Unable to restart OSD assigned to LVM partition on Ceph 15.1.2? - ceph-users

24 Sep 2020

Hi,

 I recently restarted a storage node for our Ceph cluster and had an
issue bringing one of the OSDs back online. This storage node has
multiple HDs each as a devoted OSD for a data pool, and a single nVME
drive with an LVM partition assigned as an OSD in a metadata pool.
After rebooting the host, the OSD using an LVM partition did not
restart. When trying to manually start the OSD using systemctl, I can
follow the launch of a podman container and see an error message prior
to the container shutting down again:

 Sep 23 14:02:06 X bash[30318]: Running command:
/usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev
/dev/boot/cephfs_meta --path /var/lib/ceph/osd/ceph-165
--no-mon-config
Sep 23 14:02:06 X bash[30318]: stderr: failed to read label for
/dev/boot/cephfs_meta: (2) No such file or directory
Sep 23 14:02:06 X bash[30318]: -->  RuntimeError: command returned
non-zero exit status: 1

 1. I can see the existence of the /dev/boot/cephfs_meta symlink to a
device ../dm-3
 2. `lsblk` shows the lvm partition 'boot-cephfs_meta' under nvme0n1p3
 3. `sudo lvscan --all` shows the it as activated:
`  ACTIVE            '/dev/boot/cephfs_meta' [3.42 TiB] inherit`

 This is on a CentOS 8 system, with ceph version 15.2.1
(9fd2f65f91d9246fae2c841a6222d34d121680ee) octopus (stable)

 Related issues I have found include:
 1. https://github.com/rook/rook/issues/2591
 2. https://github.com/rook/rook/issues/3289

 There were indicated solutions for these involving installing the
LVM2 package, which I completed with `sudo dnf install lvm2`, then
tried a restart of the system and restart of the container. This was
not able to resolve the problem for LVM-partition based OSD.

 This LVM-based OSD was initially created with a `ceph-volume`
command: `ceph-volume lvm create --bluestore --data /dev/sd<x>
--block.db
/dev/nvme0n1<partition-nr>`

 Is there a workaround for this problem where the container process is
unable to read the label of the LVM partition and fails to start the
OSD?

 Thanks,
  Matt

-- 
Matt Larson, PhD
Madison, WI  53705 U.S.A.