I have a cluster (14.2.11) with bluestore osds that have the block.DB on an SSD partition that is separate from the primary osd device. On some of my storage servers, the osd processes fail to start up at boot time because the permissions on the block.db device are not being changed from root:root, or are being reset by udev after the ceph-volume-systemd has run successfully. The problem only occurs on a couple of the storage servers, though all of them are configured the same and are running the same software versions.

I suspect a race condition or conflict with the udev rules, but I have not been successful in identifying where the problem lies and udev is a complete nightmare to debug and diagnose.

One workaround solution is to update /usr/lib/ceph/ceph-osd-prestart.sh so that it checks (and corrects) the permissions on the block.db device so that the osd can start correctly. This particular script looks like is hasn't been updated to support bluestore, so I added a some lines to address the problem, which works for me.

Has anyone else seen a similar issue and found a different solution?

Here is the code I added to the ceph-osd-prestart.sh script:

...

blockdb="$data/block.db"

if [ -L "$blockdb" -a -e "$blockdb" ]; then

dev_db=`readlink -f $blockdb`

owner=`stat -c %U $dev_db`

if [ $owner != 'ceph' ]; then

echo "ceph-osd(${cluster:-ceph}-$id): bluestore DB ($dev_db) has incorrect permissions, fixing." 1>&2

chown ceph:ceph $dev_db

...

thanks,

Wyllys Ingersoll