Hi Chris,
I found the problem. "ceph-volume simple activate" modifies the OSD's meta
data in an invalid way.
On a pre lvm-converted ceph-disk OSD I had in my cupboard:
[root@ceph-adm:ceph-20 ~]# mount /dev/sdq1 mnt
[root@ceph-adm:ceph-20 ~]# ls -l mnt
[...]
lrwxrwxrwx. 1 ceph ceph 58 Mar 15 2019 block ->
/dev/disk/by-partuuid/a1e5ef7d-9bab-4911-abe5-9075b91d88a4
[..]
[root@ceph-adm:ceph-20 ~]# umount mnt
[root@ceph-adm:ceph-20 ~]# cat /etc/ceph/osd/59-9b88d6ec-87a4-4640-b80e-81d3d56fac15.json
{
"active": "ok",
"block": {
"path":
"/dev/disk/by-partuuid/a1e5ef7d-9bab-4911-abe5-9075b91d88a4",
"uuid": "a1e5ef7d-9bab-4911-abe5-9075b91d88a4"
},
"block_uuid": "a1e5ef7d-9bab-4911-abe5-9075b91d88a4",
"bluefs": 1,
"ceph_fsid": "e4ece518-f2cb-4708-b00f-b6bf511e91d9",
"cluster_name": "ceph",
"data": {
"path": "/dev/sdq1",
"uuid": "9b88d6ec-87a4-4640-b80e-81d3d56fac15"
},
"fsid": "9b88d6ec-87a4-4640-b80e-81d3d56fac15",
"keyring": "AQBP4opcBeCYOxAA4sOpTthNE6T28WUf4Bgm3w==",
"kv_backend": "rocksdb",
"magic": "ceph osd volume v026",
"mkfs_done": "yes",
"none": "",
"ready": "ready",
"require_osd_release": "",
"type": "bluestore",
"whoami": 59
}
Now, "ceph-volume simple activate" modifies the symlink "block" to
point to an unstable path:
[root@ceph-adm:ceph-20 ~]# ceph-volume simple activate --file
"/etc/ceph/osd/59-9b88d6ec-87a4-4640-b80e-81d3d56fac15.json" --no-systemd
Running command: /usr/bin/mount -v /dev/sdq1 /var/lib/ceph/osd/ceph-59
stdout: mount: /dev/sdq1 mounted on /var/lib/ceph/osd/ceph-59.
Running command: /usr/bin/ln -snf /dev/sdq2 /var/lib/ceph/osd/ceph-59/block
Running command: /usr/bin/chown -R ceph:ceph /dev/sdq2
--> Skipping enabling of `simple` systemd unit
--> Skipping masking of ceph-disk systemd units
--> Skipping enabling and starting OSD simple systemd unit because --no-systemd was
used
--> Successfully activated OSD 59 with FSID 9b88d6ec-87a4-4640-b80e-81d3d56fac15
Its the command "/usr/bin/ln -snf /dev/sdq2 /var/lib/ceph/osd/ceph-59/block"
that destroys the integrity of the OSD. If you reboot the machine and the devices get
different names, the next execution of "ceph-volume simple scan" will produce a
corrupted meta data file. This will also happen if you move a converted OSD to another
host and try to scan+start it.
The change of the symbolic link to an unstable device path is a critical bug and I
don't even understand why it happens in the first place. There is no point and the
only valid link target would be
"/dev/disk/by-partuuid/a1e5ef7d-9bab-4911-abe5-9075b91d88a4" any ways.
I can work aroud that by resetting the link to its correct value after activation.
However, this should really be fixed.
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
________________________________________
From: Chris Dunlop <chris(a)onthe.net.au>
Sent: 03 March 2021 05:06:09
To: Frank Schilder
Cc: ceph-users(a)ceph.io
Subject: Re: [ceph-users] OSD id 241 != my id 248: conversion from "ceph-disk"
to "ceph-volume simple" destroys OSDs
Hi Frank,
On Tue, Mar 02, 2021 at 02:58:05PM +0000, Frank Schilder wrote:
Hi all,
this is a follow-up on "reboot breaks OSDs converted from ceph-disk to ceph-volume
simple".
I converted a number of ceph-disk OSDs to ceph-volume using "simple scan" and
"simple activate". Somewhere along the way, the OSDs meta-data gets rigged and
the prominent symptom is that the symlink block is changes from a part-uuid target to an
unstable device name target like:
before conversion:
block -> /dev/disk/by-partuuid/9123be91-7620-495a-a9b7-cc85b1de24b7
after conversion:
block -> /dev/sdj2
This is a huge problem as the "after conversion" device names are unstable. I
have now a cluster that I cannot reboot servers on due to this problem. OSDs randomly
re-assigned devices will refuse to start with:
2021-03-02 15:56:21.709 7fb7c2549b80 -1 OSD id 241 != my id 248
Please help me with getting out of this mess.
These paths might be coming from /etc/ceph/osd/*.json files.
Have your tried editing the files to replace /dev/sdXX path with the by-partuuid path?
Cheers,
Chris