[ceph-users] Re: cephadm does not redeploy OSD

18 Jul 2023

in the "ceph orch device ls --format json-pretty" output, in the blob for
that specific device, is the "ceph_device" field set? There was a bug where
it wouldn't be set at all (https://tracker.ceph.com/issues/57100) and it
would make it so you couldn't use a device serving as a db device for any
further OSDs, unless the device was fully cleaned out (so it is no longer
serving as a db device). The "ceph_device" field is meant to be our way of
knowing "yes there are LVM partitions here, but they're our partitions for
ceph stuff, so we can still use the device" and without it (or with it just
being broken, as in the tracker) redeploying OSDs that used the device for
its DB wasn't working as we don't know if those LVs imply its our device or
has LVs for some other purpose. I had thought this was fixed already in
16.2.13 but it sounds too similar to what you're seeing not to consider it.

On Tue, Jul 18, 2023 at 10:53 AM Luis Domingues &lt;luis.domingues(a)proton.ch&gt;
wrote:

...
  Hi,

 We are running a ceph cluster managed with cephadm v16.2.13. Recently we
 needed to change a disk, and we replaced it with:

 ceph orch osd rm 37 --replace.

 It worked fine, the disk was drained and the OSD marked as destroy.

 However, after changing the disk, no OSD was created. Looking to the db
 device, the partition for db for OSD 37 was still there. So we destroyed it
 using:
 ceph-volume lvm zap --osd-id=37 --destroy.

 But we still have no OSD redeployed.
 Here we have our spec:

 ---
 service_type: osd
 service_id: osd-hdd
 placement:
 label: osds
 spec:
 data_devices:
 rotational: 1
 encrypted: true
 db_devices:
 size: '1TB:2TB' db_slots: 12

 And the disk looks good:

 HOST PATH TYPE DEVICE ID SIZE AVAILABLE REFRESHED REJECT REASONS
 node05 /dev/nvme2n1 ssd SAMSUNG MZPLJ1T6HBJR-00007_S55JNG0R600357 1600G
 12m ago LVM detected, locked

 node05 /dev/sdk hdd SEAGATE_ST10000NM0206_ZA21G2170000C7240KPF 10.0T Yes
 12m ago

 And VG on db_device looks to have enough space:
 ceph-33b06f1a-f6f6-57cf-9ca8-6e4aa81caae0 1 11 0 wz--n- <1.46t 173.91g

 If I remove the db_devices and db_slots from the specs, and do a dry run,
 the orchestrator seems to see the new disk as available:

 ceph orch apply -i osd_specs.yml --dry-run
 WARNING! Dry-Runs are snapshots of a certain point in time and are bound
 to the current inventory setup. If any of these conditions change, the
 preview will be invalid. Please make sure to have a minimal
 timeframe between planning and applying the specs.
 ####################
 SERVICESPEC PREVIEWS
 ####################
 +---------+------+--------+-------------+
 |SERVICE |NAME |ADD_TO |REMOVE_FROM |
 +---------+------+--------+-------------+
 +---------+------+--------+-------------+
 ################
 OSDSPEC PREVIEWS
 ################
 +---------+---------+-------------------------+----------+----+-----+
 |SERVICE |NAME |HOST |DATA |DB |WAL |
 +---------+---------+-------------------------+----------+----+-----+
 |osd |osd-hdd |node05 |/dev/sdk |- |- |
 +---------+---------+-------------------------+----------+----+-----+

 But as soon as I add db_devices back, the orchestrator is happy as it is,
 like there is nothing to do:

 ceph orch apply -i osd_specs.yml --dry-run
 WARNING! Dry-Runs are snapshots of a certain point in time and are bound
 to the current inventory setup. If any of these conditions change, the
 preview will be invalid. Please make sure to have a minimal
 timeframe between planning and applying the specs.
 ####################
 SERVICESPEC PREVIEWS
 ####################
 +---------+------+--------+-------------+
 |SERVICE |NAME |ADD_TO |REMOVE_FROM |
 +---------+------+--------+-------------+
 +---------+------+--------+-------------+
 ################
 OSDSPEC PREVIEWS
 ################
 +---------+------+------+------+----+-----+
 |SERVICE |NAME |HOST |DATA |DB |WAL |
 +---------+------+------+------+----+-----+

 I do not know why ceph will not use this disk, and I do not know where to
 look. It seems logs are not saying anything. And the weirdest thing,
 another disk was replaced on the same machine, and it went without any
 issues.

 Luis Domingues
 Proton AG
 _______________________________________________
 ceph-users mailing list -- ceph-users(a)ceph.io
 To unsubscribe send an email to ceph-users-leave(a)ceph.io

2024

2023

2022

2021

2020

2019

[ceph-users] Re: cephadm does not redeploy OSD