in the "ceph orch device ls --format json-pretty" output, in the blob for
that specific device, is the "ceph_device" field set? There was a bug where
it wouldn't be set at all (
https://tracker.ceph.com/issues/57100) and it
would make it so you couldn't use a device serving as a db device for any
further OSDs, unless the device was fully cleaned out (so it is no longer
serving as a db device). The "ceph_device" field is meant to be our way of
knowing "yes there are LVM partitions here, but they're our partitions for
ceph stuff, so we can still use the device" and without it (or with it just
being broken, as in the tracker) redeploying OSDs that used the device for
its DB wasn't working as we don't know if those LVs imply its our device or
has LVs for some other purpose. I had thought this was fixed already in
16.2.13 but it sounds too similar to what you're seeing not to consider it.
On Tue, Jul 18, 2023 at 10:53 AM Luis Domingues <luis.domingues(a)proton.ch>
wrote:
Hi,
We are running a ceph cluster managed with cephadm v16.2.13. Recently we
needed to change a disk, and we replaced it with:
ceph orch osd rm 37 --replace.
It worked fine, the disk was drained and the OSD marked as destroy.
However, after changing the disk, no OSD was created. Looking to the db
device, the partition for db for OSD 37 was still there. So we destroyed it
using:
ceph-volume lvm zap --osd-id=37 --destroy.
But we still have no OSD redeployed.
Here we have our spec:
---
service_type: osd
service_id: osd-hdd
placement:
label: osds
spec:
data_devices:
rotational: 1
encrypted: true
db_devices:
size: '1TB:2TB' db_slots: 12
And the disk looks good:
HOST PATH TYPE DEVICE ID SIZE AVAILABLE REFRESHED REJECT REASONS
node05 /dev/nvme2n1 ssd SAMSUNG MZPLJ1T6HBJR-00007_S55JNG0R600357 1600G
12m ago LVM detected, locked
node05 /dev/sdk hdd SEAGATE_ST10000NM0206_ZA21G2170000C7240KPF 10.0T Yes
12m ago
And VG on db_device looks to have enough space:
ceph-33b06f1a-f6f6-57cf-9ca8-6e4aa81caae0 1 11 0 wz--n- <1.46t 173.91g
If I remove the db_devices and db_slots from the specs, and do a dry run,
the orchestrator seems to see the new disk as available:
ceph orch apply -i osd_specs.yml --dry-run
WARNING! Dry-Runs are snapshots of a certain point in time and are bound
to the current inventory setup. If any of these conditions change, the
preview will be invalid. Please make sure to have a minimal
timeframe between planning and applying the specs.
####################
SERVICESPEC PREVIEWS
####################
+---------+------+--------+-------------+
|SERVICE |NAME |ADD_TO |REMOVE_FROM |
+---------+------+--------+-------------+
+---------+------+--------+-------------+
################
OSDSPEC PREVIEWS
################
+---------+---------+-------------------------+----------+----+-----+
|SERVICE |NAME |HOST |DATA |DB |WAL |
+---------+---------+-------------------------+----------+----+-----+
|osd |osd-hdd |node05 |/dev/sdk |- |- |
+---------+---------+-------------------------+----------+----+-----+
But as soon as I add db_devices back, the orchestrator is happy as it is,
like there is nothing to do:
ceph orch apply -i osd_specs.yml --dry-run
WARNING! Dry-Runs are snapshots of a certain point in time and are bound
to the current inventory setup. If any of these conditions change, the
preview will be invalid. Please make sure to have a minimal
timeframe between planning and applying the specs.
####################
SERVICESPEC PREVIEWS
####################
+---------+------+--------+-------------+
|SERVICE |NAME |ADD_TO |REMOVE_FROM |
+---------+------+--------+-------------+
+---------+------+--------+-------------+
################
OSDSPEC PREVIEWS
################
+---------+------+------+------+----+-----+
|SERVICE |NAME |HOST |DATA |DB |WAL |
+---------+------+------+------+----+-----+
I do not know why ceph will not use this disk, and I do not know where to
look. It seems logs are not saying anything. And the weirdest thing,
another disk was replaced on the same machine, and it went without any
issues.
Luis Domingues
Proton AG
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io