That part looks quite good:
"available": false,
"ceph_device": true,
"created": "2023-07-18T16:01:16.715487Z",
"device_id": "SAMSUNG MZPLJ1T6HBJR-00007_S55JNG0R600354",
"human_readable_type": "ssd",
"lsm_data": {},
"lvs": [
{
"cluster_fsid": "11b47c57-5e7f-44c0-8b19-ddd801a89435",
"cluster_name": "ceph",
"db_uuid": "CUMgp7-Uscn-ASLo-bh14-7Sxe-80GE-EcywDb",
"name":
"osd-block-db-5cb8edda-30f9-539f-b4c5-dbe420927911",
"osd_fsid": "089894cf-1782-4a3a-8ac0-9dd043f80c71",
"osd_id": "7",
"osdspec_affinity": "",
"type": "db"
},
{
I forgot to mention that the cluster was initially deployed with ceph-ansible and adopted
by cephadm.
Luis Domingues
Proton AG
------- Original Message -------
On Tuesday, July 18th, 2023 at 18:15, Adam King <adking(a)redhat.com> wrote:
> in the "ceph orch device ls --format json-pretty" output, in the blob for
> that specific device, is the "ceph_device" field set? There was a bug
where
> it wouldn't be set at all (
https://tracker.ceph.com/issues/57100) and it
> would make it so you couldn't use a device serving as a db device for any
> further OSDs, unless the device was fully cleaned out (so it is no longer
> serving as a db device). The "ceph_device" field is meant to be our way of
> knowing "yes there are LVM partitions here, but they're our partitions for
> ceph stuff, so we can still use the device" and without it (or with it just
> being broken, as in the tracker) redeploying OSDs that used the device for
> its DB wasn't working as we don't know if those LVs imply its our device or
> has LVs for some other purpose. I had thought this was fixed already in
> 16.2.13 but it sounds too similar to what you're seeing not to consider it.
>
> On Tue, Jul 18, 2023 at 10:53 AM Luis Domingues luis.domingues(a)proton.ch
>
> wrote:
>
> > Hi,
> >
> > We are running a ceph cluster managed with cephadm v16.2.13. Recently we
> > needed to change a disk, and we replaced it with:
> >
> > ceph orch osd rm 37 --replace.
> >
> > It worked fine, the disk was drained and the OSD marked as destroy.
> >
> > However, after changing the disk, no OSD was created. Looking to the db
> > device, the partition for db for OSD 37 was still there. So we destroyed it
> > using:
> > ceph-volume lvm zap --osd-id=37 --destroy.
> >
> > But we still have no OSD redeployed.
> > Here we have our spec:
> >
> > ---
> > service_type: osd
> > service_id: osd-hdd
> > placement:
> > label: osds
> > spec:
> > data_devices:
> > rotational: 1
> > encrypted: true
> > db_devices:
> > size: '1TB:2TB' db_slots: 12
> >
> > And the disk looks good:
> >
> > HOST PATH TYPE DEVICE ID SIZE AVAILABLE REFRESHED REJECT REASONS
> > node05 /dev/nvme2n1 ssd SAMSUNG MZPLJ1T6HBJR-00007_S55JNG0R600357 1600G
> > 12m ago LVM detected, locked
> >
> > node05 /dev/sdk hdd SEAGATE_ST10000NM0206_ZA21G2170000C7240KPF 10.0T Yes
> > 12m ago
> >
> > And VG on db_device looks to have enough space:
> > ceph-33b06f1a-f6f6-57cf-9ca8-6e4aa81caae0 1 11 0 wz--n- <1.46t 173.91g
> >
> > If I remove the db_devices and db_slots from the specs, and do a dry run,
> > the orchestrator seems to see the new disk as available:
> >
> > ceph orch apply -i osd_specs.yml --dry-run
> > WARNING! Dry-Runs are snapshots of a certain point in time and are bound
> > to the current inventory setup. If any of these conditions change, the
> > preview will be invalid. Please make sure to have a minimal
> > timeframe between planning and applying the specs.
> > ####################
> > SERVICESPEC PREVIEWS
> > ####################
> > +---------+------+--------+-------------+
> > |SERVICE |NAME |ADD_TO |REMOVE_FROM |
> > +---------+------+--------+-------------+
> > +---------+------+--------+-------------+
> > ################
> > OSDSPEC PREVIEWS
> > ################
> > +---------+---------+-------------------------+----------+----+-----+
> > |SERVICE |NAME |HOST |DATA |DB |WAL |
> > +---------+---------+-------------------------+----------+----+-----+
> > |osd |osd-hdd |node05 |/dev/sdk |- |- |
> > +---------+---------+-------------------------+----------+----+-----+
> >
> > But as soon as I add db_devices back, the orchestrator is happy as it is,
> > like there is nothing to do:
> >
> > ceph orch apply -i osd_specs.yml --dry-run
> > WARNING! Dry-Runs are snapshots of a certain point in time and are bound
> > to the current inventory setup. If any of these conditions change, the
> > preview will be invalid. Please make sure to have a minimal
> > timeframe between planning and applying the specs.
> > ####################
> > SERVICESPEC PREVIEWS
> > ####################
> > +---------+------+--------+-------------+
> > |SERVICE |NAME |ADD_TO |REMOVE_FROM |
> > +---------+------+--------+-------------+
> > +---------+------+--------+-------------+
> > ################
> > OSDSPEC PREVIEWS
> > ################
> > +---------+------+------+------+----+-----+
> > |SERVICE |NAME |HOST |DATA |DB |WAL |
> > +---------+------+------+------+----+-----+
> >
> > I do not know why ceph will not use this disk, and I do not know where to
> > look. It seems logs are not saying anything. And the weirdest thing,
> > another disk was replaced on the same machine, and it went without any
> > issues.
> >
> > Luis Domingues
> > Proton AG
> > _______________________________________________
> > ceph-users mailing list -- ceph-users(a)ceph.io
> > To unsubscribe send an email to ceph-users-leave(a)ceph.io
>
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io