ceph orch osd spec questions - ceph-users

13 Jan 2023

Ceph Pacific 16.2.9

We have a storage server with multiple 1.7TB SSDs dedicated to the bluestore DB usage. 
The osd spec originally was misconfigured slightly and had set the "limit"
parameter on the db_devices to 5 (there are 8 SSDs available) and did not specify a
block_db_size.  ceph layed out the original 40 OSDs and put 8 DBs across 5 of the SSDs
(because of limit param).  Ceph seems to have auto-sized the bluestore DB partitions to be
about 45GB, which is far less than the recommended 1-4% (using 10TB drives).  How does
ceph-volume determine the size of the bluestore DB/WAL partitions when it is not specified
in the spec?

We updated the spec and specified a block_db_size of 300G and removed the
"limit" value.  Now we can see in the cephadm.log that the ceph-volume command
being issued is using the correct list of SSD devices (all 8) as options to the lvm batch
(--db-devices ...), but it keeps failing to create the new OSD because we are asking for
300G and it thinks there is only 44G available even though the last 3 SSDs in the list are
empty (1.7T).  So, it appears that somehow the orchestrator is ignoring the last 3 SSDs. 
I have verified that these SSDs are wiped clean, have no partitions or LVM, and no label
(sgdisk -Z, wipefs -a). They appear as available in the inventory and not locked or
otherwise in use.

Also, the "db_slots" spec parameter is ignored in pacific due to a bug so there
is no way to tell the orchestrator to use "block_db_slots". Adding it to the
spec like "block_db_size" fails since it is not recognized.

Any help figuring out why these SSDs are being ignored would be much appreciated.

Our spec for this host looks like this:
---

spec:

  data_devices:

    rotational: 1

    size: '3TB:'

  db_devices:

    rotational: 0

    size: ':2T'

    vendor: 'SEAGATE'

  block_db_size: 300G

---