I'm building a new 4-node Proxmox/Ceph cluster, to hold disk images for our VMs. (Ceph
version is 15.2.5).
Each node has 6 x NVMe SSDs (4TB), and 1 x Optane drive (960GB).
CPU is AMD Rome 7442, so there should be plenty of CPU capacity to spare.
My aim is to create 4 x OSDs per NVMe SSD (to make more effective use of the NVMe
performance) and use the Optane drive to store the WAL/DB partition for each OSD. (I.e.
total of 24 x 35GB WAL/DB partitions).
However, I am struggling to get the right ceph-volume command to achieve this.
Thanks to a very kind Redditor, I was able to get close:
/dev/nvme0n1 is an Optane device (900GB).
/dev/nvme2n1 is an Intel NVMe SSD (4TB).
```
# ceph-volume lvm batch --osds-per-device 4 /dev/nvme2n1 --db-devices /dev/nvme0n1
Total OSDs: 4
Solid State VG:
Targets: block.db Total size: 893.00 GB
Total LVs: 16 Size per LV: 223.25 GB
Devices: /dev/nvme0n1
Type Path LV Size
% of device
----------------------------------------------------------------------------------------------------
[data] /dev/nvme2n1 931.25 GB
25.0%
[block.db] vg: vg/lv 223.25 GB
25%
----------------------------------------------------------------------------------------------------
[data] /dev/nvme2n1 931.25 GB
25.0%
[block.db] vg: vg/lv 223.25 GB
25%
----------------------------------------------------------------------------------------------------
[data] /dev/nvme2n1 931.25 GB
25.0%
[block.db] vg: vg/lv 223.25 GB
25%
----------------------------------------------------------------------------------------------------
[data] /dev/nvme2n1 931.25 GB
25.0%
[block.db] vg: vg/lv 223.25 GB
25%
--> The above OSDs would be created if the operation continues
--> do you want to proceed? (yes/no)
```
This does split up the NVMe disk into 4 OSDs, and creates WAL/DB partition on the Optane
drive - however, it creates 4 x 223 GB partitions on the Optane (whereas I want 35GB
partitions).
Is there any way to specify the WAL/DB partition size in the above?
And can it be done, such that you can run successive ceph-volume commands, to add the OSDs
and WAL/DB partitions for each NVMe disk?
(Or if there's an easier way to achieve the above layout, please let me know).
That being said - I also just saw this ceph-users thread:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/3Y6DEJCF7ZM…
It talks there about "osd op num shards" and "osd op num threads per
shard" - is there some way to set those, to achieve similar performance to say, 4 x
OSDs per NVMe drive, but with only 1 x NVMe? Has anybody done any testing/benchmarking on
this they can share?