I have a similar setup and have been running some large concurrent
benchmarks and I am seeing that running multiple OSDs per NVME doesn't
really make a lot of difference. In fact, it actually increases the write
amplification if you have write-heavy workloads, so performance degrades
over time.
Also, if you have powerful (and enough) CPU and memory on the system,
ingest throttling and replication becomes the bottleneck and not the NVME
writes.
Although, it might make a difference if you have read heavy workloads. I
haven't tested that enough.
Regards,
Shridhar
On Wed, 11 Nov 2020 at 00:27, Jan Fajerski <jfajerski(a)suse.com> wrote:
On Fri, Nov 06, 2020 at 10:15:52AM -0000,
victorhooi(a)yahoo.com wrote:
I'm building a new 4-node Proxmox/Ceph
cluster, to hold disk images for
our VMs. (Ceph version is 15.2.5).
Each node has 6 x NVMe SSDs (4TB), and 1 x Optane drive (960GB).
CPU is AMD Rome 7442, so there should be plenty of CPU capacity to spare.
My aim is to create 4 x OSDs per NVMe SSD (to make more effective use of
the NVMe
performance) and use the Optane drive to store the WAL/DB
partition for each OSD. (I.e. total of 24 x 35GB WAL/DB partitions).
However, I am struggling to get the right ceph-volume command to achieve
this.
Thanks to a very kind Redditor, I was able to get close:
/dev/nvme0n1 is an Optane device (900GB).
/dev/nvme2n1 is an Intel NVMe SSD (4TB).
```
# ceph-volume lvm batch --osds-per-device 4 /dev/nvme2n1 --db-devices
/dev/nvme0n1
Total OSDs: 4
Solid State VG:
Targets: block.db Total size: 893.00 GB
Total LVs: 16 Size per LV: 223.25 GB
Devices: /dev/nvme0n1
Type Path
LV Size % of device
----------------------------------------------------------------------------------------------------
[data] /dev/nvme2n1
931.25 GB 25.0%
[block.db] vg: vg/lv
223.25 GB
25%
----------------------------------------------------------------------------------------------------
[data] /dev/nvme2n1
931.25 GB 25.0%
[block.db] vg: vg/lv
223.25 GB
25%
----------------------------------------------------------------------------------------------------
[data] /dev/nvme2n1
931.25 GB 25.0%
[block.db] vg: vg/lv
223.25 GB
25%
----------------------------------------------------------------------------------------------------
[data] /dev/nvme2n1
931.25 GB 25.0%
[block.db] vg: vg/lv
223.25 GB
25%
--> The above OSDs would be created if the
operation continues
--> do you want to proceed? (yes/no)
```
This does split up the NVMe disk into 4 OSDs, and creates WAL/DB
partition on the
Optane drive - however, it creates 4 x 223 GB partitions
on the Optane (whereas I want 35GB partitions).
Is there any way to specify the WAL/DB partition size in the above?
And can it be done, such that you can run successive ceph-volume
commands, to add
the OSDs and WAL/DB partitions for each NVMe disk?
Is there is particular reason you want to run ceph-volume multiple times?
The
batch subcommand can handle that in one go without the need to explicitly
specify any sizes as another reply proposed (though that will work nicely).
Something like this should get you there:
ceph-volume lvm batch --osds-per-device 4 /dev/nvme1n1 /dev/nvme2n1
/dev/nvme3n1 /dev/nvme4n1 --db-devices /dev/nvme0n1
This of course makes assumption regarding device names, adjust accordingly.
Another option to size the volumes on the Optane drive would be to rely on
the
*slots arguments of the batch subcommand. See either ceph-volume lvm batch
--help
or
https://docs.ceph.com/en/latest/ceph-volume/lvm/batch/#implicit-sizing
(Or if there's an easier way to achieve the above layout, please let me
know).
That being said - I also just saw this ceph-users thread:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/3Y6DEJCF7ZM…
It talks there about "osd op num shards" and "osd op num threads per
shard" - is there some way to set those, to achieve similar performance to
say, 4 x OSDs per NVMe drive, but with only 1 x NVMe? Has anybody done any
testing/benchmarking on this they can share?
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io