[ceph-users] Re: How to use ceph-volume to create multiple OSDs per NVMe disk, and with fixed WAL/DB partition on another device?

11 Nov 2020

I have a similar setup and have been running some large concurrent
benchmarks and I am seeing that running multiple OSDs per NVME doesn't
really make a lot of difference. In fact, it actually increases the write
amplification if you have write-heavy workloads, so performance degrades
over time.

Also, if you have powerful (and enough) CPU and memory on the system,
ingest throttling and replication becomes the bottleneck and not the NVME
writes.

Although, it might make a difference if you have read heavy workloads. I
haven't tested that enough.

Regards,
Shridhar

On Wed, 11 Nov 2020 at 00:27, Jan Fajerski &lt;jfajerski(a)suse.com&gt; wrote:

...
  On Fri, Nov 06, 2020 at 10:15:52AM -0000,
victorhooi(a)yahoo.com wrote:
 I'm building a new 4-node Proxmox/Ceph
cluster, to hold disk images for  our VMs. (Ceph version is 15.2.5).

Each node has 6 x NVMe SSDs (4TB), and 1 x Optane drive (960GB).

CPU is AMD Rome 7442, so there should be plenty of CPU capacity to spare.

My aim is to create 4 x OSDs per NVMe SSD (to make more effective use of  the NVMe
performance) and use the Optane drive to store the WAL/DB
 partition for each OSD. (I.e. total of 24 x 35GB WAL/DB partitions).

However, I am struggling to get the right ceph-volume command to achieve  this.

Thanks to a very kind Redditor, I was able to get close:

/dev/nvme0n1 is an Optane device (900GB).

/dev/nvme2n1 is an Intel NVMe SSD (4TB).

```
# ceph-volume lvm batch --osds-per-device 4 /dev/nvme2n1 --db-devices  /dev/nvme0n1

Total OSDs: 4

Solid State VG:
  Targets:   block.db                  Total size: 893.00 GB
  Total LVs: 16                        Size per LV: 223.25 GB
  Devices:   /dev/nvme0n1

  Type            Path  LV Size         % of device

----------------------------------------------------------------------------------------------------
  [data]          /dev/nvme2n1  931.25 GB       25.0%
   [block.db]      vg: vg/lv   223.25 GB     
 25%

----------------------------------------------------------------------------------------------------
  [data]          /dev/nvme2n1  931.25 GB       25.0%
   [block.db]      vg: vg/lv   223.25 GB     
 25%

----------------------------------------------------------------------------------------------------
  [data]          /dev/nvme2n1  931.25 GB       25.0%
   [block.db]      vg: vg/lv   223.25 GB     
 25%

----------------------------------------------------------------------------------------------------
  [data]          /dev/nvme2n1  931.25 GB       25.0%
   [block.db]      vg: vg/lv   223.25 GB     
 25%
 --> The above OSDs would be created if the
operation continues
--> do you want to proceed? (yes/no)
```

This does split up the NVMe disk into 4 OSDs, and creates WAL/DB  partition on the
Optane drive - however, it creates 4 x 223 GB partitions
 on the Optane (whereas I want 35GB partitions).

Is there any way to specify the WAL/DB partition size in the above?

And can it be done, such that you can run successive ceph-volume  commands, to add
the OSDs and WAL/DB partitions for each NVMe disk?
 Is there is particular reason you want to run ceph-volume multiple times?
 The
 batch subcommand can handle that in one go without the need to explicitly
 specify any sizes as another reply proposed (though that will work nicely).

 Something like this should get you there:
 ceph-volume lvm batch --osds-per-device 4 /dev/nvme1n1 /dev/nvme2n1
 /dev/nvme3n1 /dev/nvme4n1 --db-devices /dev/nvme0n1

 This of course makes assumption regarding device names, adjust accordingly.

 Another option to size the volumes on the Optane drive would be to rely on
 the
 *slots arguments of the batch subcommand. See either ceph-volume lvm batch
 --help
 or https://docs.ceph.com/en/latest/ceph-volume/lvm/batch/#implicit-sizing

(Or if there's an easier way to achieve the above layout, please let me  know).

That being said - I also just saw this ceph-users thread:

https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/3Y6DEJCF7ZM…

It talks there about "osd op num shards" and "osd op num threads per 
shard" - is there some way to set those, to achieve similar performance to
 say, 4 x OSDs per NVMe drive, but with only 1 x NVMe? Has anybody done any
 testing/benchmarking on this they can share?
 _______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io
  _______________________________________________
 ceph-users mailing list -- ceph-users(a)ceph.io
 To unsubscribe send an email to ceph-users-leave(a)ceph.io

2024

2023

2022

2021

2020

2019

[ceph-users] Re: How to use ceph-volume to create multiple OSDs per NVMe disk, and with fixed WAL/DB partition on another device?