Hi,
we currently creating a new cluster. This cluster is (as far as we can
tell) an config-copy (ansible) of our existing cluster, just 5 years later
- with new hardware (nvme instead of ssd, bigger disks, ...)
The setup:
* NVMe for Journals and "Cache"-Pool
* HDD with NVMe Journals for "Data"-Pool
* Cache-Pool as writeback-Tier on Data-Pool
* We are using 12.2.13 without bluestore.
If we run an rados benchmark against this pool, everything seems fine, but
as soon as we start a fio-benchmark
-<-
[global]
ioengine=rbd
clientname=cinder
pool=cinder
rbdname=fio_test
rw=write
bs=4M
[rbd_iodepth32]
iodepth=32
->-
after some seconds the bandwidth drops to <15 MB/s and our hdd-disks are
doing more IOs than our Journal-Disks.
We also unconfigured the caching completely, but the issue remains.
The output of "ceph osd pool stats" shows ~100 op/s, but our disks are
doing:
-<-
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await r_await w_await svctm %util
nvme0n1 0.00 0.00 0.00 278.50 0.00 34.07 250.51
0.14 0.50 0.00 0.50 0.03 0.80
nvme1n1 0.00 0.00 0.00 64.00 0.00 7.77 248.50
0.01 0.22 0.00 0.22 0.03 0.20
sda 0.00 1.50 0.00 557.00 0.00 29.49 108.45
180.57 160.59 0.00 160.59 1.80 100.00
sdb 0.00 42.00 0.00 592.00 0.00 28.21 97.60
176.51 1105.79 0.00 1105.79 1.69 100.00
sdc 0.00 14.50 0.00 528.50 0.00 27.95 108.31
183.02 179.47 0.00 179.47 1.89 100.00
sde 0.00 134.50 0.00 223.50 0.00 14.05 128.72
17.38 60.05 0.00 60.05 0.89 20.00
sdg 0.00 76.00 0.00 492.00 0.00 26.32 109.54
191.81 1474.96 0.00 1474.96 2.03 100.00
sdf 0.00 0.00 0.00 491.50 0.00 26.76 111.49
176.55 326.05 0.00 326.05 2.03 100.00
sdh 0.00 0.00 0.00 548.50 0.00 26.71 99.75
204.39 327.57 0.00 327.57 1.82 100.00
sdi 0.00 112.00 0.00 526.00 0.00 23.15 90.14
158.32 1325.61 0.00 1325.61 1.90 100.00
sdj 0.00 12.00 0.00 641.00 0.00 34.78 111.13
185.51 278.29 0.00 278.29 1.56 100.00
sdk 0.00 23.50 0.00 399.50 0.00 20.38 104.46
166.77 461.67 0.00 461.67 2.50 100.00
sdl 0.00 267.00 0.00 498.50 0.00 34.46 141.58
200.37 490.80 0.00 490.80 2.01 100.00
->-
Any hints how to debug the issue?
Thanks a lot,
Fabian