Hi,
I have a 3-OSD-node Ceph cluster with 1 480GB SSD and 8 x 2TB
12Gpbs SAS HDD on each node, to provide storage to a OpenStack
cluster. Both public and cluster networks are 2x10G. WAL and DB
of each OSD is on SSD and they share the same 60GB partition.
I run fio with different combinations of operation, block size and
io-depth to collect IOPS, bandwidth and latency. I tried fio on
compute node with ioengine=rbd, also fio within VM (backed by Ceph)
with ioengine=libaio.
The result doesn't seem good. Here are couple examples.
====================================
fio --name=test --ioengine=rbd --clientname=admin \
--pool=benchmark --rbdname=test --numjobs=1 \
--runtime=30 --direct=1 --size=2G \
--rw=read --bs=4k --iodepth=1
test: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=rbd,
iodepth=1
fio-3.7
Starting 1 process
Jobs: 1 (f=0): [f(1)][100.0%][r=27.6MiB/s,w=0KiB/s][r=7075,w=0 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=56310: Mon Sep 14 19:01:24 2020
read: IOPS=7610, BW=29.7MiB/s (31.2MB/s)(892MiB/30001msec)
slat (nsec): min=1550, max=57662, avg=3312.74, stdev=2981.42
clat (usec): min=77, max=4799, avg=127.39, stdev=39.88
lat (usec): min=78, max=4812, avg=130.70, stdev=40.67
clat percentiles (usec):
| 1.00th=[ 82], 5.00th=[ 86], 10.00th=[ 95], 20.00th=[ 98],
| 30.00th=[ 100], 40.00th=[ 104], 50.00th=[ 116], 60.00th=[ 129],
| 70.00th=[ 141], 80.00th=[ 157], 90.00th=[ 182], 95.00th=[ 198],
| 99.00th=[ 233], 99.50th=[ 245], 99.90th=[ 359], 99.95th=[ 515],
| 99.99th=[ 709]
bw ( KiB/s): min=27160, max=40696, per=100.00%, avg=30474.29, stdev=2826.23,
samples=59
iops : min= 6790, max=10174, avg=7618.56, stdev=706.56, samples=59
lat (usec) : 100=28.89%, 250=70.72%, 500=0.34%, 750=0.05%, 1000=0.01%
lat (msec) : 2=0.01%, 10=0.01%
cpu : usr=3.55%, sys=3.80%, ctx=228358, majf=0, minf=29
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=228333,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=29.7MiB/s (31.2MB/s), 29.7MiB/s-29.7MiB/s (31.2MB/s-31.2MB/s), io=892MiB
(935MB), run=30001-30001msec
Disk stats (read/write):
dm-0: ios=290/3, merge=0/0, ticks=2427/19, in_queue=2446, util=0.95%, aggrios=290/4,
aggrmerge=0/0, aggrticks=2427/39, aggrin_queue=2332, aggrutil=0.95%
sda: ios=290/4, merge=0/0, ticks=2427/39, in_queue=2332, util=0.95%
====================================
====================================
fio --name=test --ioengine=rbd --clientname=admin \
--pool=benchmark --rbdname=test --numjobs=1 \
--runtime=30 --direct=1 --size=2G \
--rw=write --bs=4k --iodepth=1
test: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=rbd,
iodepth=1
fio-3.7
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][r=0KiB/s,w=6352KiB/s][r=0,w=1588 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=56544: Mon Sep 14 19:03:36 2020
write: IOPS=1604, BW=6417KiB/s (6571kB/s)(188MiB/30003msec)
slat (nsec): min=2240, max=45925, avg=6526.95, stdev=3486.19
clat (usec): min=399, max=35411, avg=615.88, stdev=231.41
lat (usec): min=402, max=35421, avg=622.40, stdev=232.08
clat percentiles (usec):
| 1.00th=[ 420], 5.00th=[ 449], 10.00th=[ 469], 20.00th=[ 498],
| 30.00th=[ 529], 40.00th=[ 562], 50.00th=[ 611], 60.00th=[ 652],
| 70.00th=[ 685], 80.00th=[ 709], 90.00th=[ 766], 95.00th=[ 799],
| 99.00th=[ 881], 99.50th=[ 955], 99.90th=[ 2671], 99.95th=[ 3097],
| 99.99th=[ 3785]
bw ( KiB/s): min= 5944, max= 6792, per=100.00%, avg=6415.95, stdev=178.72, samples=60
iops : min= 1486, max= 1698, avg=1603.93, stdev=44.67, samples=60
lat (usec) : 500=20.82%, 750=67.23%, 1000=11.55%
lat (msec) : 2=0.25%, 4=0.14%, 10=0.01%, 20=0.01%, 50=0.01%
cpu : usr=1.22%, sys=1.25%, ctx=48143, majf=0, minf=18
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,48129,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=6417KiB/s (6571kB/s), 6417KiB/s-6417KiB/s (6571kB/s-6571kB/s), io=188MiB
(197MB), run=30003-30003msec
Disk stats (read/write):
dm-0: ios=31/2, merge=0/0, ticks=342/14, in_queue=356, util=0.12%, aggrios=33/3,
aggrmerge=0/0, aggrticks=390/27, aggrin_queue=404, aggrutil=0.13%
sda: ios=33/3, merge=0/0, ticks=390/27, in_queue=404, util=0.13%
====================================
Does that make sense? How do you benchmark your Ceph cluster?
Appreciate if you could share your experiences here.
Thanks!
Tony