Here is the test inside VM.
================
# fio --name=test --ioengine=libaio --numjobs=1 --runtime=30 \
--direct=1 --size=2G --end_fsync=1 \
--rw=read --bs=4K --iodepth=1
test: (groupid=0, jobs=1): err= 0: pid=14615: Mon Sep 14 21:50:55 2020
read: IOPS=3209, BW=12.5MiB/s (13.1MB/s)(376MiB/30001msec)
slat (usec): min=3, max=162, avg= 6.91, stdev= 4.74
clat (usec): min=85, max=17366, avg=303.17, stdev=639.42
lat (usec): min=161, max=17373, avg=310.38, stdev=639.93
clat percentiles (usec):
| 1.00th=[ 167], 5.00th=[ 172], 10.00th=[ 176], 20.00th=[ 182],
| 30.00th=[ 188], 40.00th=[ 194], 50.00th=[ 204], 60.00th=[ 221],
| 70.00th=[ 239], 80.00th=[ 277], 90.00th=[ 359], 95.00th=[ 461],
| 99.00th=[ 3130], 99.50th=[ 5735], 99.90th=[ 8094], 99.95th=[11338],
| 99.99th=[14091]
bw ( KiB/s): min= 9688, max=15120, per=99.87%, avg=12820.51, stdev=1001.88,
samples=59
iops : min= 2422, max= 3780, avg=3205.12, stdev=250.47, samples=59
lat (usec) : 100=0.01%, 250=74.99%, 500=20.76%, 750=2.21%, 1000=0.50%
lat (msec) : 2=0.39%, 4=0.27%, 10=0.81%, 20=0.06%
cpu : usr=0.65%, sys=3.06%, ctx=96287, majf=0, minf=13
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=96287,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=12.5MiB/s (13.1MB/s), 12.5MiB/s-12.5MiB/s (13.1MB/s-13.1MB/s), io=376MiB
(394MB), run=30001-30001msec
Disk stats (read/write):
vda: ios=95957/2, merge=0/0, ticks=29225/12, in_queue=6027, util=82.52%
================
================
# fio --name=test --ioengine=libaio --numjobs=1 --runtime=30 \
--direct=1 --size=2G --end_fsync=1 \
--rw=write --bs=4K --iodepth=1
test: (groupid=0, jobs=1): err= 0: pid=14619: Mon Sep 14 21:52:04 2020
write: IOPS=16.3k, BW=63.7MiB/s (66.8MB/s)(1917MiB/30074msec)
slat (usec): min=3, max=182, avg= 5.94, stdev= 1.30
clat (usec): min=11, max=5234, avg=54.08, stdev=18.58
lat (usec): min=35, max=5254, avg=60.26, stdev=18.80
clat percentiles (usec):
| 1.00th=[ 36], 5.00th=[ 38], 10.00th=[ 40], 20.00th=[ 46],
| 30.00th=[ 48], 40.00th=[ 50], 50.00th=[ 53], 60.00th=[ 56],
| 70.00th=[ 59], 80.00th=[ 63], 90.00th=[ 67], 95.00th=[ 71],
| 99.00th=[ 85], 99.50th=[ 100], 99.90th=[ 289], 99.95th=[ 355],
| 99.99th=[ 412]
bw ( KiB/s): min=59640, max=80982, per=100.00%, avg=65462.25, stdev=7166.81,
samples=59
iops : min=14910, max=20245, avg=16365.54, stdev=1791.69, samples=59
lat (usec) : 20=0.01%, 50=39.85%, 100=59.65%, 250=0.36%, 500=0.14%
lat (usec) : 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%
cpu : usr=2.10%, sys=11.63%, ctx=490639, majf=0, minf=12
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,490635,0,1 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=63.7MiB/s (66.8MB/s), 63.7MiB/s-63.7MiB/s (66.8MB/s-66.8MB/s), io=1917MiB
(2010MB), run=30074-30074msec
Disk stats (read/write):
vda: ios=9/490639, merge=0/0, ticks=26/27102, in_queue=184, util=99.36%
================
Both networking and storage workloads are light.
Which system stat should I monitor?
Thanks!
Tony
> -----Original Message-----
> From: rainning <tweetypie(a)qq.com>
> Sent: Monday, September 14, 2020 8:39 PM
> To: Tony Liu <tonyliu0592(a)hotmail.com>om>; ceph-users <ceph-users(a)ceph.io>
> Subject: [ceph-users] Re: benchmark Ceph
>
> Can you post the fio results with the ioengine using libaio? From what
> you posted, it seems to me that the read test hit cache. And the write
> performance was not good, the latency was too high (~35.4ms) while the
> numjobs and iodepth both were 1. Did you monitor system stat on both
> side (VM/Compute Node and Cluster)?
>
>
>
>
> ------------------ Original ------------------
> From: "Tony Liu";<tonyliu0592(a)hotmail.com&gt;ail.com>;
> Date: Sep 15, 2020
> To: &nbsp;"ceph-users"<ceph-users(a)ceph.io&gt;ph-users@ceph.io>;
>
> Subject: [ceph-users] benchmark Ceph
>
>
>
> Hi,
>
> I have a 3-OSD-node Ceph cluster with 1 480GB SSD and 8 x 2TB 12Gpbs SAS
> HDD on each node, to provide storage to a OpenStack cluster. Both public
> and cluster networks are 2x10G. WAL and DB of each OSD is on SSD and
> they share the same 60GB partition.
>
> I run fio with different combinations of operation, block size and io-
> depth to collect IOPS, bandwidth and latency. I tried fio on compute
> node with ioengine=rbd, also fio within VM (backed by Ceph) with
> ioengine=libaio.
>
> The result doesn't seem good. Here are couple examples.
> ====================================
> fio --name=test --ioengine=rbd --clientname=admin \ -
> -pool=benchmark --rbdname=test --numjobs=1 \ --
> runtime=30 --direct=1 --size=2G \ --rw=read --bs=4k -
> -iodepth=1
>
> test: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-
> 4096B, ioengine=rbd, iodepth=1
> fio-3.7
> Starting 1 process
> Jobs: 1 (f=0): [f(1)][100.0%][r=27.6MiB/s,w=0KiB/s][r=7075,w=0 IOPS][eta
> 00m:00s]
> test: (groupid=0, jobs=1): err= 0: pid=56310: Mon Sep 14 19:01:24 2020
> read: IOPS=7610, BW=29.7MiB/s (31.2MB/s)(892MiB/30001msec)
> slat (nsec): min=1550, max=57662, avg=3312.74,
> stdev=2981.42 clat (usec): min=77, max=4799,
> avg=127.39, stdev=39.88 lat (usec): min=78,
> max=4812, avg=130.70, stdev=40.67 clat percentiles
> (usec):
> | 1.00th=[
82],
> 5.00th=[ 86], 10.00th=[ 95],
> 20.00th=[ 98], |
30.00th=[
> 100], 40.00th=[ 104], 50.00th=[ 116], 60.00th=[ 129],
> | 70.00th=[ 141],
80.00th=[ 157],
> 90.00th=[ 182], 95.00th=[ 198],
|
> 99.00th=[ 233], 99.50th=[ 245], 99.90th=[ 359],
> 99.95th=[ 515], |
99.99th=[ 709]
> bw ( KiB/s): min=27160, max=40696, per=100.00%,
> avg=30474.29, stdev=2826.23, samples=59
> iops : min=
6790, max=10174,
> avg=7618.56, stdev=706.56, samples=59 lat (usec) :
> 100=28.89%, 250=70.72%, 500=0.34%, 750=0.05%, 1000=0.01% lat
> (msec) : 2=0.01%, 10=0.01%
>
cpu
: usr=3.55%,
> sys=3.80%, ctx=228358, majf=0, minf=29 IO
> depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%,
> 32=0.0%, >=64=0.0%
submit :
> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=100.0%,
8=0.0%,
> 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued
> rwts: total=228333,0,0,0 short=0,0,0,0 dropped=0,0,0,0
> latency : target=0,
window=0,
> percentile=100.00%, depth=1
>
> Run status group 0 (all jobs):
> READ: bw=29.7MiB/s (31.2MB/s), 29.7MiB/s-29.7MiB/s
> (31.2MB/s-31.2MB/s), io=892MiB (935MB), run=30001-30001msec
>
> Disk stats (read/write):
> dm-0: ios=290/3, merge=0/0, ticks=2427/19,
> in_queue=2446, util=0.95%, aggrios=290/4, aggrmerge=0/0,
> aggrticks=2427/39, aggrin_queue=2332, aggrutil=0.95% sda:
> ios=290/4, merge=0/0, ticks=2427/39, in_queue=2332, util=0.95%
> ====================================
> ====================================
> fio --name=test --ioengine=rbd --clientname=admin \ -
> -pool=benchmark --rbdname=test --numjobs=1 \ --
> runtime=30 --direct=1 --size=2G \ --rw=write --bs=4k
> --iodepth=1
>
> test: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-
> 4096B, ioengine=rbd, iodepth=1
> fio-3.7
> Starting 1 process
> Jobs: 1 (f=1): [W(1)][100.0%][r=0KiB/s,w=6352KiB/s][r=0,w=1588 IOPS][eta
> 00m:00s]
> test: (groupid=0, jobs=1): err= 0: pid=56544: Mon Sep 14 19:03:36 2020
> write: IOPS=1604, BW=6417KiB/s (6571kB/s)(188MiB/30003msec)
> slat (nsec): min=2240, max=45925, avg=6526.95,
> stdev=3486.19 clat (usec): min=399, max=35411,
> avg=615.88, stdev=231.41 lat (usec):
min=402,
> max=35421, avg=622.40, stdev=232.08 clat percentiles
> (usec):
> | 1.00th=[
420],
> 5.00th=[ 449], 10.00th=[ 469], 20.00th=[ 498],
> | 30.00th=[ 529],
40.00th=[ 562],
> 50.00th=[ 611], 60.00th=[ 652],
|
> 70.00th=[ 685], 80.00th=[ 709], 90.00th=[ 766],
> 95.00th=[ 799], |
99.00th=[ 881],
> 99.50th=[ 955], 99.90th=[ 2671], 99.95th=[ 3097],
> | 99.99th=[ 3785] bw
(
> KiB/s): min= 5944, max= 6792, per=100.00%, avg=6415.95, stdev=178.72,
> samples=60
iops :
> min= 1486, max= 1698, avg=1603.93, stdev=44.67, samples=60 lat
> (usec) : 500=20.82%, 750=67.23%, 1000=11.55% lat
> (msec) : 2=0.25%, 4=0.14%, 10=0.01%, 20=0.01%, 50=0.01%
>
cpu
:
> usr=1.22%, sys=1.25%, ctx=48143, majf=0, minf=18 IO
> depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%,
> 32=0.0%, >=64=0.0%
submit :
> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=100.0%,
8=0.0%,
> 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued
> rwts: total=0,48129,0,0 short=0,0,0,0 dropped=0,0,0,0
> latency : target=0,
window=0,
> percentile=100.00%, depth=1
>
> Run status group 0 (all jobs):
> WRITE: bw=6417KiB/s (6571kB/s), 6417KiB/s-6417KiB/s (6571kB/s-
> 6571kB/s), io=188MiB (197MB), run=30003-30003msec
>
> Disk stats (read/write):
> dm-0: ios=31/2, merge=0/0, ticks=342/14,
in_queue=356,
> util=0.12%, aggrios=33/3, aggrmerge=0/0, aggrticks=390/27,
> aggrin_queue=404, aggrutil=0.13% sda: ios=33/3, merge=0/0,
> ticks=390/27, in_queue=404, util=0.13%
> ====================================
>
> Does that make sense? How do you benchmark your Ceph cluster?
> Appreciate if you could share your experiences here.
>
> Thanks!
> Tony
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an
> email to ceph-users-leave(a)ceph.io
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an
> email to ceph-users-leave(a)ceph.io