struggling to achieve high bandwidth on Ceph dev cluster - HELP - ceph-users

List overview All Threads
Download

newer

struggling to achieve high bandwidth on Ceph dev cluster - HELP

older

osds processes shutdown during...

POC Hardware questions

Bobby

10 Feb 2021 10 Feb '21

noon

Hi, Hello I am using rados bench tool. Currently I am using this tool on the development cluster after running vstart.sh script. It is working fine and I am interested in benchmarking the cluster. However I am struggling to achieve a good bandwidth i.e. bandwidth (MB/sec). My target throughput is at least 50 MB/sec and more. But mostly I am achieving is around 15-20 MB/sec. So, very poor. I am quite sure I am missing something. Either I have to change my cluster through vstart.sh script or I am not fully utilizing the rados bench tool. Or may be both. i.e. not the right cluster and also not using the rados bench tool correctly. Some of the shell examples I have been using to build the cluster are bellow: MDS=0 RGW=1 ../src/vstart.sh -d -l -n --bluestore MDS=0 RGW=1 MON=1 OSD=4../src/vstart.sh -d -l -n --bluestore While using rados bench tool I have been trying with different block sizes 4K, 8K, 16K, 32K, 64K, 128K, 256K, 512K. And I have also been changing the -t parameter in the shell to increase concurrent IOs. Looking forward to help. Bobby

Show replies by date

Marc

10 Feb 10 Feb

12:12 p.m.

try 4MB that is the default not?

...

-----Original Message----- Sent: 10 February 2021 09:30 To: ceph-users <ceph-users(a)ceph.io>io>; dev <dev(a)ceph.io>io>; ceph-qa(a)ceph.io Subject: [ceph-users] struggling to achieve high bandwidth on Ceph dev cluster - HELP Hi, Hello I am using rados bench tool. Currently I am using this tool on the development cluster after running vstart.sh script. It is working fine and I am interested in benchmarking the cluster. However I am struggling to achieve a good bandwidth i.e. bandwidth (MB/sec). My target throughput is at least 50 MB/sec and more. But mostly I am achieving is around 15-20 MB/sec. So, very poor. I am quite sure I am missing something. Either I have to change my cluster through vstart.sh script or I am not fully utilizing the rados bench tool. Or may be both. i.e. not the right cluster and also not using the rados bench tool correctly. Some of the shell examples I have been using to build the cluster are bellow: MDS=0 RGW=1 ../src/vstart.sh -d -l -n --bluestore MDS=0 RGW=1 MON=1 OSD=4../src/vstart.sh -d -l -n --bluestore While using rados bench tool I have been trying with different block sizes 4K, 8K, 16K, 32K, 64K, 128K, 256K, 512K. And I have also been changing the -t parameter in the shell to increase concurrent IOs.

Bobby

12:43 p.m.

thanks for the reply. Yes, 4MB is the default. I have tried it. For example below (posted) is for 4MB (default) ran for 600 seconds. The seq read and rand read gives me a good bandwidth (not posted here). But with write its still very less. And I am particularly interested in block sizes. And rados bench tool has block size option which I have been using. Total time run: 601.106 Total writes made: 2966 Write size: 4194304 Object size: 4194304 Bandwidth (MB/sec): 19.7369 Stddev Bandwidth: 14.8408 Max bandwidth (MB/sec): 64 Min bandwidth (MB/sec): 0 Average IOPS: 4 Stddev IOPS: 3.67408 Max IOPS: 16 Min IOPS: 0 Average Latency(s): 3.24064 Stddev Latency(s): 2.75111 Max latency(s): 42.4551 Min latency(s): 0.167701 On Wed, Feb 10, 2021 at 9:46 AM Marc <Marc(a)f1-outsourcing.eu> wrote:

...

try 4MB that is the default not?

Marc

1:13 p.m.

You have to tell a bit about your cluster setup, like nr of osd's, 3x replication on your testing pool? Eg. this[1] was my test on a cluster with only 1gbit ethernet, 3x repl hdd pool. This[2] with 10gbit and more osd's added [2] [root@c01 ~]# rados bench -p rbd 10 write hints = 1 Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10 seconds or 0 objects Object prefix: benchmark_data_c01_3576497 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 0 0 0 0 0 0 - 0 1 16 41 25 99.9948 100 0.198773 0.41148 2 16 101 85 169.984 240 0.203578 0.347027 3 16 172 156 207.979 284 0.0863202 0.296866 4 16 245 229 228.975 292 0.139681 0.268933 5 16 322 306 244.772 308 0.107296 0.257353 6 16 385 369 245.97 252 0.601879 0.250782 7 16 460 444 253.684 300 0.154803 0.247178 8 16 541 525 262.467 324 0.274302 0.241951 9 16 604 588 261.3 252 0.11929 0.238717 10 16 672 656 262.367 272 0.134654 0.241424 Total time run: 10.1504 Total writes made: 673 Write size: 4194304 Object size: 4194304 Bandwidth (MB/sec): 265.212 Stddev Bandwidth: 63.0823 Max bandwidth (MB/sec): 324 Min bandwidth (MB/sec): 100 Average IOPS: 66 Stddev IOPS: 15.7706 Max IOPS: 81 Min IOPS: 25 Average Latency(s): 0.241012 Stddev Latency(s): 0.154282 Max latency(s): 1.05851 Min latency(s): 0.0702826 Cleaning up (deleting benchmark objects) Removed 673 objects Clean up completed and total clean up time :1.26346 [1] [@]# rados bench -p rbd 10 write --no-cleanup hints = 1 Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10 seconds or 0 objects Object prefix: benchmark_data_c01_18283 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 0 0 0 0 0 0 - 0 1 16 27 11 43.9884 44 0.554119 0.624979 2 16 47 31 61.9841 80 1.04112 0.793553 3 16 57 41 54.654 40 1.33104 0.876273 4 16 75 59 58.9869 72 0.840098 0.97091 5 16 97 81 64.7864 88 1.02915 0.922043 6 16 105 89 59.3207 32 1.2471 0.915408 7 16 129 113 64.5582 96 0.616579 0.947882 8 16 145 129 64.4866 64 1.09397 0.921441 9 16 163 147 65.3201 72 0.885566 0.906388 10 16 166 150 59.9881 12 1.22834 0.909591 11 13 167 154 55.9889 16 2.30029 0.942798 Total time run: 11.141939 Total writes made: 167 Write size: 4194304 Object size: 4194304 Bandwidth (MB/sec): 59.9537 Stddev Bandwidth: 28.7889 Max bandwidth (MB/sec): 96 Min bandwidth (MB/sec): 12 Average IOPS: 14 Stddev IOPS: 7 Max IOPS: 24 Min IOPS: 3 Average Latency(s): 1.06157 Stddev Latency(s): 0.615773 Max latency(s): 3.23088 Min latency(s): 0.171585

...

-----Original Message----- Sent: 10 February 2021 10:14 To: Marc <Marc(a)f1-outsourcing.eu> Cc: ceph-users <ceph-users(a)ceph.io> Subject: [ceph-users] Re: struggling to achieve high bandwidth on Ceph dev cluster - HELP thanks for the reply. Yes, 4MB is the default. I have tried it. For example below (posted) is for 4MB (default) ran for 600 seconds. The seq read and rand read gives me a good bandwidth (not posted here). But with write its still very less. And I am particularly interested in block sizes. And rados bench tool has block size option which I have been using. Total time run: 601.106 Total writes made: 2966 Write size: 4194304 Object size: 4194304 Bandwidth (MB/sec): 19.7369 Stddev Bandwidth: 14.8408 Max bandwidth (MB/sec): 64 Min bandwidth (MB/sec): 0 Average IOPS: 4 Stddev IOPS: 3.67408 Max IOPS: 16 Min IOPS: 0 Average Latency(s): 3.24064 Stddev Latency(s): 2.75111 Max latency(s): 42.4551 Min latency(s): 0.167701 On Wed, Feb 10, 2021 at 9:46 AM Marc <Marc(a)f1-outsourcing.eu> wrote:

try 4MB that is the default not? > -----Original Message----- > Sent: 10 February 2021 09:30 > To: ceph-users <ceph-users(a)ceph.io>io>; dev <dev(a)ceph.io>io>; ceph-

qa(a)ceph.io

> Subject: [ceph-users] struggling to achieve high bandwidth on Ceph

dev

> cluster - HELP > > Hi, > > Hello I am using rados bench tool. Currently I am using this tool

> the > development cluster after running vstart.sh script. It is working

fine

> and > I am interested in benchmarking the cluster. However I am struggling

> achieve a good bandwidth i.e. bandwidth (MB/sec). My target

throughput

> is > at least 50 MB/sec and more. But mostly I am achieving is around 15-

> MB/sec. So, very poor. > > I am quite sure I am missing something. Either I have to change my > cluster > through vstart.sh script or I am not fully utilizing the rados bench > tool. > Or may be both. i.e. not the right cluster and also not using the

rados

> bench tool correctly. > > Some of the shell examples I have been using to build the cluster

are

> bellow: > MDS=0 RGW=1 ../src/vstart.sh -d -l -n --bluestore > MDS=0 RGW=1 MON=1 OSD=4../src/vstart.sh -d -l -n --bluestore > > While using rados bench tool I have been trying with different block > sizes > 4K, 8K, 16K, 32K, 64K, 128K, 256K, 512K. And I have also been

changing

the -t parameter in the shell to increase concurrent IOs.

_______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

Bobby

1:51 p.m.

thanks, this looks really helpful and it proves me that I am not doing the right way. And you had the hit the nail by asking about *replication factor*. Because I don't know how to change the replication factor. AFAIK, by default it is *3x*. But I would like to change, for example to* 2x*. So please excuse me for two naive questions before my cluster info [1]: - How can I change my replication factor? I am assuming I can change it through vstart script. - How can I change ethernet speed on test cluster? For example, 1gbit ethernet and 10gbit ethernet. Like you had done it. Assuming I can change it through vstart script. [1] I am running a minimal cluster of 4 OSDs . I am passing following shell parameters for vstart.sh: MDS=1 RGW=1 MON=1 OSD=4 ../src/vstart.sh -d -l -n --bluestore cluster: id: fce9b3c6-2814-4df2-a5e5-ee0d001a8f4f health: HEALTH_OK services: mon: 1 daemons, quorum a (age 4m) mgr: x(active, since 4m) osd: 4 osds: 4 up (since 3m), 4 in (since 3m) rgw: 1 daemon active (8000) data: pools: 5 pools, 112 pgs objects: 329 objects, 27 KiB usage: 4.0 GiB used, 400 GiB / 404 GiB avail pgs: 0.893% pgs not active 111 active+clean 1 peering On Wed, Feb 10, 2021 at 10:47 AM Marc <Marc(a)f1-outsourcing.eu> wrote:

...

try 4MB that is the default not? > -----Original Message----- > Sent: 10 February 2021 09:30 > To: ceph-users <ceph-users(a)ceph.io>io>; dev <dev(a)ceph.io>io>; ceph-

qa(a)ceph.io

> Subject: [ceph-users] struggling to achieve high bandwidth on Ceph

dev

> cluster - HELP > > Hi, > > Hello I am using rados bench tool. Currently I am using this tool

> the > development cluster after running vstart.sh script. It is working

fine

> and > I am interested in benchmarking the cluster. However I am struggling

> achieve a good bandwidth i.e. bandwidth (MB/sec). My target

throughput

> is > at least 50 MB/sec and more. But mostly I am achieving is around 15-

rados

> bench tool correctly. > > Some of the shell examples I have been using to build the cluster

are

changing

the -t parameter in the shell to increase concurrent IOs.

_______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

Marc

2:10 p.m.

...

And you had the hit the nail by asking about *replication factor*. Because I don't know how to change the replication factor. AFAIK, by default it is *3x*. But I would like to change, for example to* 2x*.

ceph osd pool get rbd size https://docs.ceph.com/en/latest/man/8/ceph/

...

So please excuse me for two naive questions before my cluster info [1]: - How can I change my replication factor? I am assuming I can change it through vstart script.

I have no idea what vstart is. If you want to learn ceph (and you should, if you are going to play with large amounts of other peoples data) install it manually. IMHO deployment tools are for making deployments easier and faster and not for for I don't know, so lets run a script.

...

- How can I change ethernet speed on test cluster? For example, 1gbit ethernet and 10gbit ethernet. Like you had done it. Assuming I can change it through vstart script.

Don't do it, it is a waste of time, it is just for reference. I wanted to know when I started creating my test cluster.

...

[1] I am running a minimal cluster of 4 OSDs .

I am not sure if you are going to get much more performance out of it then. Because you do not utilize the power of many osd's. This how my individual drives perform under the same rados bench test. All around the 20MB/s [@~]# dstat -d -D sdb,sdc,sdd,sdf,sdl,sdg,sdh,sdi --dsk/sdb-----dsk/sdc-----dsk/sdd-----dsk/sdf-----dsk/sdl-----dsk/sdg-----dsk/sdh-----dsk/sdi-- read writ: read writ: read writ: read writ: read writ: read writ: read writ: read writ 3664k 284k:2507k 172k:2692k 204k:6676k 467k:2405k 322k:3220k 230k:1932k 196k:2050k 202k 0 0 : 0 0 : 0 0 : 0 0 : 0 0 : 0 0 : 0 0 : 0 0 0 8192B: 0 0 : 0 28k: 0 44k: 0 928k: 0 28k: 0 0 : 0 12k 0 4096B: 0 0 : 0 36k: 68k 32k: 0 0 : 0 0 : 0 0 : 0 0 0 0 : 0 0 : 0 0 : 0 0 : 0 0 : 0 0 : 0 0 : 0 0 0 0 : 0 0 : 0 0 : 0 24k: 0 0 : 0 0 : 0 0 : 0 0 4096B 104k: 0 0 :4096B 20k: 0 0 :8192B 152k: 0 80k: 0 0 : 0 72k 0 0 : 0 0 : 0 0 : 0 0 : 0 0 : 0 0 : 0 0 : 0 4096B 0 0 : 0 12k: 0 0 : 0 0 : 0 12k: 0 24k: 0 0 : 0 12k 0 72k: 0 16k: 0 32k: 20k 100k: 0 4096B: 0 0 : 0 0 : 0 24k 0 8200k: 0 0 : 0 20M: 12k 20M: 0 28M: 0 20M: 0 12M: 0 20M 0 16M: 0 12M: 0 24M: 0 16M: 0 47M: 0 12M: 0 8212k: 0 20M 0 24M: 0 11M: 0 28M: 0 28M: 0 49M: 0 44M: 0 12M: 0 24M 0 38M: 0 13M: 0 42M: 0 32M: 0 28M: 0 31M: 0 4104k: 0 21M 0 50M: 0 8204k: 0 28M:4096B 44M: 0 61M: 0 33M: 0 8204k: 0 12M 0 32M: 0 20M:4096B 38M: 0 20M: 0 55M:8192B 39M: 0 32M: 0 24M 0 16M: 0 24M:4096B 29M: 0 36M: 0 28M: 0 17M: 0 37M: 0 0 4096B 44M: 0 16M: 0 40M: 44k 31M:4096B 28M:8192B 32M: 0 12M: 0 24M 0 12M: 0 28M: 0 6196k: 0 18M: 0 52M: 0 32M: 0 46M: 12k 40M 0 20M: 0 18M: 0 38M: 0 52M: 0 32M: 0 24M: 0 27M: 0 43M 0 128k: 0 2056k: 0 16k: 20k 12M: 0 0 : 0 8212k:4096B 12k:8192B 9804k 0 520k: 0 116k: 0 280k: 0 452k: 0 364k: 0 208k: 0 152k: 0 144k 0 64k: 0 88k: 0 64k: 0 132k:4096B 156k: 0 88k: 0 72k: 0 184k 0 140k: 0 0 : 0 0 :8192B 20k: 0 12k: 0 112k: 0 0 : 0 0 0 0 : 0 8192B: 0 12k: 32k 1044k: 0 0 :4096B 16k:4096B 0 : 0 24k 0 0 : 0 0 : 0 36k: 0 12k: 0 0 : 0 0 : 0 0 : 0 0 0 0 : 0 0 : 0 0 : 0 0 : 0 20k: 0 0 : 0 0 : 0 0 0 92k: 0 24k: 0 0 : 12k 60k: 0 0 : 0 0 : 0 0 : 320k 0 0 0 : 0 0 : 0 0 : 12k 80k: 0 20k: 0 0 : 0 0 : 512k 0 0 0 : 0 0 : 0 0 : 0 0 : 0 0 : 0 0 : 0 0 : 768k 0

Bobby

8:43 p.m.

thanks. Ceph source code contains a script called vstart.sh which allows developers to quickly test their code using a simple deployment on your development system. Here: https://docs.ceph.com/en/latest//dev/quick_guide/ Although I completely agree with your manual deployment part, I thought may be the script can also give a good idea. May be I need to ask this in another email that how far I can go with the script. Some more questions please: How many OSDs have you been using in your second email tests for 1gbit [1] and 10gbit [2] ethernet? Or to be precise, what is your cluster for both? On Wed, Feb 10, 2021 at 11:40 AM Marc <Marc(a)f1-outsourcing.eu> wrote:

...

ceph osd pool get rbd size https://docs.ceph.com/en/latest/man/8/ceph/

So please excuse me for two naive questions before my cluster info [1]: - How can I change my replication factor? I am assuming I can change it through vstart script.

- How can I change ethernet speed on test cluster? For example, 1gbit ethernet and 10gbit ethernet. Like you had done it. Assuming I can change it through vstart script.

Don't do it, it is a waste of time, it is just for reference. I wanted to know when I started creating my test cluster.

[1] I am running a minimal cluster of 4 OSDs .

Marc

11:51 p.m.

...

Some more questions please: How many OSDs have you been using in your second email tests for 1gbit [1] and 10gbit [2] ethernet? Or to be precise, what is your cluster for

When I was testing with 1gbit ethernet I had 11 osds on 4 servers, but this already showed saturated 1Gbit links. Now on the 10gbit ethernet DAC it is with 30 hdd's or so. Keep in mind that the default rados bench is using 16 threads. If I do the 1 thread I will get something like yours[1], if I do the same on the ssd pool, I get this[2]. And if I remove the 3x times replication on the ssd pool, this[3], and the 16 threads ssd pool with 3x on[4] Side note is that I did not fully tune my cluster on performance, I have still processors doing frequency/powerstate switching. Have slower hdd sata drives combined with faster sas. But this fits my use case. What I have should not be of interest to you. You have to determine what you need, and describe your use case, then there are quite a few good people here that can advice you how to realize that, or tell you it is not possible with ceph ;) [@~]# rados bench -t 1 -p rbd 10 write hints = 1 Maintaining 1 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10 seconds or 0 objects Object prefix: benchmark_data_c01_3768767 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 0 0 0 0 0 0 - 0 1 1 5 4 15.9973 16 0.278477 0.240159 2 1 10 9 17.9973 20 0.162663 0.219858 3 1 17 16 21.33 28 0.21535 0.181435 4 1 26 25 24.9965 36 0.154064 0.158931 5 1 33 32 25.5966 28 0.119773 0.153031 6 1 42 41 27.3295 36 0.064895 0.144242 7 1 50 49 27.9962 32 0.192591 0.142036 8 1 59 58 28.9961 36 0.108623 0.137699 9 1 69 68 30.2183 40 0.0684741 0.132143 10 1 78 77 30.796 36 0.118075 0.12872 Total time run: 10.1903 Total writes made: 79 Write size: 4194304 Object size: 4194304 Bandwidth (MB/sec): 31.01 Stddev Bandwidth: 7.78603 Max bandwidth (MB/sec): 40 Min bandwidth (MB/sec): 16 Average IOPS: 7 Stddev IOPS: 1.94651 Max IOPS: 10 Min IOPS: 4 Average Latency(s): 0.128988 Stddev Latency(s): 0.0571245 Max latency(s): 0.385165 Min latency(s): 0.0608502 Cleaning up (deleting benchmark objects) Removed 79 objects Clean up completed and total clean up time :2.49933 [2] [@~]# rados bench -t 1 -p rbd.ssd 10 write hints = 1 Maintaining 1 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10 seconds or 0 objects Object prefix: benchmark_data_c01_3769249 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 0 0 0 0 0 0 - 0 1 1 39 38 151.992 152 0.0318137 0.0258572 2 1 80 79 157.985 164 0.0239471 0.0250284 3 1 122 121 161.315 168 0.0240444 0.0247604 4 1 163 162 161.981 164 0.0270316 0.024625 5 1 204 203 162.38 164 0.0235799 0.0245714 6 1 246 245 163.313 168 0.0296698 0.0244574 7 1 286 285 162.836 160 0.0232353 0.0245383 8 1 326 325 162.479 160 0.0236261 0.0245476 9 1 367 366 162.646 164 0.0249223 0.0245132 10 1 408 407 162.779 164 0.0229952 0.0245034 Total time run: 10.0277 Total writes made: 409 Write size: 4194304 Object size: 4194304 Bandwidth (MB/sec): 163.149 Stddev Bandwidth: 4.63801 Max bandwidth (MB/sec): 168 Min bandwidth (MB/sec): 152 Average IOPS: 40 Stddev IOPS: 1.1595 Max IOPS: 42 Min IOPS: 38 Average Latency(s): 0.0245153 Stddev Latency(s): 0.00212425 Max latency(s): 0.0343171 Min latency(s): 0.0202639 Cleaning up (deleting benchmark objects) Removed 409 objects Clean up completed and total clean up time :0.521216 [3] [@~]# rados bench -t 1 -p rbd.ssd.r1 10 write hints = 1 Maintaining 1 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10 seconds or 0 objects Object prefix: benchmark_data_c01_3769477 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 0 0 0 0 0 0 - 0 1 1 62 61 243.98 244 0.0149218 0.0162273 2 1 130 129 257.972 272 0.0144574 0.0154287 3 1 198 197 262.639 272 0.0144917 0.0151589 4 1 266 265 264.973 272 0.0156794 0.0150565 5 1 333 332 265.572 268 0.0149153 0.0150315 6 1 401 400 266.636 272 0.0143737 0.014966 7 1 469 468 267.399 272 0.0155345 0.0149459 8 1 536 535 267.471 268 0.0171765 0.0149397 9 1 604 603 267.971 272 0.0168833 0.0149184 10 1 672 671 268.37 272 0.0145986 0.0148998 Total time run: 10.0309 Total writes made: 673 Write size: 4194304 Object size: 4194304 Bandwidth (MB/sec): 268.371 Stddev Bandwidth: 8.73308 Max bandwidth (MB/sec): 272 Min bandwidth (MB/sec): 244 Average IOPS: 67 Stddev IOPS: 2.18327 Max IOPS: 68 Min IOPS: 61 Average Latency(s): 0.014903 Stddev Latency(s): 0.00209157 Max latency(s): 0.0434543 Min latency(s): 0.0111273 Cleaning up (deleting benchmark objects) Removed 673 objects Clean up completed and total clean up time :0.69776 [4] [@~]# rados bench -p rbd.ssd 10 write hints = 1 Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10 seconds or 0 objects Object prefix: benchmark_data_c01_3771109 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 0 0 0 0 0 0 - 0 1 16 198 182 727.843 728 0.0647907 0.0820398 2 16 400 384 767.86 808 0.0977614 0.0812242 3 16 603 587 782.536 812 0.0322522 0.0801139 4 16 801 785 784.878 792 0.0621462 0.0804142 5 16 996 980 783.883 780 0.0854195 0.0807731 6 16 1202 1186 790.546 824 0.041 0.0805795 7 16 1408 1392 795.307 824 0.122898 0.0798532 8 16 1608 1592 795.88 800 0.0382256 0.0802024 9 16 1791 1775 788.773 732 0.0480604 0.0807028 10 16 1997 1981 792.286 824 0.0581529 0.080433 Total time run: 10.07 Total writes made: 1997 Write size: 4194304 Object size: 4194304 Bandwidth (MB/sec): 793.249 Stddev Bandwidth: 35.9481 Max bandwidth (MB/sec): 824 Min bandwidth (MB/sec): 728 Average IOPS: 198 Stddev IOPS: 8.98703 Max IOPS: 206 Min IOPS: 182 Average Latency(s): 0.0805776 Stddev Latency(s): 0.0339439 Max latency(s): 0.255658 Min latency(s): 0.0212272 Cleaning up (deleting benchmark objects) Removed 1997 objects Clean up completed and total clean up time :0.179425

Bobby

16 Feb 16 Feb

4:23 p.m.

@Marc: thanks a lot.. your results have been helpful to understand. @Mark: mainly HDDs.....not even one SSD.....so yes, pretty slow. On Wed, Feb 10, 2021 at 9:22 PM Marc <Marc(a)f1-outsourcing.eu> wrote:

...

Some more questions please: How many OSDs have you been using in your second email tests for 1gbit [1] and 10gbit [2] ethernet? Or to be precise, what is your cluster for

Mark Lehrer

10 Feb 10 Feb

11:46 p.m.

...

I am interested in benchmarking the cluster.

dstat is great, but can you send and example of this command on your osd machine: iostat -mtxy 1 This will also show some basic CPU info and more detailed analysis of the I/O pattern. What kind of drives are you using? Random access can be very slow on spinning drives, especially if you have to do log structured merging (double writes). Mark On Wed, Feb 10, 2021 at 1:31 AM Bobby <italienisch1987(a)gmail.com> wrote: > > Hi, > > Hello I am using rados bench tool. Currently I am using this tool on the > development cluster after running vstart.sh script. It is working fine and > I am interested in benchmarking the cluster. However I am struggling to > achieve a good bandwidth i.e. bandwidth (MB/sec). My target throughput is > at least 50 MB/sec and more. But mostly I am achieving is around 15-20 > MB/sec. So, very poor. > > I am quite sure I am missing something. Either I have to change my cluster > through vstart.sh script or I am not fully utilizing the rados bench tool. > Or may be both. i.e. not the right cluster and also not using the rados > bench tool correctly. > > Some of the shell examples I have been using to build the cluster are > bellow: > MDS=0 RGW=1 ../src/vstart.sh -d -l -n --bluestore > MDS=0 RGW=1 MON=1 OSD=4../src/vstart.sh -d -l -n --bluestore > > While using rados bench tool I have been trying with different block sizes > 4K, 8K, 16K, 32K, 64K, 128K, 256K, 512K. And I have also been changing the > -t parameter in the shell to increase concurrent IOs. > > > Looking forward to help. > > Bobby > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io

1185

days inactive

1191

days old

ceph-users@ceph.io

Manage subscription

9 comments

3 participants

tags (0)

participants (3)

Bobby
Marc
Mark Lehrer