[ceph-users] Re: iSCSI write performance

25 Oct 2019

Just to clarify, it is better to separate the different performance cases:

1- regular io performance ( iops / throughput ), this should be good.

2- vmotion within datastores managed by Ceph: this will be good, as 
xcopy will be used.

3. vmotion between Ceph datastore and an external datastore..this will 
be bad. This seems the case you are testing. It is bad because between 2 
different storage systems (iqns are served on different targets), vaai 
xcopy cannot be used and vmware does its own stuff. It moves data using 
64k block size, which gives low performance...to add some flavor, it 
does indeed use 32 threads, but unfortunately they use co-located 
addresses which does not work well in Ceph as they are hitting the same 
rbd object, which gets serialized due to pg locks, so you will not get 
any palatalization. Your speed will mostly be determined by a serial 
64k, so with 1 ms write latency for ssd cluster, you will get around 64 
MB/s..it will be slightly higher as the extra threads have some low effect.

Note your esxtop does show 32 active ios under ACTV, the QUED of zero 
does is not the queue depth, but rather the "queued" io the ESX would 
suspend in case your active reaches the maximum by adapater ( 128 ).

This is just to clarify, if case 3 is not your primary concern than i 
would forget about it and benchmark 1 and 2 if they are relevant. Else, 
if 3 is important, i am not sure you can do much as it is happening 
within vmware..maybe there could be a way to map the external iqn to be 
served by the same target serving the Ceph iqn then there could be a 
chance the xcopy could be activated..Mike would probably know if this 
has any chance of working :)

/Maged

On 25/10/2019 22:01, Ryan wrote:
...
  esxtop is showing a queue length of 0

 Storage motion to ceph
 DEVICE                                PATH/WORLD/PARTITION DQLEN WQLEN 
 ACTV QUED %USD  LOAD   CMDS/s  READS/s WRITES/s MBREAD/s MBWRTN/s 
 DAVG/cmd KAVG/cmd GAVG/cmd QAVG/cmd
 naa.6001405ec60d8b82342404d929fbbd03           - 128     -   32    0   
 25  0.25  1442.32     0.18  1440.50 0.00    89.78    21.32     0.01   
  21.34     0.01

 Storage motion from ceph
 DEVICE                                PATH/WORLD/PARTITION DQLEN WQLEN 
 ACTV QUED %USD  LOAD   CMDS/s  READS/s WRITES/s MBREAD/s MBWRTN/s 
 DAVG/cmd KAVG/cmd GAVG/cmd QAVG/cmd
 naa.6001405ec60d8b82342404d929fbbd03           - 128     -   32    0   
 25  0.25  4065.38  4064.83     0.36 253.52     0.00     7.57     0.01 
     7.58     0.00

 I tried using fio like you mentioned but it was hanging with 
 [r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS] and the ETA kept climbing. I ended 
 up using rbd bench on the ceph iscsi gateway. With a 64K write 
 workload I'm seeing 400MB/s transfers.

 rbd create test --size 100G --image-feature layering
 rbd map test
 mkfs.ext4 /dev/rbd/rbd/test
 mount /dev/rbd/rbd/test test

 rbd create testec --size 100G --image-feature layering --data-pool rbd_ec
 rbd map testec
 mkfs.ext4 /dev/rbd/rbd/testec
 mount /dev/rbd/rbd/testec testec

 [root@ceph-iscsi1 mnt]# rbd bench --image test --io-size 64K --io-type 
 write --io-total 10G
 bench  type write io_size 65536 io_threads 16 bytes 10737418240 
 pattern sequential
   SEC       OPS   OPS/SEC   BYTES/SEC
     1      6368   6377.59  417961796.64
     2     12928   6462.27  423511630.71
     3     19296   6420.18  420752986.78
     4     26320   6585.61  431594792.67
     5     33296   6662.37  436624891.04
     6     40128   6754.67  442673957.25
     7     46784   6765.75  443400452.26
     8     53280   6809.02  446236110.93
     9     60032   6739.67  441691068.73
    10     66784   6698.91  439019550.77
    11     73616   6690.88  438493253.66
    12     80016   6654.35  436099640.00
    13     85712   6485.07  425005611.11
    14     91088   6202.49  406486113.46
    15     96896   6021.17  394603137.62
    16    102368   5741.19  376254347.24
    17    107568   5501.57  360550910.38
    18    113728   5603.17  367209502.58
    19    120144   5820.48  381451245.32
    20    126496   5917.60  387816078.53
    21    132768   6089.71  399095466.00
    22    139040   6306.98  413334431.09
    23    145104   6276.42  411331743.63
    24    151440   6256.67  410036891.68
    25    157808   6261.12  410328554.98
    26    163456   6140.03  402392725.36
 elapsed:    26  ops:   163840  ops/sec:  6271.36  bytes/sec: 410999626.38

 [root@ceph-iscsi1 mnt]# rbd bench --image testec --io-size 64K 
 --io-type write --io-total 10G
 bench  type write io_size 65536 io_threads 16 bytes 10737418240 
 pattern sequential
   SEC       OPS   OPS/SEC   BYTES/SEC
     1      7392   7415.38  485974266.41
     2     14464   7243.59  474715656.29
     3     22000   7341.08  481104853.50
     4     29408   7352.29  481839517.16
     5     37296   7459.38  488857889.75
     6     44864   7494.36  491150574.57
     7     52848   7676.76  503104281.98
     8     60784   7756.76  508347136.11
     9     68608   7835.26  513491609.52
    10     76784   7902.30  517885290.67
    11     84544   7935.96  520091129.45
    12     92432   7916.76  518832844.57
    13    100064   7855.96  514848275.43
    14    107040   7692.52  504136734.09
    15    114320   7499.66  491497933.56
    16    121744   7436.99  487390477.85
    17    129664   7438.92  487517345.01
    18    136704   7326.50  480149408.39
    19    144960   7587.00  497221460.09
    20    153264   7796.56  510955233.33
    21    160832   7814.44  512126854.90
 elapsed:    21  ops:   163840  ops/sec:  7659.97  bytes/sec: 502004079.43

 On Fri, Oct 25, 2019 at 11:54 AM Mike Christie &lt;mchristi(a)redhat.com 
 <mailto:mchristi@redhat.com>> wrote:

     On 10/24/2019 11:47 PM, Ryan wrote:
  I'm using CentOS 7.7.1908 with kernel  
   3.10.0-1062.1.2.el7.x86_64. The
  workload was a VMware Storage Motion from a local
SSD backed      datastore

     Ignore my comments. I thought you were just doing fio like tests
     in the vm.

  to the ceph backed datastore. Performance was
measured using      dstat on
  the iscsi gateway for network traffic and ceph
status as this      cluster is
  basically idle.  I changed max_data_area_mb to
256 and      cmdsn_depth to
  128. This appears to have given a slight
improvement of maybe      10MB/s.

 Moving VM to the ceph backed datastore
 io:
     client:   124 KiB/s rd, 76 MiB/s wr, 95 op/s rd, 1.26k op/s wr

 Moving VM off the ceph backed datastore
   io:
     client:   344 MiB/s rd, 625 KiB/s wr, 5.54k op/s rd, 62 op/s wr

     If you run esxtop while running your test what do you see for the
     number
     of commands in the iscsi LUN's queue?

  I'm going to test bonnie++ with an rbd volume
mounted directly      on the

     To try and isolate if its the iscsi or rbd, you need to run fio
     with the
     librbd io engine. We know krbd is going to be the fastest. ceph-iscsi
     uses librbd so it is a better baseline. If you are not familiar
     with fio
     you can just do something like:

     fio --group_reporting --ioengine=rbd --direct=1 --name=librbdtest
     --numjobs=32 --bs=512k --iodepth=128 --size=10G  --rw=write
     --rbd=name_of_your_image -pool=name_of_pool

  iscsi gateway. Also will test bonnie++ inside a
VM on a ceph backed
 datastore.

 On Thu, Oct 24, 2019 at 7:15 PM Mike Christie      &lt;mchristi(a)redhat.com
<mailto:mchristi@redhat.com>
  <mailto:mchristi@redhat.com
<mailto:mchristi@redhat.com>>> wrote:

     On 10/24/2019 12:22 PM, Ryan wrote:
     > I'm in the process of testing the iscsi target feature of      ceph.
The
      > cluster is running ceph 14.2.4 and
ceph-iscsi 3.3. It      consists of 5

     What kernel are you using?

     > hosts with 12 SSD OSDs per host. Some basic testing moving      VMs to
      a ceph
     > backed datastore is only showing 60MB/s transfers. However      moving
      these
  back off the datastore is fast at 200-300MB/s.

     What is the workload and what are you using to measure the      throughput?

     If you are using fio, what arguments are you using? And,      could you
      change the ioengine to rbd and re-run the
test from the      target system so
      we can check if rbd is slow or iscsi?

     For small IOs, 60 is about right.

     For 128-512K IOs you should be able to get around 300 MB/s      for writes
      and 600 for reads.

     1. Increase max_data_area_mb. This is a kernel buffer      lio/tcmu uses to
      pass data between the kernel and tcmu-runner.
The default is      only 8MB.

     In gwcli cd to your disk and do:

     # reconfigure max_data_area_mb %N

     where N is between 8 and 2048 MBs.

     2. The Linux kernel target only allows 64 commands per iscsi      session by
      default. We increase that to 128, but you can
increase this      to 512.

     In gwcli cd to the target dir and do

     reconfigure cmdsn_depth 512

     3. I think ceph-iscsi and lio work better with higher queue      depths so if
      you are using fio you want higher numjobs
and/or iodepths.

     >
     > What should I be looking at to track down the write      performance
issue?
      > In comparison with the Nimble Storage
arrays I can see      200-300MB/s in
      > both directions.
     >
     > Thanks,
     > Ryan
     >
     >
     > _______________________________________________
     > ceph-users mailing list -- ceph-users(a)ceph.io     
<mailto:ceph-users@ceph.io>
      <mailto:ceph-users@ceph.io
<mailto:ceph-users@ceph.io>>
     > To unsubscribe send an email to ceph-users-leave(a)ceph.io     
<mailto:ceph-users-leave@ceph.io>
      <mailto:ceph-users-leave@ceph.io    
 <mailto:ceph-users-leave@ceph.io>>

 _______________________________________________
 ceph-users mailing list -- ceph-users(a)ceph.io
 To unsubscribe send an email to ceph-users-leave(a)ceph.io 

2024

2023

2022

2021

2020

2019

[ceph-users] Re: iSCSI write performance