[ceph-users] Re: rbd-mirror replay is very slow - but initial bootstrap is fast

10 Mar 2020

FWIW when using rbd-mirror to migrate volumes between SATA SSD clusters, I found that 

   rbd_mirror_journal_max_fetch_bytes:
    section: "client"
    value: "33554432"

  rbd_journal_max_payload_bytes:
    section: "client"
    value: “8388608"

Made a world of difference in expediting journal reply on Luminous 12.2.2.  With defaults,
some active voumes would take hours to converge, and a couple were falling even more
behind.

This was mirroring 1 to 2 volumes at a time.  YMMV.

...
  On Mar 10, 2020, at 7:36 AM, Ml Ml
&lt;mliebherr99(a)googlemail.com&gt; wrote:

 Hello Jason,

 thanks for that fast reply.

 This is now my /etc/ceph/ceph.conf

 [client]
 rbd_mirror_journal_max_fetch_bytes = 4194304

 I stopped and started my rbd-mirror manually with:
 rbd-mirror -d -c /etc/ceph/ceph.conf

 Still same result. Slow speed shown by iftop and entries_behind_master
 keeps increasing a lot if i produce 20MB/sec traffic on that
 replication image.

 The latency is like:
 --- 10.10.50.1 ping statistics ---
 100 packets transmitted, 100 received, 0% packet loss, time 20199ms
 rtt min/avg/max/mdev = 0.067/0.286/1.418/0.215 ms

 iperf from the source node to the destination node (where the
 rbd-mirror runs): 8.92 Gbits/sec

 Any other idea?

 Thanks,
 Michael

 On Tue, Mar 10, 2020 at 2:19 PM Jason Dillaman &lt;jdillama(a)redhat.com&gt; wrote:

 On Tue, Mar 10, 2020 at 6:47 AM Ml Ml &lt;mliebherr99(a)googlemail.com&gt; wrote:

 Hello List,

 when i initially enable journal/mirror on an image it gets
 bootstrapped to my site-b pretty quickly with 250MB/sec which is about
 the IO Write limit.

 Once its up2date, the replay is very slow. About 15KB/sec and the
 entries_behind_maste is just running away:

 root@ceph01:~# rbd --cluster backup mirror pool status rbd-cluster6 --verbose
 health: OK
 images: 3 total
    3 replaying

 ...

 vm-112-disk-0:
  global_id:   60a795c3-9f5d-4be3-b9bd-3df971e531fa
  state:       up+replaying
  description: replaying, master_position=[object_number=623,
 tag_tid=3, entry_tid=345567], mirror_position=[object_number=35,
 tag_tid=3, entry_tid=18371], entries_behind_master=327196
  last_update: 2020-03-10 11:36:44

 ...

 Write traffic on the source is about 20/25MB/sec.

 On the Source i run 14.2.6 and on the destination 12.2.13.

 Any idea why the replaying is sooo slow?  
 What is the latency between the two clusters?

 I would recommend increasing the "rbd_mirror_journal_max_fetch_bytes"
 config setting (defaults to 32KiB) on your destination cluster. i.e.
 try adding add "rbd_mirror_journal_max_fetch_bytes = 4194304" to the
 "[client]" section of your Ceph configuration file on the node where
 "rbd-mirror" daemon is running, and restart it. It defaults to a very
 small read size from the remote cluster in a primitive attempt to
 reduce the potential memory usage of the rbd-mirror daemon, but it has
 the side-effect of slowing down mirroring for links with higher
 latencies.

 Thanks,
 Michael
 _______________________________________________
 ceph-users mailing list -- ceph-users(a)ceph.io
 To unsubscribe send an email to ceph-users-leave(a)ceph.io

 --
 Jason
   _______________________________________________
 ceph-users mailing list -- ceph-users(a)ceph.io
 To unsubscribe send an email to ceph-users-leave(a)ceph.io 

2024

2023

2022

2021

2020

2019

[ceph-users] Re: rbd-mirror replay is very slow - but initial bootstrap is fast