[ceph-users] Re: rbd-mirror replay is very slow - but initial bootstrap is fast

10 Mar 2020

On Tue, Mar 10, 2020 at 11:53 AM Ml Ml &lt;mliebherr99(a)googlemail.com&gt; wrote:
...

 Hello Jason,

 okay, good hint!

 I did not realize, that it will write the journal 1:1 but that makes
 sense. I will benchmark it later. 
Yes, it's replaying the exact IOs again to ensure it's point-in-time consistent.

...
  However, my backup cluster is the place where the old
spinning rust
 will find its last dedication.
 Therefore it will never be as fast as the live cluster.

 Looking that the modes, i should change from Journal-based to
 Snapshot-based mirroring? 
Well, snapshot-based mirroring hasn't been released yet (technically)
since it's new with Octopus. It might be better in such an
environment, however, since it has the potential to reduce the number
of IOs.

...
  Thanks,
 Michael

 On Tue, Mar 10, 2020 at 3:43 PM Jason Dillaman &lt;jdillama(a)redhat.com&gt; wrote:

 On Tue, Mar 10, 2020 at 10:36 AM Ml Ml &lt;mliebherr99(a)googlemail.com&gt; wrote:

 Hello Jason,

 thanks for that fast reply.

 This is now my /etc/ceph/ceph.conf

 [client]
 rbd_mirror_journal_max_fetch_bytes = 4194304

 I stopped and started my rbd-mirror manually with:
 rbd-mirror -d -c /etc/ceph/ceph.conf

 Still same result. Slow speed shown by iftop and entries_behind_master
 keeps increasing a lot if i produce 20MB/sec traffic on that
 replication image.

 The latency is like:
  --- 10.10.50.1 ping statistics ---
 100 packets transmitted, 100 received, 0% packet loss, time 20199ms
 rtt min/avg/max/mdev = 0.067/0.286/1.418/0.215 ms

 iperf from the source node to the destination node (where the
 rbd-mirror runs): 8.92 Gbits/sec

 Any other idea? 
 Do you know the average IO sizes against the primary image? Can you
 create a similar image in the secondary cluster and run "fio" or "rbd
 bench-write" against it using similar settings to verify that your
 secondary cluster can handle the IO load? The initial image sync
 portion will be issuing large, whole-object writes whereas the journal
 replay will replay the writes exactly as written in the journal.

  Thanks,
 Michael

 On Tue, Mar 10, 2020 at 2:19 PM Jason Dillaman &lt;jdillama(a)redhat.com&gt; wrote:

 On Tue, Mar 10, 2020 at 6:47 AM Ml Ml &lt;mliebherr99(a)googlemail.com&gt; wrote:
 >
 > Hello List,
 >
 > when i initially enable journal/mirror on an image it gets
 > bootstrapped to my site-b pretty quickly with 250MB/sec which is about
 > the IO Write limit.
 >
 > Once its up2date, the replay is very slow. About 15KB/sec and the
 > entries_behind_maste is just running away:
 >
 > root@ceph01:~# rbd --cluster backup mirror pool status rbd-cluster6 --verbose
 > health: OK
 > images: 3 total
 >     3 replaying
 >
 > ...
 >
 > vm-112-disk-0:
 >   global_id:   60a795c3-9f5d-4be3-b9bd-3df971e531fa
 >   state:       up+replaying
 >   description: replaying, master_position=[object_number=623,
 > tag_tid=3, entry_tid=345567], mirror_position=[object_number=35,
 > tag_tid=3, entry_tid=18371], entries_behind_master=327196
 >   last_update: 2020-03-10 11:36:44
 >
 > ...
 >
 > Write traffic on the source is about 20/25MB/sec.
 >
 > On the Source i run 14.2.6 and on the destination 12.2.13.
 >
 > Any idea why the replaying is sooo slow?

 What is the latency between the two clusters?

 I would recommend increasing the "rbd_mirror_journal_max_fetch_bytes"
 config setting (defaults to 32KiB) on your destination cluster. i.e.
 try adding add "rbd_mirror_journal_max_fetch_bytes = 4194304" to the
 "[client]" section of your Ceph configuration file on the node where
 "rbd-mirror" daemon is running, and restart it. It defaults to a very
 small read size from the remote cluster in a primitive attempt to
 reduce the potential memory usage of the rbd-mirror daemon, but it has
 the side-effect of slowing down mirroring for links with higher
 latencies.

 >
 > Thanks,
 > Michael
 > _______________________________________________
 > ceph-users mailing list -- ceph-users(a)ceph.io
 > To unsubscribe send an email to ceph-users-leave(a)ceph.io
 >

 --
 Jason

 --
 Jason

-- 
Jason

2024

2023

2022

2021

2020

2019

[ceph-users] Re: rbd-mirror replay is very slow - but initial bootstrap is fast