FWIW when using rbd-mirror to migrate volumes between SATA SSD clusters, I found that
rbd_mirror_journal_max_fetch_bytes:
section: "client"
value: "33554432"
rbd_journal_max_payload_bytes:
section: "client"
value: “8388608"
Made a world of difference in expediting journal reply on Luminous 12.2.2. With defaults,
some active voumes would take hours to converge, and a couple were falling even more
behind.
This was mirroring 1 to 2 volumes at a time. YMMV.
On Mar 10, 2020, at 7:36 AM, Ml Ml
<mliebherr99(a)googlemail.com> wrote:
Hello Jason,
thanks for that fast reply.
This is now my /etc/ceph/ceph.conf
[client]
rbd_mirror_journal_max_fetch_bytes = 4194304
I stopped and started my rbd-mirror manually with:
rbd-mirror -d -c /etc/ceph/ceph.conf
Still same result. Slow speed shown by iftop and entries_behind_master
keeps increasing a lot if i produce 20MB/sec traffic on that
replication image.
The latency is like:
--- 10.10.50.1 ping statistics ---
100 packets transmitted, 100 received, 0% packet loss, time 20199ms
rtt min/avg/max/mdev = 0.067/0.286/1.418/0.215 ms
iperf from the source node to the destination node (where the
rbd-mirror runs): 8.92 Gbits/sec
Any other idea?
Thanks,
Michael
On Tue, Mar 10, 2020 at 2:19 PM Jason Dillaman <jdillama(a)redhat.com> wrote:
On Tue, Mar 10, 2020 at 6:47 AM Ml Ml <mliebherr99(a)googlemail.com> wrote:
Hello List,
when i initially enable journal/mirror on an image it gets
bootstrapped to my site-b pretty quickly with 250MB/sec which is about
the IO Write limit.
Once its up2date, the replay is very slow. About 15KB/sec and the
entries_behind_maste is just running away:
root@ceph01:~# rbd --cluster backup mirror pool status rbd-cluster6 --verbose
health: OK
images: 3 total
3 replaying
...
vm-112-disk-0:
global_id: 60a795c3-9f5d-4be3-b9bd-3df971e531fa
state: up+replaying
description: replaying, master_position=[object_number=623,
tag_tid=3, entry_tid=345567], mirror_position=[object_number=35,
tag_tid=3, entry_tid=18371], entries_behind_master=327196
last_update: 2020-03-10 11:36:44
...
Write traffic on the source is about 20/25MB/sec.
On the Source i run 14.2.6 and on the destination 12.2.13.
Any idea why the replaying is sooo slow?
What is the latency between the two clusters?
I would recommend increasing the "rbd_mirror_journal_max_fetch_bytes"
config setting (defaults to 32KiB) on your destination cluster. i.e.
try adding add "rbd_mirror_journal_max_fetch_bytes = 4194304" to the
"[client]" section of your Ceph configuration file on the node where
"rbd-mirror" daemon is running, and restart it. It defaults to a very
small read size from the remote cluster in a primitive attempt to
reduce the potential memory usage of the rbd-mirror daemon, but it has
the side-effect of slowing down mirroring for links with higher
latencies.
Thanks,
Michael
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io
--
Jason
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io