Hi,
We have 2 clusters (v18.2.1) primarily used for RGW which has over 2+ billion RGW objects.
They are also in multisite configuration totaling to 2 zones and we've got around 2
Gbps of bandwidth dedicated (P2P) for the multisite traffic. We see that using
"radosgw-admin sync status" on the zone 2, all the 128 shards are recovering and
unfortunately there is very less data transfer from primary zone ie., the link utilization
is barely 100 Mbps / 2 Gbps. Our objects are quite small as well like avg. of 1 MB in
size.
On further inspection, we noticed the rgw access the logs at primary site are mostly
yielding "304 Not Modified" for RGWs at site-2. Is this expected? Here are some
of the logs (information is redacted)
root@host-04:~# tail -f /var/log/haproxy-msync.log
Feb 12 05:06:51 host-04 haproxy[971171]: 10.1.85.14:33730 [12/Feb/2024:05:06:51.047]
https~ backend/host-04-msync 0/0/0/2/2 304 143 - - ---- 56/55/1/0/0 0/0 "GET
/bucket1/object1.jpg?rgwx-zonegroup=71dceb3d-3092-4dc6-897f-a9abf60c9972&rgwx-prepend-metadata=true&rgwx-sync-manifest&rgwx-sync-cloudtiered&rgwx-skip-decrypt&rgwx-if-not-replicated-to=a8204ce2-b69e-4d90-bca1-93edd05a1a29%3Abucket1%3A8b96aea5-c763-40a3-8430-efd67cff0c62.20010.7
HTTP/1.1"
Feb 12 05:06:51 host-04 haproxy[971171]: 10.1.85.14:59730 [12/Feb/2024:05:06:51.048]
https~ backend/host-04-msync 0/0/0/2/2 304 143 - - ---- 56/55/3/1/0 0/0 "GET
/bucket1/object91.jpg?rgwx-zonegroup=71dceb3d-3092-4dc6-897f-a9abf60c9972&rgwx-prepend-metadata=true&rgwx-sync-manifest&rgwx-sync-cloudtiered&rgwx-skip-decrypt&rgwx-if-not-replicated-to=a8204ce2-b69e-4d90-bca1-93edd05a1a29%3Abucket1%3A8b96aea5-c763-40a3-8430-efd67cff0c62.20010.7
HTTP/1.1"
We also took a look at our grafana instance and out of 1000 requests / second, 200 are
"200 OK" and 800 are "304 Not Modified". Sync threads are run on only
2 rgw daemons per zone and are behind a Load Balancer. "# radosgw-admin sync error
list" also contains around 20 errors which are mostly automatically recoverable.
As we understand, does it mean that RGW multisite sync logs in the log pool are yet to be
generated or some sort? Please provide us some insights and let us know how to resolve
this.
Thanks,
Saif
Show replies by date
Hi All,
I just wanted to quick follow-up on my previous mail about "Slow RGW multisite sync
due to '304 Not Modified' responses on primary zone". I wanted to highlight
that I'm still facing the issue and urgently need your guidance to resolve it.
I appreciate your attention to this matter.
Thanks,
Saif
Hi All ,
Regarding our earlier email regarding "Slow RGW multisite sync due to '304 Not
Modified' responses on primary zone," We just wanted to quickly follow up. We
wanted to make it clear that we still having problems and that we desperately need your
help to find a solution.
Thank you for taking the time to consider this.
Thanks,
Praveen