Not all Bucket Shards being used - ceph-users

18 Jul 2023

Hi,

I have trouble with large OMAP files in a cluster in the RGW index pool. Some
background information about the cluster: There is CephFS and RBD usage on the
main cluster but for this issue I think only S3 is interesting.
There is one realm, one zonegroup with two zones which have a bidirectional sync
set up. Since this does not allow for autoresharding we have to do it by hand in
this cluster – looking forward to Reef!

From the logs:
cluster 2023-07-17T22:59:03.018722+0000 osd.75 (osd.75) 623978 :
cluster [WRN] Large omap object found. Object:
34:bcec3016:::.dir.3caabb9a-4e3b-4b8a-8222-34c33dd63210.10610190.9.5:head
PG: 34.680c373d (34.5) Key count: 962091 Size (bytes): 277963182

The offending bucket looks like this:
# radosgw-admin bucket stats \
    | jq '.[] | select(.marker
=="3caabb9a-4e3b-4b8a-8222-34c33dd63210.10610190.9")
              |"\(.num_shards) \(.usage["rgw.main"].num_objects)"'
-r
131 9463833

Last week the number of objects was about 12 million. Which is why I reshareded
the offending bucket twice, I think. Once to 129 and the second time to 131
because I wanted some leeway (or lieway? scnr, Sage).

Unfortunately, even after a week the objects were still to big (the log line
above is quite recent), so I looked into it again.

# rados -p raum.rgw.buckets.index ls \
    |grep .dir.3caabb9a-4e3b-4b8a-8222-34c33dd63210.10610190.9 \
    |sort -V
.dir.3caabb9a-4e3b-4b8a-8222-34c33dd63210.10610190.9.0
.dir.3caabb9a-4e3b-4b8a-8222-34c33dd63210.10610190.9.1
.dir.3caabb9a-4e3b-4b8a-8222-34c33dd63210.10610190.9.2
.dir.3caabb9a-4e3b-4b8a-8222-34c33dd63210.10610190.9.3
.dir.3caabb9a-4e3b-4b8a-8222-34c33dd63210.10610190.9.4
.dir.3caabb9a-4e3b-4b8a-8222-34c33dd63210.10610190.9.5
.dir.3caabb9a-4e3b-4b8a-8222-34c33dd63210.10610190.9.6
.dir.3caabb9a-4e3b-4b8a-8222-34c33dd63210.10610190.9.7
.dir.3caabb9a-4e3b-4b8a-8222-34c33dd63210.10610190.9.8
.dir.3caabb9a-4e3b-4b8a-8222-34c33dd63210.10610190.9.9
.dir.3caabb9a-4e3b-4b8a-8222-34c33dd63210.10610190.9.10
# rados -p raum.rgw.buckets.index ls \
    |grep .dir.3caabb9a-4e3b-4b8a-8222-34c33dd63210.10610190.9 \
    |sort -V \
    |xargs -IOMAP sh -c \
        'rados -p raum.rgw.buckets.index listomapkeys OMAP | wc -l'
1013854
1011007
1012287
1011232
1013565
998262
1012777
1012713
1012230
1010690
997111

Apparently, only 11 shards are in use. This would explain why the "Key usage"
(from the log line) is about ten times higher than I would expect.

How can I deal with this issue?
One thing I could try to fix this would be to reshard to a lower number, but I
am not sure if there are any risks associated with "downsharding". After that I
could reshard to something like 97. Or I could directly "downshard" to 97.

Also, the second zone has a similar problem, but as the error messsage lets me
know, this would be a bad idea. Will it just take more time until the sharding
is transferred to the seconds zone?

Best,
Christian Kugler