Hi David,
On Tuesday, March 30th, 2021 at 00:50, David Orman <ormandj(a)corenode.com> wrote:
Sure enough, it is more than 200,000, just as the
alert indicates.
However, why did it not reshard further? Here's the kicker - we only
see this with versioned buckets/objects. I don't see anything in the
documentation that indicates this is a known issue with sharding, but
perhaps there is something going on with versioned buckets/objects. Is
there any clarity here/suggestions on how to deal with this? It sounds
like you expect this behavior with versioned buckets, so we must be
missing something.
The issue with versioned buckets is that each object is associated with at least 4 index
entries, with 2 additional index entries for each version of the object. Dynamic
resharding is based on the number of objects, not the number of index entries, and it
counts each version of an object as an object, so the biggest discrepancy between number
of objects and index entries happens when there's only one version of each object
(factor of 4), and it tends to a factor of two as the number of versions per object
increases to infinity. But there's one more special case. When you delete an versioned
object, it also creates two more index entries, but those are not taken into account by
dynamic resharding. Therefore, the absolute worst case is when there was a single version
of each object, and all the objects have been deleted. In that case, there's 6 index
entries for each object counted by dynamic resharding, i.e. a factor of 6.
So one way to "solve" this issue is to set
`osd_deep_scrub_large_omap_object_key_threshold=600000`, which (with the default
`rgw_max_objs_per_shard=100000`) will guarantee that dynamic resharding will kick in
before you get a large omap object warning even in the worst case scenario for versioned
buckets. If you're not comfortable having that many keys per omap object, you could
instead decrease `rgw_max_objs_per_shard`.
Cheers,
--
Ben