RGW listing slower on nominally faster setup - ceph-users

11 Jun 2020

Hi everyone,

We are currently transitioning from a temporary machine to our production hardware. Since
we're starting with under 200 TB raw storage, we are currently on only 1–2 physical
machines per cluster, eventually in 3 zones. The temporary machine is undersized for even
that with an older single 6-core CPU and spinning disks only. As of now that
"cluster-of-one" is running on Nautilus and has 3 buckets with 98K, 1.1M and
1.4M objects, respectively for a total of 9.1 TB. As we're expecting these to grow to
around 5M objects each and will be in a multisite configuration, I went with 50 shards per
bucket.

Listing "directories" via S3 is somewhat slow (sometimes to the point of read
timeouts) but mostly bearable. After the new production setup (dual 8-core/16-thread Xeon
Silvers, 2 x SATA SSDs for RGW index pool, on Octopus, with enough free memory to easily
fit all bucket indexes multiple times) synced successfully, listings via S3 always time
out on the RGW on that machine/zone.

As soon as I trigger a single listing via S3 (even on the 98K object bucket), reads go up
to a sustained 300–500MB/s and 20–50K IOPS on the bucket index pool for several hours. The
RGW debug log is flooded with lines like this:

{"log":"debug 2020-06-08T19:31:08.315+0000 7f83d704c700  1
RGWRados::Bucket::List::list_objects_ordered INFO ordered bucket listing requires read
#1\n","stream":"stdout","time":"2020-06-08T19:31:08.317198682Z"}

I get that sharded RGW indexes (and listing objects in S3 buckets in general) are not very
efficient, but after getting somewhat decent results on slower hardware and an older Ceph
version, I wasn't expecting the nominally much better setup to be orders of magnitude
slower.

Any help or pointers would be greatly appreciated.

Thank you,
Stefan