Stefan;
I can't find it, but I seem to remember a discussion in this mailing list that sharded
RGW performance is significantly better if the shard count is a power of 2, so you might
try increasing shards to 64.
Also, you might looks at OSD logs while a listing is trying to run, to see if this
illuminates anything for you.
You said: "2 x SATA SSDs for RGW index pool," but do you have the zone's
index pool running on a rule which only targets SSDs, or only targets those SSDs? Are you
running your RGW multi-site? Are you running replication for RGW in multi-site?
Thank you,
Dominic L. Hilsbos, MBA
Director – Information Technology
Perform Air International, Inc.
DHilsbos(a)PerformAir.com
www.PerformAir.com
-----Original Message-----
From: Stefan Wild [mailto:swild@tiltworks.com]
Sent: Wednesday, June 10, 2020 6:05 PM
To: ceph-users(a)ceph.io
Subject: [ceph-users] RGW listing slower on nominally faster setup
Hi everyone,
We are currently transitioning from a temporary machine to our production hardware. Since
we're starting with under 200 TB raw storage, we are currently on only 1–2 physical
machines per cluster, eventually in 3 zones. The temporary machine is undersized for even
that with an older single 6-core CPU and spinning disks only. As of now that
"cluster-of-one" is running on Nautilus and has 3 buckets with 98K, 1.1M and
1.4M objects, respectively for a total of 9.1 TB. As we're expecting these to grow to
around 5M objects each and will be in a multisite configuration, I went with 50 shards per
bucket.
Listing "directories" via S3 is somewhat slow (sometimes to the point of read
timeouts) but mostly bearable. After the new production setup (dual 8-core/16-thread Xeon
Silvers, 2 x SATA SSDs for RGW index pool, on Octopus, with enough free memory to easily
fit all bucket indexes multiple times) synced successfully, listings via S3 always time
out on the RGW on that machine/zone.
As soon as I trigger a single listing via S3 (even on the 98K object bucket), reads go up
to a sustained 300–500MB/s and 20–50K IOPS on the bucket index pool for several hours. The
RGW debug log is flooded with lines like this:
{"log":"debug 2020-06-08T19:31:08.315+0000 7f83d704c700 1
RGWRados::Bucket::List::list_objects_ordered INFO ordered bucket listing requires read
#1\n","stream":"stdout","time":"2020-06-08T19:31:08.317198682Z"}
I get that sharded RGW indexes (and listing objects in S3 buckets in general) are not very
efficient, but after getting somewhat decent results on slower hardware and an older Ceph
version, I wasn't expecting the nominally much better setup to be orders of magnitude
slower.
Any help or pointers would be greatly appreciated.
Thank you,
Stefan
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to
ceph-users-leave(a)ceph.io