New subject: Ceph RGW Performance [EXT]

25 Sep 2020

Hey folks!

Just shooting this out there in case someone has some advice. We're
just setting up RGW object storage for one of our new Ceph clusters (3
mons, 1072 OSDs, 34 nodes) and doing some benchmarking before letting
users on it.

We have 10Gb network to our two RGW nodes behind a single ip on
haproxy, and some iperf testing shows I can push that much; latencies
look okay. However, when using a small cosbench cluster I am unable to
get more than ~250Mb of read speed total.

If I add more nodes to the cosbench cluster it just spreads out the
load evenly with the same cap Same results when running two cosbench
clusters from different locations. I don't see any obvious bottlenecks
in terms of the RGW server hardware limitations, but here I am asking
for assistance so I don't put it past me missing something. I have
attached one of my cosbench load files with keys removed, but I get
similar results with different numbers of workers, objects, buckets,
object sizes, and cosbench drivers.

Does anyone have any pointers on what I could find to nail this
bottleneck down? Am I wrong in expecting more throughput? Let me know
if I can get any other info for you.

Cheers,
Dylan

-- 

Dylan Griff
Senior System Administrator
CLE D063
RCS - Systems - University of Victoria