Thanks everyone for all of your guidance. To answer all the questions!
Are the OSD nodes connected with 10Gb as well?
Yes
Are you using SSDs for your index pool? How many?
Yes, for a node with 39 HDD OSDs we are using 6 Index SSDs
How big are your objects?
Most test run at 64K, but I have gone up to 4M with similar results
Try to increase gradually to say 1gb and measure.
Tried with 1gb size and result was the same, just around 250Mb/s
rados bench gives me ~1Gb/s, so 4 times what I am getting through our haproxy balanced end
point.
Our haproxy is version 1.5.18 and it looks like multithreading was introduced in 1.8 so
that could be culprit.
These cpu stats during the benchmark seem to agree as well
03:34:15 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
03:34:16 PM all 3.23 0.00 4.12 0.00 0.00 0.76 0.00 0.00 0.00 91.88
03:34:16 PM 0 4.00 0.00 2.00 0.00 0.00 1.00 0.00 0.00 0.00 93.00
03:34:16 PM 1 2.06 0.00 2.06 0.00 0.00 0.00 0.00 0.00 0.00 95.88
03:34:16 PM 2 2.04 0.00 3.06 0.00 0.00 0.00 0.00 0.00 0.00 94.90
03:34:16 PM 3 2.00 0.00 2.00 0.00 0.00 0.00 0.00 0.00 0.00 96.00
03:34:16 PM 4 39.00 0.00 58.00 0.00 0.00 3.00 0.00 0.00 0.00 0.00
03:34:16 PM 5 0.99 0.00 1.98 0.00 0.00 0.00 0.00 0.00 0.00 97.03
03:34:16 PM 6 2.04 0.00 3.06 0.00 0.00 1.02 0.00 0.00 0.00 93.88
03:34:16 PM 7 1.01 0.00 2.02 0.00 0.00 0.00 0.00 0.00 0.00 96.97
03:34:16 PM 8 2.25 0.00 1.12 0.00 0.00 5.62 0.00 0.00 0.00 91.01
03:34:16 PM 9 2.02 0.00 2.02 0.00 0.00 0.00 0.00 0.00 0.00 95.96
03:34:16 PM 10 1.02 0.00 2.04 0.00 0.00 1.02 0.00 0.00 0.00 95.92
03:34:16 PM 11 1.98 0.00 1.98 0.00 0.00 0.99 0.00 0.00 0.00 95.05
03:34:16 PM 12 1.03 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 98.97
03:34:16 PM 13 1.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 98.98
03:34:16 PM 14 1.08 0.00 2.15 0.00 0.00 1.08 0.00 0.00 0.00 95.70
03:34:16 PM 15 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
03:34:16 PM 16 1.04 0.00 3.12 0.00 0.00 1.04 0.00 0.00 0.00 94.79
03:34:16 PM 17 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.00
03:34:16 PM 18 1.04 0.00 1.04 0.00 0.00 1.04 0.00 0.00 0.00 96.88
03:34:16 PM 19 0.99 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.01
03:34:16 PM 20 2.04 0.00 3.06 0.00 0.00 1.02 0.00 0.00 0.00 93.88
03:34:16 PM 21 2.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 97.98
03:34:16 PM 22 1.04 0.00 2.08 0.00 0.00 2.08 0.00 0.00 0.00 94.79
03:34:16 PM 23 3.00 0.00 7.00 0.00 0.00 0.00 0.00 0.00 0.00 90.00
I'll start by upgrading that, ramping up multithreading and let you know how it
goes.
Cheers,
Dylan
On Fri, 2020-09-25 at 19:39 +0000, Dylan Griff wrote:
Notice: This message was sent from outside the
University of Victoria email system but is claiming to be from UVic. Please be cautious
with links, attachments, and sensitive information.
Notice: This message was sent from outside the University of Victoria email system.
Please be cautious with links and sensitive information.
Hey folks!
Just shooting this out there in case someone has some advice. We're
just setting up RGW object storage for one of our new Ceph clusters (3
mons, 1072 OSDs, 34 nodes) and doing some benchmarking before letting
users on it.
We have 10Gb network to our two RGW nodes behind a single ip on
haproxy, and some iperf testing shows I can push that much; latencies
look okay. However, when using a small cosbench cluster I am unable to
get more than ~250Mb of read speed total.
If I add more nodes to the cosbench cluster it just spreads out the
load evenly with the same cap Same results when running two cosbench
clusters from different locations. I don't see any obvious bottlenecks
in terms of the RGW server hardware limitations, but here I am asking
for assistance so I don't put it past me missing something. I have
attached one of my cosbench load files with keys removed, but I get
similar results with different numbers of workers, objects, buckets,
object sizes, and cosbench drivers.
Does anyone have any pointers on what I could find to nail this
bottleneck down? Am I wrong in expecting more throughput? Let me know
if I can get any other info for you.
Cheers,
Dylan
--
Dylan Griff
Senior System Administrator
CLE D063
RCS - Systems - University of Victoria
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io