Yes it pays off to know what to do before you do it, instead of after. If you complain
about speed, is it a general unfounded complaint or did you compare ceph with similar
solutions? I have really no idea what the standards are for these types of solutions. I
can remember asking at such seminar that 'general' performance numbers should
published for every release, so people are not having to go to the ordeal to investigate
on their own. However I also get that there is a technical performance limit for
distributing your data like this.
I bookmarked this quite a while ago, if you are in dire need you can do some external
caching for rbd's.
Yes, during my last adventure of trying to get any
reasonable
performance out of ceph, i realized my testing methodology was wrong.
Both the kernel client and qemu have queues everywhere that make the
numbers hard to understand.
fio has rbd support, which gives more useful values.
https://subscription.packtpub.com/book/cloud-&-
networking/9781784393502/10/ch10lvl1sec112/benchmarking-ceph-rbd-using-
fio
frustratingly, much lower ones, showing just how slow ceph actually is.
On Sat, Mar 18, 2023 at 8:59 PM Rafael Weingartner
<work.ceph.user.mailing(a)gmail.com> wrote:
Hello guys!
I would like to ask if somebody has already experienced a similar
situation. We have a new cluster with 5 nodes with the following
setup:
- 128 GB of RAM
- 2 cpus Intel(R) Intel Xeon Silver 4210R
- 1 NVME of 2 TB for the rocks DB caching
- 5 HDDs of 14TB
- 1 NIC dual port of 25GiB in BOND mode.
We are starting with a single dual port NIC (the bond has 50GiB in
total),
the design has been prepared so a new NIC can be
added, and a new BOND
can
be created, where we intend to offload the
cluster network. Therefore,
logically speaking, we already configured different VLANs and networks
for
public and cluster traffic of Ceph.
We are using Ubuntu 20.04 with Ceph Octopus. It is a standard
deployment
that we are used to. During our initial
validations and evaluations of
the
cluster, we are reaching write speeds between
250-300MB/s, which would
be
the ballpark for this kind of setup for HDDs with
the NVME as Rocks.db
cache (in our experience). However, the issue is the reading process.
While
reading, we barely hit the mark of 100MB/s; we
would expect at least
something similar to the write speed. These tests are being performed
in a
pool with a replication factor of 3.
We have already checked the disks, and they all seem to be reading
just
fine. The network does not seem to be the
bottleneck either (checked
with
atop while reading/writing to the cluster).
Have you guys ever encountered similar situations? Do you have any
tips for
us to proceed with the troubleshooting?
We suspect that we are missing some small tuning detail, which is
affecting
the read performance only, but so far we could
not pinpoint it. Any
help
would be much appreciated :)
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io
--
+4916093821054
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io