Unexpected slow read for HDD cluster (good write speed) - ceph-users

18 Mar 2023

Hello guys!

I would like to ask if somebody has already experienced a similar
situation. We have a new cluster with 5 nodes with the following setup:

   - 128 GB of RAM
   - 2 cpus Intel(R) Intel Xeon Silver 4210R
   - 1 NVME of 2 TB for the rocks DB caching
   - 5 HDDs of 14TB
   - 1 NIC dual port of 25GiB in BOND mode.

We are starting with a single dual port NIC (the bond has 50GiB in total),
the design has been prepared so a new NIC can be added, and a new BOND can
be created, where we intend to offload the cluster network. Therefore,
logically speaking, we already configured different VLANs and networks for
public and cluster traffic of Ceph.

We are using Ubuntu 20.04 with Ceph Octopus. It is a standard deployment
that we are used to. During our initial validations and evaluations of the
cluster, we are reaching write speeds between 250-300MB/s, which would be
the ballpark for this kind of setup for HDDs with the NVME as Rocks.db
cache (in our experience). However, the issue is the reading process. While
reading, we barely hit the mark of 100MB/s; we would expect at least
something similar to the write speed. These tests are being performed in a
pool with a replication factor of 3.

We have already checked the disks, and they all seem to be reading just
fine. The network does not seem to be the bottleneck either (checked with
atop while reading/writing to the cluster).

Have you guys ever encountered similar situations? Do you have any tips for
us to proceed with the troubleshooting?

We suspect that we are missing some small tuning detail, which is affecting
the read performance only, but so far we could not pinpoint it. Any help
would be much appreciated :)