Thanks Istvan.
I did some more investigation and what I found is that if I run FIO with
100% write on an already warm volume, then the performance degradation
doesn't happen. In other words, 100% write OPS on an empty volume causes
performance degradation while subsequent read/writes on a volume where data
was already allocated, doesn't cause the degradation. I tested this with
thick provisioned volumes too and experienced the same problem.
Regards,
Shridhar
On Thu, 8 Oct 2020 at 18:31, Szabo, Istvan (Agoda) <Istvan.Szabo(a)agoda.com>
wrote:
Hi,
We have a quite serious issue regarding slow ops.
In our case DB team used the cluster to read and write in the same pool at
the same time and it made the cluster useless.
When we ran fio, we realised that ceph doesn't like the read and write at
the same time in the same pool, so we tested this with fio to create 2
separate pool, put the read operation to 1 pool and the write to another
one and magic happened, no slow ops and a weigh higher performance.
We asked the db team also to split the read and write (as much as thay
can) and issue solved (after 2 week).
Thank you
________________________________________
From: Void Star Nill <void.star.nill(a)gmail.com>
Sent: Thursday, October 8, 2020 1:14 PM
To: ceph-users
Subject: [Suspicious newsletter] [ceph-users] Weird performance issue with
long heartbeat and slow ops warnings
Email received from outside the company. If in doubt don't click links nor
open attachments!
________________________________
Hello,
I have a ceph cluster running 14.2.11. I am running benchmark tests with
FIO concurrently on ~2000 volumes of 10G each. During the time initial
warm-up FIO creates a 10G file on each volume before it runs the actual
read/write I/O operations. During this time, I start seeing the Ceph
cluster reporting about 35GiB/s write throughput for a while, but after
some time I start seeing "long heartbeat" and "slow ops" warnings and
in a
few mins the throughput drops to ~1GB/s and stays there until all FIO runs
complete.
The cluster has 5 monitor nodes and 10 data nodes - each with 10x3.2TB NVME
drives. I have setup 3 OSD for each NVME, so there are a total of 300 OSDs.
Each server has 200GB uplink and there's no apparent network bottleneck as
the network is set up to support over 1Tbps bandwidth. I dont see any CPU
or memory issues also on the servers.
There is a single manager instance running on one of the mons.
The pool is configured for 3 replication factor with min_size of 2. I tried
to use pg_num of 8192 and 16384 and saw the issue with both settings.
Could you please suggest if this is a known issue or if I can tune any
parameters?
Long heartbeat ping times on back interface seen, longest is
1202.120 msec
Long heartbeat ping times on front interface seen, longest is
1535.191 msec
35 slow ops, oldest one blocked for 122 sec, daemons
[osd.135,osd.14,osd.141,osd.143,osd.149,osd.15,osd.151,osd.153,osd.157,osd.162]...
have slow ops.
Regards,
Shridhar
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io
________________________________
This message is confidential and is for the sole use of the intended
recipient(s). It may also be privileged or otherwise protected by copyright
or other legal rules. If you have received it by mistake please let us know
by reply email and delete it from your system. It is prohibited to copy
this message or disclose its content to anyone. Any confidentiality or
privilege is not waived or lost by any mistaken delivery or unauthorized
disclosure of the message. All messages sent to and from Agoda may be
monitored to ensure compliance with company policies, to protect the
company's interests and to remove potential malware. Electronic messages
may be intercepted, amended, lost or deleted, or contain viruses.