New subject: Ceph OSD reported Slow operations

2 Nov 2023

 Hi Eugen
 Please find the details below

root@meghdootctr1:/var/log/ceph# ceph -s
cluster:
id: c59da971-57d1-43bd-b2b7-865d392412a5
health: HEALTH_WARN
nodeep-scrub flag(s) set
544 pgs not deep-scrubbed in time

services:
mon: 3 daemons, quorum meghdootctr1,meghdootctr2,meghdootctr3 (age 5d)
mgr: meghdootctr1(active, since 5d), standbys: meghdootctr2, meghdootctr3
mds: 3 up:standby
osd: 36 osds: 36 up (since 34h), 36 in (since 34h)
flags nodeep-scrub

data:
pools: 2 pools, 544 pgs
objects: 10.14M objects, 39 TiB
usage: 116 TiB used, 63 TiB / 179 TiB avail
pgs: 544 active+clean

io:
client: 24 MiB/s rd, 16 MiB/s wr, 2.02k op/s rd, 907 op/s wr

Ceph Versions:

root@meghdootctr1:/var/log/ceph# ceph --version
ceph version 14.2.16 (762032d6f509d5e7ee7dc008d80fe9c87086603c) nautilus
(stable)

Ceph df -h
https://pastebin.com/1ffucyJg

Ceph OSD performance dump
https://pastebin.com/1R6YQksE

Ceph tell osd.XX bench  (Out of 36 osds only 8 OSDs give High IOPS value of 250
+. Out of that 4 OSDs are from HP 3PAR and 4 OSDS from DELL EMC. We are using
only 4 OSDs from HP3 par and it is working fine without any latency and iops
issues from the beginning but the remaining 32 OSDs are from DELL EMC in which 4
OSDs are much better than the remaining 28 OSDs)

https://pastebin.com/CixaQmBi

Please help me to identify if the issue is with the DELL EMC Storage, Ceph
configuration parameter tuning or the Overload in the cloud setup

On November 1, 2023 at 9:48 PM Eugen Block &lt;eblock(a)nde.ag&gt; wrote:
...
  Hi,

 for starters please add more cluster details like 'ceph status', 'ceph
 versions', 'ceph osd df tree'. Increasing the to 10G was the right
 thing to do, you don't get far with 1G with real cluster load. How are
 the OSDs configured (HDD only, SSD only or HDD with rocksdb on SSD)?
 How is the disk utilization?

 Regards,
 Eugen

 Zitat von prabhav(a)cdac.in:

  In a production setup of 36 OSDs( SAS disks)
totalling 180 TB
 allocated to a single Ceph Cluster with 3 monitors and 3 managers.
 There were 830 volumes and VMs created in Openstack with Ceph as a
 backend. On Sep 21, users reported slowness in accessing the VMs.
 Analysing the logs lead us to problem with SAS , Network congestion
 and Ceph configuration( as all default values were used). We updated
 the Network from 1Gbps to 10Gbps for public and cluster networking.
 There was no change.
 The ceph benchmark performance showed that 28 OSDs out of 36 OSDs
 reported very low IOPS of 30 to 50 while the remaining showed 300+
 IOPS.
 We gradually started reducing the load on the ceph cluster and now
 the volumes count is 650. Now the slow operations has gradually
 reduced but I am aware that this is not the solution.
 Ceph configuration is updated with increasing the
 osd_journal_size to 10 GB,
 osd_max_backfills = 1
 osd_recovery_max_active = 1
 osd_recovery_op_priority = 1
 bluestore_cache_trim_max_skip_pinned=10000

 After one month, now we faced another issue with Mgr daemon stopped
 in all 3 quorums and 16 OSDs went down. From the
 ceph-mon,ceph-mgr.log could not get the reason. Please guide me as
 its a production setup
 _______________________________________________
 ceph-users mailing list -- ceph-users(a)ceph.io
 To unsubscribe send an email to ceph-users-leave(a)ceph.io 

 _______________________________________________
 ceph-users mailing list -- ceph-users(a)ceph.io
 To unsubscribe send an email to ceph-users-leave(a)ceph.io Thanks & Regards,
Ms V A Prabha / श्रीमती प्रभा वी ए
Joint Director / संयुक्त निदेशक
Centre for Development of Advanced Computing(C-DAC) / प्रगत संगणन विकास
केन्द्र(सी-डैक)
Tidel Park”, 8th Floor, “D” Block, (North &South) / “टाइडल पार्क”,8वीं मंजिल,
“डी” ब्लॉक, (उत्तर और दक्षिण)
No.4, Rajiv Gandhi Salai / नं.4, राजीव गांधी सलाई
Taramani / तारामणि
Chennai / चेन्नई – 600113
Ph.No.:044-22542226/27
Fax No.: 044-22542294
------------------------------------------------------------------------------------------------------------
[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]

This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
------------------------------------------------------------------------------------------------------------

Re: Ceph OSD reported Slow operations