Hi,
the current output of ceph -s reports a warning:
2 slow ops, oldest one blocked for 347335 sec, mon.ld5505 has slow ops
This time is increasing.
root@ld3955:~# ceph -s
cluster:
id: 6b1b5117-6e08-4843-93d6-2da3cf8a6bae
health: HEALTH_WARN
9 daemons have recently crashed
2 slow ops, oldest one blocked for 347335 sec, mon.ld5505
has slow ops
services:
mon: 3 daemons, quorum ld5505,ld5506,ld5507 (age 3d)
mgr: ld5507(active, since 8m), standbys: ld5506, ld5505
mds: cephfs:2 {0=ld5507=up:active,1=ld5505=up:active} 2
up:standby-replay 3 up:standby
osd: 442 osds: 442 up (since 8d), 442 in (since 9d)
data:
pools: 7 pools, 19628 pgs
objects: 65.78M objects, 251 TiB
usage: 753 TiB used, 779 TiB / 1.5 PiB avail
pgs: 19628 active+clean
io:
client: 427 KiB/s rd, 22 MiB/s wr, 851 op/s rd, 647 op/s wr
The details are as follows:
root@ld3955:~# ceph health detail
HEALTH_WARN 9 daemons have recently crashed; 2 slow ops, oldest one
blocked for 347755 sec, mon.ld5505 has slow ops
RECENT_CRASH 9 daemons have recently crashed
mds.ld4464 crashed on host ld4464 at 2020-02-09 07:33:59.131171Z
mds.ld5506 crashed on host ld5506 at 2020-02-09 07:42:52.036592Z
mds.ld4257 crashed on host ld4257 at 2020-02-09 07:47:44.369505Z
mds.ld4464 crashed on host ld4464 at 2020-02-09 06:10:24.515912Z
mds.ld5507 crashed on host ld5507 at 2020-02-09 07:13:22.400268Z
mds.ld4257 crashed on host ld4257 at 2020-02-09 06:48:34.742475Z
mds.ld5506 crashed on host ld5506 at 2020-02-09 06:10:24.680648Z
mds.ld4465 crashed on host ld4465 at 2020-02-09 06:52:33.204855Z
mds.ld5506 crashed on host ld5506 at 2020-02-06 07:59:37.089007Z
SLOW_OPS 2 slow ops, oldest one blocked for 347755 sec, mon.ld5505 has
slow ops
There's no error on services (mgr, mon, osd).
Can you please advise how to identify the root cause of this slow ops?
THX
Show replies by date