Hello,
Nautilus 14.2.16
I had an OSD go bad about 10 days ago. Apparently as it was going down
some MDS ops got hung up waiting for it to come back. I was out of town
for a couple days and found the OSD 'Down and Out' when I checked in.
(Also, oddly, the cluster did not appear to initiate recovery right away -
it took until I rebooted the OSD node.)
As of right now, the damaged OSD is 'safe-to-destroy' but the slow ops are
still hanging around. Earlier today I quiesced the clients that were
accessing the CephFS, then unmounted and re-mounted it. However, this did
not clear the lingering ops.
When I had the node offline I verified that the HDD and NVMe associated
with the OSD seem to actually be healthy, so I plan to zap and re-deploy
using the same hardware. I would also like to upgrade to 14.2.20 (latest
Ceph for debian 10), but I'm hesitant to do any of this until I get rid of
these 29 slow ops.
Can anybody suggest a path forward?
Thanks.
-Dave
--
Dave Hall
Binghamton University
kdhall(a)binghamton.edu
Show replies by date