ceph crash hangs forever and recovery stop - ceph-users

30 Apr 2020

Hi everybody (again),
We recently had a lot of osd crashs (more than 30 osd crashed). This is 
now fixed, but it triggered a huge rebalancing+recovery.
More or less in the same time, we noticed that the ceph crash ls (or 
whatever other ceph crash command) hangs forever and never returns.
And finally, the recovery process stops regularly (after ~1 hour) but it 
can be restarted by reseting the mgr daemon (systemctl restart 
ceph-mgr.target on the active manager).
There is nothing in the logs (the manager still works, the service is 
up, the dashboard is accessible but simply the recovery stops).
We also tryed to reboot the managers, but it doesn't solve the problem.
I guess theses two problems should be linked, but not sure.
Does anybody have a clue ?
Thanks.
F.