On Thu, Dec 5, 2019 at 10:31 AM Janek Bevendorff
I had similar issues again today. Some users were trying to train a
neural network on several million files resulting in enormous cache
sizes. Due to my custom cap recall and decay rate settings, the MDSs
were able to withstand the load for quite some time, but at some point
the active rank crashed taking the whole CephFS down.
As usual, the MDS were playing round-robin Russian roulette trying to
recover the cache only to be killed by MONs after some time. I tried
increasing the beacon grace time, but it didn't help, MONs were still
kicking MDSs after what seemed like a random timeout.
You set mds_beacon_grace ?
Even with the
setting to wipe the MDS cache on startup, the CephFS was unable to
recover. I had to manually delete the mds0_openfiles.* objects from the
CephFS metadata pool of which I had a total of 9. Only then was I able
to get the MDS back into a working state.
Yes, this optimization is having some struggles with large cache sizes
(ironically). Luckily, nuking the open file objects is harmless...
I know there are some unreleased patches to improve
the MDS behaviour as
a result of this thread. Is there any timeline for when those will be
This issue is rather critical. What I need is a faster
recall (which got fixed I think, but hasn't been released so far) as
well as probably some kind of hard limit after which a client has to
release file handles.
MDS will soon be more aggressive about recalling caps from idle
sessions, which may help: https://tracker.ceph.com/issues/41865
That'll make 14.2.6 probably.
Patrick Donnelly, Ph.D.
He / Him / His
Senior Software Engineer
Red Hat Sunnyvale, CA