MDS Corruption: ceph_assert(!p) in MDCache::add_inode - ceph-users

18 Dec 2020

This is attempt #3 to submit this issue to this mailing list. I don't
expect this to be received. I give up.

I have an issue with MDS corruption which so far I haven't been able to
resolve using the recovery steps I've found online. I'm on v15.2.6. I've
tried all the recovery steps mentioned here, except copying the pool:
https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/

When I try to start an MDS instance, it crashes after a few seconds. It
logs a bunch of "bad backtrace on directory inode" errors before failing on
an assertion in MDCache::add_inode, line 313:
https://github.com/ceph/ceph/blob/cb8c61a60551b72614257d632a574d420064c17a/…

Here's the output of journalctl -xe: https://pastebin.com/9g1UJaKQ

I asked in the IRC channel, and it was suggested I might be able to
manually delete the duplicate inodes using the RADOS API, though I don't
know specifically how I would do that. I have also cloned the code and
built Ceph with the problem assertion replaced with a return, but I haven't
tried using it yet and I'm saving that as my last resort. I'd appreciate
any help you all can give.

Thank you,
- Brandon Lyon