Hi Paul,
On Wed, Dec 13, 2023 at 9:50 PM Paul Mezzanini <pfmeec(a)rit.edu> wrote:
Long story short, we've got a lot of empty directories that I'm working on
removing. While removing directories, using "perf top -g" we can watch the MDS
daemon go to 100% cpu usage with "SnapRealm:: split_at" and
"CInode::is_ancestor_of".
It's this 2 year old bug that still is around.
https://tracker.ceph.com/issues/53192
Unfortunately the fix isn't straightforward as it was attempted, so
lately, we've been working around these issues by pinning
to-be-deleted directories to a (separate) active MDS. This might need
some tuning at the application level to move stuff inside this
"special" pinned directory and then delete it.
HTH.
To help combat this, we've moved our snapshot schedule down the tree one level so the
snaprealm is significantly smaller. Our luck with multiple active MDSs hasn't been
great so we are still on a single MDS. To help split the load, I'm working on moving
different workloads to different filesytems within ceph.
A user can still fairly easily overwhelm the MDS's finisher thread and basically stop
all cephfs io through that MDS. I'm hoping we can get some other people chiming in
with "Me Too!" so there can be some traction behind fixing this.
It's a longstanding bug so the version is less important, but we are on 17.2.7.
Thoughts?
-paul
--
Paul Mezzanini
Platform Engineer III
Research Computing
Rochester Institute of Technology
“End users is a description, not a goal.”
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io
--
Cheers,
Venky