Hi,
I've been migrating data from one EC pool to another EC pool: two
directories are mounted with ceph.dir.layout.pool file attribute set
appropriately, then rsync from old to new and finally, delete the old
files. I'm using the kernel client to do this. While the removed files
are no longer present on the filesystem, they still appear to be
accounted for via "ceph df".
When I tally up the sizes reported by "ls -lh" on all subdirectories
under the root CephFS using a FUSE client mount (except for those on
the new EC pool), it totals just under 2PiB. However, "ceph df" shows
the original EC pool as 2.5PiB used. I've copied + deleted
approximately 545TiB so far, so it seems like the unlinked files
aren't being fully released/purged.
I've only observed the num_strays counter from "ceph daemon mds.$name
perf dump" for a few days now since I first suspected an issue, but
I've never seen it drop below roughly 310k. From other ML postings
I've gathered that stat has something to do with files pending
deletion, but I'm not positive.
So far all I've done is restart the mds and mon daemons, which hasn't
helped. What are the next steps for troubleshooting? I can turn up mds
debug logging, but am not sure what to look for.
Thanks for your help!
Josh