cephfs forward scrubbing docs - ceph-users

10 Nov 2020

Hi,

Today while debugging something we had a few questions that might lead
to improving the cephfs forward scrub docs:
https://docs.ceph.com/en/latest/cephfs/scrub/

tldr:
1. Should we document which sorts of issues that the forward scrub is
able to fix?
2. Can we make it more visible (in docs) that scrubbing is not
supported with multi-mds?
3. Isn't the new `ceph -s` scrub task status misleading with multi-mds?

Details here:

1) We found a CephFS directory with a number of zero sized files:

# ls -l
...
-rw-r--r-- 1 1001890000 1001890000        0 Nov  3 11:58
upload_fc501199e3e7abe6b574101cf34aeefb.png
-rw-r--r-- 1 1001890000 1001890000        0 Nov  3 12:23
upload_fce4f55348185fefa0abdd8d11095ba8.gif
-rw-r--r-- 1 1001890000 1001890000        0 Nov  3 11:54
upload_fd95b8358851f0dac22fb775046a6163.png
...

The user claims that those files were non-zero sized last week. The
sequence of zero sized files includes *all* files written between Nov
2 and 9.
The user claims that his client was running out of memory, but this is
now fixed. So I suspect that his ceph client (kernel
3.10.0-1127.19.1.el7.x86_64) was not behaving well.

Anyway, I noticed that even though the dentries list 0 bytes, the
underlying rados objects have data, and the data looks good. E.g.

# rados get -p cephfs_data 200212e68b5.00000000 --namespace=xxx
200212e68b5.00000000
# file 200212e68b5.00000000
200212e68b5.00000000: PNG image data, 960 x 815, 8-bit/color RGBA,
non-interlaced

So I managed to recover the files doing something like this (using an
input file mapping inode to filename) [see PS 0].

But I'm wondering if a forward scrub is able to fix this sort of
problem directly?
Should we document which sorts of issues that the forward scrub is able to fix?

I anyway tried to scrub it, which led to:

# ceph tell mds.cephflax-mds-xxx scrub start /volumes/_nogroup/xxx
recursive repair
Scrub is not currently supported for multiple active MDS. Please
reduce max_mds to 1 and then scrub.

So ...
2) Shouldn't we update the doc to mention loud and clear that scrub is
not currently supported for multiple active MDS?

3) I was somehow surprised by this, because I had thought that the new
`ceph -s` multi-mds scrub status implied that multi-mds scrubbing was
now working:

  task status:
    scrub status:
        mds.x: idle
        mds.y: idle
        mds.z: idle

Is it worth reporting this task status for cephfs if we can't even scrub them?

Thanks!!

Dan

[0]
mkdir -p recovered
while read -r a b; do
    for i in {0..9}
    do
        echo "rados stat --cluster=flax --pool=cephfs_data
--namespace=xxx" $(printf "%x" $a).0000000$i "&&"
"rados get
--cluster=flax --pool=cephfs_data --namespace=xxx" $(printf "%x"
$a).0000000$i $(printf "%x" $a).0000000$i
    done
    echo cat $(printf "%x" $a).* ">" $(printf "%x" $a)
    echo mv $(printf "%x" $a) recovered/$b
done < inones_fnames.txt