Hi Dan,
Sorry for the very late reply -- I'm going through old unanswered email.
On Mon, Nov 9, 2020 at 4:13 PM Dan van der Ster <dan(a)vanderster.com> wrote:
Hi,
Today while debugging something we had a few questions that might lead
to improving the cephfs forward scrub docs:
https://docs.ceph.com/en/latest/cephfs/scrub/
tldr:
1. Should we document which sorts of issues that the forward scrub is
able to fix?
Yes, I've made a ticket:
https://tracker.ceph.com/issues/51459
2. Can we make it more visible (in docs) that
scrubbing is not
supported with multi-mds?
This is no longer the case since Pacific, as you probably know.
3. Isn't the new `ceph -s` scrub task status
misleading with multi-mds?
Details here:
1) We found a CephFS directory with a number of zero sized files:
# ls -l
...
-rw-r--r-- 1 1001890000 1001890000 0 Nov 3 11:58
upload_fc501199e3e7abe6b574101cf34aeefb.png
-rw-r--r-- 1 1001890000 1001890000 0 Nov 3 12:23
upload_fce4f55348185fefa0abdd8d11095ba8.gif
-rw-r--r-- 1 1001890000 1001890000 0 Nov 3 11:54
upload_fd95b8358851f0dac22fb775046a6163.png
...
The user claims that those files were non-zero sized last week. The
sequence of zero sized files includes *all* files written between Nov
2 and 9.
The user claims that his client was running out of memory, but this is
now fixed. So I suspect that his ceph client (kernel
3.10.0-1127.19.1.el7.x86_64) was not behaving well.
Anyway, I noticed that even though the dentries list 0 bytes, the
underlying rados objects have data, and the data looks good. E.g.
# rados get -p cephfs_data 200212e68b5.00000000 --namespace=xxx
200212e68b5.00000000
# file 200212e68b5.00000000
200212e68b5.00000000: PNG image data, 960 x 815, 8-bit/color RGBA,
non-interlaced
So I managed to recover the files doing something like this (using an
input file mapping inode to filename) [see PS 0].
But I'm wondering if a forward scrub is able to fix this sort of
problem directly?
Someday perhaps but not yet. But it's not clear this is something the
MDS should repair. The client clearly didn't flush the dirty size to
the MDS yet. This is one of those situations where the client has done
write() but not yet fsync(), logically.
Should we document which sorts of issues that the
forward scrub is able to fix?
I anyway tried to scrub it, which led to:
# ceph tell mds.cephflax-mds-xxx scrub start /volumes/_nogroup/xxx
recursive repair
Scrub is not currently supported for multiple active MDS. Please
reduce max_mds to 1 and then scrub.
So ...
2) Shouldn't we update the doc to mention loud and clear that scrub is
not currently supported for multiple active MDS?
For Octopus sure but at this point (late reply, my fault), I'm not
sure it's worth the trouble.
3) I was somehow surprised by this, because I had
thought that the new
`ceph -s` multi-mds scrub status implied that multi-mds scrubbing was
now working:
task status:
scrub status:
mds.x: idle
mds.y: idle
mds.z: idle
Is it worth reporting this task status for cephfs if we can't even scrub them?
This was fixed a few months ago.
Thanks for the email,
--
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D