[ceph-users] Re: cephfs forward scrubbing docs

30 Jun 2021

On Thu, Jul 1, 2021 at 12:53 AM Patrick Donnelly &lt;pdonnell(a)redhat.com&gt; wrote:
...

 Hi Dan,

 Sorry for the very late reply -- I'm going through old unanswered email.

 On Mon, Nov 9, 2020 at 4:13 PM Dan van der Ster &lt;dan(a)vanderster.com&gt; wrote:

 Hi,

 Today while debugging something we had a few questions that might lead
 to improving the cephfs forward scrub docs:
 https://docs.ceph.com/en/latest/cephfs/scrub/

 tldr:
 1. Should we document which sorts of issues that the forward scrub is
 able to fix? 
 Yes, I've made a ticket: https://tracker.ceph.com/issues/51459 
Great, thanks!

...

  2. Can we make it more visible (in docs) that
scrubbing is not
 supported with multi-mds? 
 This is no longer the case since Pacific, as you probably know.

  3. Isn't the new `ceph -s` scrub task status
misleading with multi-mds?

 Details here:

 1) We found a CephFS directory with a number of zero sized files:

 # ls -l
 ...
 -rw-r--r-- 1 1001890000 1001890000        0 Nov  3 11:58
 upload_fc501199e3e7abe6b574101cf34aeefb.png
 -rw-r--r-- 1 1001890000 1001890000        0 Nov  3 12:23
 upload_fce4f55348185fefa0abdd8d11095ba8.gif
 -rw-r--r-- 1 1001890000 1001890000        0 Nov  3 11:54
 upload_fd95b8358851f0dac22fb775046a6163.png
 ...

 The user claims that those files were non-zero sized last week. The
 sequence of zero sized files includes *all* files written between Nov
 2 and 9.
 The user claims that his client was running out of memory, but this is
 now fixed. So I suspect that his ceph client (kernel
 3.10.0-1127.19.1.el7.x86_64) was not behaving well.

 Anyway, I noticed that even though the dentries list 0 bytes, the
 underlying rados objects have data, and the data looks good. E.g.

 # rados get -p cephfs_data 200212e68b5.00000000 --namespace=xxx
 200212e68b5.00000000
 # file 200212e68b5.00000000
 200212e68b5.00000000: PNG image data, 960 x 815, 8-bit/color RGBA,
 non-interlaced

 So I managed to recover the files doing something like this (using an
 input file mapping inode to filename) [see PS 0].

 But I'm wondering if a forward scrub is able to fix this sort of
 problem directly? 
 Someday perhaps but not yet. But it's not clear this is something the
 MDS should repair. The client clearly didn't flush the dirty size to
 the MDS yet. This is one of those situations where the client has done
 write() but not yet fsync(), logically.

The root cause in this case turned out to be
https://bugzilla.redhat.com/show_bug.cgi?id=1710751
We haven't seen this again after updating client kernels.

Best Regards,

Dan

2024

2023

2022

2021

2020

2019

[ceph-users] Re: cephfs forward scrubbing docs