Hi Patrick,
we have been running one daily snapshot since december and our cephfs crashed 3 times
because of this
https://tracker.ceph.com/issues/38452
We currentliy have 19 files with corrupt metadata found by your first-damage.py script. We
isolated the these files from access by users and are waiting for a fix before we remove
them with your script (or maybe a new way?)
Today we upgraded our cluster from 16.2.11 and 16.2.13. After Upgrading the mds servers,
cluster health went to ERROR MDS_DAMAGE. 'ceph tells mds 0 damage ls‘ is showing me
the same files as your script (initially only a part, after a cephfs scrub all of them).
I noticed "mds: catch damage to CDentry’s first member before persisting
(issue#58482, pr#50781, Patrick Donnelly)“ in the change logs for 16.2.13 and like to ask
you the following questions:
a) can we repair the damaged files online now instead of bringing down the whole fs and
using the python script?
b) should we set one of the new mds options in our specific case to avoid our fileserver
crashing because of the wrong snap ids?
c) will your patch prevent wrong snap ids in the future?
Regards
Felix
---------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Volker Rieke
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Dr. Astrid Lambrecht, Prof. Dr. Frauke Melchior
---------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------
Am 08.01.2023 um 02:14 schrieb Patrick Donnelly <pdonnell(a)redhat.com>om>:
On Thu, Dec 15, 2022 at 9:32 AM Stolte, Felix <f.stolte(a)fz-juelich.de> wrote:
Hi Patrick,
we used your script to repair the damaged objects on the weekend and it went smoothly.
Thanks for your support.
We adjusted your script to scan for damaged files on a daily basis, runtime is about 6h.
Until thursday last week, we had exactly the same 17 Files. On thursday at 13:05 a
snapshot was created and our active mds crashed once at this time (snapshot was
created):
2022-12-08T13:05:48.919+0100 7f440afec700 -1 /build/ceph-16.2.10/src/mds/ScatterLock.h: In
function 'void ScatterLock::set_xlock_snap_sync(MDSContext*)' thread 7f440afec700
time 2022-12-08T13:05:48.921223+0100
/build/ceph-16.2.10/src/mds/ScatterLock.h: 59: FAILED ceph_assert(state LOCK_XLOCK ||
state LOCK_XLOCKDONE)
12 Minutes lates the unlink_local error crashes appeared again. This time with a new file.
During debugging we noticed a MTU mismatch between MDS (1500) and client (9000) with
cephfs kernel mount. The client is also creating the snapshots via mkdir in the .snap
directory.
We disabled snapshot creation for now, but really need this feature. I uploaded the mds
logs of the first crash along with the information above to
https://tracker.ceph.com/issues/38452
I would greatly appreciate it, if you could answer me the following question:
Is the Bug related to our MTU Mismatch? We fixed the MTU Issue going back to 1500 on all
nodes in the ceph public network on the weekend also.
I doubt it.
If you need a debug level 20 log of the ScatterLock for further analysis, i could schedule
snapshots at the end of our workdays and increase the debug level 5 Minutes arround snap
shot creation.
This would be very helpful!
--
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D