Re: MDS crash - ceph-users

27 Apr 2024

Colleagues, thank you for the advice to check the operability of MGRs. In fact, it is
strange also: we checked our nodes for the network issues (ip connectivity, sockets, ACL,
DNS) and find nothing wrong - but suddenly just the restart of all MGRs solved the problem
with stale PGs and with ceph commands hang!

So, we are at the start point again - ceph is working except MDS daemons crash. But now we
see some additional errors in MDS logs when try to start the daemon:

dir 0x1000dd10fa0 object missing on disk; some files may be lost
(/volumes/csi/csi-vol-2eb40f89-f2e1-11ee-b657-3aa98da4c4a6/1080803d-1277-4ad8-ae80-a004bd3a5699/gallery/pc-12083932925583528732)

dir 0x1000dd10f9d object missing on disk; some files may be lost
(/volumes/csi/csi-vol-2eb40f89-f2e1-11ee-b657-3aa98da4c4a6/1080803d-1277-4ad8-ae80-a004bd3a5699/cadserver-filevault/project-files/661fb14d341d3746ea5c2a8f

 I promiced to create the bug, so will do it later a bit. But should I try to do something
more from my side also?  What I did exactly last time:

cephfs-journal-tool journal reset
cephfs-table-tool all reset session
cephfs-data-scan scan_extents
cephfs-data-scan scan_inodes
cephfs-data-scan scan_links
cephfs-data-scan cleanup

And one more question: is it possible to access to cephfs content directly, without MDS?