[ceph-users] Re: One mds daemon damaged, filesystem is offline. How to recover?

22 May 2021

...
  Sorry, above post has to be corrected as:"Out of
the info now  
 emerged so far seems Ceph client wanted to write an object of size  
 1555896 but managed to write only  1540096 bytes to the journal." 
Yes, I would think so, too.

...
  I think what we need to do now is:
 1. Get the MDS.0 recover, discard if necessary part of the object  
 200.00006048 and bring the MSD.0 up. 
Yes, I agree, I just can't tell what the best way is here, maybe  
remove all three objects from the disks (make a backup before doing  
that, just in case) and try the steps to recover the journal (also  
make a backup of the journal first):

mds01:~ # systemctl stop ceph-mds(a)mds01.service
mds01:~ # cephfs-journal-tool journal export myjournal.bin
mds01:~ # cephfs-journal-tool --rank=cephfs:0 event recover_dentries summary
mds01:~ # cephfs-journal-tool --rank=cephfs:0 journal reset
mds01:~ # cephfs-table-tool all reset session
mds01:~ # systemctl start ceph-mds(a)mds01.service
mds01:~ # ceph mds repaired 0
mds01:~ # ceph daemon mds.mds01 scrub_path / recursive repair

...
  2. Do the same recovery for the MSD.1 as in step 1 and
bring MDS.1 also up. 
If step 1 succeeds the standby daemons will most likely also start  
successfully.

Zitat von Sagara Wijetunga &lt;sagarawmw(a)yahoo.com&gt;om>:

> Sorry, above post has to be corrected as:"Out of the info now  
> emerged so far seems Ceph client wanted to write an object of  
> size 1555896 but managed to write only  1540096 bytes to the journal."
>
> Sagara
>     On Saturday, May 22, 2021, 08:29:34 PM GMT+8, Sagara Wijetunga  
> &lt;sagarawmw(a)yahoo.com&gt; wrote:
>
>   Out of the info now emerged so far seems Ceph client wanted to  
> write an object of size 1555896 but managed to write only 1555896  
> bytes to the journal.
> I think what we need to do now is:1. Get the MDS.0 recover, discard  
> if necessary part of the object 200.00006048 and bring the MSD.0 up.
...
  2. Do the same recovery for the MSD.1 as in step 1 and
bring MDS.1 also up. > 3. Above two steps to the most probability may bring
CephFS up.
> 4. Once the CephFS is up, scan for corrupted files, remove them and  
> bring from backup.
> 5. Get the MDS.2 to sync to MSD.0 or 1 and bring the cluster to  
> sync'ed stage.
>
> My question is, what exactly necessary to carry above step 1?
> Sagara
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io

2024

2023

2022

2021

2020

2019

[ceph-users] Re: One mds daemon damaged, filesystem is offline. How to recover?