[Ceph Octopus 15.2.3 ] MDS crashed suddently and failed to replay journal after restarting - ceph-users

5 Oct 2020

Hello,

MDS process crashed suddently. After trying to restart it, it failed to replay journal and
started to restart continually.

Just to summarize, here is what happened : 
1/ The cluster is up and running with 3 nodes (mon and mds in the same nodes)  and 3 OSD.
2/ After a few days, 2 (standby-replay and standby) of the 3 MDS processes crashed. No
pid. Ceph status indicates that the processes are down
3/ I try restart it :
   - Sometimes, the restarting fails with segmentation fault error. Here is the
ceph-mds.log file :
  -20> 2020-07-17T13:50:27.888+0000 7fc8c6c51700 10 monclient: _renew_subs
 -19> 2020-07-17T13:50:27.888+0000 7fc8c6c51700 10 monclient: _send_mon_message to
mon.2 at v1:172.31.36.98:6789/0
 -18> 2020-07-17T13:50:27.888+0000 7fc8c6c51700 10 monclient: handle_get_version_reply
finishing 0x559dcf9530c0 version 269
 -17> 2020-07-17T13:50:27.888+0000 7fc8c6c51700 10 monclient: handle_get_version_reply
finishing 0x559dcfa87520 version 269
 -16> 2020-07-17T13:50:27.888+0000 7fc8c6c51700 10 monclient: handle_get_version_reply
finishing 0x559dcfa875c0 version 269
 -15> 2020-07-17T13:50:27.888+0000 7fc8c6c51700 10 monclient: handle_get_version_reply
finishing 0x559dcfa871c0 version 269
 -14> 2020-07-17T13:50:27.888+0000 7fc8c8c55700 10 monclient: get_auth_request con
0x559dcfada000 auth_method 0
 -13> 2020-07-17T13:50:27.888+0000 7fc8c9456700 10 monclient: get_auth_request con
0x559dcfada800 auth_method 0
 -12> 2020-07-17T13:50:27.892+0000 7fc8bfc43700 1 mds.282966.journaler.mdlog(ro)
recover start
 -11> 2020-07-17T13:50:27.892+0000 7fc8bfc43700 1 mds.282966.journaler.mdlog(ro)
read_head
 -10> 2020-07-17T13:50:27.892+0000 7fc8bfc43700 4 mds.0.log Waiting for journal 0x200
to recover...
 -9> 2020-07-17T13:50:27.893+0000 7fc8c0444700 1 mds.282966.journaler.mdlog(ro)
_finish_read_head loghead(trim 4194304, expire 4231216, write 4329405, stream_format 1).
probing for end of log (from 4329405)...
 -8> 2020-07-17T13:50:27.893+0000 7fc8c0444700 1 mds.282966.journaler.mdlog(ro) probing
for end of the log
 -7> 2020-07-17T13:50:27.893+0000 7fc8c0444700 1 mds.282966.journaler.mdlog(ro)
_finish_probe_end write_pos = 4329949 (header had 4329405). recovered.
 -6> 2020-07-17T13:50:27.893+0000 7fc8bfc43700 4 mds.0.log Journal 0x200 recovered.
 -5> 2020-07-17T13:50:27.893+0000 7fc8bfc43700 4 mds.0.log Recovered journal 0x200 in
format 1
 -4> 2020-07-17T13:50:27.893+0000 7fc8bfc43700 2 mds.0.0 Booting: 1:
loading/discovering base inodes
 -3> 2020-07-17T13:50:27.893+0000 7fc8bfc43700 0 mds.0.cache creating system inode with
ino:0x100
 -2> 2020-07-17T13:50:27.894+0000 7fc8bfc43700 0 mds.0.cache creating system inode with
ino:0x1
 -1> 2020-07-17T13:50:27.894+0000 7fc8c0444700 2 mds.0.0 Booting: 2: replaying mds log
 0> 2020-07-17T13:50:27.896+0000 7fc8bec41700 -1 *** Caught signal (Segmentation fault)
**
 in thread 7fc8bec41700 thread_name:md_log_replay 

- Sometimes, the restarting works but journal replay failed even after having reset it (#
cephfs-journal-tool --rank=cephfs:0 journal reset) on the failed nodes. The cluster status
look like this :

# ceph status -w 
 cluster: 
 id: acd73aa2-8cdd-41a3-9941-fb397aa1d79e 
 health: HEALTH_WARN 
 1 daemons have recently crashed 

 services: 
 mon: 3 daemons, quorum 2,0,1 (age 3w) 
 mgr: mgr.0(active, since 11w), standbys: mgr.2, mgr.1 
 mds: cephfs:1 {0=node1=up:active} 1 up:standby-replay 1 up:standby 
 osd: 3 osds: 3 up (since 33h), 3 in (since 11w) 

 task status: 
 scrub status: 
 mds.node1: idle 

 data: 
 pools: 3 pools, 49 pgs 
 objects: 165 objects, 157 MiB 
 usage: 3.5 GiB used, 41 TiB / 41 TiB avail 
 pgs: 49 active+clean 

 io: 
 client: 1.8 MiB/s rd, 4 op/s rd, 0 op/s wr 

2020-10-05T13:32:03.798231+0000 mds.node0 [ERR] failure replaying journal (EMetaBlob) 
2020-10-05T13:32:03.851986+0000 mon.2 [INF] daemon mds.node0 restarted 
2020-10-05T13:32:04.605163+0000 mds.node0 [ERR] failure replaying journal (EMetaBlob) 
2020-10-05T13:32:08.652989+0000 mon.2 [INF] daemon mds.node0 restarted 
2020-10-05T13:32:08.916347+0000 mds.node0 [ERR] failure replaying journal (EMetaBlob) 
2020-10-05T13:32:12.961902+0000 mon.2 [INF] daemon mds.node0 restarted 
2020-10-05T13:32:13.974410+0000 mds.node0 [ERR] failure replaying journal (EMetaBlob) 
2020-10-05T13:32:14.023126+0000 mon.2 [INF] daemon mds.node0 restarted 
2020-10-05T13:32:14.610039+0000 mds.node0 [ERR] failure replaying journal (EMetaBlob)

Question :
-  Why 2 of 3 MDS processes sometimes crash? I suspect the client ( kernel 4.20) on which
there is a cephfs in-tree provisioner (not csi) for kubernetes. how can i highlight it?

Thanks for your support