Hi,
I made a fresh install of Ceph Octopus 15.2.3 recently.
And after a few days, the 2 standby MDS suddenly crashed with segmentation fault error.
I try to restart it but it does not start.
Here is the error :
-20> 2020-07-17T13:50:27.888+0000 7fc8c6c51700 10 monclient: _renew_subs
-19> 2020-07-17T13:50:27.888+0000 7fc8c6c51700 10 monclient: _send_mon_message to
mon.2 at v1:172.31.36.98:6789/0
-18> 2020-07-17T13:50:27.888+0000 7fc8c6c51700 10 monclient: handle_get_version_reply
finishing 0x559dcf9530c0 version 269
-17> 2020-07-17T13:50:27.888+0000 7fc8c6c51700 10 monclient: handle_get_version_reply
finishing 0x559dcfa87520 version 269
-16> 2020-07-17T13:50:27.888+0000 7fc8c6c51700 10 monclient: handle_get_version_reply
finishing 0x559dcfa875c0 version 269
-15> 2020-07-17T13:50:27.888+0000 7fc8c6c51700 10 monclient: handle_get_version_reply
finishing 0x559dcfa871c0 version 269
-14> 2020-07-17T13:50:27.888+0000 7fc8c8c55700 10 monclient: get_auth_request con
0x559dcfada000 auth_method 0
-13> 2020-07-17T13:50:27.888+0000 7fc8c9456700 10 monclient: get_auth_request con
0x559dcfada800 auth_method 0
-12> 2020-07-17T13:50:27.892+0000 7fc8bfc43700 1 mds.282966.journaler.mdlog(ro)
recover start
-11> 2020-07-17T13:50:27.892+0000 7fc8bfc43700 1 mds.282966.journaler.mdlog(ro)
read_head
-10> 2020-07-17T13:50:27.892+0000 7fc8bfc43700 4 mds.0.log Waiting for journal 0x200
to recover...
-9> 2020-07-17T13:50:27.893+0000 7fc8c0444700 1 mds.282966.journaler.mdlog(ro)
_finish_read_head loghead(trim 4194304, expire 4231216, write 4329405, stream_format 1).
probing for end of log (from 4329405)...
-8> 2020-07-17T13:50:27.893+0000 7fc8c0444700 1 mds.282966.journaler.mdlog(ro) probing
for end of the log
-7> 2020-07-17T13:50:27.893+0000 7fc8c0444700 1 mds.282966.journaler.mdlog(ro)
_finish_probe_end write_pos = 4329949 (header had 4329405). recovered.
-6> 2020-07-17T13:50:27.893+0000 7fc8bfc43700 4 mds.0.log Journal 0x200 recovered.
-5> 2020-07-17T13:50:27.893+0000 7fc8bfc43700 4 mds.0.log Recovered journal 0x200 in
format 1
-4> 2020-07-17T13:50:27.893+0000 7fc8bfc43700 2 mds.0.0 Booting: 1:
loading/discovering base inodes
-3> 2020-07-17T13:50:27.893+0000 7fc8bfc43700 0 mds.0.cache creating system inode with
ino:0x100
-2> 2020-07-17T13:50:27.894+0000 7fc8bfc43700 0 mds.0.cache creating system inode with
ino:0x1
-1> 2020-07-17T13:50:27.894+0000 7fc8c0444700 2 mds.0.0 Booting: 2: replaying mds log
0> 2020-07-17T13:50:27.896+0000 7fc8bec41700 -1 *** Caught signal (Segmentation fault)
**
in thread 7fc8bec41700 thread_name:md_log_replay
Here is the cluster information :
# ceph status
cluster:
id: dd024fe1-4996-4fed-ba57-03090e53724d
health: HEALTH_WARN
20 daemons have recently crashed
services:
mon: 3 daemons, quorum 2,0,1 (age 2d)
mgr: mgr.0(active, since 9d), standbys: mgr.2, mgr.1
mds: cephfs:1 {0=node0=up:active} 1 up:standby-replay 1 up:standby
osd: 3 osds: 3 up (since 28h), 3 in (since 9d)
task status:
scrub status:
mds.node0: idle
mds.node2: idle
data:
pools: 3 pools, 49 pgs
objects: 29 objects, 170 KiB
usage: 3.0 GiB used, 41 TiB / 41 TiB avail
pgs: 49 active+clean
io:
client: 853 B/s rd, 1 op/s rd, 0 op/s wr
There is only 1 client connected to the cluster.
Please, does anyone have any idea?
Thanks