Ceph standby-replay metadata server: MDS internal heartbeat is not healthy - ceph-users

13 Feb 2020

Hi all,

today we observe that out of the sudden our standby-replay metadata
server continuously writes the following logs:

2020-02-13 11:56:50.216102 7fd2ad229700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2020-02-13 11:56:50.287699 7fd2ad229700  0 mds.beacon.dcucmds401
Skipping beacon heartbeat to monitors (last acked 100.836s ago); MDS
internal heartbeat is not healthy!

and it's memory is growing until no memory is available any more and
the service gets restarted and then stops. The funny thing is that on
the active MDS we are not seeing these log messages and any increase
of memory.

We are running ceph version 12.2.10 on all nodes of our Ceph cluster.
Any suggestions?

Best,
Martin