Hi Martin,
On Thu, Feb 13, 2020 at 4:10 AM Martin Palma <martin(a)palma.bz> wrote:
Hi all,
today we observe that out of the sudden our standby-replay metadata
server continuously writes the following logs:
2020-02-13 11:56:50.216102 7fd2ad229700 1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2020-02-13 11:56:50.287699 7fd2ad229700 0 mds.beacon.dcucmds401
Skipping beacon heartbeat to monitors (last acked 100.836s ago); MDS
internal heartbeat is not healthy!
and it's memory is growing until no memory is available any more and
the service gets restarted and then stops. The funny thing is that on
the active MDS we are not seeing these log messages and any increase
of memory.
We are running ceph version 12.2.10 on all nodes of our Ceph cluster.
Any suggestions?
Please increase debugging on the standby-replay daemon and share the logs.
--
Patrick Donnelly, Ph.D.
He / Him / His
Senior Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D