How to find out why osd crashed with cephadm/podman containers? - ceph-users

6 May 2021

Hello,

I have a small 6 nodes Octopus 15.2.11 cluster installed on bare metal with cephadm and I
added a second OSD to one of my 3 OSD nodes. I started then copying data to my ceph fs
mounted with kernel mount but then both OSDs on that specific nodes crashed.

To this topic I have the following questions:

1) How can I find out why the two OSD crashed? because everything is in podman containers
I don't know where are the logs to find out the reason why this happened. From the OS
itself everything looks ok, there was no out of memory error.

2) I would assume the two OSD container would restart on their own but this is not the
case it looks like. How can I restart manually these 2 OSD containers on that node? I
believe this should be a "cephadm orch" command?

The health of the cluster right now is:

    CEPHADM_FAILED_DAEMON: 2 failed cephadm daemon(s)
    PG_DEGRADED: Degraded data redundancy: 132518/397554 objects degraded (33.333%), 65
pgs degraded, 65 pgs undersized

Thank your for your hints.

Best regards,
Mabi