Hello,
I have a small 6 nodes Octopus 15.2.11 cluster installed on bare metal with cephadm and I
added a second OSD to one of my 3 OSD nodes. I started then copying data to my ceph fs
mounted with kernel mount but then both OSDs on that specific nodes crashed.
To this topic I have the following questions:
1) How can I find out why the two OSD crashed? because everything is in podman containers
I don't know where are the logs to find out the reason why this happened. From the OS
itself everything looks ok, there was no out of memory error.
2) I would assume the two OSD container would restart on their own but this is not the
case it looks like. How can I restart manually these 2 OSD containers on that node? I
believe this should be a "cephadm orch" command?
The health of the cluster right now is:
CEPHADM_FAILED_DAEMON: 2 failed cephadm daemon(s)
PG_DEGRADED: Degraded data redundancy: 132518/397554 objects degraded (33.333%), 65
pgs degraded, 65 pgs undersized
Thank your for your hints.
Best regards,
Mabi