It looks like the second mon server was down from my reboot. Restarted and everything is
functional again but I still can’t figure out why only 2 out of the 5 mon servers is down
and won’t start. If they were functioning, I probably wouldn’t have noticing the cluster
being down.
Thanks
-jeremy
On Jun 7, 2021, at 7:53 PM, Jeremy Hansen
<jeremy(a)skidrow.la> wrote:
Signed PGP part
In an attempt to troubleshoot why only 2/5 mon services were running, I believe I’ve
broke something:
[ceph: root@cn01 /]# ceph orch ls
NAME PORTS RUNNING REFRESHED AGE PLACEMENT
alertmanager 1/1 81s ago 9d count:1
crash 6/6 7m ago 9d *
grafana 1/1 80s ago 9d count:1
mds.testfs 2/2 81s ago 9d
cn01.ceph.la1.clx.corp;cn02.ceph.la1.clx.corp;cn03.ceph.la1.clx.corp;cn04.ceph.la1.clx.corp;cn05.ceph.la1.clx.corp;cn06.ceph.la1.clx.corp;count:2
mgr 2/2 81s ago 9d count:2
mon 2/5 81s ago 9d count:5
node-exporter 6/6 7m ago 9d *
osd.all-available-devices 20/26 7m ago 9d *
osd.unmanaged 7/7 7m ago - <unmanaged>
prometheus 2/2 80s ago 9d count:2
I tried to stop and start the mon service, but now the cluster is pretty much
unresponsive, I’m assuming because I stopped mon:
[ceph: root@cn01 /]# ceph orch stop mon
Scheduled to stop mon.cn01 on host 'cn01.ceph.la1.clx.corp'
Scheduled to stop mon.cn02 on host 'cn02.ceph.la1.clx.corp'
Scheduled to stop mon.cn03 on host 'cn03.ceph.la1.clx.corp'
Scheduled to stop mon.cn04 on host 'cn04.ceph.la1.clx.corp'
Scheduled to stop mon.cn05 on host 'cn05.ceph.la1.clx.corp'
[ceph: root@cn01 /]# ceph orch start mon
^CCluster connection aborted
Now even after a reboot of the cluster, it’s unresponsive. How do I get mon started
again?
I’m going through Ceph and breaking things left and right, so I apologize for all the
questions. I learn best from breaking things and figuring out how to resolve the issues.
Thank you
-jeremy