Only 2/5 mon services running - ceph-users

8 Jun 2021

In an attempt to troubleshoot why only 2/5 mon services were running, I believe I’ve broke
something:

[ceph: root@cn01 /]# ceph orch ls
NAME                       PORTS  RUNNING  REFRESHED  AGE  PLACEMENT
alertmanager                          1/1  81s ago    9d   count:1
crash                                 6/6  7m ago     9d   *
grafana                               1/1  80s ago    9d   count:1
mds.testfs                            2/2  81s ago    9d  
cn01.ceph.la1.clx.corp;cn02.ceph.la1.clx.corp;cn03.ceph.la1.clx.corp;cn04.ceph.la1.clx.corp;cn05.ceph.la1.clx.corp;cn06.ceph.la1.clx.corp;count:2
mgr                                   2/2  81s ago    9d   count:2
mon                                   2/5  81s ago    9d   count:5
node-exporter                         6/6  7m ago     9d   *
osd.all-available-devices           20/26  7m ago     9d   *
osd.unmanaged                         7/7  7m ago     -    <unmanaged>
prometheus                            2/2  80s ago    9d   count:2

I tried to stop and start the mon service, but now the cluster is pretty much
unresponsive, I’m assuming because I stopped mon:

[ceph: root@cn01 /]# ceph orch stop mon
Scheduled to stop mon.cn01 on host 'cn01.ceph.la1.clx.corp'
Scheduled to stop mon.cn02 on host 'cn02.ceph.la1.clx.corp'
Scheduled to stop mon.cn03 on host 'cn03.ceph.la1.clx.corp'
Scheduled to stop mon.cn04 on host 'cn04.ceph.la1.clx.corp'
Scheduled to stop mon.cn05 on host 'cn05.ceph.la1.clx.corp'
[ceph: root@cn01 /]# ceph orch start mon

^CCluster connection aborted

Now even after a reboot of the cluster, it’s unresponsive.  How do I get mon started
again?

I’m going through Ceph and breaking things left and right, so I apologize for all the
questions.  I learn best from breaking things and figuring out how to resolve the issues.

Thank you
-jeremy