We have a 5 node cluster, all monitors, installed with cephadm. recently, the hosts needed
to be rebooted for upgrades, but as we rebooted them, hosts fail their cephadm check. as
you can see ceph1 is in quorum and is the host the command is run from. following is the
output of ceph -s and ceph orch host ls. ceph orch pause and resume only removed the
"offline" status of cephmon-temp, which really is offline. how do we fix ceph
orchs confusion?
the third is a temp node we had that ceph orch remove host couldnt get rid of.
ceph1:~# ceph -s
health: HEALTH_WARN
3 hosts fail cephadm check
services:
mon: 5 daemons, quorum ceph5,ceph4,ceph3,ceph2,ceph1 (age 2d)
mgr: ceph3.dmpmih(active, since 3w), standbys: ceph5.pwseyi
osd: 30 osds: 30 up (since 2d), 30 in (since 3w)
data:
pools: 2 pools, 129 pgs
objects: 2.29M objects, 8.7 TiB
usage: 22 TiB used, 87 TiB / 109 TiB avail
pgs: 129 active+clean
io:
client: 149 KiB/s wr, 0 op/s rd, 14 op/s wr
ceph1:~# ceph orch host ls
HOST ADDR LABELS STATUS
ceph1 ceph1 mon Offline
ceph2 ceph2 mon Offline
ceph3 ceph3 mon
ceph4 ceph4 mon
ceph5 ceph5 mon
cephmon-temp cephmon-temp Offline
Show replies by date