Docs on Containerized Mon Maintenance - ceph-users

16 Jun 2021

Hey folks,

I'm working through some basic ops drills, and noticed what I think is an
inconsistency in the Cephadm Docs.  Some Googling appears to show this is a
known thing, but I didn't find a clear direction on cooking up a solution
yet.

On a cluster with 5 mons, 2 were abruptly removed when their host OS
decided to do scheduled maintenance without asking first.  Those hosts only
had mons running on them (and mds/crash/node exporter), so I still have 3
mon quorum and the cluster is happy.

It's not clear to me how I add these hosts back in as mons though.  In the
troubleshooting docs it describes bringing all mons down, then extracting a
monmap.  I tried this through various iterations of bringing all down,
bringing one back up and entering the container; bringing all down and
trying to use ceph-mon from cephadm shell and so on.  I either got rocksdb
lock issues presumably because a mon node was running, or an error that the
path to the mon data didn't exist, presumably for the opposite reason.

Is there guidance on the container-friendly way to perform the monmap
maintenance?

I did think that because I still have quorum, I could simply do ceph orch
apply mon label:mon instead, but I am nervous this might upset my remaining
mons.  Looking at the ceph orch ls output I see:

root@kida:/# ceph orch ls
NAME                       PORTS  RUNNING  REFRESHED  AGE  PLACEMENT
alertmanager                          1/1  7m ago     2h   count:1
crash                                 5/5  9m ago     2h   *
grafana                               1/1  7m ago     2h   count:1
mds.media                             3/3  9m ago     2h
thebends;okcomputer;amnesiac
mgr                                   2/2  9m ago     2h   count:2
mon                                   3/5  9m ago     2h   label:mon
node-exporter                         5/5  9m ago     2h   *
osd.all-available-devices            5/10  9m ago     2h   *
prometheus                            1/1  7m ago     2h   count:1
root@kida:/#

So is it expecting 2 more mons, or has it autoscaled down cleverly?

Looking at ceph orch ps I see:
root@kida:/# ceph orch ps
NAME                         HOST         PORTS        STATUS
REFRESHED  AGE  VERSION    IMAGE ID      CONTAINER ID
alertmanager.kida            kida         *:9093,9094  running (2h)   8m
ago     2h   0.20.0     0881eb8f169f  89c604455194
crash.amnesiac               amnesiac                  running (11h)  8m
ago     11h  16.2.4     8d91d370c2b8  bff086c930db
crash.kida                   kida                      running (2h)   8m
ago     2h   16.2.4     8d91d370c2b8  b0ac059be109
crash.kingoflimbs            kingoflimbs               running (13h)  8m
ago     13h  16.2.4     8d91d370c2b8  b0955309a8b9
crash.okcomputer             okcomputer                running (2h)   10m
ago    2h   16.2.4     8d91d370c2b8  a75cf65ef235
crash.thebends               thebends                  running (2h)   8m
ago     2h   16.2.4     8d91d370c2b8  befe9c1015f3
grafana.kida                 kida         *:3000       running (2h)   8m
ago     2h   6.7.4      ae5c36c3d3cd  f85747138299
mds.media.amnesiac.uujwlk    amnesiac                  running (11h)  8m
ago     2h   16.2.4     8d91d370c2b8  512a2fcc0f97
mds.media.okcomputer.nednib  okcomputer                running (2h)   10m
ago    2h   16.2.4     8d91d370c2b8  10c6244a9308
mds.media.thebends.pqsfeb    thebends                  running (2h)   8m
ago     2h   16.2.4     8d91d370c2b8  c1b75831a973
mgr.kida.kchysa              kida         *:9283       running (2h)   8m
ago     2h   16.2.4     8d91d370c2b8  602acc0d8df3
mgr.okcomputer.rjtrqw        okcomputer   *:8443,9283  running (2h)   10m
ago    2h   16.2.4     8d91d370c2b8  605a8a25a604
mon.amnesiac                 amnesiac                  stopped        8m
ago     2h   <unknown>  <unknown>     <unknown>
mon.kida                     kida                      running (2h)   8m
ago     2h   16.2.4     8d91d370c2b8  a441563a978d
mon.kingoflimbs              kingoflimbs               stopped        8m
ago     2h   <unknown>  <unknown>     <unknown>
mon.okcomputer               okcomputer                running (2h)   10m
ago    2h   16.2.4     8d91d370c2b8  c4297efafe27
mon.thebends                 thebends                  running (2h)   8m
ago     2h   16.2.4     8d91d370c2b8  e2394d5f152b
node-exporter.amnesiac       amnesiac     *:9100       running (11h)  8m
ago     2h   0.18.1     e5a616e4b9cf  da3c69057c4f
node-exporter.kida           kida         *:9100       running (2h)   8m
ago     2h   0.18.1     e5a616e4b9cf  5c9219a29257
node-exporter.kingoflimbs    kingoflimbs  *:9100       running (13h)  8m
ago     2h   0.18.1     e5a616e4b9cf  c2236491fb6e
node-exporter.okcomputer     okcomputer   *:9100       running (2h)   10m
ago    2h   0.18.1     e5a616e4b9cf  2e53a82eed32
node-exporter.thebends       thebends     *:9100       running (2h)   8m
ago     2h   0.18.1     e5a616e4b9cf  def6bdd359d6
osd.0                        kida                      running (2h)   8m
ago     2h   16.2.4     8d91d370c2b8  c1419a29ddd8
osd.1                        kida                      running (85m)  8m
ago     2h   16.2.4     8d91d370c2b8  dcb172c628ec
osd.2                        thebends                  running (2h)   8m
ago     2h   16.2.4     8d91d370c2b8  4826e3da8d14
osd.3                        okcomputer                running (2h)   10m
ago    2h   16.2.4     8d91d370c2b8  5424d437c270
osd.4                        thebends                  running (2h)   8m
ago     2h   16.2.4     8d91d370c2b8  47e682c3727d
prometheus.kida              kida         *:9095       running (2h)   8m
ago     2h   2.18.1     de242295e225  4c8e7fdd89a8
root@kida:/#

So those mon containers are still there, stopped.  ceph orch daemon restart
mon.amnesiac gives notice that a restart is scheduled on that mon.  The
container status updates in ceph orch ps to running, but version, image ID
and container ID are <unknown> and I don't see that mon unit in any status
output or log. cephadm unit --name mon.amnesiac restart --fsid
yadda-yadda-yadda errors with daemon not found, it seems like the cephadm
cli command is scoped to  the daemons running on the same host it's being
executed on, rather than cluster-wide like ceph orch.

Any clues offered to further investigation are welcomed.

Best regards

Phil