FYI, I'm getting monitors assigned via '... apply label:mon' with
current and valid 'mon' tags: 'committing suicide' after surprise
reboots in the 'Pacific' 16.2.4 release. The tag indicating a monitor
should be assigned to that host is present and never changed.
Deleting the mon tag, waiting a minute, then re-adding the 'mon' tag to
the host causes the monitor to redeploy and run properly.
I have 5 monitors assigned via the orchestrator's 'label:mon', all in
docker containers. Upon reboot that goes to 4 monitors deployed. On the
offending host in the logs I see this:
May 28 11:06:59 noc4 bash[10563]: debug 2021-05-28T16:06:59.771+0000
7f7a029bf700 0 using public_addr v2:[fc00:1002:c7::44]:0/0 ->
[v2:[fc00:1002:c7::44]:3300/0,v1:[fc00:1002:c7::44]:6789/0]
May 28 11:06:59 noc4 bash[10563]: debug 2021-05-28T16:06:59.771+0000
7f7a029bf700 0 starting mon.noc4 rank -1 at public addrs
[v2:[fc00:1002:c7::44]:3300/0,v1:[fc00:1002:c7::44]:6789/0] at bind
addrs [v2:[fc00:1002:c7::44]:3300/0,v1:[fc00:1002:c7::44]:6789/0]
mon_data /var/lib/ceph/mon/ceph-noc4 fsid
4067126d-01cb-40af-824a-881c130140f8
May 28 11:06:59 noc4 bash[10563]: debug 2021-05-28T16:06:59.775+0000
7f7a029bf700 1 mon.noc4@-1(???) e40 preinit fsid
4067126d-01cb-40af-824a-xxxxxxxxx
May 28 11:06:59 noc4 bash[10563]: debug 2021-05-28T16:06:59.775+0000
7f7a029bf700 -1 mon.noc4@-1(???) e40 not in monmap and have been in a
quorum before; must have been removed
May 28 11:06:59 noc4 bash[10563]: debug 2021-05-28T16:06:59.775+0000
7f7a029bf700 -1 mon.noc4@-1(???) e40 commit suicide!
May 28 11:06:59 noc4 bash[10563]: debug 2021-05-28T16:06:59.775+0000
7f7a029bf700 -1 failed to initialize
Seems odd. And, you know as debug comments go, 'commit suicide!',
appears to have an 'extra coffee that day' aspect.
HC