I might have a reproducer, the second rebuilt mon is not joining the
cluster as well, I'll look into it and let you know if I find anything.
Zitat von Eugen Block <eblock(a)nde.ag>ag>:
Hi,
Can anyone confirm that ancient (2017) leveldb
database mons should
just accept ‘mon.$hostname’ names for mons, a well as ‘mon.$id’ ?
at some point you had or have to remove one of the mons to recreate
it with a rocksdb backend, so the mismatch should not be an issue
here. I can confirm that when I tried to reproduce it in a small
test cluster with leveldb. So now I have two leveldb MONs and one
rocksdb MON:
jewel:~ # cat
/var/lib/ceph/b08424fa-8530-4080-876d-2821c916d26c/mon.jewel/kv_backend
rocksdb
jewel2:~ # cat
/var/lib/ceph/b08424fa-8530-4080-876d-2821c916d26c/mon.jewel2/kv_backend
leveldb
jewel3:~ # cat
/var/lib/ceph/b08424fa-8530-4080-876d-2821c916d26c/mon.jewel3/kv_backend
leveldb
And the cluster is healthy, although it took a minute or two for the
rebuilt MON to sync (in a real cluster with some load etc. it might
take longer):
jewel:~ # ceph -s
cluster:
id: b08424fa-8530-4080-876d-2821c916d26c
health: HEALTH_OK
services:
mon: 3 daemons, quorum jewel2,jewel3,jewel (age 3m)
I'm wondering if this could have to do with the insecure_global_id
things. Can you send the output of:
ceph config get mon auth_allow_insecure_global_id_reclaim
ceph config get mon auth_expose_insecure_global_id_reclaim
ceph config get mon mon_warn_on_insecure_global_id_reclaim
ceph config get mon mon_warn_on_insecure_global_id_reclaim_allowed
Zitat von Mark Schouten <mark(a)tuxis.nl>nl>:
> Hi,
>
> I don’t have a fourth machine available, so that’s not an option
> unfortunatly.
>
> I did enable a lot of debugging earlier, but that shows no
> information as to why stuff is not working as to be expected.
>
> Proxmox just deploys the mons, nothing fancy there, no special cases.
>
Can anyone confirm that ancient (2017) leveldb
database mons should
just accept ‘mon.$hostname’ names for mons, a well as ‘mon.$id’ ?
>
> —
> Mark Schouten
> CTO, Tuxis B.V.
> +31 318 200208 / mark(a)tuxis.nl
>
>
> ------ Original Message ------
> From "Eugen Block" <eblock(a)nde.ag>
> To ceph-users(a)ceph.io
> Date 31/01/2024, 13:02:04
> Subject [ceph-users] Re: Cannot recreate monitor in upgrade from
> pacific to quincy (leveldb -> rocksdb)
>
>> Hi Mark,
>>
>> as I'm not familiar with proxmox I'm not sure what happens under
>> the hood. There are a couple of things I would try, not
>> necessarily in this order:
>>
>> - Check the troubleshooting guide [1], for example a clock skew
>> could be one reason, have you verified ntp/chronyd functionality?
>> - Inspect debug log output, maybe first on the probing mon and if
>> those don't reveal the reason, enable debug logs for the other
>> MONs as well:
>> ceph config set mon.proxmox03 debug_mon 20
>> ceph config set mon.proxmox03 debug_paxos 20
>>
>> or for all MONs:
>> ceph config set mon debug_mon 20
>> ceph config set mon debug_paxos 20
>>
>> - Try to deploy an additional MON on a different server (if you
>> have more available) and see if that works.
>> - Does proxmox log anything?
>> - Maybe last resort, try to start a MON manually after adding it
>> to the monmap with the monmaptool, but only if you know what
>> you're doing. I wonder if the monmap doesn't get updated...
>>
>> Regards,
>> Eugen
>>
>> [1]
>>
https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/
>>
>> Zitat von Mark Schouten <mark(a)tuxis.nl>nl>:
>>
>>> Hi,
>>>
>>> During an upgrade from pacific to quincy, we needed to recreate
>>> the mons because the mons were pretty old and still using leveldb.
>>>
>>> So step one was to destroy one of the mons. After that we
>>> recreated the monitor, and although it starts, it remains in
>>> state ‘probing’, as you can see below.
>>>
>>> No matter what I tried, it won’t come up. I’ve seen quite some
>>> messages that the MTU might be an issue, but that seems to be ok:
>>> root@proxmox03:/var/log/ceph# fping -b 1472 10.10.10.{1..3} -M
>>> 10.10.10.1 is alive
>>> 10.10.10.2 is alive
>>> 10.10.10.3 is alive
>>>
>>>
>>> Does anyone have an idea how to fix this? I’ve tried destroying
>>> and recreating the mon a few times now. Could it be that the
>>> leveldb mons only support mon.$id notation for the monitors?
>>>
>>> root@proxmox03:/var/log/ceph# ceph daemon mon.proxmox03 mon_status
>>> {
>>> "name": “proxmox03”,
>>> "rank": 2,
>>> "state": “probing”,
>>> "election_epoch": 0,
>>> "quorum": [],
>>> "features": {
>>> "required_con": “2449958197560098820”,
>>> "required_mon": [
>>> “kraken”,
>>> “luminous”,
>>> “mimic”,
>>> "osdmap-prune”,
>>> “nautilus”,
>>> “octopus”,
>>> “pacific”,
>>> "elector-pinging”
>>> ],
>>> "quorum_con": “0”,
>>> "quorum_mon": []
>>> },
>>> "outside_quorum": [
>>> “proxmox03”
>>> ],
>>> "extra_probe_peers": [],
>>> "sync_provider": [],
>>> "monmap": {
>>> "epoch": 0,
>>> "fsid": "39b1e85c-7b47-4262-9f0a-47ae91042bac”,
>>> "modified": "2024-01-23T21:02:12.631320Z”,
>>> "created": "2017-03-15T14:54:55.743017Z”,
>>> "min_mon_release": 16,
>>> "min_mon_release_name": “pacific”,
>>> "election_strategy": 1,
>>> "disallowed_leaders: ": “”,
>>> "stretch_mode": false,
>>> "tiebreaker_mon": “”,
>>> "removed_ranks: ": “2”,
>>> "features": {
>>> "persistent": [
>>> “kraken”,
>>> “luminous”,
>>> “mimic”,
>>> "osdmap-prune”,
>>> “nautilus”,
>>> “octopus”,
>>> “pacific”,
>>> "elector-pinging”
>>> ],
>>> "optional": []
>>> },
>>> "mons": [
>>> {
>>> "rank": 0,
>>> "name": “0”,
>>> "public_addrs": {
>>> "addrvec": [
>>> {
>>> "type": “v2”,
>>> "addr": "10.10.10.1:3300”,
>>> "nonce": 0
>>> },
>>> {
>>> "type": “v1”,
>>> "addr": "10.10.10.1:6789”,
>>> "nonce": 0
>>> }
>>> ]
>>> },
>>> "addr": "10.10.10.1:6789/0”,
>>> "public_addr": "10.10.10.1:6789/0”,
>>> "priority": 0,
>>> "weight": 0,
>>> "crush_location": “{}”
>>> },
>>> {
>>> "rank": 1,
>>> "name": “1”,
>>> "public_addrs": {
>>> "addrvec": [
>>> {
>>> "type": “v2”,
>>> "addr": "10.10.10.2:3300”,
>>> "nonce": 0
>>> },
>>> {
>>> "type": “v1”,
>>> "addr": "10.10.10.2:6789”,
>>> "nonce": 0
>>> }
>>> ]
>>> },
>>> "addr": "10.10.10.2:6789/0”,
>>> "public_addr": "10.10.10.2:6789/0”,
>>> "priority": 0,
>>> "weight": 0,
>>> "crush_location": “{}”
>>> },
>>> {
>>> "rank": 2,
>>> "name": “proxmox03”,
>>> "public_addrs": {
>>> "addrvec": [
>>> {
>>> "type": “v2”,
>>> "addr": "10.10.10.3:3300”,
>>> "nonce": 0
>>> },
>>> {
>>> "type": “v1”,
>>> "addr": "10.10.10.3:6789”,
>>> "nonce": 0
>>> }
>>> ]
>>> },
>>> "addr": "10.10.10.3:6789/0”,
>>> "public_addr": "10.10.10.3:6789/0”,
>>> "priority": 0,
>>> "weight": 0,
>>> "crush_location": “{}”
>>> }
>>> ]
>>> },
>>> "feature_map": {
>>> "mon": [
>>> {
>>> "features": “0x3f01cfbdfffdffff”,
>>> "release": “luminous”,
>>> "num": 1
>>> }
>>> ]
>>> },
>>> "stretch_mode": false
>>> }
>>>
>>> —
>>> Mark Schouten
>>> CTO, Tuxis B.V.
>>> +31 318 200208 / mark(a)tuxis.nl
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>
>>
>> _______________________________________________
>> ceph-users mailing list -- ceph-users(a)ceph.io
>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io