Problems with mon - ceph-users - lists.ceph.io

List overview All Threads
Download

Problems with mon

Manual bucket resharding problem

Multisite design details

Mateusz Skała

13 Oct 2020 13 Oct '20

8:56 a.m.

Hello Community, I have problems with ceph-mons in docker. Docker pods are starting but I got a lot of messages "e6 handle_auth_request failed to assign global_id” in log. 2 mons are up but I can’t send any ceph commands. Regards Mateusz

Reply

Show replies by date

Gaël THEROND

13 Oct 13 Oct

9:24 a.m.

This error means your quorum didn’t formed. How much mon nodes do you have usually and how much went down? Le mar. 13 oct. 2020 à 10:56, Mateusz Skała <mateusz.skala(a)gmail.com> a écrit :

Hello Community, I have problems with ceph-mons in docker. Docker pods are starting but I got a lot of messages "e6 handle_auth_request failed to assign global_id” in log. 2 mons are up but I can’t send any ceph commands. Regards Mateusz _______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

Reply

Mateusz Skała

10:56 a.m.

Hi, Thanks for responding, all monitors goes down, 2/3 is actually up, but probably not in the quorum. Quick look for before tasks: 1. few pgs without scrub and deep-scrub, 2 mons in cluster 2. added one monitor (via ansible), ansible restarted osd 3. all system os filesystem goes full (because of multiple sst files) 4. all pods with monitors goes down 5. added new fs for monitors, and move data from system os to this fs 6. 2 monitors started (last with failure), but not responding for any commands Regards Mateusz Skała On Tue, 13 Oct 2020 at 11:25, Gaël THEROND <gael.therond(a)bitswalk.com> wrote:

This error means your quorum didn’t formed. How much mon nodes do you have usually and how much went down? Le mar. 13 oct. 2020 à 10:56, Mateusz Skała <mateusz.skala(a)gmail.com> a écrit :

Hello Community, I have problems with ceph-mons in docker. Docker pods are starting but I got a lot of messages "e6 handle_auth_request failed to assign global_id” in log. 2 mons are up but I can’t send any ceph commands. Regards Mateusz _______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

Reply

Gaël THEROND

12:59 p.m.

If you’ve got all “Nodes” up and running fine now, here what I’ve done on my own just this morning. 1°/- Ensure all MONs get the same /etc/ceph/ceph.conf file. 2°/- Many times you MONs share the same keyring, if so, ensure you’ve got the right keyring in both places /etc/ceph/ceph.mon.keyring and /var/lib/ceph/mon/<clustername>-<hostname>/keyring 3°/- Delete your NOT HEALTHY mon store and kv that you can found out on /var/lib/ceph/mon/<clustername>-<hostname>/ it will be rebuild during the restart of the mon process. 4°/- Start the latest healthy monitor and wait for him to complain about no way to acquire global_id. 5°/- Start the remaining MONs. You should see the quorum trigger a new election as soon as each mons will have detected it is part of an already existing cluster and so retrieve the appropriate data (store/kv/etc) from the remaining healthy MON. This procedure can fail if your not healthy MONs don’t get the appropriate keyring. Le mar. 13 oct. 2020 à 12:56, Mateusz Skała <mateusz.skala(a)gmail.com> a écrit :

Hi, Thanks for responding, all monitors goes down, 2/3 is actually up, but probably not in the quorum. Quick look for before tasks: 1. few pgs without scrub and deep-scrub, 2 mons in cluster 2. added one monitor (via ansible), ansible restarted osd 3. all system os filesystem goes full (because of multiple sst files) 4. all pods with monitors goes down 5. added new fs for monitors, and move data from system os to this fs 6. 2 monitors started (last with failure), but not responding for any commands Regards Mateusz Skała On Tue, 13 Oct 2020 at 11:25, Gaël THEROND <gael.therond(a)bitswalk.com> wrote: > This error means your quorum didn’t formed. > > How much mon nodes do you have usually and how much went down? > > Le mar. 13 oct. 2020 à 10:56, Mateusz Skała <mateusz.skala(a)gmail.com> a > écrit : > >> Hello Community, >> I have problems with ceph-mons in docker. Docker pods are starting but I >> got a lot of messages "e6 handle_auth_request failed to assign global_id” >> in log. 2 mons are up but I can’t send any ceph commands. >> Regards >> Mateusz >> _______________________________________________ >> ceph-users mailing list -- ceph-users(a)ceph.io >> To unsubscribe send an email to ceph-users-leave(a)ceph.io >> >

Reply

Mateusz Skała

21 Nov 21 Nov

8:26 a.m.

Hello, Thank You for help. This is done and everything is working now. Best Regards Mateusz Skała

Wiadomość napisana przez Gaël THEROND <gael.therond(a)bitswalk.com> w dniu 13.10.2020, o godz. 14:59: EXTERNAL EMAIL - Do not click any links or open any attachments unless you trust the sender and know the content is safe. If you’ve got all “Nodes” up and running fine now, here what I’ve done on my own just this morning. 1°/- Ensure all MONs get the same /etc/ceph/ceph.conf file. 2°/- Many times you MONs share the same keyring, if so, ensure you’ve got the right keyring in both places /etc/ceph/ceph.mon.keyring and /var/lib/ceph/mon/<clustername>-<hostname>/keyring 3°/- Delete your NOT HEALTHY mon store and kv that you can found out on /var/lib/ceph/mon/<clustername>-<hostname>/ it will be rebuild during the restart of the mon process. 4°/- Start the latest healthy monitor and wait for him to complain about no way to acquire global_id. 5°/- Start the remaining MONs. You should see the quorum trigger a new election as soon as each mons will have detected it is part of an already existing cluster and so retrieve the appropriate data (store/kv/etc) from the remaining healthy MON. This procedure can fail if your not healthy MONs don’t get the appropriate keyring. Le mar. 13 oct. 2020 à 12:56, Mateusz Skała <mateusz.skala(a)gmail.com <mailto:mateusz.skala@gmail.com>> a écrit :

Hi, Thanks for responding, all monitors goes down, 2/3 is actually up, but probably not in the quorum. Quick look for before tasks: 1. few pgs without scrub and deep-scrub, 2 mons in cluster 2. added one monitor (via ansible), ansible restarted osd 3. all system os filesystem goes full (because of multiple sst files) 4. all pods with monitors goes down 5. added new fs for monitors, and move data from system os to this fs 6. 2 monitors started (last with failure), but not responding for any commands Regards Mateusz Skała On Tue, 13 Oct 2020 at 11:25, Gaël THEROND <gael.therond(a)bitswalk.com> wrote: > This error means your quorum didn’t formed. > > How much mon nodes do you have usually and how much went down? > > Le mar. 13 oct. 2020 à 10:56, Mateusz Skała <mateusz.skala(a)gmail.com> a > écrit : > >> Hello Community, >> I have problems with ceph-mons in docker. Docker pods are starting but I >> got a lot of messages "e6 handle_auth_request failed to assign global_id” >> in log. 2 mons are up but I can’t send any ceph commands. >> Regards >> Mateusz >> _______________________________________________ >> ceph-users mailing list -- ceph-users(a)ceph.io >> To unsubscribe send an email to ceph-users-leave(a)ceph.io >> >

_______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io <mailto:ceph-users@ceph.io> To unsubscribe send an email to ceph-users-leave(a)ceph.io <mailto:ceph-users-leave@ceph.io>

Reply

1251

days inactive

1290

days old

ceph-users@ceph.io

Manage subscription

4 comments

2 participants

tags (0)

participants (2)

Gaël THEROND
Mateusz Skała