[ceph-users] Re: ceph mons stuck in electing state

4 Sep 2019

Hi Huang,

Thanks for offering to help but this original issue with the ceph-mon's not
connecting already got diagnosed as a possible networking error at the
hardware level last week. We originally removed all the mons except one to
force it to come online without waiting for a quorum, and the networking
was diagnosed and fixed after that was implemented. We are pretty sure it
was ultimately the result of aging hardware being pushed to its limit with
the rebuilding and repairing and several changes I made in a short time
period.

On Tue, Sep 3, 2019 at 9:41 PM huang jun &lt;hjwsm1989(a)gmail.com&gt; wrote:

...
  can you set debug_mon=20 and debug_paxos=20 and
debug_ms=1 on all mon
 and get log?

 Ashley Merrick &lt;singapore(a)amerrick.co.uk&gt; 于2019年9月3日周二 下午9:35写道：

 What change did you make in ceph.conf

 Id check that hasn't caused an issue first.

 ---- On Tue, 27 Aug 2019 04:37:15 +0800 nkerns92(a)gmail.com wrote ----

 Hello,

 I have an old ceph 0.94.10 cluster that had 10 storage nodes with one  extra
management node used for running commands on the cluster. Over time
 we'd had some hardware failures on some of the storage nodes, so we're down
 to 6, with ceph-mon running on the management server and 4 of the storage
 nodes. We attempted deploying a ceph.conf change and restarted ceph-mon and
 ceph-osd services, but the cluster went down on us. We found all the
 ceph-mons are stuck in the electing state, I can't get any response from
 any ceph commands but I found I can contact the daemon directly and get
 this information (hostnames removed for privacy reasons):

 root@<mgmt1>:~# ceph daemon mon.<mgmt1> mon_status
 {
     "name": "<mgmt1>",
     "rank": 0,
     "state": "electing",
     "election_epoch": 4327,
     "quorum": [],
     "outside_quorum": [],
     "extra_probe_peers": [],
     "sync_provider": [],
     "monmap": {
         "epoch": 10,
         "fsid": "69611c75-200f-4861-8709-8a0adc64a1c9",
         "modified": "2019-08-23 08:20:57.620147",
         "created": "0.000000",
         "mons": [
             {
                 "rank": 0,
                 "name": "<mgmt1>",
                 "addr": "[fdc4:8570:e14c:132d::15]:6789\/0"
             },
             {
                 "rank": 1,
                 "name": "<mon1>",
                 "addr": "[fdc4:8570:e14c:132d::16]:6789\/0"
             },
             {
                 "rank": 2,
                 "name": "<mon2>",
                 "addr": "[fdc4:8570:e14c:132d::28]:6789\/0"
             },
             {
                 "rank": 3,
                 "name": "<mon3>",
                 "addr": "[fdc4:8570:e14c:132d::29]:6789\/0"
             },
             {
                 "rank": 4,
                 "name": "<mon4>",
                 "addr": "[fdc4:8570:e14c:132d::151]:6789\/0"
             }
         ]
     }
 }

 Is there any way to force the cluster back into a quorum even if it's  just one
mon running to start it up? I've tried exporting the mgmt's monmap
 and injecting it into the other nodes, but it didn't make any difference.

 Thanks!
 _______________________________________________
 ceph-users mailing list -- ceph-users(a)ceph.io
 To unsubscribe send an email to ceph-users-leave(a)ceph.io

 _______________________________________________
 ceph-users mailing list -- ceph-users(a)ceph.io
 To unsubscribe send an email to ceph-users-leave(a)ceph.io 

2024

2023

2022

2021

2020

2019

[ceph-users] Re: ceph mons stuck in electing state