Can't add a ceph-mon to existing large cluster

List overview All Threads
Download

newer

older

Re: How can I fix "object unfound"...

rbd-mirror - which direction?

Dan van der Ster

5 Mar 2020 5 Mar '20

5:41 p.m.

Hi all, There's something broken in our env when we try to add new mons to existing clusters, confirmed on two clusters running mimic and nautilus. It's basically this issue https://tracker.ceph.com/issues/42830 In case something is wrong with our puppet manifests, I'm trying to doing it manually. First we --mkfs the mon and start it, but as soon as the new mon starts synchronizing, the existing leader becomes unresponsive and an election is triggered. Here's exactly what I'm doing: # cd /var/lib/ceph/tmp/ # scp cephmon1:/var/lib/ceph/tmp/keyring.mon.cephmon1 keyring.mon.cephmon4 # ceph mon getmap -o monmap # ceph-mon --mkfs -i cephmon4 --monmap monmap --keyrin keyring.mon.cephmon4 --setuser ceph --setgroup ceph # vi /etc/ceph/ceph.conf <add the new mon to ceph.conf like this> [mon.cephmon4] host = cephmon4 mon addr = a.b.c.d:6790 # systemctl start ceph-mon@cephmon4 The log file on the new mon shows it start synchronizing, then immediately the CPU usage on the leader goes to 100% and elections start happening, and ceph health shows mon slow ops. perf top of the ceph-mon with 100% CPU is shown below [1]. On a small nautilus cluster, the new mon gets added withing a minute or so (but not cleanly -- the leader is unresponsive for quite awhile until the new mon joins). debug_mon=20 on the leader doesn't show anything very interesting. On our large mimic cluster we tried waiting more than 10 minutes -- suffering through several mon elections and 100% usage bouncing around between leaders -- until we gave up. I'm pulling my hair out a bit on this -- it's really weird! Did anyone add a new mon to an existing large cluster recently, and it went smoothly? Cheers, Dan [1] 15.12% ceph-mon [.] MonitorDBStore::Transaction::encode 8.95% libceph-common.so.0 [.] ceph::buffer::v14_2_0::ptr::append 8.68% libceph-common.so.0 [.] ceph::buffer::v14_2_0::list::append 7.69% libceph-common.so.0 [.] ceph::buffer::v14_2_0::ptr::release 5.86% libceph-common.so.0 [.] ceph::buffer::v14_2_0::ptr::ptr

Show replies by date

Sage Weil

5 Mar 5 Mar

9:22 p.m.

On Thu, 5 Mar 2020, Dan van der Ster wrote:

...

Can you try running a rocksdb compaction on the existing mons before adding the new one and see if that helps? s > > Did anyone add a new mon to an existing large cluster recently, and it > went smoothly? > > Cheers, Dan > > [1] > > 15.12% ceph-mon [.] > MonitorDBStore::Transaction::encode > 8.95% libceph-common.so.0 [.] > ceph::buffer::v14_2_0::ptr::append > 8.68% libceph-common.so.0 [.] > ceph::buffer::v14_2_0::list::append > 7.69% libceph-common.so.0 [.] > ceph::buffer::v14_2_0::ptr::release > 5.86% libceph-common.so.0 [.] > ceph::buffer::v14_2_0::ptr::ptr > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io > >

Wido den Hollander

9:31 p.m.

On 3/5/20 3:22 PM, Sage Weil wrote:

...

On Thu, 5 Mar 2020, Dan van der Ster wrote:

Can you try running a rocksdb compaction on the existing mons before adding the new one and see if that helps?

I can chime in here: I had this happen to a customer as well. Compact did not work. Some background: 5 Monitors and the DBs were ~350M in size. They upgraded one MON from 13.2.6 to 13.2.8 and that caused one MON (sync source) to eat 100% CPU. The logs showed that the upgraded MON (which was restarted) was in the synchronizing state. Because they had 5 MONs they now had 3 left so the cluster kept running. I left this for about 5 minutes, but it never synced. I tried a compact, didn't work either. Eventually I stopped one MON, tarballed it's database and used that to bring back the MON which was upgraded to 13.2.8 That work without any hickups. The MON joined again within a few seconds. Wido > > s > >> >> Did anyone add a new mon to an existing large cluster recently, and it >> went smoothly? >> >> Cheers, Dan >> >> [1] >> >> 15.12% ceph-mon [.] >> MonitorDBStore::Transaction::encode >> 8.95% libceph-common.so.0 [.] >> ceph::buffer::v14_2_0::ptr::append >> 8.68% libceph-common.so.0 [.] >> ceph::buffer::v14_2_0::list::append >> 7.69% libceph-common.so.0 [.] >> ceph::buffer::v14_2_0::ptr::release >> 5.86% libceph-common.so.0 [.] >> ceph::buffer::v14_2_0::ptr::ptr >> _______________________________________________ >> ceph-users mailing list -- ceph-users(a)ceph.io >> To unsubscribe send an email to ceph-users-leave(a)ceph.io >> >> > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io >

Dan van der Ster

10:29 p.m.

On Thu, Mar 5, 2020 at 3:31 PM Wido den Hollander <wido(a)42on.com> wrote:

...

On 3/5/20 3:22 PM, Sage Weil wrote:

On Thu, 5 Mar 2020, Dan van der Ster wrote:

Can you try running a rocksdb compaction on the existing mons before adding the new one and see if that helps?

Yeah, that works! -- something like: ceph mon add <newmon> ip:6789 rsync <oldmon>:/var/lib/ceph/mon/ceph.. /var/lib/ceph/mon systemctl start ceph-mon.target I guess that's a workaround, but would be good to find out why the sync source is spinning. -- dan > > That work without any hickups. The MON joined again within a few seconds. > > Wido > > > > > s > > > >> > >> Did anyone add a new mon to an existing large cluster recently, and it > >> went smoothly? > >> > >> Cheers, Dan > >> > >> [1] > >> > >> 15.12% ceph-mon [.] > >> MonitorDBStore::Transaction::encode > >> 8.95% libceph-common.so.0 [.] > >> ceph::buffer::v14_2_0::ptr::append > >> 8.68% libceph-common.so.0 [.] > >> ceph::buffer::v14_2_0::list::append > >> 7.69% libceph-common.so.0 [.] > >> ceph::buffer::v14_2_0::ptr::release > >> 5.86% libceph-common.so.0 [.] > >> ceph::buffer::v14_2_0::ptr::ptr > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users(a)ceph.io > >> To unsubscribe send an email to ceph-users-leave(a)ceph.io > >> > >> > > _______________________________________________ > > ceph-users mailing list -- ceph-users(a)ceph.io > > To unsubscribe send an email to ceph-users-leave(a)ceph.io > >

Dan van der Ster

10:14 p.m.

Hi Sage, On Thu, Mar 5, 2020 at 3:22 PM Sage Weil <sage(a)newdream.net> wrote:

...

On Thu, 5 Mar 2020, Dan van der Ster wrote:

Can you try running a rocksdb compaction on the existing mons before adding the new one and see if that helps?

It doesn't help. I compacted the 3 mons in quorum then started a new one with debug mon & paxos = 20. ceph-post-file: 9867d4ef-38cc-4ae7-9631-c6b86e666683 I stopped that new mon as soon as the sync source started spinning 100% and left the quorum. -- Dan > > s > > > > > Did anyone add a new mon to an existing large cluster recently, and it > > went smoothly? > > > > Cheers, Dan > > > > [1] > > > > 15.12% ceph-mon [.] > > MonitorDBStore::Transaction::encode > > 8.95% libceph-common.so.0 [.] > > ceph::buffer::v14_2_0::ptr::append > > 8.68% libceph-common.so.0 [.] > > ceph::buffer::v14_2_0::list::append > > 7.69% libceph-common.so.0 [.] > > ceph::buffer::v14_2_0::ptr::release > > 5.86% libceph-common.so.0 [.] > > ceph::buffer::v14_2_0::ptr::ptr > > _______________________________________________ > > ceph-users mailing list -- ceph-users(a)ceph.io > > To unsubscribe send an email to ceph-users-leave(a)ceph.io > > > >

Sage Weil

10:42 p.m.

On Thu, 5 Mar 2020, Dan van der Ster wrote:

...

Hi Sage, On Thu, Mar 5, 2020 at 3:22 PM Sage Weil <sage(a)newdream.net> wrote:

On Thu, 5 Mar 2020, Dan van der Ster wrote:

Can you try running a rocksdb compaction on the existing mons before adding the new one and see if that helps?

Can you include the log from teh sync source too? That's presumably where the bug is. Thanks! sage > > -- Dan > > > > > > s > > > > > > > > Did anyone add a new mon to an existing large cluster recently, and it > > > went smoothly? > > > > > > Cheers, Dan > > > > > > [1] > > > > > > 15.12% ceph-mon [.] > > > MonitorDBStore::Transaction::encode > > > 8.95% libceph-common.so.0 [.] > > > ceph::buffer::v14_2_0::ptr::append > > > 8.68% libceph-common.so.0 [.] > > > ceph::buffer::v14_2_0::list::append > > > 7.69% libceph-common.so.0 [.] > > > ceph::buffer::v14_2_0::ptr::release > > > 5.86% libceph-common.so.0 [.] > > > ceph::buffer::v14_2_0::ptr::ptr > > > _______________________________________________ > > > ceph-users mailing list -- ceph-users(a)ceph.io > > > To unsubscribe send an email to ceph-users-leave(a)ceph.io > > > > > > > >

Dan van der Ster

6 Mar 6 Mar

1:59 a.m.

On Thu, Mar 5, 2020 at 4:42 PM Sage Weil <sage(a)newdream.net> wrote:

...

On Thu, 5 Mar 2020, Dan van der Ster wrote:

Hi Sage, On Thu, Mar 5, 2020 at 3:22 PM Sage Weil <sage(a)newdream.net> wrote:

On Thu, 5 Mar 2020, Dan van der Ster wrote:

Can you try running a rocksdb compaction on the existing mons before adding the new one and see if that helps?

Can you include the log from teh sync source too? That's presumably where the bug is.

Here's a different new mon and the leader, with debug_paxos & mon = 20: ceph-post-file: 8db3d788-e266-4034-9d0c-4ee55eb1d055 Things start to go wrong at this line: 2020-03-05 19:37:35.697 7f5fe87e2700 10 mon.p05517715y58557@0(leader) e32 handle_sync mon_sync(get_chunk cookie 170322296835) v2 ...which is the just before it tries to sync osd_snap. I also included the output of ceph-monstore-tool dump-keys. There are really a lot of osd_snap keys! Thanks! -- dan > > Thanks! > sage > > > > > -- Dan > > > > > > > > > > s > > > > > > > > > > > Did anyone add a new mon to an existing large cluster recently, and it > > > > went smoothly? > > > > > > > > Cheers, Dan > > > > > > > > [1] > > > > > > > > 15.12% ceph-mon [.] > > > > MonitorDBStore::Transaction::encode > > > > 8.95% libceph-common.so.0 [.] > > > > ceph::buffer::v14_2_0::ptr::append > > > > 8.68% libceph-common.so.0 [.] > > > > ceph::buffer::v14_2_0::list::append > > > > 7.69% libceph-common.so.0 [.] > > > > ceph::buffer::v14_2_0::ptr::release > > > > 5.86% libceph-common.so.0 [.] > > > > ceph::buffer::v14_2_0::ptr::ptr > > > > _______________________________________________ > > > > ceph-users mailing list -- ceph-users(a)ceph.io > > > > To unsubscribe send an email to ceph-users-leave(a)ceph.io > > > > > > > > > > > >

Sage Weil

2:05 a.m.

On Thu, 5 Mar 2020, Dan van der Ster wrote:

...

On Thu, Mar 5, 2020 at 4:42 PM Sage Weil <sage(a)newdream.net> wrote:

On Thu, 5 Mar 2020, Dan van der Ster wrote:

Hi Sage, On Thu, Mar 5, 2020 at 3:22 PM Sage Weil <sage(a)newdream.net> wrote:

On Thu, 5 Mar 2020, Dan van der Ster wrote: > Hi all, > > There's something broken in our env when we try to add new mons to > existing clusters, confirmed on two clusters running mimic and > nautilus. It's basically this issue > https://tracker.ceph.com/issues/42830 > > In case something is wrong with our puppet manifests, I'm trying to > doing it manually. > > First we --mkfs the mon and start it, but as soon as the new mon > starts synchronizing, the existing leader becomes unresponsive and an > election is triggered. > > Here's exactly what I'm doing: > > # cd /var/lib/ceph/tmp/ > # scp cephmon1:/var/lib/ceph/tmp/keyring.mon.cephmon1 keyring.mon.cephmon4 > # ceph mon getmap -o monmap > # ceph-mon --mkfs -i cephmon4 --monmap monmap --keyrin > keyring.mon.cephmon4 --setuser ceph --setgroup ceph > # vi /etc/ceph/ceph.conf <add the new mon to ceph.conf like this> > [mon.cephmon4] > host = cephmon4 > mon addr = a.b.c.d:6790 > # systemctl start ceph-mon@cephmon4 > > The log file on the new mon shows it start synchronizing, then > immediately the CPU usage on the leader goes to 100% and elections > start happening, and ceph health shows mon slow ops. perf top of the > ceph-mon with 100% CPU is shown below [1]. > On a small nautilus cluster, the new mon gets added withing a minute > or so (but not cleanly -- the leader is unresponsive for quite awhile > until the new mon joins). debug_mon=20 on the leader doesn't show > anything very interesting. > On our large mimic cluster we tried waiting more than 10 minutes -- > suffering through several mon elections and 100% usage bouncing around > between leaders -- until we gave up. > > I'm pulling my hair out a bit on this -- it's really weird! Can you try running a rocksdb compaction on the existing mons before adding the new one and see if that helps?

Can you include the log from teh sync source too? That's presumably where the bug is.

Aha, I knew this sounded familiar! See https://github.com/ceph/ceph/pull/31581 We should backport this for the next nautilus... sage > > Thanks! > > -- dan > > > > > Thanks! > > sage > > > > > > > > -- Dan > > > > > > > > > > > > > > s > > > > > > > > > > > > > > Did anyone add a new mon to an existing large cluster recently, and it > > > > > went smoothly? > > > > > > > > > > Cheers, Dan > > > > > > > > > > [1] > > > > > > > > > > 15.12% ceph-mon [.] > > > > > MonitorDBStore::Transaction::encode > > > > > 8.95% libceph-common.so.0 [.] > > > > > ceph::buffer::v14_2_0::ptr::append > > > > > 8.68% libceph-common.so.0 [.] > > > > > ceph::buffer::v14_2_0::list::append > > > > > 7.69% libceph-common.so.0 [.] > > > > > ceph::buffer::v14_2_0::ptr::release > > > > > 5.86% libceph-common.so.0 [.] > > > > > ceph::buffer::v14_2_0::ptr::ptr > > > > > _______________________________________________ > > > > > ceph-users mailing list -- ceph-users(a)ceph.io > > > > > To unsubscribe send an email to ceph-users-leave(a)ceph.io > > > > > > > > > > > > > > > > > >

Dan van der Ster

2:07 a.m.

On Thu, Mar 5, 2020 at 8:05 PM Sage Weil <sage(a)newdream.net> wrote:

...

On Thu, 5 Mar 2020, Dan van der Ster wrote:

On Thu, Mar 5, 2020 at 4:42 PM Sage Weil <sage(a)newdream.net> wrote:

On Thu, 5 Mar 2020, Dan van der Ster wrote:

Hi Sage, On Thu, Mar 5, 2020 at 3:22 PM Sage Weil <sage(a)newdream.net> wrote: > > On Thu, 5 Mar 2020, Dan van der Ster wrote: > > Hi all, > > > > There's something broken in our env when we try to add new mons to > > existing clusters, confirmed on two clusters running mimic and > > nautilus. It's basically this issue > > https://tracker.ceph.com/issues/42830 > > > > In case something is wrong with our puppet manifests, I'm trying to > > doing it manually. > > > > First we --mkfs the mon and start it, but as soon as the new mon > > starts synchronizing, the existing leader becomes unresponsive and an > > election is triggered. > > > > Here's exactly what I'm doing: > > > > # cd /var/lib/ceph/tmp/ > > # scp cephmon1:/var/lib/ceph/tmp/keyring.mon.cephmon1 keyring.mon.cephmon4 > > # ceph mon getmap -o monmap > > # ceph-mon --mkfs -i cephmon4 --monmap monmap --keyrin > > keyring.mon.cephmon4 --setuser ceph --setgroup ceph > > # vi /etc/ceph/ceph.conf <add the new mon to ceph.conf like this> > > [mon.cephmon4] > > host = cephmon4 > > mon addr = a.b.c.d:6790 > > # systemctl start ceph-mon@cephmon4 > > > > The log file on the new mon shows it start synchronizing, then > > immediately the CPU usage on the leader goes to 100% and elections > > start happening, and ceph health shows mon slow ops. perf top of the > > ceph-mon with 100% CPU is shown below [1]. > > On a small nautilus cluster, the new mon gets added withing a minute > > or so (but not cleanly -- the leader is unresponsive for quite awhile > > until the new mon joins). debug_mon=20 on the leader doesn't show > > anything very interesting. > > On our large mimic cluster we tried waiting more than 10 minutes -- > > suffering through several mon elections and 100% usage bouncing around > > between leaders -- until we gave up. > > > > I'm pulling my hair out a bit on this -- it's really weird! > > Can you try running a rocksdb compaction on the existing mons before > adding the new one and see if that helps? It doesn't help. I compacted the 3 mons in quorum then started a new one with debug mon & paxos = 20. ceph-post-file: 9867d4ef-38cc-4ae7-9631-c6b86e666683 I stopped that new mon as soon as the sync source started spinning 100% and left the quorum.

Can you include the log from teh sync source too? That's presumably where the bug is.

Aha, I knew this sounded familiar! See https://github.com/ceph/ceph/pull/31581 We should backport this for the next nautilus...

Perfect.. thanks!! -- dan > sage > > > > > > Thanks! > > > > -- dan > > > > > > > > Thanks! > > > sage > > > > > > > > > > > -- Dan > > > > > > > > > > > > > > > > > > s > > > > > > > > > > > > > > > > > Did anyone add a new mon to an existing large cluster recently, and it > > > > > > went smoothly? > > > > > > > > > > > > Cheers, Dan > > > > > > > > > > > > [1] > > > > > > > > > > > > 15.12% ceph-mon [.] > > > > > > MonitorDBStore::Transaction::encode > > > > > > 8.95% libceph-common.so.0 [.] > > > > > > ceph::buffer::v14_2_0::ptr::append > > > > > > 8.68% libceph-common.so.0 [.] > > > > > > ceph::buffer::v14_2_0::list::append > > > > > > 7.69% libceph-common.so.0 [.] > > > > > > ceph::buffer::v14_2_0::ptr::release > > > > > > 5.86% libceph-common.so.0 [.] > > > > > > ceph::buffer::v14_2_0::ptr::ptr > > > > > > _______________________________________________ > > > > > > ceph-users mailing list -- ceph-users(a)ceph.io > > > > > > To unsubscribe send an email to ceph-users-leave(a)ceph.io > > > > > > > > > > > > > > > > > > > > > > > >

Dan van der Ster

2:12 a.m.

On Thu, Mar 5, 2020 at 8:07 PM Dan van der Ster <dan(a)vanderster.com> wrote:

...

On Thu, Mar 5, 2020 at 8:05 PM Sage Weil <sage(a)newdream.net> wrote:

On Thu, 5 Mar 2020, Dan van der Ster wrote:

On Thu, Mar 5, 2020 at 4:42 PM Sage Weil <sage(a)newdream.net> wrote:

On Thu, 5 Mar 2020, Dan van der Ster wrote: > Hi Sage, > > On Thu, Mar 5, 2020 at 3:22 PM Sage Weil <sage(a)newdream.net> wrote: > > > > On Thu, 5 Mar 2020, Dan van der Ster wrote: > > > Hi all, > > > > > > There's something broken in our env when we try to add new mons to > > > existing clusters, confirmed on two clusters running mimic and > > > nautilus. It's basically this issue > > > https://tracker.ceph.com/issues/42830 > > > > > > In case something is wrong with our puppet manifests, I'm trying to > > > doing it manually. > > > > > > First we --mkfs the mon and start it, but as soon as the new mon > > > starts synchronizing, the existing leader becomes unresponsive and an > > > election is triggered. > > > > > > Here's exactly what I'm doing: > > > > > > # cd /var/lib/ceph/tmp/ > > > # scp cephmon1:/var/lib/ceph/tmp/keyring.mon.cephmon1 keyring.mon.cephmon4 > > > # ceph mon getmap -o monmap > > > # ceph-mon --mkfs -i cephmon4 --monmap monmap --keyrin > > > keyring.mon.cephmon4 --setuser ceph --setgroup ceph > > > # vi /etc/ceph/ceph.conf <add the new mon to ceph.conf like this> > > > [mon.cephmon4] > > > host = cephmon4 > > > mon addr = a.b.c.d:6790 > > > # systemctl start ceph-mon@cephmon4 > > > > > > The log file on the new mon shows it start synchronizing, then > > > immediately the CPU usage on the leader goes to 100% and elections > > > start happening, and ceph health shows mon slow ops. perf top of the > > > ceph-mon with 100% CPU is shown below [1]. > > > On a small nautilus cluster, the new mon gets added withing a minute > > > or so (but not cleanly -- the leader is unresponsive for quite awhile > > > until the new mon joins). debug_mon=20 on the leader doesn't show > > > anything very interesting. > > > On our large mimic cluster we tried waiting more than 10 minutes -- > > > suffering through several mon elections and 100% usage bouncing around > > > between leaders -- until we gave up. > > > > > > I'm pulling my hair out a bit on this -- it's really weird! > > > > Can you try running a rocksdb compaction on the existing mons before > > adding the new one and see if that helps? > > It doesn't help. I compacted the 3 mons in quorum then started a new > one with debug mon & paxos = 20. > > ceph-post-file: 9867d4ef-38cc-4ae7-9631-c6b86e666683 > > I stopped that new mon as soon as the sync source started spinning > 100% and left the quorum. Can you include the log from teh sync source too? That's presumably where the bug is.

Aha, I knew this sounded familiar! See https://github.com/ceph/ceph/pull/31581 We should backport this for the next nautilus...

Perfect.. thanks!!

Sage, do you think I can workaround by setting mon_sync_max_payload_size ridiculously small, like 1024 or something like that? -- dan

Sage Weil

2:19 a.m.

On Thu, 5 Mar 2020, Dan van der Ster wrote:

...

On Thu, Mar 5, 2020 at 8:07 PM Dan van der Ster <dan(a)vanderster.com> wrote:

On Thu, Mar 5, 2020 at 8:05 PM Sage Weil <sage(a)newdream.net> wrote:

On Thu, 5 Mar 2020, Dan van der Ster wrote:

On Thu, Mar 5, 2020 at 4:42 PM Sage Weil <sage(a)newdream.net> wrote: > > On Thu, 5 Mar 2020, Dan van der Ster wrote: > > Hi Sage, > > > > On Thu, Mar 5, 2020 at 3:22 PM Sage Weil <sage(a)newdream.net> wrote: > > > > > > On Thu, 5 Mar 2020, Dan van der Ster wrote: > > > > Hi all, > > > > > > > > There's something broken in our env when we try to add new mons to > > > > existing clusters, confirmed on two clusters running mimic and > > > > nautilus. It's basically this issue > > > > https://tracker.ceph.com/issues/42830 > > > > > > > > In case something is wrong with our puppet manifests, I'm trying to > > > > doing it manually. > > > > > > > > First we --mkfs the mon and start it, but as soon as the new mon > > > > starts synchronizing, the existing leader becomes unresponsive and an > > > > election is triggered. > > > > > > > > Here's exactly what I'm doing: > > > > > > > > # cd /var/lib/ceph/tmp/ > > > > # scp cephmon1:/var/lib/ceph/tmp/keyring.mon.cephmon1 keyring.mon.cephmon4 > > > > # ceph mon getmap -o monmap > > > > # ceph-mon --mkfs -i cephmon4 --monmap monmap --keyrin > > > > keyring.mon.cephmon4 --setuser ceph --setgroup ceph > > > > # vi /etc/ceph/ceph.conf <add the new mon to ceph.conf like this> > > > > [mon.cephmon4] > > > > host = cephmon4 > > > > mon addr = a.b.c.d:6790 > > > > # systemctl start ceph-mon@cephmon4 > > > > > > > > The log file on the new mon shows it start synchronizing, then > > > > immediately the CPU usage on the leader goes to 100% and elections > > > > start happening, and ceph health shows mon slow ops. perf top of the > > > > ceph-mon with 100% CPU is shown below [1]. > > > > On a small nautilus cluster, the new mon gets added withing a minute > > > > or so (but not cleanly -- the leader is unresponsive for quite awhile > > > > until the new mon joins). debug_mon=20 on the leader doesn't show > > > > anything very interesting. > > > > On our large mimic cluster we tried waiting more than 10 minutes -- > > > > suffering through several mon elections and 100% usage bouncing around > > > > between leaders -- until we gave up. > > > > > > > > I'm pulling my hair out a bit on this -- it's really weird! > > > > > > Can you try running a rocksdb compaction on the existing mons before > > > adding the new one and see if that helps? > > > > It doesn't help. I compacted the 3 mons in quorum then started a new > > one with debug mon & paxos = 20. > > > > ceph-post-file: 9867d4ef-38cc-4ae7-9631-c6b86e666683 > > > > I stopped that new mon as soon as the sync source started spinning > > 100% and left the quorum. > > Can you include the log from teh sync source too? That's presumably where > the bug is. Here's a different new mon and the leader, with debug_paxos & mon = 20: ceph-post-file: 8db3d788-e266-4034-9d0c-4ee55eb1d055 Things start to go wrong at this line: 2020-03-05 19:37:35.697 7f5fe87e2700 10 mon.p05517715y58557@0(leader) e32 handle_sync mon_sync(get_chunk cookie 170322296835) v2 ...which is the just before it tries to sync osd_snap. I also included the output of ceph-monstore-tool dump-keys. There are really a lot of osd_snap keys!

Aha, I knew this sounded familiar! See https://github.com/ceph/ceph/pull/31581 We should backport this for the next nautilus...

Perfect.. thanks!!

Sage, do you think I can workaround by setting mon_sync_max_payload_size ridiculously small, like 1024 or something like that?

Yeah... IIRC that is how the original user worked around the problem. I think they use 64 or 128 KB. sage

Dan van der Ster

2:32 a.m.

On Thu, Mar 5, 2020 at 8:19 PM Sage Weil <sage(a)newdream.net> wrote:

...

On Thu, 5 Mar 2020, Dan van der Ster wrote:

On Thu, Mar 5, 2020 at 8:07 PM Dan van der Ster <dan(a)vanderster.com> wrote:

On Thu, Mar 5, 2020 at 8:05 PM Sage Weil <sage(a)newdream.net> wrote:

On Thu, 5 Mar 2020, Dan van der Ster wrote: > On Thu, Mar 5, 2020 at 4:42 PM Sage Weil <sage(a)newdream.net> wrote: > > > > On Thu, 5 Mar 2020, Dan van der Ster wrote: > > > Hi Sage, > > > > > > On Thu, Mar 5, 2020 at 3:22 PM Sage Weil <sage(a)newdream.net> wrote: > > > > > > > > On Thu, 5 Mar 2020, Dan van der Ster wrote: > > > > > Hi all, > > > > > > > > > > There's something broken in our env when we try to add new mons to > > > > > existing clusters, confirmed on two clusters running mimic and > > > > > nautilus. It's basically this issue > > > > > https://tracker.ceph.com/issues/42830 > > > > > > > > > > In case something is wrong with our puppet manifests, I'm trying to > > > > > doing it manually. > > > > > > > > > > First we --mkfs the mon and start it, but as soon as the new mon > > > > > starts synchronizing, the existing leader becomes unresponsive and an > > > > > election is triggered. > > > > > > > > > > Here's exactly what I'm doing: > > > > > > > > > > # cd /var/lib/ceph/tmp/ > > > > > # scp cephmon1:/var/lib/ceph/tmp/keyring.mon.cephmon1 keyring.mon.cephmon4 > > > > > # ceph mon getmap -o monmap > > > > > # ceph-mon --mkfs -i cephmon4 --monmap monmap --keyrin > > > > > keyring.mon.cephmon4 --setuser ceph --setgroup ceph > > > > > # vi /etc/ceph/ceph.conf <add the new mon to ceph.conf like this> > > > > > [mon.cephmon4] > > > > > host = cephmon4 > > > > > mon addr = a.b.c.d:6790 > > > > > # systemctl start ceph-mon@cephmon4 > > > > > > > > > > The log file on the new mon shows it start synchronizing, then > > > > > immediately the CPU usage on the leader goes to 100% and elections > > > > > start happening, and ceph health shows mon slow ops. perf top of the > > > > > ceph-mon with 100% CPU is shown below [1]. > > > > > On a small nautilus cluster, the new mon gets added withing a minute > > > > > or so (but not cleanly -- the leader is unresponsive for quite awhile > > > > > until the new mon joins). debug_mon=20 on the leader doesn't show > > > > > anything very interesting. > > > > > On our large mimic cluster we tried waiting more than 10 minutes -- > > > > > suffering through several mon elections and 100% usage bouncing around > > > > > between leaders -- until we gave up. > > > > > > > > > > I'm pulling my hair out a bit on this -- it's really weird! > > > > > > > > Can you try running a rocksdb compaction on the existing mons before > > > > adding the new one and see if that helps? > > > > > > It doesn't help. I compacted the 3 mons in quorum then started a new > > > one with debug mon & paxos = 20. > > > > > > ceph-post-file: 9867d4ef-38cc-4ae7-9631-c6b86e666683 > > > > > > I stopped that new mon as soon as the sync source started spinning > > > 100% and left the quorum. > > > > Can you include the log from teh sync source too? That's presumably where > > the bug is. > > Here's a different new mon and the leader, with debug_paxos & mon = 20: > > ceph-post-file: 8db3d788-e266-4034-9d0c-4ee55eb1d055 > > Things start to go wrong at this line: > > 2020-03-05 19:37:35.697 7f5fe87e2700 10 mon.p05517715y58557@0(leader) > e32 handle_sync mon_sync(get_chunk cookie 170322296835) v2 > > ...which is the just before it tries to sync osd_snap. > > I also included the output of ceph-monstore-tool dump-keys. There are > really a lot of osd_snap keys! Aha, I knew this sounded familiar! See https://github.com/ceph/ceph/pull/31581 We should backport this for the next nautilus...

Perfect.. thanks!!

Sage, do you think I can workaround by setting mon_sync_max_payload_size ridiculously small, like 1024 or something like that?

Yeah... IIRC that is how the original user worked around the problem. I think they use 64 or 128 KB.

Nice... 64kB still triggered elections but 4kB worked. I have 5 mons again! -- dan > > sage > >

Anthony D'Atri

5:43 a.m.

...

Sage, do you think I can workaround by setting mon_sync_max_payload_size ridiculously small, like 1024 or something like that?

Yeah... IIRC that is how the original user worked around the problem. I think they use 64 or 128 KB.

Nice... 64kB still triggered elections but 4kB worked. I have 5 mons again!

I had an experience on 12.2.2 (Luminous) that seems as though it may have been related. * mon02, not the lead, crashed with a DIMM error and rebooted. The cluster rode it out just fine * The next day mon02 was taken down gracefully to address the DIMM issue. The lead mon’s memory footprint spiked and an election storm commenced. * IIRC recovery required bringing back the second mon and restarting ceph-mon on the lead

1520

days inactive

1520

days old

ceph-users@ceph.io

Manage subscription

12 comments

4 participants

tags (0)

participants (4)

Anthony D'Atri
Dan van der Ster
Sage Weil
Wido den Hollander