[ceph-users] Re: 14.2.20: Strange monitor problem eating 100% CPU

4 May 2021

On Tue, May 4, 2021 at 4:21 PM Janne Johansson &lt;icepic.dz(a)gmail.com&gt; wrote:
...

 Den tis 4 maj 2021 kl 16:10 skrev Rainer Krienke &lt;krienke(a)uni-koblenz.de&gt;de>:
  Hello,
 I am playing around with a test ceph 14.2.20 cluster. The cluster
 consists of 4 VMs, each VM has 2 OSDs. The first three VMs vceph1,
 vceph2 and vceph3 are monitors. vceph1 is also mgr.
 What I did was quite simple. The cluster is in the state HEALTHY:
 vceph2: systemctl stop ceph-osd@2
 # let ceph repair until ceph -s reports cluster is healthy again
 vceph2: systemctl start ceph-osd@2  # @ 15:39:15, for the logs
 # cluster reports in cephs -s that 8 OSDs are up and in, then
 # starts rebalance osd.2
 vceph2:  ceph -s   # hangs forever also if executed on vceph3 or 4
 # mon on vceph1 eats 100% CPU permanently, the other mons ~0 %CPU

 vceph1: systemctl stop ceph-mon@vceph1 # wait ~30 sec to terminate
 vceph1: systemctl start ceph-mon@vceph1 # Everything is OK again

 I posted the mon-log to: https://cloud.uni-koblenz.de/s/t8tWjWFAobZb5Hy

 Strange enough if I set "debug mon 20" before starting the experiment
 this  bug does not show up. I also tried the very same procedure on the
 same cluster updated to 15.2.11 but I was unable to reproduce this bug
 in this ceph version. 
 I might have run into the same issue recently, except not in a test
 but on a live system,
 also running 14.2.20 like you. We have (for other reasons) some flapping OSDs,
 and repairs/backfills take a lot of time, and while we might have had slightly
 less memory on the mons than we should have, they didn't OOM or anything,
 but we found ourselves in the situation where one mon would eat 100% cpu,
 not log anything of value at all, and the two others would be all but idling.

 Restarting the 100%-using mon would finally allow us to get back into the rest
 of the recovery. 
Same question as above -- does your mgr log negative progress at level 4 ?

BTW, if you find that this is indeed what's blocking your mons, you
can workaround by setting `ceph progress off` until the fixes are
released.

-- Dan

2024

2023

2022

2021

2020

2019

[ceph-users] Re: 14.2.20: Strange monitor problem eating 100% CPU