[ceph-users] Re: MON slow ops and growing MON store

18 Mar 2021

We just had the same problem again after a power outage that took out 
62% of our cluster and three out of five MONs. Once everything was back 
up, the MONs started lagging and piling up slow ops while to MON store 
was growing to double-digit gigabytes. It was so bad that I couldn't 
even list the flying ops anymore, because ceph daemon mon.XXX ops did 
not return at all.

Like last time, after I restarted all five MONs, the store size 
decreased and everything went back to normal. I also had to restart MGRs 
and MDSs afterwards. This starts looking like a bug to me.

Janek

On 26/02/2021 15:24, Janek Bevendorff wrote:
...
  Since the full cluster restart and disabling logging
to syslog, it's 
 not a problem any more (for now).

 Unfortunately, just disabling clog_to_monitors didn't have the wanted 
 effect when I tried it yesterday. But I also believe that it is 
 somehow related. I could not find any specific reason for the incident 
 yesterday in the logs besides a few more RocksDB status and compact 
 messages than usual, but that's more symptomatic.

 On 26/02/2021 13:05, Mykola Golub wrote:
> On Thu, Feb 25, 2021 at 08:58:01PM +0100, Janek Bevendorff wrote:
>
>> On the first MON, the command doesn’t even return, but I was able to
>> get a dump from the one I restarted most recently. The oldest ops
>> look like this:
>>
>>          {
>>              "description": "log(1000 entries from seq 17876238 at

>> 2021-02-25T15:13:20.306487+0100)",
>>              "initiated_at":
"2021-02-25T20:40:34.698932+0100",
>>              "age": 183.762551121,
>>              "duration": 183.762599201,
> The mon stores cluster log messages in the mon db. You mentioned
> problems with osds flooding with log messages. It looks like related.
>
> If you still observe the db growth you may try temporarily disable
> clog_to_monitors, i.e. set for all osds:
>
>   clog_to_monitors = false
>
> And see if it stops growing after this and if it helps with the slow
> ops (it might make sense to restar mons if some look like get
> stuck). You can apply the config option on the fly (without restarting
> the osds, e.g with injectargs), but when re-enabling back you will
> have to restart the osds to avoid crashes due to this bug [1].
>
> [1] https://tracker.ceph.com/issues/48946
> -- 

Bauhaus-Universität Weimar
Bauhausstr. 9a, R308
99423 Weimar, Germany

Phone: +49 3643 58 3577
www.webis.de

2024

2023

2022

2021

2020

2019

[ceph-users] Re: MON slow ops and growing MON store