[ceph-users] Re: MDS Behind on Trimming...

8 Apr 2024

Hi Xiubo,

...
  Thanks for your logs, and it should be the same issue
with 
 https://tracker.ceph.com/issues/62052, could you try to test with this 
 fix again ? 
This sounds good - but I'm not clear on what I should do?  I see a patch 
in that tracker page, is that what you are referring to?  If so, how 
would I apply such a patch?  Or is there simply a binary update I can 
apply somehow to the MDS server software?

Thanks for helping!

-erich

> Please let me know if you still could see this bug then it should be the 
> locker order bug as https://tracker.ceph.com/issues/62123.
> 
> Thanks
> 
> - Xiubo
> 
> 
> On 3/28/24 04:03, Erich Weiler wrote:
>> Hi All,
>>
>> I've been battling this for a while and I'm not sure where to go from 
>> here.  I have a Ceph health warning as such:
>>
>> # ceph -s
>>   cluster:
>>     id:     58bde08a-d7ed-11ee-9098-506b4b4da440
>>     health: HEALTH_WARN
>>             1 MDSs report slow requests
>>             1 MDSs behind on trimming
>>
>>   services:
>>     mon: 5 daemons, quorum 
>> pr-md-01,pr-md-02,pr-store-01,pr-store-02,pr-md-03 (age 5d)
>>     mgr: pr-md-01.jemmdf(active, since 3w), standbys: pr-md-02.emffhz
>>     mds: 1/1 daemons up, 2 standby
>>     osd: 46 osds: 46 up (since 9h), 46 in (since 2w)
>>
>>   data:
>>     volumes: 1/1 healthy
>>     pools:   4 pools, 1313 pgs
>>     objects: 260.72M objects, 466 TiB
>>     usage:   704 TiB used, 424 TiB / 1.1 PiB avail
>>     pgs:     1306 active+clean
>>              4    active+clean+scrubbing+deep
>>              3    active+clean+scrubbing
>>
>>   io:
>>     client:   123 MiB/s rd, 75 MiB/s wr, 109 op/s rd, 1.40k op/s wr
>>
>> And the specifics are:
>>
>> # ceph health detail
>> HEALTH_WARN 1 MDSs report slow requests; 1 MDSs behind on trimming
>> [WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests
>>     mds.slugfs.pr-md-01.xdtppo(mds.0): 99 slow requests are blocked > 
>> 30 secs
>> [WRN] MDS_TRIM: 1 MDSs behind on trimming
>>     mds.slugfs.pr-md-01.xdtppo(mds.0): Behind on trimming (13884/250) 
>> max_segments: 250, num_segments: 13884
>>
>> That "num_segments" number slowly keeps increasing.  I suspect I just 
>> need to tell the MDS servers to trim faster but after hours of 
>> googling around I just can't figure out the best way to do it. The 
>> best I could come up with was to decrease "mds_cache_trim_decay_rate" 
>> from 1.0 to .8 (to start), based on this page:
>>
>> https://www.suse.com/support/kb/doc/?id=000019740
>>
>> But it doesn't seem to help, maybe I should decrease it further? I am 
>> guessing this must be a common issue...?  I am running Reef on the MDS 
>> servers, but most clients are on Quincy.
>>
>> Thanks for any advice!
>>
>> cheers,
>> erich
>> _______________________________________________
>> ceph-users mailing list -- ceph-users(a)ceph.io
>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>
> 

2024

2023

2022

2021

2020

2019

[ceph-users] Re: MDS Behind on Trimming...