[ceph-users] Re: MDS Behind on Trimming...

28 Mar 2024

Hello Erich,

Does the workload, by any chance, involve rsync? It is unfortunately
well-known for triggering such issues. A workaround is to export the
directory via NFS and run rsync against the NFS mount instead of
directly against CephFS.

On Fri, Mar 29, 2024 at 4:58 AM Erich Weiler &lt;weiler(a)soe.ucsc.edu&gt; wrote:
...

 MDS logs show:

 Mar 28 13:42:29 pr-md-02.prism ceph-mds[1464328]: log_channel(cluster)
 log [WRN] : 16 slow requests, 0 included below; oldest blocked for >
 3676.400077 secs
 Mar 28 13:42:30 pr-md-02.prism ceph-mds[1464328]:
 mds.slugfs.pr-md-02.sbblqq Updating MDS map to version 22775 from mon.3
 Mar 28 13:42:34 pr-md-02.prism ceph-mds[1464328]: log_channel(cluster)
 log [WRN] : 320 slow requests, 5 included below; oldest blocked for >
 3681.400104 secs
 Mar 28 13:42:34 pr-md-02.prism ceph-mds[1464328]: log_channel(cluster)
 log [WRN] : slow request 3668.805732 seconds old, received at
 2024-03-28T19:41:25.772531+0000: client_request(client.99375:574268
 getattr AsXsFs #0x1000c097307 2024-03-28T19:41:25.770954+0000
 caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
 getattr
 Mar 28 13:42:34 pr-md-02.prism ceph-mds[1464328]: log_channel(cluster)
 log [WRN] : slow request 3667.883853 seconds old, received at
 2024-03-28T19:41:26.694410+0000: client_request(client.99390:374844
 getattr AsXsFs #0x1000c097307 2024-03-28T19:41:26.696172+0000
 caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
 getattr
 Mar 28 13:42:34 pr-md-02.prism ceph-mds[1464328]: log_channel(cluster)
 log [WRN] : slow request 3663.724571 seconds old, received at
 2024-03-28T19:41:30.853692+0000: client_request(client.99390:375258
 getattr AsXsFs #0x1000c097307 2024-03-28T19:41:30.852166+0000
 caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
 getattr
 Mar 28 13:42:34 pr-md-02.prism ceph-mds[1464328]: log_channel(cluster)
 log [WRN] : slow request 3681.399582 seconds old, received at
 2024-03-28T19:41:13.178681+0000: client_request(client.99385:11712080
 getattr AsXsFs #0x1000c097307 2024-03-28T19:41:13.178764+0000
 caller_uid=30150, caller_gid=600{600,608,999,}) currently failed to
 rdlock, waiting
 Mar 28 13:42:34 pr-md-02.prism ceph-mds[1464328]: log_channel(cluster)
 log [WRN] : slow request 3680.508972 seconds old, received at
 2024-03-28T19:41:14.069291+0000: client_request(client.99385:11712556
 getattr AsXsFs #0x1000c097307 2024-03-28T19:41:14.070764+0000
 caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
 getattr

 The client IDs map to several of our cluster nodes but the inode
 reference always refers to the same directory in these recent logs:

 /private/groups/shapirolab/brock/r2/cactus_coord

 That directory does not respond to an 'ls', but other directories
 directly above it do just fine.  Maybe it's a bad cache item on the MDS?

 # ceph health detail
 HEALTH_WARN 2 clients failing to advance oldest client/flush tid; 1 MDSs
 report slow requests; 1 MDSs behind on trimming
 [WRN] MDS_CLIENT_OLDEST_TID: 2 clients failing to advance oldest
 client/flush tid
      mds.slugfs.pr-md-02.sbblqq(mds.0): Client mustard failing to
 advance its oldest client/flush tid.  client_id: 101305
      mds.slugfs.pr-md-01.xdtppo(mds.1): Client  failing to advance its
 oldest client/flush tid.  client_id: 101305
 [WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests
      mds.slugfs.pr-md-02.sbblqq(mds.0): 201 slow requests are blocked >
 30 secs
 [WRN] MDS_TRIM: 1 MDSs behind on trimming
      mds.slugfs.pr-md-02.sbblqq(mds.0): Behind on trimming (4786/250)
 max_segments: 250, num_segments: 4786

 I think that this is somehow causing the "slow requests", on the nodes
 listed in the logs, as that directory in inaccessible.  And maybe the
 'behind on trimming' part is also related, as it can't trim past that
 inode or something?

 If I restart the MDS daemon this will clear (I've done it before).  But
 it just comes back.  Often somewhere in the same directory
 /private/groups/shapirolab/brock/...[something].

 -erich

 On 3/28/24 10:11 AM, Erich Weiler wrote:
  Here are some of the MDS logs:

 Mar 27 11:58:25 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
 log [WRN] : slow request 511.703289 seconds old, received at
 2024-03-27T18:49:53.623192+0000: client_request(client.99375:459393
 getattr AsXsFs #0x100081b9ceb 2024-03-27T18:49:53.620806+0000
 caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
 getattr
 Mar 27 11:58:25 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
 log [WRN] : slow request 690.189459 seconds old, received at
 2024-03-27T18:46:55.137022+0000: client_request(client.99445:4189994
 getattr AsXsFs #0x100081b9ceb 2024-03-27T18:46:55.134857+0000
 caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
 getattr
 Mar 27 11:58:25 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
 log [WRN] : slow request 686.308604 seconds old, received at
 2024-03-27T18:46:59.017876+0000: client_request(client.99445:4190508
 getattr AsXsFs #0x100081b9ceb 2024-03-27T18:46:59.018864+0000
 caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
 getattr
 Mar 27 11:58:25 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
 log [WRN] : slow request 686.156943 seconds old, received at
 2024-03-27T18:46:59.169537+0000: client_request(client.99400:591887
 getattr AsXsFs #0x100081b9ceb 2024-03-27T18:46:59.170644+0000
 caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
 getattr
 Mar 27 11:58:26 pr-md-01.prism ceph-mds[1296468]:
 mds.slugfs.pr-md-01.xdtppo Updating MDS map to version 16631 from mon.0
 Mar 27 11:58:30 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
 log [WRN] : 16 slow requests, 0 included below; oldest blocked for >
 699.385743 secs
 Mar 27 11:58:34 pr-md-01.prism ceph-mds[1296468]:
 mds.slugfs.pr-md-01.xdtppo Updating MDS map to version 16632 from mon.0
 Mar 27 11:58:35 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
 log [WRN] : 16 slow requests, 0 included below; oldest blocked for >
 704.385896 secs
 Mar 27 11:58:38 pr-md-01.prism ceph-mds[1296468]:
 mds.slugfs.pr-md-01.xdtppo Updating MDS map to version 16633 from mon.0
 Mar 27 11:58:40 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
 log [WRN] : 16 slow requests, 0 included below; oldest blocked for >
 709.385979 secs
 Mar 27 11:58:42 pr-md-01.prism ceph-mds[1296468]:
 mds.slugfs.pr-md-01.xdtppo Updating MDS map to version 16634 from mon.0
 Mar 27 11:58:45 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
 log [WRN] : 78 slow requests, 5 included below; oldest blocked for >
 714.386040 secs
 Mar 27 11:58:45 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
 log [WRN] : slow request 710.189838 seconds old, received at
 2024-03-27T18:46:55.137022+0000: client_request(client.99445:4189994
 getattr AsXsFs #0x100081b9ceb 2024-03-27T18:46:55.134857+0000
 caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
 getattr
 Mar 27 11:58:45 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
 log [WRN] : slow request 706.308983 seconds old, received at
 2024-03-27T18:46:59.017876+0000: client_request(client.99445:4190508
 getattr AsXsFs #0x100081b9ceb 2024-03-27T18:46:59.018864+0000
 caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
 getattr
 Mar 27 11:58:45 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
 log [WRN] : slow request 706.157322 seconds old, received at
 2024-03-27T18:46:59.169537+0000: client_request(client.99400:591887
 getattr AsXsFs #0x100081b9ceb 2024-03-27T18:46:59.170644+0000
 caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
 getattr
 Mar 27 11:58:45 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
 log [WRN] : slow request 706.086751 seconds old, received at
 2024-03-27T18:46:59.240108+0000: client_request(client.99400:591894
 getattr AsXsFs #0x100081b9ceb 2024-03-27T18:46:59.242644+0000
 caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
 getattr
 Mar 27 11:58:45 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
 log [WRN] : slow request 705.196030 seconds old, received at
 2024-03-27T18:47:00.130829+0000: client_request(client.99400:591985
 getattr AsXsFs #0x100081b9ceb 2024-03-27T18:47:00.130641+0000
 caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
 getattr
 Mar 27 11:58:45 pr-md-01.prism ceph-mds[1296468]:
 mds.slugfs.pr-md-01.xdtppo Updating MDS map to version 16635 from mon.0
 Mar 27 11:58:50 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
 log [WRN] : 16 slow requests, 0 included below; oldest blocked for >
 719.386116 secs
 Mar 27 11:58:53 pr-md-01.prism ceph-mds[1296468]:
 mds.slugfs.pr-md-01.xdtppo Updating MDS map to version 16636 from mon.0
 Mar 27 11:58:55 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
 log [WRN] : 16 slow requests, 0 included below; oldest blocked for >
 724.386184 secs
 Mar 27 11:58:57 pr-md-01.prism ceph-mds[1296468]:
 mds.slugfs.pr-md-01.xdtppo Updating MDS map to version 16637 from mon.0
 Mar 27 11:59:00 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
 log [WRN] : 16 slow requests, 0 included below; oldest blocked for >
 729.386333 secs
 Mar 27 11:59:02 pr-md-01.prism ceph-mds[1296468]:
 mds.slugfs.pr-md-01.xdtppo Updating MDS map to version 16638 from mon.0
 Mar 27 11:59:05 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
 log [WRN] : 53 slow requests, 5 included below; oldest blocked for >
 734.386400 secs
 Mar 27 11:59:05 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
 log [WRN] : slow request 730.190197 seconds old, received at
 2024-03-27T18:46:55.137022+0000: client_request(client.99445:4189994
 getattr AsXsFs #0x100081b9ceb 2024-03-27T18:46:55.134857+0000
 caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
 getattr

 Can we tell which client the slow requests are coming from?  It says
 stuff like "client.99445:4189994" but I don't know how to map that to a
 client...

 Thanks for the response!

 -erich

 On 3/27/24 21:28, Xiubo Li wrote:
>
> On 3/28/24 04:03, Erich Weiler wrote:
>> Hi All,
>>
>> I've been battling this for a while and I'm not sure where to go from
>> here.  I have a Ceph health warning as such:
>>
>> # ceph -s
>>   cluster:
>>     id:     58bde08a-d7ed-11ee-9098-506b4b4da440
>>     health: HEALTH_WARN
>>             1 MDSs report slow requests
>
> There had slow requests. I just suspect the behind on trimming was
> caused by this.
>
> Could you share the logs about the slow requests ? What are they ?
>
> Thanks
>
>
>> 1 MDSs behind on trimming
>>
>>   services:
>>     mon: 5 daemons, quorum
>> pr-md-01,pr-md-02,pr-store-01,pr-store-02,pr-md-03 (age 5d)
>>     mgr: pr-md-01.jemmdf(active, since 3w), standbys: pr-md-02.emffhz
>>     mds: 1/1 daemons up, 2 standby
>>     osd: 46 osds: 46 up (since 9h), 46 in (since 2w)
>>
>>   data:
>>     volumes: 1/1 healthy
>>     pools:   4 pools, 1313 pgs
>>     objects: 260.72M objects, 466 TiB
>>     usage:   704 TiB used, 424 TiB / 1.1 PiB avail
>>     pgs:     1306 active+clean
>>              4    active+clean+scrubbing+deep
>>              3    active+clean+scrubbing
>>
>>   io:
>>     client:   123 MiB/s rd, 75 MiB/s wr, 109 op/s rd, 1.40k op/s wr
>>
>> And the specifics are:
>>
>> # ceph health detail
>> HEALTH_WARN 1 MDSs report slow requests; 1 MDSs behind on trimming
>> [WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests
>>     mds.slugfs.pr-md-01.xdtppo(mds.0): 99 slow requests are blocked >
>> 30 secs
>> [WRN] MDS_TRIM: 1 MDSs behind on trimming
>>     mds.slugfs.pr-md-01.xdtppo(mds.0): Behind on trimming (13884/250)
>> max_segments: 250, num_segments: 13884
>>
>> That "num_segments" number slowly keeps increasing.  I suspect I just
>> need to tell the MDS servers to trim faster but after hours of
>> googling around I just can't figure out the best way to do it. The
>> best I could come up with was to decrease "mds_cache_trim_decay_rate"
>> from 1.0 to .8 (to start), based on this page:
>>
>> https://www.suse.com/support/kb/doc/?id=000019740
>>
>> But it doesn't seem to help, maybe I should decrease it further? I am
>> guessing this must be a common issue...?  I am running Reef on the
>> MDS servers, but most clients are on Quincy.
>>
>> Thanks for any advice!
>>
>> cheers,
>> erich
>> _______________________________________________
>> ceph-users mailing list -- ceph-users(a)ceph.io
>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>
>  _______________________________________________
 ceph-users mailing list -- ceph-users(a)ceph.io
 To unsubscribe send an email to ceph-users-leave(a)ceph.io 

-- 
Alexander E. Patrakov

2024

2023

2022

2021

2020

2019

[ceph-users] Re: MDS Behind on Trimming...