Could there be an issue with the fact that the servers (MDS, MGR, MON,
OSD) are running reef and all the clients are running quincy?
I can easily enough get the new reef repo in for all our clients (Ubuntu
22.04) and upgrade the clients to reef if that might help..?
On 3/28/24 3:05 PM, Erich Weiler wrote:
> I asked the user and they said no, no rsync involved. Although I
> rsync'd 500TB into this filesystem in the beginning without incident, so
> hopefully it's not a big deal here.
>
> I'm asking the user what their workflow does to try and pin this down.
>
> Are there any other known reason why a slow request would start on a
> certain inode, then block a bunch of cache segments behind it, until the
> MDS is restarted?
>
> Once I restart the MDS daemon that is slow, it shows the cache segments
> transfer to the other MDS server and very quickly drop to zero, then
> everything is healthy again, the stuck directory in question responds
> again and all is well. Then a few hours later it started happening
> again (not always the same directory).
>
> I hope I'm not experiencing a bug, but I can't see what would be causing
> this...
>
> On 3/28/24 2:37 PM, Alexander E. Patrakov wrote:
>> Hello Erich,
>>
>> Does the workload, by any chance, involve rsync? It is unfortunately
>> well-known for triggering such issues. A workaround is to export the
>> directory via NFS and run rsync against the NFS mount instead of
>> directly against CephFS.
>>
>> On Fri, Mar 29, 2024 at 4:58 AM Erich Weiler <weiler(a)soe.ucsc.edu> wrote:
>>>
>>> MDS logs show:
>>>
>>> Mar 28 13:42:29 pr-md-02.prism ceph-mds[1464328]: log_channel(cluster)
>>> log [WRN] : 16 slow requests, 0 included below; oldest blocked for >
>>> 3676.400077 secs
>>> Mar 28 13:42:30 pr-md-02.prism ceph-mds[1464328]:
>>> mds.slugfs.pr-md-02.sbblqq Updating MDS map to version 22775 from mon.3
>>> Mar 28 13:42:34 pr-md-02.prism ceph-mds[1464328]: log_channel(cluster)
>>> log [WRN] : 320 slow requests, 5 included below; oldest blocked for >
>>> 3681.400104 secs
>>> Mar 28 13:42:34 pr-md-02.prism ceph-mds[1464328]: log_channel(cluster)
>>> log [WRN] : slow request 3668.805732 seconds old, received at
>>> 2024-03-28T19:41:25.772531+0000: client_request(client.99375:574268
>>> getattr AsXsFs #0x1000c097307 2024-03-28T19:41:25.770954+0000
>>> caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
>>> getattr
>>> Mar 28 13:42:34 pr-md-02.prism ceph-mds[1464328]: log_channel(cluster)
>>> log [WRN] : slow request 3667.883853 seconds old, received at
>>> 2024-03-28T19:41:26.694410+0000: client_request(client.99390:374844
>>> getattr AsXsFs #0x1000c097307 2024-03-28T19:41:26.696172+0000
>>> caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
>>> getattr
>>> Mar 28 13:42:34 pr-md-02.prism ceph-mds[1464328]: log_channel(cluster)
>>> log [WRN] : slow request 3663.724571 seconds old, received at
>>> 2024-03-28T19:41:30.853692+0000: client_request(client.99390:375258
>>> getattr AsXsFs #0x1000c097307 2024-03-28T19:41:30.852166+0000
>>> caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
>>> getattr
>>> Mar 28 13:42:34 pr-md-02.prism ceph-mds[1464328]: log_channel(cluster)
>>> log [WRN] : slow request 3681.399582 seconds old, received at
>>> 2024-03-28T19:41:13.178681+0000: client_request(client.99385:11712080
>>> getattr AsXsFs #0x1000c097307 2024-03-28T19:41:13.178764+0000
>>> caller_uid=30150, caller_gid=600{600,608,999,}) currently failed to
>>> rdlock, waiting
>>> Mar 28 13:42:34 pr-md-02.prism ceph-mds[1464328]: log_channel(cluster)
>>> log [WRN] : slow request 3680.508972 seconds old, received at
>>> 2024-03-28T19:41:14.069291+0000: client_request(client.99385:11712556
>>> getattr AsXsFs #0x1000c097307 2024-03-28T19:41:14.070764+0000
>>> caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
>>> getattr
>>>
>>> The client IDs map to several of our cluster nodes but the inode
>>> reference always refers to the same directory in these recent logs:
>>>
>>> /private/groups/shapirolab/brock/r2/cactus_coord
>>>
>>> That directory does not respond to an 'ls', but other directories
>>> directly above it do just fine. Maybe it's a bad cache item on the MDS?
>>>
>>> # ceph health detail
>>> HEALTH_WARN 2 clients failing to advance oldest client/flush tid; 1 MDSs
>>> report slow requests; 1 MDSs behind on trimming
>>> [WRN] MDS_CLIENT_OLDEST_TID: 2 clients failing to advance oldest
>>> client/flush tid
>>> mds.slugfs.pr-md-02.sbblqq(mds.0): Client mustard failing to
>>> advance its oldest client/flush tid. client_id: 101305
>>> mds.slugfs.pr-md-01.xdtppo(mds.1): Client failing to advance its
>>> oldest client/flush tid. client_id: 101305
>>> [WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests
>>> mds.slugfs.pr-md-02.sbblqq(mds.0): 201 slow requests are blocked >
>>> 30 secs
>>> [WRN] MDS_TRIM: 1 MDSs behind on trimming
>>> mds.slugfs.pr-md-02.sbblqq(mds.0): Behind on trimming (4786/250)
>>> max_segments: 250, num_segments: 4786
>>>
>>> I think that this is somehow causing the "slow requests", on the
nodes
>>> listed in the logs, as that directory in inaccessible. And maybe the
>>> 'behind on trimming' part is also related, as it can't trim past
that
>>> inode or something?
>>>
>>> If I restart the MDS daemon this will clear (I've done it before). But
>>> it just comes back. Often somewhere in the same directory
>>> /private/groups/shapirolab/brock/...[something].
>>>
>>> -erich
>>>
>>> On 3/28/24 10:11 AM, Erich Weiler wrote:
>>>> Here are some of the MDS logs:
>>>>
>>>> Mar 27 11:58:25 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
>>>> log [WRN] : slow request 511.703289 seconds old, received at
>>>> 2024-03-27T18:49:53.623192+0000: client_request(client.99375:459393
>>>> getattr AsXsFs #0x100081b9ceb 2024-03-27T18:49:53.620806+0000
>>>> caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
>>>> getattr
>>>> Mar 27 11:58:25 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
>>>> log [WRN] : slow request 690.189459 seconds old, received at
>>>> 2024-03-27T18:46:55.137022+0000: client_request(client.99445:4189994
>>>> getattr AsXsFs #0x100081b9ceb 2024-03-27T18:46:55.134857+0000
>>>> caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
>>>> getattr
>>>> Mar 27 11:58:25 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
>>>> log [WRN] : slow request 686.308604 seconds old, received at
>>>> 2024-03-27T18:46:59.017876+0000: client_request(client.99445:4190508
>>>> getattr AsXsFs #0x100081b9ceb 2024-03-27T18:46:59.018864+0000
>>>> caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
>>>> getattr
>>>> Mar 27 11:58:25 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
>>>> log [WRN] : slow request 686.156943 seconds old, received at
>>>> 2024-03-27T18:46:59.169537+0000: client_request(client.99400:591887
>>>> getattr AsXsFs #0x100081b9ceb 2024-03-27T18:46:59.170644+0000
>>>> caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
>>>> getattr
>>>> Mar 27 11:58:26 pr-md-01.prism ceph-mds[1296468]:
>>>> mds.slugfs.pr-md-01.xdtppo Updating MDS map to version 16631 from mon.0
>>>> Mar 27 11:58:30 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
>>>> log [WRN] : 16 slow requests, 0 included below; oldest blocked for >
>>>> 699.385743 secs
>>>> Mar 27 11:58:34 pr-md-01.prism ceph-mds[1296468]:
>>>> mds.slugfs.pr-md-01.xdtppo Updating MDS map to version 16632 from mon.0
>>>> Mar 27 11:58:35 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
>>>> log [WRN] : 16 slow requests, 0 included below; oldest blocked for >
>>>> 704.385896 secs
>>>> Mar 27 11:58:38 pr-md-01.prism ceph-mds[1296468]:
>>>> mds.slugfs.pr-md-01.xdtppo Updating MDS map to version 16633 from mon.0
>>>> Mar 27 11:58:40 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
>>>> log [WRN] : 16 slow requests, 0 included below; oldest blocked for >
>>>> 709.385979 secs
>>>> Mar 27 11:58:42 pr-md-01.prism ceph-mds[1296468]:
>>>> mds.slugfs.pr-md-01.xdtppo Updating MDS map to version 16634 from mon.0
>>>> Mar 27 11:58:45 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
>>>> log [WRN] : 78 slow requests, 5 included below; oldest blocked for >
>>>> 714.386040 secs
>>>> Mar 27 11:58:45 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
>>>> log [WRN] : slow request 710.189838 seconds old, received at
>>>> 2024-03-27T18:46:55.137022+0000: client_request(client.99445:4189994
>>>> getattr AsXsFs #0x100081b9ceb 2024-03-27T18:46:55.134857+0000
>>>> caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
>>>> getattr
>>>> Mar 27 11:58:45 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
>>>> log [WRN] : slow request 706.308983 seconds old, received at
>>>> 2024-03-27T18:46:59.017876+0000: client_request(client.99445:4190508
>>>> getattr AsXsFs #0x100081b9ceb 2024-03-27T18:46:59.018864+0000
>>>> caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
>>>> getattr
>>>> Mar 27 11:58:45 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
>>>> log [WRN] : slow request 706.157322 seconds old, received at
>>>> 2024-03-27T18:46:59.169537+0000: client_request(client.99400:591887
>>>> getattr AsXsFs #0x100081b9ceb 2024-03-27T18:46:59.170644+0000
>>>> caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
>>>> getattr
>>>> Mar 27 11:58:45 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
>>>> log [WRN] : slow request 706.086751 seconds old, received at
>>>> 2024-03-27T18:46:59.240108+0000: client_request(client.99400:591894
>>>> getattr AsXsFs #0x100081b9ceb 2024-03-27T18:46:59.242644+0000
>>>> caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
>>>> getattr
>>>> Mar 27 11:58:45 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
>>>> log [WRN] : slow request 705.196030 seconds old, received at
>>>> 2024-03-27T18:47:00.130829+0000: client_request(client.99400:591985
>>>> getattr AsXsFs #0x100081b9ceb 2024-03-27T18:47:00.130641+0000
>>>> caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
>>>> getattr
>>>> Mar 27 11:58:45 pr-md-01.prism ceph-mds[1296468]:
>>>> mds.slugfs.pr-md-01.xdtppo Updating MDS map to version 16635 from mon.0
>>>> Mar 27 11:58:50 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
>>>> log [WRN] : 16 slow requests, 0 included below; oldest blocked for >
>>>> 719.386116 secs
>>>> Mar 27 11:58:53 pr-md-01.prism ceph-mds[1296468]:
>>>> mds.slugfs.pr-md-01.xdtppo Updating MDS map to version 16636 from mon.0
>>>> Mar 27 11:58:55 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
>>>> log [WRN] : 16 slow requests, 0 included below; oldest blocked for >
>>>> 724.386184 secs
>>>> Mar 27 11:58:57 pr-md-01.prism ceph-mds[1296468]:
>>>> mds.slugfs.pr-md-01.xdtppo Updating MDS map to version 16637 from mon.0
>>>> Mar 27 11:59:00 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
>>>> log [WRN] : 16 slow requests, 0 included below; oldest blocked for >
>>>> 729.386333 secs
>>>> Mar 27 11:59:02 pr-md-01.prism ceph-mds[1296468]:
>>>> mds.slugfs.pr-md-01.xdtppo Updating MDS map to version 16638 from mon.0
>>>> Mar 27 11:59:05 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
>>>> log [WRN] : 53 slow requests, 5 included below; oldest blocked for >
>>>> 734.386400 secs
>>>> Mar 27 11:59:05 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
>>>> log [WRN] : slow request 730.190197 seconds old, received at
>>>> 2024-03-27T18:46:55.137022+0000: client_request(client.99445:4189994
>>>> getattr AsXsFs #0x100081b9ceb 2024-03-27T18:46:55.134857+0000
>>>> caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
>>>> getattr
>>>>
>>>> Can we tell which client the slow requests are coming from? It says
>>>> stuff like "client.99445:4189994" but I don't know how to
map that to a
>>>> client...
>>>>
>>>> Thanks for the response!
>>>>
>>>> -erich
>>>>
>>>> On 3/27/24 21:28, Xiubo Li wrote:
>>>>>
>>>>> On 3/28/24 04:03, Erich Weiler wrote:
>>>>>> Hi All,
>>>>>>
>>>>>> I've been battling this for a while and I'm not sure
where to go from
>>>>>> here. I have a Ceph health warning as such:
>>>>>>
>>>>>> # ceph -s
>>>>>> cluster:
>>>>>> id: 58bde08a-d7ed-11ee-9098-506b4b4da440
>>>>>> health: HEALTH_WARN
>>>>>> 1 MDSs report slow requests
>>>>>
>>>>> There had slow requests. I just suspect the behind on trimming was
>>>>> caused by this.
>>>>>
>>>>> Could you share the logs about the slow requests ? What are they ?
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>> 1 MDSs behind on trimming
>>>>>>
>>>>>> services:
>>>>>> mon: 5 daemons, quorum
>>>>>> pr-md-01,pr-md-02,pr-store-01,pr-store-02,pr-md-03 (age 5d)
>>>>>> mgr: pr-md-01.jemmdf(active, since 3w), standbys:
>>>>>> pr-md-02.emffhz
>>>>>> mds: 1/1 daemons up, 2 standby
>>>>>> osd: 46 osds: 46 up (since 9h), 46 in (since 2w)
>>>>>>
>>>>>> data:
>>>>>> volumes: 1/1 healthy
>>>>>> pools: 4 pools, 1313 pgs
>>>>>> objects: 260.72M objects, 466 TiB
>>>>>> usage: 704 TiB used, 424 TiB / 1.1 PiB avail
>>>>>> pgs: 1306 active+clean
>>>>>> 4 active+clean+scrubbing+deep
>>>>>> 3 active+clean+scrubbing
>>>>>>
>>>>>> io:
>>>>>> client: 123 MiB/s rd, 75 MiB/s wr, 109 op/s rd, 1.40k op/s
wr
>>>>>>
>>>>>> And the specifics are:
>>>>>>
>>>>>> # ceph health detail
>>>>>> HEALTH_WARN 1 MDSs report slow requests; 1 MDSs behind on
trimming
>>>>>> [WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests
>>>>>> mds.slugfs.pr-md-01.xdtppo(mds.0): 99 slow requests are
>>>>>> blocked >
>>>>>> 30 secs
>>>>>> [WRN] MDS_TRIM: 1 MDSs behind on trimming
>>>>>> mds.slugfs.pr-md-01.xdtppo(mds.0): Behind on trimming
>>>>>> (13884/250)
>>>>>> max_segments: 250, num_segments: 13884
>>>>>>
>>>>>> That "num_segments" number slowly keeps increasing. I
suspect I just
>>>>>> need to tell the MDS servers to trim faster but after hours of
>>>>>> googling around I just can't figure out the best way to do
it. The
>>>>>> best I could come up with was to decrease
"mds_cache_trim_decay_rate"
>>>>>> from 1.0 to .8 (to start), based on this page:
>>>>>>
>>>>>>
https://www.suse.com/support/kb/doc/?id=000019740
>>>>>>
>>>>>> But it doesn't seem to help, maybe I should decrease it
further? I am
>>>>>> guessing this must be a common issue...? I am running Reef on
the
>>>>>> MDS servers, but most clients are on Quincy.
>>>>>>
>>>>>> Thanks for any advice!
>>>>>>
>>>>>> cheers,
>>>>>> erich
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>>>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>>>>>
>>>>>
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>
>>
>>