New subject: MDS Behind on Trimming...

19 Apr 2024

Hi Xiubo,

Nevermind I was wrong, most the blocked ops were 12 hours old.  Ug.

I restarted the MDS daemon to clear them.

I just reset to having one active MDS instead of two, let's see if that 
makes a difference.

I am beginning to think it may be impossible to catch the logs that 
matter here.  I feel like sometimes the blocked ops are just waiting 
because of load and sometimes they are waiting because they are stuck. 
But, it's really hard to tell which, without waiting a while.  But, I 
can't wait while having debug turned on because my root disks (which are 
150 GB large) fill up with debug logs in 20 minutes.  So it almost seems 
that unless I could somehow store many TB of debug logs we won't be able 
to catch this.

Let's see how having one MDS helps.  Or maybe I actually need like 4 
MDSs because the load is too high for only one or two.  I don't know. 
Or maybe it's the lock issue you've been working on.  I guess I can test 
the lock order fix when it's available to test.

-erich

On 4/19/24 7:26 AM, Erich Weiler wrote:
> So I woke up this morning and checked the blocked_ops again, there were 
> 150 of them.  But the age of each ranged from 500 to 4300 seconds.  So 
> it seems as if they are eventually being processed.
> 
> I wonder if we are thinking about this in the wrong way?  Maybe I should 
> be *adding* MDS daemons because my current ones are overloaded?
> 
> Can a single server hold multiple MDS daemons?  Right now I have three 
> physical servers each with one MDS daemon on it.
> 
> I can still try reducing to one.  And I'll keep an eye on blocked ops to 
> see if any get to a very old age (and are thus wedged).
> 
> -erich
> 
> On 4/18/24 8:55 PM, Xiubo Li wrote:
>> Okay, please try it to set only one active mds.
>>
>>
>> On 4/19/24 11:54, Erich Weiler wrote:
>>> We have 2 active MDS daemons and one standby.
>>>
>>> On 4/18/24 8:52 PM, Xiubo Li wrote:
>>>> BTW, how man active mds you are using ?
>>>>
>>>>
>>>> On 4/19/24 10:55, Erich Weiler wrote:
>>>>> OK, I'm sure I caught it in the right order this time, the logs 
>>>>> should definitely show when the blocked/slow requests start.  Check 
>>>>> out these logs and dumps:
>>>>>
>>>>> http://hgwdev.gi.ucsc.edu/~weiler/
>>>>>
>>>>> It's a 762 MB tarball but it uncompresses to 16 GB.
>>>>>
>>>>> -erichll
>>>>>
>>>>>
>>>>> On 4/18/24 6:57 PM, Xiubo Li wrote:
>>>>>> Okay, could you try this with 18.2.0 ?
>>>>>>
>>>>>> I just double it was introduce by:
>>>>>>
>>>>>> commit e610179a6a59c463eb3d85e87152ed3268c808ff
>>>>>> Author: Patrick Donnelly &lt;pdonnell(a)redhat.com&gt;
>>>>>> Date:   Mon Jul 17 16:10:59 2023 -0400
>>>>>>
>>>>>>      mds: drop locks and retry when lock set changes
>>>>>>
>>>>>>      An optimization was added to avoid an unnecessary gather on

>>>>>> the inode
>>>>>>      filelock when the client can safely get the file size
without 
>>>>>> also
>>>>>>      getting issued the requested caps. However, if a retry of 
>>>>>> getattr
>>>>>>      is necessary, this conditional inclusion of the inode
filelock
>>>>>>      can cause lock-order violations resulting in deadlock.
>>>>>>
>>>>>>      So, if we've already acquired some of the inode's
locks then 
>>>>>> we must
>>>>>>      drop locks and retry.
>>>>>>
>>>>>>      Fixes: https://tracker.ceph.com/issues/62052
>>>>>>      Fixes: c822b3e2573578c288d170d1031672b74e02dced
>>>>>>      Signed-off-by: Patrick Donnelly &lt;pdonnell(a)redhat.com&gt;
>>>>>>      (cherry picked from commit 
>>>>>> b5719ac32fe6431131842d62ffaf7101c03e9bac)
>>>>>>
>>>>>>
>>>>>> On 4/19/24 09:54, Erich Weiler wrote:
>>>>>>> I'm on 18.2.1.  I think I may have gotten the timing off
on the 
>>>>>>> logs and dumps so I'll try again.  Just really hard to
capture 
>>>>>>> because I need to kind of be looking at it in real time to 
>>>>>>> capture it. Hang on, lemme see if I can get another
capture...
>>>>>>>
>>>>>>> -erich
>>>>>>>
>>>>>>> On 4/18/24 6:35 PM, Xiubo Li wrote:
>>>>>>>>
>>>>>>>> BTW, which ceph version you are using ?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 4/12/24 04:22, Erich Weiler wrote:
>>>>>>>>> BTW - it just happened again, I upped the debugging
settings as 
>>>>>>>>> you instructed and got more dumps (then returned the
debug 
>>>>>>>>> settings to normal).
>>>>>>>>>
>>>>>>>>> Attached are the new dumps.
>>>>>>>>>
>>>>>>>>> Thanks again,
>>>>>>>>> erich
>>>>>>>>>
>>>>>>>>> On 4/9/24 9:00 PM, Xiubo Li wrote:
>>>>>>>>>>
>>>>>>>>>> On 4/10/24 11:48, Erich Weiler wrote:
>>>>>>>>>>>>> Dos that mean it could be the locker
order bug 
>>>>>>>>>>>>>
(https://tracker.ceph.com/issues/62123) as Xiubo suggested?
>>>>>>>>>>>>
>>>>>>>>>>>> I have raised one PR to fix the lock
order issue, if 
>>>>>>>>>>>> possible please have a try to see could
it resolve this issue.
>>>>>>>>>>>
>>>>>>>>>>> Thank you!  Yeah, this issue is happening
every couple days 
>>>>>>>>>>> now. It just happened again today and I got
more MDS dumps. 
>>>>>>>>>>> If it would help, let me know and I can send
them!
>>>>>>>>>>>
>>>>>>>>>> Once this happen if you could enable the mds
debug logs will 
>>>>>>>>>> be better:
>>>>>>>>>>
>>>>>>>>>> debug mds = 20
>>>>>>>>>>
>>>>>>>>>> debug ms = 1
>>>>>>>>>>
>>>>>>>>>> And then provide the debug logs together with the
MDS dumps.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> I assume if this fix is approved and
backported it will then 
>>>>>>>>>>> appear in like 18.2.3 or something?
>>>>>>>>>>>
>>>>>>>>>> Yeah, it will be backported after being well
tested.
>>>>>>>>>>
>>>>>>>>>> - Xiubo
>>>>>>>>>>
>>>>>>>>>>> Thanks again,
>>>>>>>>>>> erich
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>

Re: MDS Behind on Trimming...