Question about PR merge

List overview All Threads
Download

newer

older

ceph api rgw/role

RGWs stop processing requests...

Erich Weiler

17 Apr 2024 17 Apr '24

8:06 p.m.

Hello, We are tracking PR #56805: https://github.com/ceph/ceph/pull/56805 And the resolution of this item would potentially fix a pervasive and ongoing issue that needs daily attention in our cephfs cluster. I was wondering if it would be included in 18.2.3 which I *think* should be released soon? Is there any way of knowing if that is true? Thanks again, erich

Show replies by date

Patrick Donnelly

18 Apr 18 Apr

5:12 a.m.

On Wed, Apr 17, 2024 at 11:36 AM Erich Weiler <weiler(a)soe.ucsc.edu> wrote:

...

Have you already shared information about this issue? Please do if not.

...

I was wondering if it would be included in 18.2.3 which I *think* should be released soon? Is there any way of knowing if that is true?

This PR is primarily a debugging tool. It will not make 18.2.3 as it's not even merged to main yet. -- Patrick Donnelly, Ph.D. He / Him / His Red Hat Partner Engineer IBM, Inc. GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

Erich Weiler

5:27 a.m.

...

Have you already shared information about this issue? Please do if not.

I am working with Xiubo Li and providing debugging information - in progress!

...

I was wondering if it would be included in 18.2.3 which I *think* should be released soon? Is there any way of knowing if that is true?

This PR is primarily a debugging tool. It will not make 18.2.3 as it's not even merged to main yet.

Ah, OK. I hope some solution can be had soon for this item if Xiubo figures it out - it's requiring constant attention to keep my filesystem from hanging, or, the restart MDS daemons multiple times a day to "unstick" the filesystem on random cluster nodes. We think it's due to lock contention/deadlocking. Possibly it's not affecting others as much as me... We have an HPC cluster hammering the filesystem (18.2.1) and the MDS daemons seems to be reporting lock issues pretty frequently while nodes and processes fighting to get file and directory locks, and deadlocking (we think). I'll keep working with Xiubo. -erich

Xiubo Li

6:57 a.m.

On 4/18/24 08:57, Erich Weiler wrote:

...

Have you already shared information about this issue? Please do if not.

I am working with Xiubo Li and providing debugging information - in progress!

From the blocked ops output it very similiar the same issue as Patrick's lock order fixed before. I am still waiting the complete debug logs from Erich. And the lock order PR is under reviewing. - Xiubo

...

I was wondering if it would be included in 18.2.3 which I *think* should be released soon? Is there any way of knowing if that is true?

This PR is primarily a debugging tool. It will not make 18.2.3 as it's not even merged to main yet.

Nigel Williams

10:52 a.m.

Hi Xiubo, Is the issue we provided logs on the same as Erich or is that a third different locking issue? thanks, nigel. On Thu, 18 Apr 2024 at 12:29, Xiubo Li <xiubli(a)redhat.com> wrote:

...

On 4/18/24 08:57, Erich Weiler wrote:

Have you already shared information about this issue? Please do if not.

I am working with Xiubo Li and providing debugging information - in progress!

I was wondering if it would be included in 18.2.3 which I *think* should be released soon? Is there any way of knowing if that is true?

This PR is primarily a debugging tool. It will not make 18.2.3 as it's not even merged to main yet.

_______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

Xiubo Li

10:59 a.m.

Hi Nigel, The logs you provide is totally a different issue, it's deadlock between two MDSs for a rename request. I will continue work on it today and tomorrow. While Erich's is mostly like the lock order issue as I mentioned in the previous mails, but still waiting the debug logs to confirm it. Thanks - Xiubo On 4/18/24 14:22, Nigel Williams wrote: > Hi Xiubo, > > Is the issue we provided logs on the same as Erich or is that a third > different locking issue? > > thanks, > nigel. > > On Thu, 18 Apr 2024 at 12:29, Xiubo Li <xiubli(a)redhat.com> wrote: > > > On 4/18/24 08:57, Erich Weiler wrote: > >> Have you already shared information about this issue? Please do > if not. > > > > I am working with Xiubo Li and providing debugging information - in > > progress! > > > From the blocked ops output it very similiar the same issue as > Patrick's lock order fixed before. > > I am still waiting the complete debug logs from Erich. > > And the lock order PR is under reviewing. > > - Xiubo > > > >>> I was > >>> wondering if it would be included in 18.2.3 which I *think* > should be > >>> released soon? Is there any way of knowing if that is true? > >> > >> This PR is primarily a debugging tool. It will not make 18.2.3 > as it's > >> not even merged to main yet. > > > > Ah, OK. I hope some solution can be had soon for this item if > Xiubo > > figures it out - it's requiring constant attention to keep my > > filesystem from hanging, or, the restart MDS daemons multiple > times a > > day to "unstick" the filesystem on random cluster nodes. We > think it's > > due to lock contention/deadlocking. > > > > Possibly it's not affecting others as much as me... We have an HPC > > cluster hammering the filesystem (18.2.1) and the MDS daemons > seems to > > be reporting lock issues pretty frequently while nodes and > processes > > fighting to get file and directory locks, and deadlocking (we > think). > > > > I'll keep working with Xiubo. > > > > -erich > > _______________________________________________ > > ceph-users mailing list -- ceph-users(a)ceph.io > > To unsubscribe send an email to ceph-users-leave(a)ceph.io > > > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io >

Xiubo Li

23 Apr 23 Apr

8:51 a.m.

Hi Nigel, For your issue I created a dedicated tracker, please see https://tracker.ceph.com/issues/65630. I have found the root cause and I am still trying to find the proper way to fix it. Please watch the tracker. Thanks - Xiubo On 4/18/24 14:22, Nigel Williams wrote: > Hi Xiubo, > > Is the issue we provided logs on the same as Erich or is that a third > different locking issue? > > thanks, > nigel. > > On Thu, 18 Apr 2024 at 12:29, Xiubo Li <xiubli(a)redhat.com> wrote: > > > On 4/18/24 08:57, Erich Weiler wrote: > >> Have you already shared information about this issue? Please do > if not. > > > > I am working with Xiubo Li and providing debugging information - in > > progress! > > > From the blocked ops output it very similiar the same issue as > Patrick's lock order fixed before. > > I am still waiting the complete debug logs from Erich. > > And the lock order PR is under reviewing. > > - Xiubo > > > >>> I was > >>> wondering if it would be included in 18.2.3 which I *think* > should be > >>> released soon? Is there any way of knowing if that is true? > >> > >> This PR is primarily a debugging tool. It will not make 18.2.3 > as it's > >> not even merged to main yet. > > > > Ah, OK. I hope some solution can be had soon for this item if > Xiubo > > figures it out - it's requiring constant attention to keep my > > filesystem from hanging, or, the restart MDS daemons multiple > times a > > day to "unstick" the filesystem on random cluster nodes. We > think it's > > due to lock contention/deadlocking. > > > > Possibly it's not affecting others as much as me... We have an HPC > > cluster hammering the filesystem (18.2.1) and the MDS daemons > seems to > > be reporting lock issues pretty frequently while nodes and > processes > > fighting to get file and directory locks, and deadlocking (we > think). > > > > I'll keep working with Xiubo. > > > > -erich > > _______________________________________________ > > ceph-users mailing list -- ceph-users(a)ceph.io > > To unsubscribe send an email to ceph-users-leave(a)ceph.io > > > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io >

days inactive

days old

ceph-users@ceph.io

Manage subscription

6 comments

4 participants

tags (0)

participants (4)

Erich Weiler
Nigel Williams
Patrick Donnelly
Xiubo Li