kernel client osdc ops stuck and mds slow reqs

List overview All Threads
Download

newer

older

MDS stuck in "up:replay"

slow replication of large buckets

Dan van der Ster

31 Jan 2020 31 Jan '20

2:05 a.m.

Hi all, We are quite regularly (a couple times per week) seeing: HEALTH_WARN 1 clients failing to respond to capability release; 1 MDSs report slow requests MDS_CLIENT_LATE_RELEASE 1 clients failing to respond to capability release mdshpc-be143(mds.0): Client hpc-be028.cern.ch: failing to respond to capability release client_id: 52919162 MDS_SLOW_REQUEST 1 MDSs report slow requests mdshpc-be143(mds.0): 1 slow requests are blocked > 30 secs Which is being caused by osdc ops stuck in a kernel client, e.g.: 10:57:18 root hpc-be028 /root → cat /sys/kernel/debug/ceph/4da6fd06-b069-49af-901f-c9513baabdbd.client52919162/osdc REQUESTS 9 homeless 0 46559317 osd243 3.ee6ffcdb 3.cdb [243,501,92]/243 [243,501,92]/243 e678697 fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a01.00000057 0x400014 1 read 46559322 osd243 3.ee6ffcdb 3.cdb [243,501,92]/243 [243,501,92]/243 e678697 fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a01.00000057 0x400014 1 read 46559323 osd243 3.969cc573 3.573 [243,330,226]/243 [243,330,226]/243 e678697 fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a56.00000056 0x400014 1 read 46559341 osd243 3.969cc573 3.573 [243,330,226]/243 [243,330,226]/243 e678697 fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a56.00000056 0x400014 1 read 46559342 osd243 3.969cc573 3.573 [243,330,226]/243 [243,330,226]/243 e678697 fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a56.00000056 0x400014 1 read 46559345 osd243 3.969cc573 3.573 [243,330,226]/243 [243,330,226]/243 e678697 fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a56.00000056 0x400014 1 read 46559621 osd243 3.6313e8ef 3.8ef [243,330,521]/243 [243,330,521]/243 e678697 fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a45.0000007a 0x400014 1 read 46559629 osd243 3.b280c852 3.852 [243,113,539]/243 [243,113,539]/243 e678697 fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a3a.0000007f 0x400014 1 read 46559928 osd243 3.1ee7bab4 3.ab4 [243,332,94]/243 [243,332,94]/243 e678697 fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f099ff.0000073f 0x400024 1 write LINGER REQUESTS BACKOFFS We can unblock those requests by doing `ceph osd down osd.243` (or restarting osd.243). This is ceph v14.2.6 and the client kernel is el7 3.10.0-957.27.2.el7.x86_64. Are there a better way to debug this? Best Regards, Dan

Show replies by date

Ilya Dryomov

31 Jan 31 Jan

2:33 a.m.

On Fri, Jan 31, 2020 at 11:06 AM Dan van der Ster <dan(a)vanderster.com> wrote:

...

Hi Dan, I assume that these ops don't show up as slow requests on the OSD side? How long did you see it stuck for before intervening? Do you happen to have "debug ms = 1" logs from osd243? Do you have PG autoscaler enabled? Any PG splits and/or merges at the time? Thanks, Ilya

Dan van der Ster

7:56 a.m.

Hi Ilya, On Fri, Jan 31, 2020 at 11:33 AM Ilya Dryomov <idryomov(a)gmail.com> wrote:

...

On Fri, Jan 31, 2020 at 11:06 AM Dan van der Ster <dan(a)vanderster.com> wrote:

Hi Dan, I assume that these ops don't show up as slow requests on the OSD side? How long did you see it stuck for before intervening?

That's correct -- the osd had no active ops (ceph daemon.... ops). The late release slow req was stuck for 4129s before we intervened.

...

Do you happen to have "debug ms = 1" logs from osd243?

Nope, but I can try to get it afterwards next time. (Though you need it at the moment the ops get stuck, not only from the moment we notice the stuck ops, right?)

...

Do you have PG autoscaler enabled? Any PG splits and/or merges at the time?

Not on the cephfs_(meta)data pools (though on the 30th I increased those pool sizes from 2 to 3). And also on the 30th I did some PG merging on an unrelated test pool. And anyway we have seen this type of lockup in the past, without those pool changes (also with mimic MDS until we upgraded to nautilus). Looking back further in the client's kernel log we see a page alloc failure on the 30th: Jan 30 16:16:35 hpc-be028.cern.ch kernel: kworker/1:36: page allocation failure: order:5, mode:0x104050 Jan 30 16:16:35 hpc-be028.cern.ch kernel: CPU: 1 PID: 78445 Comm: kworker/1:36 Kdump: loaded Tainted: P Jan 30 16:16:35 hpc-be028.cern.ch kernel: Workqueue: ceph-msgr ceph_con_workfn [libceph] The machine is running hpc jobs, there is memory pressure and likely fragmentation due to hugepages being enabled. (But other instances of stuck ops didn't have any page alloc failures). Here is the client dmesg when we restarted the osd with stuck ops: Jan 31 10:57:59 hpc-be028.cern.ch kernel: libceph: osd243 down Jan 31 10:58:17 hpc-be028.cern.ch kernel: libceph: osd243 up (There was nothing else adjacent in time). Cheers, Dan > > Thanks, > > Ilya

Ilya Dryomov

9:32 a.m.

On Fri, Jan 31, 2020 at 4:57 PM Dan van der Ster <dan(a)vanderster.com> wrote:

...

Hi Ilya, On Fri, Jan 31, 2020 at 11:33 AM Ilya Dryomov <idryomov(a)gmail.com> wrote:

On Fri, Jan 31, 2020 at 11:06 AM Dan van der Ster <dan(a)vanderster.com> wrote:

Hi Dan, I assume that these ops don't show up as slow requests on the OSD side? How long did you see it stuck for before intervening?

That's correct -- the osd had no active ops (ceph daemon.... ops). The late release slow req was stuck for 4129s before we intervened.

Do you happen to have "debug ms = 1" logs from osd243?

Nope, but I can try to get it afterwards next time. (Though you need it at the moment the ops get stuck, not only from the moment we notice the stuck ops, right?)

Yes, starting before the moment the ops get stuck and ending after you kick the OSD.

...

Do you have PG autoscaler enabled? Any PG splits and/or merges at the time?

The MDS is out of question here. This issue is between the kernel client and the OSD.

...

Looking back further in the client's kernel log we see a page alloc failure on the 30th: Jan 30 16:16:35 hpc-be028.cern.ch kernel: kworker/1:36: page allocation failure: order:5, mode:0x104050 Jan 30 16:16:35 hpc-be028.cern.ch kernel: CPU: 1 PID: 78445 Comm: kworker/1:36 Kdump: loaded Tainted: P Jan 30 16:16:35 hpc-be028.cern.ch kernel: Workqueue: ceph-msgr ceph_con_workfn [libceph]

Can you share the stack trace? That's a 128k allocation, so worth taking a look. Thanks, Ilya

Dan van der Ster

3 Feb 3 Feb

1:37 a.m.

On Fri, Jan 31, 2020 at 6:32 PM Ilya Dryomov <idryomov(a)gmail.com> wrote:

...

On Fri, Jan 31, 2020 at 4:57 PM Dan van der Ster <dan(a)vanderster.com> wrote:

Hi Ilya, On Fri, Jan 31, 2020 at 11:33 AM Ilya Dryomov <idryomov(a)gmail.com> wrote:

On Fri, Jan 31, 2020 at 11:06 AM Dan van der Ster <dan(a)vanderster.com> wrote:

Hi Dan, I assume that these ops don't show up as slow requests on the OSD side? How long did you see it stuck for before intervening?

That's correct -- the osd had no active ops (ceph daemon.... ops). The late release slow req was stuck for 4129s before we intervened.

Do you happen to have "debug ms = 1" logs from osd243?

Nope, but I can try to get it afterwards next time. (Though you need it at the moment the ops get stuck, not only from the moment we notice the stuck ops, right?)

Yes, starting before the moment the ops get stuck and ending after you kick the OSD.

Do you have PG autoscaler enabled? Any PG splits and/or merges at the time?

The MDS is out of question here. This issue is between the kernel client and the OSD.

Can you share the stack trace? That's a 128k allocation, so worth taking a look.

Pasted here: https://pastebin.com/neyah54k The same node had a lockup again last night. And we included the page alloc failure and a resulting dump of osdmap "corruption". -- Dan > Thanks, > > Ilya

Ilya Dryomov

2:50 a.m.

On Mon, Feb 3, 2020 at 10:38 AM Dan van der Ster <dan(a)vanderster.com> wrote:

...

On Fri, Jan 31, 2020 at 6:32 PM Ilya Dryomov <idryomov(a)gmail.com> wrote:

On Fri, Jan 31, 2020 at 4:57 PM Dan van der Ster <dan(a)vanderster.com> wrote:

Hi Ilya, On Fri, Jan 31, 2020 at 11:33 AM Ilya Dryomov <idryomov(a)gmail.com> wrote:

On Fri, Jan 31, 2020 at 11:06 AM Dan van der Ster <dan(a)vanderster.com> wrote: > > Hi all, > > We are quite regularly (a couple times per week) seeing: > > HEALTH_WARN 1 clients failing to respond to capability release; 1 MDSs > report slow requests > MDS_CLIENT_LATE_RELEASE 1 clients failing to respond to capability release > mdshpc-be143(mds.0): Client hpc-be028.cern.ch: failing to respond > to capability release client_id: 52919162 > MDS_SLOW_REQUEST 1 MDSs report slow requests > mdshpc-be143(mds.0): 1 slow requests are blocked > 30 secs > > Which is being caused by osdc ops stuck in a kernel client, e.g.: > > 10:57:18 root hpc-be028 /root > → cat /sys/kernel/debug/ceph/4da6fd06-b069-49af-901f-c9513baabdbd.client52919162/osdc > REQUESTS 9 homeless 0 > 46559317 osd243 3.ee6ffcdb 3.cdb [243,501,92]/243 > [243,501,92]/243 e678697 > fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a01.00000057 > 0x400014 1 read > 46559322 osd243 3.ee6ffcdb 3.cdb [243,501,92]/243 > [243,501,92]/243 e678697 > fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a01.00000057 > 0x400014 1 read > 46559323 osd243 3.969cc573 3.573 [243,330,226]/243 > [243,330,226]/243 e678697 > fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a56.00000056 > 0x400014 1 read > 46559341 osd243 3.969cc573 3.573 [243,330,226]/243 > [243,330,226]/243 e678697 > fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a56.00000056 > 0x400014 1 read > 46559342 osd243 3.969cc573 3.573 [243,330,226]/243 > [243,330,226]/243 e678697 > fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a56.00000056 > 0x400014 1 read > 46559345 osd243 3.969cc573 3.573 [243,330,226]/243 > [243,330,226]/243 e678697 > fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a56.00000056 > 0x400014 1 read > 46559621 osd243 3.6313e8ef 3.8ef [243,330,521]/243 > [243,330,521]/243 e678697 > fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a45.0000007a > 0x400014 1 read > 46559629 osd243 3.b280c852 3.852 [243,113,539]/243 > [243,113,539]/243 e678697 > fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a3a.0000007f > 0x400014 1 read > 46559928 osd243 3.1ee7bab4 3.ab4 [243,332,94]/243 > [243,332,94]/243 e678697 > fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f099ff.0000073f > 0x400024 1 write > LINGER REQUESTS > BACKOFFS > > > We can unblock those requests by doing `ceph osd down osd.243` (or > restarting osd.243). > > This is ceph v14.2.6 and the client kernel is el7 3.10.0-957.27.2.el7.x86_64. > > Are there a better way to debug this? Hi Dan, I assume that these ops don't show up as slow requests on the OSD side? How long did you see it stuck for before intervening?

That's correct -- the osd had no active ops (ceph daemon.... ops). The late release slow req was stuck for 4129s before we intervened.

Do you happen to have "debug ms = 1" logs from osd243?

Nope, but I can try to get it afterwards next time. (Though you need it at the moment the ops get stuck, not only from the moment we notice the stuck ops, right?)

Yes, starting before the moment the ops get stuck and ending after you kick the OSD.

Do you have PG autoscaler enabled? Any PG splits and/or merges at the time?

The MDS is out of question here. This issue is between the kernel client and the OSD.

Can you share the stack trace? That's a 128k allocation, so worth taking a look.

Pasted here: https://pastebin.com/neyah54k

This should be fixed with https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… in 5.4.

...

The same node had a lockup again last night.

Same as before? Mitigated by a restart of a single OSD? Thanks, Ilya

Dan van der Ster

3:06 a.m.

On Mon, Feb 3, 2020 at 11:50 AM Ilya Dryomov <idryomov(a)gmail.com> wrote:

...

On Mon, Feb 3, 2020 at 10:38 AM Dan van der Ster <dan(a)vanderster.com> wrote:

On Fri, Jan 31, 2020 at 6:32 PM Ilya Dryomov <idryomov(a)gmail.com> wrote:

On Fri, Jan 31, 2020 at 4:57 PM Dan van der Ster <dan(a)vanderster.com> wrote:

Hi Ilya, On Fri, Jan 31, 2020 at 11:33 AM Ilya Dryomov <idryomov(a)gmail.com> wrote: > > On Fri, Jan 31, 2020 at 11:06 AM Dan van der Ster <dan(a)vanderster.com> wrote: > > > > Hi all, > > > > We are quite regularly (a couple times per week) seeing: > > > > HEALTH_WARN 1 clients failing to respond to capability release; 1 MDSs > > report slow requests > > MDS_CLIENT_LATE_RELEASE 1 clients failing to respond to capability release > > mdshpc-be143(mds.0): Client hpc-be028.cern.ch: failing to respond > > to capability release client_id: 52919162 > > MDS_SLOW_REQUEST 1 MDSs report slow requests > > mdshpc-be143(mds.0): 1 slow requests are blocked > 30 secs > > > > Which is being caused by osdc ops stuck in a kernel client, e.g.: > > > > 10:57:18 root hpc-be028 /root > > → cat /sys/kernel/debug/ceph/4da6fd06-b069-49af-901f-c9513baabdbd.client52919162/osdc > > REQUESTS 9 homeless 0 > > 46559317 osd243 3.ee6ffcdb 3.cdb [243,501,92]/243 > > [243,501,92]/243 e678697 > > fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a01.00000057 > > 0x400014 1 read > > 46559322 osd243 3.ee6ffcdb 3.cdb [243,501,92]/243 > > [243,501,92]/243 e678697 > > fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a01.00000057 > > 0x400014 1 read > > 46559323 osd243 3.969cc573 3.573 [243,330,226]/243 > > [243,330,226]/243 e678697 > > fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a56.00000056 > > 0x400014 1 read > > 46559341 osd243 3.969cc573 3.573 [243,330,226]/243 > > [243,330,226]/243 e678697 > > fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a56.00000056 > > 0x400014 1 read > > 46559342 osd243 3.969cc573 3.573 [243,330,226]/243 > > [243,330,226]/243 e678697 > > fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a56.00000056 > > 0x400014 1 read > > 46559345 osd243 3.969cc573 3.573 [243,330,226]/243 > > [243,330,226]/243 e678697 > > fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a56.00000056 > > 0x400014 1 read > > 46559621 osd243 3.6313e8ef 3.8ef [243,330,521]/243 > > [243,330,521]/243 e678697 > > fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a45.0000007a > > 0x400014 1 read > > 46559629 osd243 3.b280c852 3.852 [243,113,539]/243 > > [243,113,539]/243 e678697 > > fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a3a.0000007f > > 0x400014 1 read > > 46559928 osd243 3.1ee7bab4 3.ab4 [243,332,94]/243 > > [243,332,94]/243 e678697 > > fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f099ff.0000073f > > 0x400024 1 write > > LINGER REQUESTS > > BACKOFFS > > > > > > We can unblock those requests by doing `ceph osd down osd.243` (or > > restarting osd.243). > > > > This is ceph v14.2.6 and the client kernel is el7 3.10.0-957.27.2.el7.x86_64. > > > > Are there a better way to debug this? > > Hi Dan, > > I assume that these ops don't show up as slow requests on the OSD side? > How long did you see it stuck for before intervening? That's correct -- the osd had no active ops (ceph daemon.... ops). The late release slow req was stuck for 4129s before we intervened. > Do you happen to have "debug ms = 1" logs from osd243? Nope, but I can try to get it afterwards next time. (Though you need it at the moment the ops get stuck, not only from the moment we notice the stuck ops, right?)

Yes, starting before the moment the ops get stuck and ending after you kick the OSD.

> Do you have PG autoscaler enabled? Any PG splits and/or merges at the time? Not on the cephfs_(meta)data pools (though on the 30th I increased those pool sizes from 2 to 3). And also on the 30th I did some PG merging on an unrelated test pool. And anyway we have seen this type of lockup in the past, without those pool changes (also with mimic MDS until we upgraded to nautilus).

The MDS is out of question here. This issue is between the kernel client and the OSD.

Can you share the stack trace? That's a 128k allocation, so worth taking a look.

Pasted here: https://pastebin.com/neyah54k

This should be fixed with https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… in 5.4.

Ahh thanks. We'll open a ticket to RH for a backport to el7.x.

...

The same node had a lockup again last night.

Same as before? Mitigated by a restart of a single OSD?

This time, the user noticed IO was stuck. (Those cp in the paste.) There were no slow reqs on the MDS or elsewhere in the cluster. Anyway, the stuck client had old osdc ops on osd.244, so we did `ceph osd down osd.244` and the client was un-stuck. Cheers, Dan > Thanks, > > Ilya

Kuhring, Mathias

20 Feb 20 Feb

6:28 a.m.

Hey Dan, hey Ilya I know this issue is two years old already, but we are having similar issues. Do you know, if the fixes got ever backported to RHEL kernels? Not looking for el7 but rather el8 fixes. Wondering if the patches were backported and we shouldn't actually see these issues. Or if you could maybe resolve them with a kernel upgrade. Most active clients are currently on kernel versions such as: 4.18.0-348.el8.0.2.x86_64, 4.18.0-348.2.1.el8_5.x86_64, 4.18.0-348.7.1.el8_5.x86_64 While the cluster runs with kernel 3.10.0-1160.42.2.el7.x86_64 and cephadm with ceph version 17.2.1 (ec95624474b1871a821a912b8c3af68f8f8e7aa1) quincy (stable). Not sure, if the cluster kernel is actually relevant here for OSD <> kernel client connection. Thanks you for your help. Best, Mathias

Xiubo Li

4 p.m.

On 20/02/2023 22:28, Kuhring, Mathias wrote:

...

Hey Dan, hey Ilya I know this issue is two years old already, but we are having similar issues. Do you know, if the fixes got ever backported to RHEL kernels?

It's already backported to RHEL 8 long time ago since kernel-4.18.0-154.el8.

...

Not looking for el7 but rather el8 fixes. Wondering if the patches were backported and we shouldn't actually see these issues. Or if you could maybe resolve them with a kernel upgrade. Most active clients are currently on kernel versions such as: 4.18.0-348.el8.0.2.x86_64, 4.18.0-348.2.1.el8_5.x86_64, 4.18.0-348.7.1.el8_5.x86_64 While the cluster runs with kernel 3.10.0-1160.42.2.el7.x86_64 and cephadm with ceph version 17.2.1 (ec95624474b1871a821a912b8c3af68f8f8e7aa1) quincy (stable).

It seems not backported to el7 yet. Thanks,

...

Not sure, if the cluster kernel is actually relevant here for OSD <> kernel client connection. Thanks you for your help. Best, Mathias _______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

Kuhring, Mathias

21 Feb 21 Feb

1:04 a.m.

New subject: [ext] Re: Re: kernel client osdc ops stuck and mds slow reqs

Hey Li, thank you for the quick reply. So the kernel on the cluster nodes might be the issue here? I thought the client kernel is the only relevant one (since we cephadm). Anyhow, we plan to upgrade the cluster nodes to Rocky 8 soon. We'll see if this helps with the issue. Best, Mathias On 2/21/2023 1:00 AM, Xiubo Li wrote:

...

On 20/02/2023 22:28, Kuhring, Mathias wrote:

Hey Dan, hey Ilya I know this issue is two years old already, but we are having similar issues. Do you know, if the fixes got ever backported to RHEL kernels?

It's already backported to RHEL 8 long time ago since kernel-4.18.0-154.el8.

It seems not backported to el7 yet. Thanks,

Ilya Dryomov

2:42 a.m.

On Tue, Feb 21, 2023 at 1:01 AM Xiubo Li <xiubli(a)redhat.com> wrote:

...

On 20/02/2023 22:28, Kuhring, Mathias wrote:

Hey Dan, hey Ilya I know this issue is two years old already, but we are having similar issues. Do you know, if the fixes got ever backported to RHEL kernels?

It's already backported to RHEL 8 long time ago since kernel-4.18.0-154.el8.

It seems not backported to el7 yet.

"Yet" might be misleading here -- I don't think there is/was ever a plan to backport these fixes to RHEL 7.

...

> Not sure, if the cluster kernel is actually relevant here for OSD <> > kernel client connection.

If you are seeing page allocation failures only on the kernel client nodes, then it's not relevant. Unless the stack trace is the same as in the original tracker [1] or Dan's paste [2] (note ceph_osdmap_decode() -> osdmap_set_max_osd() -> krealloc() sequence), you are hitting a different issue. Pasting the entire splat(s) from the kernel log would be a good start. [1] https://tracker.ceph.com/issues/40481 [2] https://pastebin.com/neyah54k Thanks, Ilya

Kuhring, Mathias

23 Feb 23 Feb

6:31 a.m.

New subject: [ext] Re: Re: kernel client osdc ops stuck and mds slow reqs

Hey Ilya, I'm not sure if the things I find in the logs are actually anything related or useful. But I'm not really sure, if I'm looking in the right places. I enabled "debug_ms 1" for the OSDs as suggested above. But this filled up our host disks pretty fast, leading to e.g. monitors crashing. I disabled the debug messages again and trimmed logs to free up space. But I made copies of two OSD logs files which were involved in another capability release / slow requests issue. They are quite big now (~3GB) and even if I remove things like ping stuff, I have more than 1 million lines just for the morning until the disk space was full (around 7 hours). So now I'm wondering how to filter/look for the right things here. When I grep for "error", I get a few of these messages: {"log":"debug 2023-02-22T06:18:08.113+0000 7f15c5fff700 1 -- [v2:192.168.1.13:6881/4149819408,v1:192.168.1.13:6884/4149819408] \u003c== osd.161 v2:192.168.1.31:6835/1012436344 182573 ==== pg_update_log_missing(3.1a6s2 epoch 646235/644895 rep_tid 1014320 entries 646235'7672108 (0'0) error 3:65836dde:::10016e9b7c8.00000000:head by mds.0.1221974:8515830 0.000000 -2 ObjectCleanRegions clean_offsets: [0~18446744073709551615], clean_omap: 1, new_object: 0 trim_to 646178'7662340 roll_forward_to 646192'7672106) v3 ==== 261+0+0 (crc 0 0 0) 0x562d55e52380 con 0x562d8a2de400\n","stream":"stderr","time":"2023-02-22T06:18:08.115002765Z"} And if I grep for "failed", I get a couple of those: {"log":"debug 2023-02-22T06:15:25.242+0000 7f58bbf7c700 1 -- [v2:172.16.62.11:6829/3509070161,v1:172.16.62.11:6832/3509070161] \u003e\u003e 172.16.62.10:0/3127362489 conn(0x55ba06bf3c00 msgr2=0x55b9ce07e580 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_until read failed\n","stream":"stderr","time":"2023-02-22T06:15:25.243808392Z"} {"log":"debug 2023-02-22T06:15:25.242+0000 7f58bbf7c700 1 --2- [v2:172.16.62.11:6829/3509070161,v1:172.16.62.11:6832/3509070161] \u003e\u003e 172.16.62.10:0/3127362489 conn(0x55ba06bf3c00 0x55b9ce07e580 crc :-1 s=READY pgs=2096664 cs=0 l=1 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_read_frame_preamble_main read frame preamble failed r=-1 ((1) Operation not permitted)\n","stream":"stderr","time":"2023-02-22T06:15:25.243813528Z"} Not sure, if they are related to the issue. In the kernel logs of the client (dmesg, journalctl or /var/log/messages), there seem to be no errors or any stack traces in the relevant time periods. The only thing I can see is our restart of the relevant OSDs: [Mi Feb 22 07:29:59 2023] libceph: osd90 down [Mi Feb 22 07:30:34 2023] libceph: osd90 up [Mi Feb 22 07:31:55 2023] libceph: osd93 down [Mi Feb 22 08:37:50 2023] libceph: osd93 up I noticed a socket closed for another client, but I assume that's more related to monitors failing due to full disks: [Mi Feb 22 05:59:52 2023] libceph: mon2 (1)172.16.62.12:6789 socket closed (con state OPEN) [Mi Feb 22 05:59:52 2023] libceph: mon2 (1)172.16.62.12:6789 session lost, hunting for new mon [Mi Feb 22 05:59:52 2023] libceph: mon3 (1)172.16.62.13:6789 session established Best, Mathias On 2/21/2023 11:42 AM, Ilya Dryomov wrote: On Tue, Feb 21, 2023 at 1:01 AM Xiubo Li <xiubli@redhat.com><mailto:xiubli@redhat.com> wrote: On 20/02/2023 22:28, Kuhring, Mathias wrote: Hey Dan, hey Ilya I know this issue is two years old already, but we are having similar issues. Do you know, if the fixes got ever backported to RHEL kernels? It's already backported to RHEL 8 long time ago since kernel-4.18.0-154.el8. Not looking for el7 but rather el8 fixes. Wondering if the patches were backported and we shouldn't actually see these issues. Or if you could maybe resolve them with a kernel upgrade. Most active clients are currently on kernel versions such as: 4.18.0-348.el8.0.2.x86_64, 4.18.0-348.2.1.el8_5.x86_64, 4.18.0-348.7.1.el8_5.x86_64 While the cluster runs with kernel 3.10.0-1160.42.2.el7.x86_64 and cephadm with ceph version 17.2.1 (ec95624474b1871a821a912b8c3af68f8f8e7aa1) quincy (stable). It seems not backported to el7 yet. "Yet" might be misleading here -- I don't think there is/was ever a plan to backport these fixes to RHEL 7. Not sure, if the cluster kernel is actually relevant here for OSD <> kernel client connection. If you are seeing page allocation failures only on the kernel client nodes, then it's not relevant. Unless the stack trace is the same as in the original tracker [1] or Dan's paste [2] (note ceph_osdmap_decode() -> osdmap_set_max_osd() -> krealloc() sequence), you are hitting a different issue. Pasting the entire splat(s) from the kernel log would be a good start. [1] https://tracker.ceph.com/issues/40481 [2] https://pastebin.com/neyah54k Thanks, Ilya -- Mathias Kuhring Dr. rer. nat. Bioinformatician HPC & Core Unit Bioinformatics Berlin Institute of Health at Charité (BIH) E-Mail: mathias.kuhring@bih-charite.de<mailto:mathias.kuhring@bih-charite.de> Mobile: +49 172 3475576

Ilya Dryomov

8:03 a.m.

New subject: [ext] Re: Re: kernel client osdc ops stuck and mds slow reqs

On Thu, Feb 23, 2023 at 3:31 PM Kuhring, Mathias <mathias.kuhring(a)bih-charite.de> wrote:

...

Hi Mathias, Then it's very unlikely to be a kernel client issue meaning that you don't need to worry about your kernel versions.

...

The only thing I can see is our restart of the relevant OSDs: [Mi Feb 22 07:29:59 2023] libceph: osd90 down [Mi Feb 22 07:30:34 2023] libceph: osd90 up [Mi Feb 22 07:31:55 2023] libceph: osd93 down [Mi Feb 22 08:37:50 2023] libceph: osd93 up I noticed a socket closed for another client, but I assume that's more related to monitors failing due to full disks: [Mi Feb 22 05:59:52 2023] libceph: mon2 (1)172.16.62.12:6789 socket closed (con state OPEN) [Mi Feb 22 05:59:52 2023] libceph: mon2 (1)172.16.62.12:6789 session lost, hunting for new mon [Mi Feb 22 05:59:52 2023] libceph: mon3 (1)172.16.62.13:6789 session established

Yeah, these are expected when a monitor or an OSD goes down. Thanks, Ilya

428

days inactive

1547

days old

ceph-users@ceph.io

Manage subscription

12 comments

4 participants

tags (0)

participants (4)

Dan van der Ster
Ilya Dryomov
Kuhring, Mathias
Xiubo Li