[ceph-users] Re: kernel client osdc ops stuck and mds slow reqs

21 Feb 2023

On Tue, Feb 21, 2023 at 1:01 AM Xiubo Li &lt;xiubli(a)redhat.com&gt; wrote:
...

 On 20/02/2023 22:28, Kuhring, Mathias wrote:
  Hey Dan, hey Ilya

 I know this issue is two years old already, but we are having similar
 issues.

 Do you know, if the fixes got ever backported to RHEL kernels? 
 It's already backported to RHEL 8 long time ago since kernel-4.18.0-154.el8.

  Not looking for el7 but rather el8 fixes.
 Wondering if the patches were backported and we shouldn't actually see
 these issues.
 Or if you could maybe resolve them with a kernel upgrade.

 Most active clients are currently on kernel versions such as:
 4.18.0-348.el8.0.2.x86_64, 4.18.0-348.2.1.el8_5.x86_64,
 4.18.0-348.7.1.el8_5.x86_64

 While the cluster runs with kernel 3.10.0-1160.42.2.el7.x86_64 and
 cephadm with
 ceph version 17.2.1 (ec95624474b1871a821a912b8c3af68f8f8e7aa1) quincy
 (stable). 
 It seems not backported to el7 yet. 
"Yet" might be misleading here -- I don't think there is/was ever
a plan to backport these fixes to RHEL 7.

...
  > Not sure, if the cluster kernel is actually
relevant here for OSD <>
 > kernel client connection. 
If you are seeing page allocation failures only on the kernel client
nodes, then it's not relevant.

Unless the stack trace is the same as in the original tracker [1] or
Dan's paste [2] (note ceph_osdmap_decode() -> osdmap_set_max_osd() ->
krealloc() sequence), you are hitting a different issue.  Pasting the
entire splat(s) from the kernel log would be a good start.

[1] https://tracker.ceph.com/issues/40481
[2] https://pastebin.com/neyah54k

Thanks,

                Ilya

2024

2023

2022

2021

2020

2019

[ceph-users] Re: kernel client osdc ops stuck and mds slow reqs