[ceph-users] Re: slow ops at restarting OSDs (octopus)

10 Jun 2021

Hi,

At which point in the update procedure did you

    ceph osd require-osd-release octopus

?

And are you sure it was set to nautilus before the update? (`ceph osd dump`
will show)

Cheers , Dan

On Thu, Jun 10, 2021, 5:45 PM Manuel Lausch &lt;manuel.lausch(a)1und1.de&gt; wrote:

...
  Hi Peter,

 your suggestion pointed me to the right spot.
 I didn't know about the feature, that ceph will read from replica
 PGs.

 So on. I found two functions in the osd/PrimaryLogPG.cc:
 "check_laggy" and "check_laggy_requeue". On both is first a check,
if
 the partners have the octopus features. if not, the function is
 skipped. This explains the beginning of the problem after about the
 half cluster was updated.

 To verifiy this, I added "return true" in the first line of the
 functions. The issue is gone with it. But
 I don't know what problems this could trigger. I know, the root cause
 is not fixed with it.
 I think I will open a bug ticket with this knowlage.

 osd_op_queue_cutoff is set to high
 and a icmp rate limiting should not happen

 Thanks
 Manuel

 On Thu, 10 Jun 2021 11:28:48 +0200
 Peter Lieven &lt;pl(a)kamp.de&gt; wrote:

  Am 10.06.21 um 11:08 schrieb Manuel Lausch:
  Hi,

 has no one a idea what could cause this issue. Or how I could debug
 it?

 In some days I have to go live with this cluster. If I don't have a
 solution I have to go live with nautilus. 

 Hi Manuel,

 I had similar issues with Octopus and i am thus stuck with Nautilus.

 Can you debug the slow ops and see if the slow ops are caused by the
 status "waiting for readable".

 I suspected that it has something to do with the new feature in
 Octopus to read from all OSDs regardless if

 they are master for a PG or not.

 Can you also verify that osd_op_queue_cut_off is set to high and that
 icmp rate limiting is disabled on your hosts?

 Peter

  _______________________________________________
 ceph-users mailing list -- ceph-users(a)ceph.io
 To unsubscribe send an email to ceph-users-leave(a)ceph.io

2024

2023

2022

2021

2020

2019

[ceph-users] Re: slow ops at restarting OSDs (octopus)