[ceph-users] Re: slow ops at restarting OSDs (octopus)

11 Jun 2021

Am 10.06.21 um 17:45 schrieb Manuel Lausch:
...
  Hi Peter,

 your suggestion pointed me to the right spot. 
 I didn't know about the feature, that ceph will read from replica
 PGs.

 So on. I found two functions in the osd/PrimaryLogPG.cc:
 "check_laggy" and "check_laggy_requeue". On both is first a check,
if
 the partners have the octopus features. if not, the function is
 skipped. This explains the beginning of the problem after about the
 half cluster was updated.

 To verifiy this, I added "return true" in the first line of the
 functions. The issue is gone with it. But
 I don't know what problems this could trigger. I know, the root cause
 is not fixed with it.
 I think I will open a bug ticket with this knowlage. 

I wonder if I faced the same issue. The issue I had occured when OSDs came back up and
peering started.

My cluster was a fresh octopus install so I think the min osd release was set to octopus.

Is it in general safe to stay with this switch at nautilus and run octopus to run a
maintained release?

...

 osd_op_queue_cutoff is set to high
 and a icmp rate limiting should not happen 

It could if you choose fast shutdown and connections to the OSD daemon are refused with
icmp port unreachable?!

Peter

2024

2023

2022

2021

2020

2019

[ceph-users] Re: slow ops at restarting OSDs (octopus)