Hi,
At which point in the update procedure did you
ceph osd require-osd-release octopus
?
And are you sure it was set to nautilus before the update? (`ceph osd dump`
will show)
Cheers , Dan
On Thu, Jun 10, 2021, 5:45 PM Manuel Lausch <manuel.lausch(a)1und1.de> wrote:
Hi Peter,
your suggestion pointed me to the right spot.
I didn't know about the feature, that ceph will read from replica
PGs.
So on. I found two functions in the osd/PrimaryLogPG.cc:
"check_laggy" and "check_laggy_requeue". On both is first a check,
if
the partners have the octopus features. if not, the function is
skipped. This explains the beginning of the problem after about the
half cluster was updated.
To verifiy this, I added "return true" in the first line of the
functions. The issue is gone with it. But
I don't know what problems this could trigger. I know, the root cause
is not fixed with it.
I think I will open a bug ticket with this knowlage.
osd_op_queue_cutoff is set to high
and a icmp rate limiting should not happen
Thanks
Manuel
On Thu, 10 Jun 2021 11:28:48 +0200
Peter Lieven <pl(a)kamp.de> wrote:
Am 10.06.21 um 11:08 schrieb Manuel Lausch:
Hi,
has no one a idea what could cause this issue. Or how I could debug
it?
In some days I have to go live with this cluster. If I don't have a
solution I have to go live with nautilus.
Hi Manuel,
I had similar issues with Octopus and i am thus stuck with Nautilus.
Can you debug the slow ops and see if the slow ops are caused by the
status "waiting for readable".
I suspected that it has something to do with the new feature in
Octopus to read from all OSDs regardless if
they are master for a PG or not.
Can you also verify that osd_op_queue_cut_off is set to high and that
icmp rate limiting is disabled on your hosts?
Peter
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io