Encountered this one again today, I've
updated the issue with new
information:
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at
croit GmbH
Freseniusstr. 31h
81247 München
Tel: +49 89 1896585 90
On Sat, Feb 29, 2020 at 10:21 PM Nikola Ciprich
<nikola.ciprich(a)linuxbox.cz> wrote:
Hi,
I just wanted to report we've just hit very similar problem.. on mimic
(13.2.6). Any manipulation with OSD (ie restart) causes lot of slow
ops caused by waiting for new map. It seems those are slowed by SATA
OSDs which keep being 100% busy reading for long time until all ops are gone,
blocking OPS on unrelated NVME pools - SATA pools are completely unused now.
is this possible that those maps are being requested from slow SATA OSDs
and it takes such a long time for some reason? why could it take so long?
the cluster is very small with very light load..
BR
nik
On Wed, Feb 19, 2020 at 10:03:35AM +0100, Wido den Hollander wrote:
On 2/19/20 9:34 AM, Paul Emmerich wrote:
> On Wed, Feb 19, 2020 at 7:26 AM Wido den Hollander <wido(a)42on.com> wrote:
>>
>>
>>
>> On 2/18/20 6:54 PM, Paul Emmerich wrote:
>>> I've also seen this problem on Nautilus with no obvious reason for the
>>> slowness once.
>>
>> Did this resolve itself? Or did you remove the pool?
>
> I've seen this twice on the same cluster, it fixed itself the first
> time (maybe with some OSD restarts?) and the other time I removed the
> pool after a few minutes because the OSDs were running into heartbeat
> timeouts. There unfortunately seems to be no way to reproduce this :(
>
Yes, that's the problem. I've been trying to reproduce it, but I can't.
It works on all my Nautilus systems except for this one.
As you saw it, Bryan saw it, I expect others to encounter this at some
point as well.
I don't have any extensive logging as this cluster is in production and
I can't simply crank up the logging and try again.
> In this case it wasn't a new pool that caused problems but a very old one.
>
>
> Paul
>
>>
>>> In my case it was a rather old cluster that was upgraded all the way
>>> from firefly
>>>
>>>
>>
>> This cluster has also been installed with Firefly. It was installed in
>> 2015, so a while ago.
>>
>> Wido
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io
--
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava
tel.: +420 591 166 214
fax: +420 596 621 273
mobil: +420 777 093 799
www.linuxbox.cz
mobil servis: +420 737 238 656
email servis: servis(a)linuxbox.cz
-------------------------------------