[ceph-users] Re: Removing pool in nautilus is incredibly slow

26 Jun 2020

Thanks. I also added osd_op_queue_cut_off to high in global (as you 
mentioned in a previous thread that osd and mds should use it).
F.

Le 26/06/2020 à 16:35, Frank Schilder a écrit :
...
  I never tried "prio" out, but the reports I
have seen claim that prio is inferior.

 However, as far as I know it is safe to change these settings. Unfortunately, you need to
restart services to apply the changes.

 Before you do, check if *all* daemons are using the same setting. Contrary to the naming
(osd_*), this setting applies to all daemons. I added it to the global options and, most
notably, performance of the MDS was improved a lot.

 Best regards,
 =================
 Frank Schilder
 AIT Risø Campus
 Bygning 109, rum S14

 ________________________________________
 From: Francois Legrand &lt;fleg(a)lpnhe.in2p3.fr&gt;
 Sent: 26 June 2020 15:03:23
 To: Frank Schilder; ceph-users(a)ceph.io
 Subject: Re: [ceph-users] Re: Removing pool in nautilus is incredibly slow

 I changed osd_op_queue_cut_off to high and rebooted all the osds. But
 the result is more or less the same (storage is still extremely slow,
 2h30 to rdb extract a 64GB image !). The only improvement is that it
 seems that degraded pgs have disapeared (which is at least a good
 point). It seems that there is a problem in priority of operations.
 Thus do you think (and also others on the list) that changing the
 osd_op_queue setting could help (change to prio or mclock_client).
 What are the risks or secondary effects of trying mclock_client on a
 production cluster (is it safe) ?
 F.

 Le 26/06/2020 à 09:46, Frank Schilder a écrit :
> I'm using
>
> osd_op_queue = wpq
> osd_op_queue_cut_off = high
>
> and these settings are recommended.
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Francois Legrand &lt;fleg(a)lpnhe.in2p3.fr&gt;
> Sent: 26 June 2020 09:44:00
> To: Frank Schilder; ceph-users(a)ceph.io
> Subject: Re: [ceph-users] Re: Removing pool in nautilus is incredibly slow
>
> We are now using osd_op_queue = wpq. Maybe returning to prio should help ?
> What are you using on your mimic custer ?
> F.
>
> Le 25/06/2020 à 19:28, Frank Schilder a écrit :
>> OK, this *does* sound bad. I would consider this a show stopper for upgrade from
mimic.
>>
>> Best regards,
>> =================
>> Frank Schilder
>> AIT Risø Campus
>> Bygning 109, rum S14
>>
>> ________________________________________
>> From: Francois Legrand &lt;fleg(a)lpnhe.in2p3.fr&gt;
>> Sent: 25 June 2020 19:25:14
>> To: ceph-users(a)ceph.io
>> Subject: [ceph-users] Re: Removing pool in nautilus is incredibly slow
>>
>> I also had this kind of symptoms with nautilus.
>> Replacing a failed disk (from cluster ok) generates degraded objects.
>> Also, we have a proxmox cluster accessing vm images stored in our ceph storage
with rbd.
>> Each time I had some operation on the ceph cluster like adding or removing a
pool, most of our proxmox vms lost contact with their system disk in ceph and crashed (or
remount system storage in read-only mode). At first I thought it was a network problem,
but now I am sure that it's related to ceph becoming unresponsive during background
operations.
>> For now, proxmox cannot even access ceph storage using rbd (it fails with
timeout).
>> _______________________________________________
>> ceph-users mailing list -- ceph-users(a)ceph.io
>> To unsubscribe send an email to ceph-users-leave(a)ceph.io 

2024

2023

2022

2021

2020

2019

[ceph-users] Re: Removing pool in nautilus is incredibly slow