[ceph-users] Re: Removing pool in nautilus is incredibly slow

25 Jun 2020

Hi Frank,

"With our mimic cluster I have absolutely no problems migrating pools in
one go to a completely new set of disks. I have no problems doubling the
number of disks and at the same time doubling the number of PGs in a pooI
and let the rebalancing loose in one single go. No need for slowly
increasing weights. No need for slow changes of PG counts. In such
cases,..."

This is also my experience.
I have 2 clusters running on Nautilus 14.2.8, one upgraded 2 weeks ago from
mimic.
I do NOT see any performance drop from the client side. But recovering is
extremely slow, after replacing a defect OSD.
When I need to replace an OSD, I destroy them, turn on the noout flag, turn
off the server, replace the disk, and turn the server on. All within 30min.
In Mimic I had only some misplaced objects and it recovered within an hour.
In Nautilis, when I do exactly the same, I get beside misplaced objects,
also degraded PGs and undersized PGs, and the recovery takes almost a day.

I still need to investigate this (tips are welcome ;) ) But what is
standing out, is the load on the manager.

Grtz, Jiri

On Thu, 25 Jun 2020 at 17:18, Frank Schilder &lt;frans(a)dtu.dk&gt; wrote:

...
  I actually don't think this is the problem. I
removed a 120TB file system
 EC-data pool in mimic without any special flags and magic. The OSDs of the
 data pool are HDD with everything collocated. I had absolutely no problem,
 the data was removed after 2-3 days and nobody even noticed. This is a
 standard operation and should just work without OPS queues running full,
 heartbeat losses and manual compaction or the like.

 Looking at all the different reports that came in on this list over the
 past 1-2 years about performance issues starting with nautilus, it really
 sounds to me that a serious regression happened. Maybe the messenger
 introduction? Maybe the prioritizing problem that Robert LeBlanc reported
 in

https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/W4M5XQRDBLX…
 ?

 I guess anyone who started with nautilus doesn't know the good old times
 of being able to do admin work without a completely normal cluster
 collapsing for no reason. Others do. I find it a bit strange that there is
 such a long silence on this topic. There are numerous reports of people
 having issues with PG changes or rebalancing. Benign operations that should
 just work.

 With our mimic cluster I have absolutely no problems migrating pools in
 one go to a completely new set of disks. I have no problems doubling the
 number of disks and at the same time doubling the number of PGs in a pooI
 and let the rebalancing loose in one single go. No need for slowly
 increasing weights. No need for slow changes of PG counts. In such cases, I
 casually push the recovery options up close to max available bandwidth and
 nobody even notices a performance drop. And all this with WAL/DB and data
 collocated on the same disk and with rather low RAM available, I can only
 afford 2GB per HDD OSD.

 Anyone on nautilus or higher who has the same experience?

 Best regards,
 =================
 Frank Schilder
 AIT Risø Campus
 Bygning 109, rum S14

 ________________________________________
 From: Eugen Block &lt;eblock(a)nde.ag&gt;
 Sent: 25 June 2020 16:42:57
 To: ceph-users(a)ceph.io
 Subject: [ceph-users] Re: Removing pool in nautilus is incredibly slow

 I'm not sure if your OSDs have their rocksDB on faster devices, if not
 it sounds a lot like rocksdb fragmentation [1] leading to a very high
 load on the OSDs and occasionally crashing OSDs. If you don't plan to
 delete so much data at once on a regular basis you could sit this one
 out, but one solution is to re-create the OSDs with rocksDB/WAL on
 faster devices.

 [1] https://www.mail-archive.com/ceph-users@ceph.io/msg03160.html

 Zitat von Francois Legrand &lt;fleg(a)lpnhe.in2p3.fr&gt;fr>:

  Thanks for the hint.
 I tryed but it doesn't seems to change anything...
 Moreover, as the osds seems quite loaded I had regularly some osd
 marked down which triggered some new peering and thus more load !!!
 I set the osd no down flag, but I still have some osd reported
 (wrongly) as down (and back up in the minute) which generate peering
 and remapping. I don't really understand the action of no down
 parameter !
 Is there a way to tell ceph not to peer immediately after an osd is
 reported down (let say wait for 60s) ?
 I am thinking about restarting all osd (or maybe the whole cluster)
 to get osd_op_queue_cut_off changed to high and
 osd_op_thread_timeout to something higher than 15 (but I don't think
 it will really improve the situation).
 F.

 Le 25/06/2020 à 14:26, Wout van Heeswijk a écrit :
  Hi Francois,

 Have you already looked at the option "osd_delete_sleep"? It will
 not speed up the process but I will give you some control over your
 cluster performance.

 Something like:

 ceph tell osd.\* injectargs '--osd_delete_sleep1'
 kind regards,

 Wout
 42on
 On 25-06-2020 09:57, Francois Legrand wrote:
  Does someone have an idea ?
 F.
 _______________________________________________
 ceph-users mailing list --ceph-users(a)ceph.io
 To unsubscribe send an email toceph-users-leave(a)ceph.io 

 _______________________________________________
 ceph-users mailing list -- ceph-users(a)ceph.io
 To unsubscribe send an email to ceph-users-leave(a)ceph.io 

 _______________________________________________
 ceph-users mailing list -- ceph-users(a)ceph.io
 To unsubscribe send an email to ceph-users-leave(a)ceph.io
 _______________________________________________
 ceph-users mailing list -- ceph-users(a)ceph.io
 To unsubscribe send an email to ceph-users-leave(a)ceph.io

2024

2023

2022

2021

2020

2019

[ceph-users] Re: Removing pool in nautilus is incredibly slow