March 2023 - ceph-users - lists.ceph.io

Cephalocon Amsterdam 2023 Photographer Volunteer Help Needed

by Mike Perez

Hi everyone, To help with costs for Cephalocon Amsterdam 2023, we wanted to see if anyone would like to volunteer to help with photography for the event. A group of people would be ideal so that we have good coverage in the expo hall and sessions. If you're interested, please reply to me directly for more information. -- Mike Perez Community Manager Ceph Foundation

1 year, 1 month

3
2
0 0

Re: Very slow backfilling/remapping of EC pool PGs

by Gauvain Pocentek

(adding back the list) On Tue, Mar 21, 2023 at 11:25 AM Joachim Kraftmayer < joachim.kraftmayer(a)clyso.com> wrote: > i added the questions and answers below. > > ___________________________________ > Best Regards, > Joachim Kraftmayer > CEO | Clyso GmbH > > Clyso GmbH > p: +49 89 21 55 23 91 2 > a: Loristraße 8 | 80335 München | Germany > w: https://clyso.com | e: joachim.kraftmayer(a)clyso.com > > We are hiring: https://www.clyso.com/jobs/ > --- > CEO: Dipl. Inf. (FH) Joachim Kraftmayer > Unternehmenssitz: Utting am Ammersee > Handelsregister beim Amtsgericht: Augsburg > Handelsregister-Nummer: HRB 25866 > USt. ID-Nr.: DE275430677 > > Am 21.03.23 um 11:14 schrieb Gauvain Pocentek: > > Hi Joachim, > > > On Tue, Mar 21, 2023 at 10:13 AM Joachim Kraftmayer < > joachim.kraftmayer(a)clyso.com> wrote: > >> Which Ceph version are you running, is mclock active? >> >> > We're using Quincy (17.2.5), upgraded step by step from Luminous if I > remember correctly. > > did you recreate the osds? if yes, at which version? > I actually don't remember all the history, but I think we added the HDD nodes while running Pacific. > > mlock seems active, set to high_client_ops profile. HDD OSDs have very > different settings for max capacity iops: > > osd.137 basic osd_mclock_max_capacity_iops_hdd > 929.763899 > osd.161 basic osd_mclock_max_capacity_iops_hdd > 4754.250946 > osd.222 basic osd_mclock_max_capacity_iops_hdd > 540.016984 > osd.281 basic osd_mclock_max_capacity_iops_hdd > 1029.193945 > osd.282 basic osd_mclock_max_capacity_iops_hdd > 1061.762870 > osd.283 basic osd_mclock_max_capacity_iops_hdd > 462.984562 > > We haven't set those explicitly, could they be the reason of the slow > recovery? > > i recommend to disable mclock for now, and yes we have seen slow recovery > caused by mclock. > Stupid question: how do you do that? I've looked through the docs but could only find information about changing the settings. > > > Bonus question: does ceph set that itself? > > yes and if you have a setup with HDD + SSD (db & wal) the discovery works > not in the right way. > Good to know! Gauvain > > Thanks! > > Gauvain > > > > >> Joachim >> >> ___________________________________ >> Clyso GmbH - Ceph Foundation Member >> >> Am 21.03.23 um 06:53 schrieb Gauvain Pocentek: >> > Hello all, >> > >> > We have an EC (4+2) pool for RGW data, with HDDs + SSDs for WAL/DB. This >> > pool has 9 servers with each 12 disks of 16TBs. About 10 days ago we >> lost a >> > server and we've removed its OSDs from the cluster. Ceph has started to >> > remap and backfill as expected, but the process has been getting slower >> and >> > slower. Today the recovery rate is around 12 MiB/s and 10 objects/s. All >> > the remaining unclean PGs are backfilling: >> > >> > data: >> > volumes: 1/1 healthy >> > pools: 14 pools, 14497 pgs >> > objects: 192.38M objects, 380 TiB >> > usage: 764 TiB used, 1.3 PiB / 2.1 PiB avail >> > pgs: 771559/1065561630 objects degraded (0.072%) >> > 1215899/1065561630 objects misplaced (0.114%) >> > 14428 active+clean >> > 50 active+undersized+degraded+remapped+backfilling >> > 18 active+remapped+backfilling >> > 1 active+clean+scrubbing+deep >> > >> > We've checked the health of the remaining servers, and everything looks >> > like (CPU/RAM/network/disks). >> > >> > Any hints on what could be happening? >> > >> > Thank you, >> > Gauvain >> > _______________________________________________ >> > ceph-users mailing list -- ceph-users(a)ceph.io >> > To unsubscribe send an email to ceph-users-leave(a)ceph.io >> >

1 year, 1 month

2
2
0 0

Re: Ceph Bluestore tweaks for Bcache

by Matthias Ferdinand

ceph version: 17.2.0 on Ubuntu 22.04 non-containerized ceph from Ubuntu repos cluster started on luminous I have been using bcache on filestore on rotating disks for many years without problems. Now converting OSDs to bluestore, there are some strange effects. If I create the bcache device, set its rotational flag to '1', then do ceph-volume lvm create ... --crush-device-class=hdd the OSD comes up with the right parameters and much improved latency compared to OSD directly on /dev/sdX. ceph osd metatdata ... shows "bluestore_bdev_type": "hdd", "rotational": "1" But after reboot, bcache rotational flag is set '0' again, and the OSD now comes up with "rotational": "0" Latency immediately starts to increase (and continually increases over the next days, possibly due to accumulating fragmention). These wrong settings stay in place even if I stop the OSD, set the bcache rotational flag to '1' again and restart the OSD. I have found no way to get back to the original settings other than destroying and recreating the OSD. I guess I am just not seeing something obvious, like from where these settings get pulled at OSD startup. I even created udev rules to set bcache rotational=1 at boot time, before any ceph daemon starts, but it did not help. Something running after these rules reset the bcache rotationl flags back to 0. Haven't found the culprit yet, but not sure if it even matters. Are these OSD settings (bluestore_bdev_type, rotational) persisted somewhere and can they be edited and pinned? Alternatively, can I manually set and persist the relevant bluestore tunables (per OSD / per device class) so as to make the bcache rotational flag irrelevant after the OSD is first created? Regards Matthias On Fri, Apr 08, 2022 at 03:05:38PM +0300, Igor Fedotov wrote: > Hi Frank, > > in fact this parameter impacts OSD behavior at both build-time and during > regular operationing. It simply substitutes hdd/ssd auto-detection with > manual specification. And hence relevant config parameters are applied. If > e.g. min_alloc_size is persistent after OSD creation - it wouldn't be > updated. But if specific setting allows at run-time - it would be altered. > > So the proper usage would definitely be manual ssd/hdd mode selection before > the first OSD creation and keeping it in that mode along the whole OSD > lifecycle. But technically one can change the mode at any arbitrary point in > time which would result in run-rime setting being out-of-sync with creation > ones. With some unclear side-effects.. > > Please also note that this setting was orignally intended mostly for > development/testing purposes not regular usage. Hence it's flexible but > rather unsafe if used improperly. > > > Thanks, > > Igor > > On 4/7/2022 2:40 PM, Frank Schilder wrote: > > Hi Richard and Igor, > > > > are these tweaks required at build-time (osd prepare) only or are they required for every restart? > > > > Is this setting "bluestore debug enforce settings=hdd" in the ceph config data base or set somewhere else? How does this work if deploying HDD- and SSD-OSDs at the same time? > > > > Ideally, all these tweaks should be applicable and settable at creation time only without affecting generic settings (that is, at the ceph-volume command line and not via config side effects). Otherwise it becomes really tedious to manage these. > > > > For example, would the following work-flow apply the correct settings *permanently* across restarts: > > > > 1) Prepare OSD on fresh HDD with ceph-volume lvm batch --prepare ... > > 2) Assign dm_cache to logical OSD volume created in step 1 > > 3) Start OSD, restart OSDs, boot server ... > > > > I would assume that the HDD settings are burned into the OSD in step 1 and will be used in all future (re-)starts without the need to do anything despite the device being detected as non-rotational after step 2. Is this assumption correct? > > > > Thanks and best regards, > > ================= > > Frank Schilder > > AIT Risø Campus > > Bygning 109, rum S14 > > > > ________________________________________ > > From: Richard Bade <hitrich(a)gmail.com> > > Sent: 06 April 2022 00:43:48 > > To: Igor Fedotov > > Cc: Ceph Users > > Subject: [Warning Possible spam] [ceph-users] Re: Ceph Bluestore tweaks for Bcache > > > > Just for completeness for anyone that is following this thread. Igor > > added that setting in Octopus, so unfortunately I am unable to use it > > as I am still on Nautilus. > > > > Thanks, > > Rich > > > > On Wed, 6 Apr 2022 at 10:01, Richard Bade <hitrich(a)gmail.com> wrote: > > > Thanks Igor for the tip. I'll see if I can use this to reduce the > > > number of tweaks I need. > > > > > > Rich > > > > > > On Tue, 5 Apr 2022 at 21:26, Igor Fedotov <igor.fedotov(a)croit.io> wrote: > > > > Hi Richard, > > > > > > > > just FYI: one can use "bluestore debug enforce settings=hdd" config > > > > parameter to manually enforce HDD-related settings for a BlueStore > > > > > > > > > > > > Thanks, > > > > > > > > Igor > > > > > > > > On 4/5/2022 1:07 AM, Richard Bade wrote: > > > > > Hi Everyone, > > > > > I just wanted to share a discovery I made about running bluestore on > > > > > top of Bcache in case anyone else is doing this or considering it. > > > > > We've run Bcache under Filestore for a long time with good results but > > > > > recently rebuilt all the osds on bluestore. This caused some > > > > > degradation in performance that I couldn't quite put my finger on. > > > > > Bluestore osds have some smarts where they detect the disk type. > > > > > Unfortunately in the case of Bcache it detects as SSD, when in fact > > > > > the HDD parameters are better suited. > > > > > I changed the following parameters to match the HDD default values and > > > > > immediately saw my average osd latency during normal workload drop > > > > > from 6ms to 2ms. Peak performance didn't change really, but a test > > > > > machine that I have running a constant iops workload was much more > > > > > stable as was the average latency. > > > > > Performance has returned to Filestore or better levels. > > > > > Here are the parameters. > > > > > > > > > > ; Make sure that we use values appropriate for HDD not SSD - Bcache > > > > > gets detected as SSD > > > > > bluestore_prefer_deferred_size = 32768 > > > > > bluestore_compression_max_blob_size = 524288 > > > > > bluestore_deferred_batch_ops = 64 > > > > > bluestore_max_blob_size = 524288 > > > > > bluestore_min_alloc_size = 65536 > > > > > bluestore_throttle_cost_per_io = 670000 > > > > > > > > > > ; Try to improve responsiveness when some disks are fully utilised > > > > > osd_op_queue = wpq > > > > > osd_op_queue_cut_off = high > > > > > > > > > > Hopefully someone else finds this useful. > > > > > _______________________________________________ > > > > > ceph-users mailing list -- ceph-users(a)ceph.io > > > > > To unsubscribe send an email to ceph-users-leave(a)ceph.io > > > > -- > > > > Igor Fedotov > > > > Ceph Lead Developer > > > > > > > > Looking for help with your Ceph cluster? Contact us at https://croit.io > > > > > > > > croit GmbH, Freseniusstr. 31h, 81247 Munich > > > > CEO: Martin Verges - VAT-ID: DE310638492 > > > > Com. register: Amtsgericht Munich HRB 231263 > > > > Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx > > > > > > _______________________________________________ > > ceph-users mailing list -- ceph-users(a)ceph.io > > To unsubscribe send an email to ceph-users-leave(a)ceph.io > > -- > Igor Fedotov > Ceph Lead Developer > > Looking for help with your Ceph cluster? Contact us at https://croit.io > > croit GmbH, Freseniusstr. 31h, 81247 Munich > CEO: Martin Verges - VAT-ID: DE310638492 > Com. register: Amtsgericht Munich HRB 231263 > Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx > > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io

1 year, 1 month

2
2
0 0

Ceph Days India 2023 - Call for proposals

by Gaurav Sitlani

Hello Cephers, We're happy to share that we're organizing Ceph Day India on 5th May 2023 this year. The event is now sold out! If you missed getting a ticket, consider submitting the CFP and we'll provide a ticket if accepted. https://ceph.io/en/community/events/2023/ceph-days-india/ Please reach out to us if you need any help regarding the submissions. Thanks and regards, Gaurav Sitlani Ceph Community Ambassador

1 year, 1 month

1
0
0 0

Changing os to ubuntu from centos 8

by Szabo, Istvan (Agoda)

Hi, I'd like to change the os to ubuntu 20.04.5 from my bare metal deployed octopus 15.2.14 on centos 8. On the first run I would go with octopus 15.2.17 just to not make big changes in the cluster. I've found couple of threads on the mailing list but those were containerized (like: Re: Upgrade/migrate host operating system for ceph nodes (CentOS/Rocky) or Re: Migrating CEPH OS looking for suggestions). Wonder what is the proper steps for this kind of migration? Do we need to start with mgr or mon or rgw or osd? Is it possible to reuse the osd with ceph-volume scan on the reinstalled machine? I'd stay with baremetal deployment and even maybe with octopus but I'm curious your advice. Thank you ________________________________ This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.

1 year, 1 month

2
2
0 0

Very slow backfilling/remapping of EC pool PGs

by Gauvain Pocentek

Hello all, We have an EC (4+2) pool for RGW data, with HDDs + SSDs for WAL/DB. This pool has 9 servers with each 12 disks of 16TBs. About 10 days ago we lost a server and we've removed its OSDs from the cluster. Ceph has started to remap and backfill as expected, but the process has been getting slower and slower. Today the recovery rate is around 12 MiB/s and 10 objects/s. All the remaining unclean PGs are backfilling: data: volumes: 1/1 healthy pools: 14 pools, 14497 pgs objects: 192.38M objects, 380 TiB usage: 764 TiB used, 1.3 PiB / 2.1 PiB avail pgs: 771559/1065561630 objects degraded (0.072%) 1215899/1065561630 objects misplaced (0.114%) 14428 active+clean 50 active+undersized+degraded+remapped+backfilling 18 active+remapped+backfilling 1 active+clean+scrubbing+deep We've checked the health of the remaining servers, and everything looks like (CPU/RAM/network/disks). Any hints on what could be happening? Thank you, Gauvain

1 year, 1 month

2
1
0 0

Upgrade 16.2.10 --> 16.2.11 OSD "UPGRADE_REDEPLOY_DAEMON" failed

by Marco Pizzolo

Hello Everyone, We made the mistake of trying to patch to 16.2.11 from 16.2.10 which has been stable as we felt that 16.2.11 had been out for a while already. As luck would have it, we are having failure after failure with OSDs not upgrading successfully, and have 355 more OSDs to go. I'm pretty sure we're not alone on this and am wondering what others have done to address. Appreciate all suggestions. Thanks, Marco

1 year, 1 month

1
1
0 0

Multiple instance_id and services for rbd-mirror daemon

by Aielli, Elia

Hi, I'm facing a strange issue and google doesn't seem to help me. I've a couple of clusters with Octopus v15.2.17, recently upgraded from 15.2.13 I had a rbd mirror service correctly working between the two clusters, i then updated, and after some days where all was ok, I've come to the situation where I have a single DAEMON, but multiple services (on different version tho), each one with its own instance_id, giving you some insight: rbd mirror pool status mypool --verbose health: WARNING daemon health: OK image health: WARNING images: 37 total 2 starting_replay 35 replaying DAEMONS service 123673101: instance_id: 127483700 client_id: admin hostname: DR-Host1 version: *15.2.13* leader: false health: OK service 123674106: instance_id: 127208040 client_id: admin hostname: DR-Host1 version: *15.2.13* leader: false health: OK service 123675375: instance_id: 124630539 client_id: admin hostname: DR-Host1 version: *15.2.13* leader: true health: OK service 124670331: instance_id: 127208013 client_id: backup hostname: DR-Host1 version: *15.2.17* leader: false health: OK As you can see one has the "leader" section as true, and the other are false, I'd like to delete the first 2 false ones, then delete the true one with 15.2.13 and hope the last one becomes true. Do any of you faced any similar issue or can help me in killing the wrong services? Let me know, thanks in advance! Elia

1 year, 1 month

1
0
0 0

The release time of v16.2.12 is?

by Louis Koo

https://github.com/ceph/ceph/pull/47702?notification_referrer_id=NT_kwDOANW… This issue had been backported to pacific, and the release time of v16.2.12 is?

1 year, 1 month

1
0
0 0

RBD latency

by Murilo Morais

Good evening everyone! Guys, what to expect latency for RBD images in a cluster with only HDD (36 HDDs)? Sometimes I see that the write latency is around 2-5 ms in some images even with very low IOPS and bandwidth while the read latency is around 0.2-0.7 ms. For a cluster with only HDD is this latency expected? Is there any parameter I can study to improve? What do you recommend for tuning in the OS? The latency between machines is always around 0.1ms, all connected via fiber optics. Thanks in advance!

1 year, 2 months

3
2
0 0

2024

2023

2022

2021

2020

2019

ceph-users March 2023