Hi everyone,
To help with costs for Cephalocon Amsterdam 2023, we wanted to see if anyone would like to volunteer to help with photography for the event. A group of people would be ideal so that we have good coverage in the expo hall and sessions.
If you're interested, please reply to me directly for more information.
--
Mike Perez
Community Manager
Ceph Foundation
(adding back the list)
On Tue, Mar 21, 2023 at 11:25 AM Joachim Kraftmayer <
joachim.kraftmayer(a)clyso.com> wrote:
> i added the questions and answers below.
>
> ___________________________________
> Best Regards,
> Joachim Kraftmayer
> CEO | Clyso GmbH
>
> Clyso GmbH
> p: +49 89 21 55 23 91 2
> a: Loristraße 8 | 80335 München | Germany
> w: https://clyso.com | e: joachim.kraftmayer(a)clyso.com
>
> We are hiring: https://www.clyso.com/jobs/
> ---
> CEO: Dipl. Inf. (FH) Joachim Kraftmayer
> Unternehmenssitz: Utting am Ammersee
> Handelsregister beim Amtsgericht: Augsburg
> Handelsregister-Nummer: HRB 25866
> USt. ID-Nr.: DE275430677
>
> Am 21.03.23 um 11:14 schrieb Gauvain Pocentek:
>
> Hi Joachim,
>
>
> On Tue, Mar 21, 2023 at 10:13 AM Joachim Kraftmayer <
> joachim.kraftmayer(a)clyso.com> wrote:
>
>> Which Ceph version are you running, is mclock active?
>>
>>
> We're using Quincy (17.2.5), upgraded step by step from Luminous if I
> remember correctly.
>
> did you recreate the osds? if yes, at which version?
>
I actually don't remember all the history, but I think we added the HDD
nodes while running Pacific.
>
> mlock seems active, set to high_client_ops profile. HDD OSDs have very
> different settings for max capacity iops:
>
> osd.137 basic osd_mclock_max_capacity_iops_hdd
> 929.763899
> osd.161 basic osd_mclock_max_capacity_iops_hdd
> 4754.250946
> osd.222 basic osd_mclock_max_capacity_iops_hdd
> 540.016984
> osd.281 basic osd_mclock_max_capacity_iops_hdd
> 1029.193945
> osd.282 basic osd_mclock_max_capacity_iops_hdd
> 1061.762870
> osd.283 basic osd_mclock_max_capacity_iops_hdd
> 462.984562
>
> We haven't set those explicitly, could they be the reason of the slow
> recovery?
>
> i recommend to disable mclock for now, and yes we have seen slow recovery
> caused by mclock.
>
Stupid question: how do you do that? I've looked through the docs but could
only find information about changing the settings.
>
>
> Bonus question: does ceph set that itself?
>
> yes and if you have a setup with HDD + SSD (db & wal) the discovery works
> not in the right way.
>
Good to know!
Gauvain
>
> Thanks!
>
> Gauvain
>
>
>
>
>> Joachim
>>
>> ___________________________________
>> Clyso GmbH - Ceph Foundation Member
>>
>> Am 21.03.23 um 06:53 schrieb Gauvain Pocentek:
>> > Hello all,
>> >
>> > We have an EC (4+2) pool for RGW data, with HDDs + SSDs for WAL/DB. This
>> > pool has 9 servers with each 12 disks of 16TBs. About 10 days ago we
>> lost a
>> > server and we've removed its OSDs from the cluster. Ceph has started to
>> > remap and backfill as expected, but the process has been getting slower
>> and
>> > slower. Today the recovery rate is around 12 MiB/s and 10 objects/s. All
>> > the remaining unclean PGs are backfilling:
>> >
>> > data:
>> > volumes: 1/1 healthy
>> > pools: 14 pools, 14497 pgs
>> > objects: 192.38M objects, 380 TiB
>> > usage: 764 TiB used, 1.3 PiB / 2.1 PiB avail
>> > pgs: 771559/1065561630 objects degraded (0.072%)
>> > 1215899/1065561630 objects misplaced (0.114%)
>> > 14428 active+clean
>> > 50 active+undersized+degraded+remapped+backfilling
>> > 18 active+remapped+backfilling
>> > 1 active+clean+scrubbing+deep
>> >
>> > We've checked the health of the remaining servers, and everything looks
>> > like (CPU/RAM/network/disks).
>> >
>> > Any hints on what could be happening?
>> >
>> > Thank you,
>> > Gauvain
>> > _______________________________________________
>> > ceph-users mailing list -- ceph-users(a)ceph.io
>> > To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>
>
ceph version: 17.2.0 on Ubuntu 22.04
non-containerized ceph from Ubuntu repos
cluster started on luminous
I have been using bcache on filestore on rotating disks for many years
without problems. Now converting OSDs to bluestore, there are some
strange effects.
If I create the bcache device, set its rotational flag to '1', then do
ceph-volume lvm create ... --crush-device-class=hdd
the OSD comes up with the right parameters and much improved latency
compared to OSD directly on /dev/sdX.
ceph osd metatdata ...
shows
"bluestore_bdev_type": "hdd",
"rotational": "1"
But after reboot, bcache rotational flag is set '0' again, and the OSD
now comes up with "rotational": "0"
Latency immediately starts to increase (and continually increases over
the next days, possibly due to accumulating fragmention).
These wrong settings stay in place even if I stop the OSD, set the
bcache rotational flag to '1' again and restart the OSD. I have found no
way to get back to the original settings other than destroying and
recreating the OSD. I guess I am just not seeing something obvious, like
from where these settings get pulled at OSD startup.
I even created udev rules to set bcache rotational=1 at boot time,
before any ceph daemon starts, but it did not help. Something running
after these rules reset the bcache rotationl flags back to 0.
Haven't found the culprit yet, but not sure if it even matters.
Are these OSD settings (bluestore_bdev_type, rotational) persisted
somewhere and can they be edited and pinned?
Alternatively, can I manually set and persist the relevant bluestore
tunables (per OSD / per device class) so as to make the bcache
rotational flag irrelevant after the OSD is first created?
Regards
Matthias
On Fri, Apr 08, 2022 at 03:05:38PM +0300, Igor Fedotov wrote:
> Hi Frank,
>
> in fact this parameter impacts OSD behavior at both build-time and during
> regular operationing. It simply substitutes hdd/ssd auto-detection with
> manual specification. And hence relevant config parameters are applied. If
> e.g. min_alloc_size is persistent after OSD creation - it wouldn't be
> updated. But if specific setting allows at run-time - it would be altered.
>
> So the proper usage would definitely be manual ssd/hdd mode selection before
> the first OSD creation and keeping it in that mode along the whole OSD
> lifecycle. But technically one can change the mode at any arbitrary point in
> time which would result in run-rime setting being out-of-sync with creation
> ones. With some unclear side-effects..
>
> Please also note that this setting was orignally intended mostly for
> development/testing purposes not regular usage. Hence it's flexible but
> rather unsafe if used improperly.
>
>
> Thanks,
>
> Igor
>
> On 4/7/2022 2:40 PM, Frank Schilder wrote:
> > Hi Richard and Igor,
> >
> > are these tweaks required at build-time (osd prepare) only or are they required for every restart?
> >
> > Is this setting "bluestore debug enforce settings=hdd" in the ceph config data base or set somewhere else? How does this work if deploying HDD- and SSD-OSDs at the same time?
> >
> > Ideally, all these tweaks should be applicable and settable at creation time only without affecting generic settings (that is, at the ceph-volume command line and not via config side effects). Otherwise it becomes really tedious to manage these.
> >
> > For example, would the following work-flow apply the correct settings *permanently* across restarts:
> >
> > 1) Prepare OSD on fresh HDD with ceph-volume lvm batch --prepare ...
> > 2) Assign dm_cache to logical OSD volume created in step 1
> > 3) Start OSD, restart OSDs, boot server ...
> >
> > I would assume that the HDD settings are burned into the OSD in step 1 and will be used in all future (re-)starts without the need to do anything despite the device being detected as non-rotational after step 2. Is this assumption correct?
> >
> > Thanks and best regards,
> > =================
> > Frank Schilder
> > AIT Risø Campus
> > Bygning 109, rum S14
> >
> > ________________________________________
> > From: Richard Bade <hitrich(a)gmail.com>
> > Sent: 06 April 2022 00:43:48
> > To: Igor Fedotov
> > Cc: Ceph Users
> > Subject: [Warning Possible spam] [ceph-users] Re: Ceph Bluestore tweaks for Bcache
> >
> > Just for completeness for anyone that is following this thread. Igor
> > added that setting in Octopus, so unfortunately I am unable to use it
> > as I am still on Nautilus.
> >
> > Thanks,
> > Rich
> >
> > On Wed, 6 Apr 2022 at 10:01, Richard Bade <hitrich(a)gmail.com> wrote:
> > > Thanks Igor for the tip. I'll see if I can use this to reduce the
> > > number of tweaks I need.
> > >
> > > Rich
> > >
> > > On Tue, 5 Apr 2022 at 21:26, Igor Fedotov <igor.fedotov(a)croit.io> wrote:
> > > > Hi Richard,
> > > >
> > > > just FYI: one can use "bluestore debug enforce settings=hdd" config
> > > > parameter to manually enforce HDD-related settings for a BlueStore
> > > >
> > > >
> > > > Thanks,
> > > >
> > > > Igor
> > > >
> > > > On 4/5/2022 1:07 AM, Richard Bade wrote:
> > > > > Hi Everyone,
> > > > > I just wanted to share a discovery I made about running bluestore on
> > > > > top of Bcache in case anyone else is doing this or considering it.
> > > > > We've run Bcache under Filestore for a long time with good results but
> > > > > recently rebuilt all the osds on bluestore. This caused some
> > > > > degradation in performance that I couldn't quite put my finger on.
> > > > > Bluestore osds have some smarts where they detect the disk type.
> > > > > Unfortunately in the case of Bcache it detects as SSD, when in fact
> > > > > the HDD parameters are better suited.
> > > > > I changed the following parameters to match the HDD default values and
> > > > > immediately saw my average osd latency during normal workload drop
> > > > > from 6ms to 2ms. Peak performance didn't change really, but a test
> > > > > machine that I have running a constant iops workload was much more
> > > > > stable as was the average latency.
> > > > > Performance has returned to Filestore or better levels.
> > > > > Here are the parameters.
> > > > >
> > > > > ; Make sure that we use values appropriate for HDD not SSD - Bcache
> > > > > gets detected as SSD
> > > > > bluestore_prefer_deferred_size = 32768
> > > > > bluestore_compression_max_blob_size = 524288
> > > > > bluestore_deferred_batch_ops = 64
> > > > > bluestore_max_blob_size = 524288
> > > > > bluestore_min_alloc_size = 65536
> > > > > bluestore_throttle_cost_per_io = 670000
> > > > >
> > > > > ; Try to improve responsiveness when some disks are fully utilised
> > > > > osd_op_queue = wpq
> > > > > osd_op_queue_cut_off = high
> > > > >
> > > > > Hopefully someone else finds this useful.
> > > > > _______________________________________________
> > > > > ceph-users mailing list -- ceph-users(a)ceph.io
> > > > > To unsubscribe send an email to ceph-users-leave(a)ceph.io
> > > > --
> > > > Igor Fedotov
> > > > Ceph Lead Developer
> > > >
> > > > Looking for help with your Ceph cluster? Contact us at https://croit.io
> > > >
> > > > croit GmbH, Freseniusstr. 31h, 81247 Munich
> > > > CEO: Martin Verges - VAT-ID: DE310638492
> > > > Com. register: Amtsgericht Munich HRB 231263
> > > > Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
> > > >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users(a)ceph.io
> > To unsubscribe send an email to ceph-users-leave(a)ceph.io
>
> --
> Igor Fedotov
> Ceph Lead Developer
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
> Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
>
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
Hello Cephers,
We're happy to share that we're organizing Ceph Day India on 5th May 2023
this year.
The event is now sold out! If you missed getting a ticket, consider
submitting the CFP and we'll provide a ticket if accepted.
https://ceph.io/en/community/events/2023/ceph-days-india/
Please reach out to us if you need any help regarding the submissions.
Thanks and regards,
Gaurav Sitlani
Ceph Community Ambassador
Hi,
I'd like to change the os to ubuntu 20.04.5 from my bare metal deployed octopus 15.2.14 on centos 8. On the first run I would go with octopus 15.2.17 just to not make big changes in the cluster.
I've found couple of threads on the mailing list but those were containerized (like: Re: Upgrade/migrate host operating system for ceph nodes (CentOS/Rocky) or Re: Migrating CEPH OS looking for suggestions).
Wonder what is the proper steps for this kind of migration? Do we need to start with mgr or mon or rgw or osd?
Is it possible to reuse the osd with ceph-volume scan on the reinstalled machine?
I'd stay with baremetal deployment and even maybe with octopus but I'm curious your advice.
Thank you
________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.
Hello all,
We have an EC (4+2) pool for RGW data, with HDDs + SSDs for WAL/DB. This
pool has 9 servers with each 12 disks of 16TBs. About 10 days ago we lost a
server and we've removed its OSDs from the cluster. Ceph has started to
remap and backfill as expected, but the process has been getting slower and
slower. Today the recovery rate is around 12 MiB/s and 10 objects/s. All
the remaining unclean PGs are backfilling:
data:
volumes: 1/1 healthy
pools: 14 pools, 14497 pgs
objects: 192.38M objects, 380 TiB
usage: 764 TiB used, 1.3 PiB / 2.1 PiB avail
pgs: 771559/1065561630 objects degraded (0.072%)
1215899/1065561630 objects misplaced (0.114%)
14428 active+clean
50 active+undersized+degraded+remapped+backfilling
18 active+remapped+backfilling
1 active+clean+scrubbing+deep
We've checked the health of the remaining servers, and everything looks
like (CPU/RAM/network/disks).
Any hints on what could be happening?
Thank you,
Gauvain
Hello Everyone,
We made the mistake of trying to patch to 16.2.11 from 16.2.10 which has
been stable as we felt that 16.2.11 had been out for a while already.
As luck would have it, we are having failure after failure with OSDs not
upgrading successfully, and have 355 more OSDs to go.
I'm pretty sure we're not alone on this and am wondering what others have
done to address.
Appreciate all suggestions.
Thanks,
Marco
Hi,
I'm facing a strange issue and google doesn't seem to help me.
I've a couple of clusters with Octopus v15.2.17, recently upgraded from
15.2.13
I had a rbd mirror service correctly working between the two clusters, i
then updated, and after some days where all was ok, I've come to the
situation where I have a single DAEMON, but multiple services (on different
version tho), each one with its own instance_id, giving you some insight:
rbd mirror pool status mypool --verbose
health: WARNING
daemon health: OK
image health: WARNING
images: 37 total
2 starting_replay
35 replaying
DAEMONS
service 123673101:
instance_id: 127483700
client_id: admin
hostname: DR-Host1
version: *15.2.13*
leader: false
health: OK
service 123674106:
instance_id: 127208040
client_id: admin
hostname: DR-Host1
version: *15.2.13*
leader: false
health: OK
service 123675375:
instance_id: 124630539
client_id: admin
hostname: DR-Host1
version: *15.2.13*
leader: true
health: OK
service 124670331:
instance_id: 127208013
client_id: backup
hostname: DR-Host1
version: *15.2.17*
leader: false
health: OK
As you can see one has the "leader" section as true, and the other are
false, I'd like to delete the first 2 false ones, then delete the true one
with 15.2.13 and hope the last one becomes true.
Do any of you faced any similar issue or can help me in killing the wrong
services?
Let me know, thanks in advance!
Elia
Good evening everyone!
Guys, what to expect latency for RBD images in a cluster with only HDD (36
HDDs)?
Sometimes I see that the write latency is around 2-5 ms in some images even
with very low IOPS and bandwidth while the read latency is around 0.2-0.7
ms.
For a cluster with only HDD is this latency expected? Is there any
parameter I can study to improve? What do you recommend for tuning in the
OS?
The latency between machines is always around 0.1ms, all connected via
fiber optics.
Thanks in advance!