On Tue, Mar 21, 2023 at 2:21 PM Clyso GmbH - Ceph Foundation Member <
joachim.kraftmayer(a)clyso.com> wrote:
Since this requires a restart I went an other way to speed up the recovery
of degraded PGs and avoid weirdness while restarting the OSDs. I've
increased the value of osd_mclock_max_capacity_iops_hdd to a ridiculous
number for spinning disks (6000). The effect is not magical but the
recovery went from 4 to 60 objects/s. Ceph should be back to normal in a
few hours.
I will change the osd_op_queue value once the cluster is stable.
Thanks for the help, it's been really useful, and I know a little bit more
about Ceph :)
Gauvain
___________________________________
Clyso GmbH - Ceph Foundation Member
Am 21.03.23 um 12:51 schrieb Gauvain Pocentek:
(adding back the list)
On Tue, Mar 21, 2023 at 11:25 AM Joachim Kraftmayer <
joachim.kraftmayer(a)clyso.com> wrote:
i added the questions and answers below.
___________________________________
Best Regards,
Joachim Kraftmayer
CEO | Clyso GmbH
Clyso GmbH
p: +49 89 21 55 23 91 2
a: Loristraße 8 | 80335 München | Germany
w:
https://clyso.com | e: joachim.kraftmayer(a)clyso.com
We are hiring:
https://www.clyso.com/jobs/
---
CEO: Dipl. Inf. (FH) Joachim Kraftmayer
Unternehmenssitz: Utting am Ammersee
Handelsregister beim Amtsgericht: Augsburg
Handelsregister-Nummer: HRB 25866
USt. ID-Nr.: DE275430677
Am 21.03.23 um 11:14 schrieb Gauvain Pocentek:
Hi Joachim,
On Tue, Mar 21, 2023 at 10:13 AM Joachim Kraftmayer <
joachim.kraftmayer(a)clyso.com> wrote:
Which Ceph version are you running, is mclock
active?
We're using Quincy (17.2.5), upgraded step by step from Luminous if I
remember correctly.
did you recreate the osds? if yes, at which version?
I actually don't remember all the history, but I think we added the HDD
nodes while running Pacific.
mlock seems active, set to high_client_ops profile. HDD OSDs have very
different settings for max capacity iops:
osd.137 basic osd_mclock_max_capacity_iops_hdd
929.763899
osd.161 basic osd_mclock_max_capacity_iops_hdd
4754.250946
osd.222 basic osd_mclock_max_capacity_iops_hdd
540.016984
osd.281 basic osd_mclock_max_capacity_iops_hdd
1029.193945
osd.282 basic osd_mclock_max_capacity_iops_hdd
1061.762870
osd.283 basic osd_mclock_max_capacity_iops_hdd
462.984562
We haven't set those explicitly, could they be the reason of the slow
recovery?
i recommend to disable mclock for now, and yes we have seen slow recovery
caused by mclock.
Stupid question: how do you do that? I've looked through the docs but
could only find information about changing the settings.
Bonus question: does ceph set that itself?
yes and if you have a setup with HDD + SSD (db & wal) the discovery works
not in the right way.
Good to know!
Gauvain
>
> Thanks!
>
> Gauvain
>
>
>
>
>> Joachim
>>
>> ___________________________________
>> Clyso GmbH - Ceph Foundation Member
>>
>> Am 21.03.23 um 06:53 schrieb Gauvain Pocentek:
>> > Hello all,
>> >
>> > We have an EC (4+2) pool for RGW data, with HDDs + SSDs for WAL/DB.
>> This
>> > pool has 9 servers with each 12 disks of 16TBs. About 10 days ago we
>> lost a
>> > server and we've removed its OSDs from the cluster. Ceph has started to
>> > remap and backfill as expected, but the process has been getting
>> slower and
>> > slower. Today the recovery rate is around 12 MiB/s and 10 objects/s.
>> All
>> > the remaining unclean PGs are backfilling:
>> >
>> > data:
>> > volumes: 1/1 healthy
>> > pools: 14 pools, 14497 pgs
>> > objects: 192.38M objects, 380 TiB
>> > usage: 764 TiB used, 1.3 PiB / 2.1 PiB avail
>> > pgs: 771559/1065561630 objects degraded (0.072%)
>> > 1215899/1065561630 objects misplaced (0.114%)
>> > 14428 active+clean
>> > 50 active+undersized+degraded+remapped+backfilling
>> > 18 active+remapped+backfilling
>> > 1 active+clean+scrubbing+deep
>> >
>> > We've checked the health of the remaining servers, and everything looks
>> > like (CPU/RAM/network/disks).
>> >
>> > Any hints on what could be happening?
>> >
>> > Thank you,
>> > Gauvain
>> > _______________________________________________
>> > ceph-users mailing list -- ceph-users(a)ceph.io
>> > To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>
>