New subject: increasing number of (deep) scrubs

9 Jan 2023

Hi Dan,

thanks for your answer. I don't have a problem with increasing osd_max_scrubs (=1 at
the moment) as such. I would simply prefer a somewhat finer grained way of controlling
scrubbing than just doubling or tripling it right away.

Some more info. These 2 pools are data pools for a large FS. Unfortunately, we have a
large percentage of small files, which is a pain for recovery and seemingly also for deep
scrubbing. Our OSDs are about 25% used and I had to increase the warning interval already
to 2 weeks. With all the warning grace parameters this means that we manage to deep scrub
everything about every month. I need to plan for 75% utilisation and a 3 months period is
a bit far on the risky side.

Our data is to a large percentage cold data. Client reads will not do the check for us, we
need to combat bit-rot pro-actively.

The reasons I'm interested in parameters initiating more scrubs while also converting
more scrubs into deep scrubs are, that

1) scrubs seem to complete very fast. I almost never catch a PG in state
"scrubbing", I usually only see "deep scrubbing".

2) I suspect the low deep-scrub count is due to a low number of deep-scrubs scheduled and
not due to conflicting per-OSD deep scrub reservations. With the OSD count we have and the
distribution over 12 servers I would expect at least a peak of 50% OSDs being active in
scrubbing instead of the 25% peak I'm seeing now. It ought to be possible to schedule
more PGs for deep scrub than actually are.

3) Every OSD having only 1 deep scrub active seems to have no measurable impact on user
IO. If I could just get more PGs scheduled with 1 deep scrub per OSD it would already help
a lot. Once this is working, I can eventually increase osd_max_scrubs when the OSDs fill
up. For now I would just like that (deep) scrub scheduling looks a bit harder and
schedules more eligible PGs per time unit.

If we can get deep scrubbing up to an average of 42PGs completing per hour with keeping
osd_max_scrubs=1 to maintain current IO impact, we should be able to complete a deep scrub
with 75% full OSDs in about 30 days. This is the current tail-time with 25% utilisation. I
believe currently a deep scrub of a PG in these pools takes 2-3 hours. Its just a gut
feeling from some repair and deep-scrub commands, I would need to check logs for more
precise info.

Increasing osd_max_scrubs would then be a further and not the only option to push for more
deep scrubbing. My expectation would be that values of 2-3 are fine due to the
increasingly higher percentage of cold data for which no interference with client IO will
happen.

Hope that makes sense and there is a way beyond bumping osd_max_scrubs to increase the
number of scheduled and executed deep scrubs.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Dan van der Ster &lt;dvanders(a)gmail.com&gt;
Sent: 05 January 2023 15:36
To: Frank Schilder
Cc: ceph-users(a)ceph.io
Subject: Re: [ceph-users] increasing number of (deep) scrubs

Hi Frank,

What is your current osd_max_scrubs, and why don't you want to increase it?
With 8+2, 8+3 pools each scrub is occupying the scrub slot on 10 or 11
OSDs, so at a minimum it could take 3-4x the amount of time to scrub
the data than if those were replicated pools.
If you want the scrub to complete in time, you need to increase the
amount of scrub slots accordingly.

On the other hand, IMHO the 1-week deadline for deep scrubs is often
much too ambitious for large clusters -- increasing the scrub
intervals is one solution, or I find it simpler to increase
mon_warn_pg_not_scrubbed_ratio and mon_warn_pg_not_deep_scrubbed_ratio
until you find a ratio that works for your cluster.
Of course, all of this can impact detection of bit-rot, which anyway
can be covered by client reads if most data is accessed periodically.
But if the cluster is mostly idle or objects are generally not read,
then it would be preferable to increase slots osd_max_scrubs.

Cheers, Dan

On Tue, Jan 3, 2023 at 2:30 AM Frank Schilder &lt;frans(a)dtu.dk&gt; wrote:
>
> Hi all,
>
> we are using 16T and 18T spinning drives as OSDs and I'm observing that they are
not scrubbed as often as I would like. It looks like too few scrubs are scheduled for
these large OSDs. My estimate is as follows: we have 852 spinning OSDs backing a 8+2 pool
with 2024 and an 8+3 pool with 8192 PGs. On average I see something like 10PGs of pool 1
and 12 PGs of pool 2 (deep) scrubbing. This amounts to only 232 out of 852 OSDs scrubbing
and seems to be due to a conservative rate of (deep) scrubs being scheduled. The PGs (dep)
scrub fairly quickly.
>
> I would like to increase gently the number of scrubs scheduled for these drives and
*not* the number of scrubs per OSD. I'm looking at parameters like:
>
> osd_scrub_backoff_ratio
> osd_deep_scrub_randomize_ratio
>
> I'm wondering if lowering osd_scrub_backoff_ratio to 0.5 and, maybe, increasing
osd_deep_scrub_randomize_ratio to 0.2 would have the desired effect? Are there other
parameters to look at that allow gradual changes in the number of scrubs going on?
>
> Thanks a lot for your help!
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io

Re: increasing number of (deep) scrubs