Hi all, I have a technical question about scrub scheduling. I replaced a disk and it is
back-filling slowly. We have set osd_scrub_during_recovery = true and still observe that
scrub times continuously increase (number of PGs not scrubbed in time is continuously
increasing). Investigating the situation it looks like any OSD that has a PG in states
"backfill_wait" or "backfilling" is preventing scrubs to be scheduled
on PGs it is a member of. However, it seems it is not quite like that.
On the one hand I have never seen a PG in a state like
"active+scrubbing+remapped+backfilling", so backfilling PGs at least never seem
to scrub. On the other hand, it seems like more PGs are scrubbed than would be eligible if
*all* OSDs with a remapped PG on it would refuse scrubs. It looks like something in
between "only OSDs with a backfilling PG block requests for scrub reservations"
and "all OSDs with a PG in states backfilling or backfill_wait block requests for
scrub reservations". Does the position in the backfill reservation queue play a
role?
If anyone has insight into how scrub reservations are granted and when not in the
situation of an OSD backfilling that would be great. My naive interpretation of
"osd_scrub_during_recovery = true" was that scrubs proceed as if no backfill was
going on. This, however, is clearly not the case. Having an answer to my question above
would help me a lot to get an idea when things will go back to normal.
Thanks a lot and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
Show replies by date