How to configure something like osd_deep_scrub_min_interval? - ceph-users

15 Nov 2023

Hi folks,

I am fighting a bit with odd deep-scrub behavior on HDDs and discovered a likely cause of
why the distribution of last_deep_scrub_stamps is so weird. I wrote a small script to
extract a histogram of scrubs by "days not scrubbed" (more precisely, intervals
not scrubbed; see code) to find out how (deep-) scrub times are distributed. Output
below.

What I expected is along the lines that HDD-OSDs try to scrub every 1-3 days, while they
try to deep-scrub every 7-14 days. In other words, OSDs that have been deep-scrubbed
within the last 7 days would *never* be in scrubbing+deep state. However, what I see is
completely different. There seems to be no distinction between scrub- and deep-scrub start
times. This is really unexpected as nobody would try to deep-scrub HDDs every day. Weekly
to bi-weekly is normal, specifically for large drives.

Is there a way to configure something like osd_deep_scrub_min_interval (no, I don't
want to run cron jobs for scrubbing yet)? In the output below, I would like to be able to
configure a minimum period of 1-2 weeks before the next deep-scrub happens. How can I do
that?

The observed behavior is very unusual for RAID systems (if its not a bug in the report
script). With this behavior its not surprising that people complain about "not
deep-scrubbed in time" messages and too high deep-scrub IO load when such a large
percentage of OSDs is needlessly deep-scrubbed after 1-6 days again already.

Sample output:

# scrub-report 
dumped pgs

Scrub report:
   4121 PGs not scrubbed since  1 intervals (6h)
   3831 PGs not scrubbed since  2 intervals (6h)
   4012 PGs not scrubbed since  3 intervals (6h)
   3986 PGs not scrubbed since  4 intervals (6h)
   2998 PGs not scrubbed since  5 intervals (6h)
   1488 PGs not scrubbed since  6 intervals (6h)
    909 PGs not scrubbed since  7 intervals (6h)
    771 PGs not scrubbed since  8 intervals (6h)
    582 PGs not scrubbed since  9 intervals (6h) 2 scrubbing
    431 PGs not scrubbed since 10 intervals (6h)
    333 PGs not scrubbed since 11 intervals (6h) 1 scrubbing
    265 PGs not scrubbed since 12 intervals (6h)
    195 PGs not scrubbed since 13 intervals (6h)
    116 PGs not scrubbed since 14 intervals (6h)
     78 PGs not scrubbed since 15 intervals (6h) 1 scrubbing
     72 PGs not scrubbed since 16 intervals (6h)
     37 PGs not scrubbed since 17 intervals (6h)
      5 PGs not scrubbed since 18 intervals (6h) 14.237* 19.5cd* 19.12cc* 19.1233*
14.40e*
     33 PGs not scrubbed since 20 intervals (6h)
     23 PGs not scrubbed since 21 intervals (6h)
     16 PGs not scrubbed since 22 intervals (6h)
     12 PGs not scrubbed since 23 intervals (6h)
      8 PGs not scrubbed since 24 intervals (6h)
      2 PGs not scrubbed since 25 intervals (6h) 19.eef* 19.bb3*
      4 PGs not scrubbed since 26 intervals (6h) 19.b4c* 19.10b8* 19.f13* 14.1ed*
      5 PGs not scrubbed since 27 intervals (6h) 19.43f* 19.231* 19.1dbe* 19.1788*
19.16c0*
      6 PGs not scrubbed since 28 intervals (6h)
      2 PGs not scrubbed since 30 intervals (6h) 19.10f6* 14.9d*
      3 PGs not scrubbed since 31 intervals (6h) 19.1322* 19.1318* 8.a*
      1 PGs not scrubbed since 32 intervals (6h) 19.133f*
      1 PGs not scrubbed since 33 intervals (6h) 19.1103*
      3 PGs not scrubbed since 36 intervals (6h) 19.19cc* 19.12f4* 19.248*
      1 PGs not scrubbed since 39 intervals (6h) 19.1984*
      1 PGs not scrubbed since 41 intervals (6h) 14.449*
      1 PGs not scrubbed since 44 intervals (6h) 19.179f*

Deep-scrub report:
   3723 PGs not deep-scrubbed since  1 intervals (24h)
   4621 PGs not deep-scrubbed since  2 intervals (24h) 8 scrubbing+deep
   3588 PGs not deep-scrubbed since  3 intervals (24h) 8 scrubbing+deep
   2929 PGs not deep-scrubbed since  4 intervals (24h) 3 scrubbing+deep
   1705 PGs not deep-scrubbed since  5 intervals (24h) 4 scrubbing+deep
   1904 PGs not deep-scrubbed since  6 intervals (24h) 5 scrubbing+deep
   1540 PGs not deep-scrubbed since  7 intervals (24h) 7 scrubbing+deep
   1304 PGs not deep-scrubbed since  8 intervals (24h) 7 scrubbing+deep
    923 PGs not deep-scrubbed since  9 intervals (24h) 5 scrubbing+deep
    557 PGs not deep-scrubbed since 10 intervals (24h) 7 scrubbing+deep
    501 PGs not deep-scrubbed since 11 intervals (24h) 2 scrubbing+deep
    363 PGs not deep-scrubbed since 12 intervals (24h) 2 scrubbing+deep
    377 PGs not deep-scrubbed since 13 intervals (24h) 1 scrubbing+deep
    383 PGs not deep-scrubbed since 14 intervals (24h) 2 scrubbing+deep
    252 PGs not deep-scrubbed since 15 intervals (24h) 2 scrubbing+deep
    116 PGs not deep-scrubbed since 16 intervals (24h) 5 scrubbing+deep
     47 PGs not deep-scrubbed since 17 intervals (24h) 2 scrubbing+deep
     10 PGs not deep-scrubbed since 18 intervals (24h)
      2 PGs not deep-scrubbed since 19 intervals (24h) 19.1c6c* 19.a01*
      1 PGs not deep-scrubbed since 20 intervals (24h) 14.1ed*
      2 PGs not deep-scrubbed since 21 intervals (24h) 19.1322* 19.10f6*
      1 PGs not deep-scrubbed since 23 intervals (24h) 19.19cc*
      1 PGs not deep-scrubbed since 24 intervals (24h) 19.179f*

PGs marked with a * are on busy OSDs and not eligible for scrubbing.

The script (pasted here because attaching doesn't work):

# cat bin/scrub-report 
#!/bin/bash

# Compute last scrub interval count. Scrub interval 6h, deep-scrub interval 24h.
# Print how many PGs have not been (deep-)scrubbed since #intervals.

ceph -f json pg dump pgs 2>&1 > /root/.cache/ceph/pgs_dump.json
echo ""

T0="$(date +%s)"

scrub_info="$(jq --arg T0 "$T0" -rc '.pg_stats[] | [
	.pgid,
	(.last_scrub_stamp[:19]+"Z" | (($T0|tonumber) -
fromdateiso8601)/(60*60*6)|ceil),
	(.last_deep_scrub_stamp[:19]+"Z" | (($T0|tonumber) -
fromdateiso8601)/(60*60*24)|ceil),
	.state,
	(.acting | join(" "))
	] | @tsv
' /root/.cache/ceph/pgs_dump.json)"

# less <<<"$scrub_info"

#     1          2               3      4    5..NF
# pg_id scrub-ints deep-scrub-ints status acting[]
awk <<<"$scrub_info" '{
	for(i=5; i<=NF; ++i) pg_osds[$1]=pg_osds[$1] " " $i
	if($4 == "active+clean") {
		si_mx=si_mx<$2 ? $2 : si_mx
		dsi_mx=dsi_mx<$3 ? $3 : dsi_mx
		pg_sn[$2]++
		pg_sn_ids[$2]=pg_sn_ids[$2] " " $1
		pg_dsn[$3]++
	        pg_dsn_ids[$3]=pg_dsn_ids[$3] " " $1
	} else if($4 ~ /scrubbing\+deep/) {
		deep_scrubbing[$3]++
		for(i=5; i<=NF; ++i) osd[$i]="busy"
	} else if($4 ~ /scrubbing/) {
		scrubbing[$2]++
		for(i=5; i<=NF; ++i) osd[$i]="busy"
	} else {
		unclean[$2]++
		unclean_d[$3]++
		si_mx=si_mx<$2 ? $2 : si_mx
		dsi_mx=dsi_mx<$3 ? $3 : dsi_mx
		pg_sn[$2]++
		pg_sn_ids[$2]=pg_sn_ids[$2] " " $1
		pg_dsn[$3]++
	        pg_dsn_ids[$3]=pg_dsn_ids[$3] " " $1
		for(i=5; i<=NF; ++i) osd[$i]="busy"
	}
}
END {
	print "Scrub report:"
	for(si=1; si<=si_mx; ++si) {
		if(pg_sn[si]==0 && scrubbing[si]==0 && unclean[si]==0) continue;
		printf("%7d PGs not scrubbed since %2d intervals (6h)", pg_sn[si], si)
		if(scrubbing[si]) printf(" %d scrubbing", scrubbing[si])
		if(unclean[si])   printf(" %d unclean", unclean[si])
		if(pg_sn[si]<=5) {
			split(pg_sn_ids[si], pgs)
			osds_busy=0
			for(pg in pgs) {
				split(pg_osds[pgs[pg]], osds)
				for(o in osds) if(osd[osds[o]]=="busy") osds_busy=1
				if(osds_busy) printf(" %s*", pgs[pg])
				if(!osds_busy) printf(" %s", pgs[pg])
			}
		}
		printf("\n")
	}
	print ""
	print "Deep-scrub report:"
	for(dsi=1; dsi<=dsi_mx; ++dsi) {
		if(pg_dsn[dsi]==0 && deep_scrubbing[dsi]==0 && unclean_d[dsi]==0)
continue;
		printf("%7d PGs not deep-scrubbed since %2d intervals (24h)", pg_dsn[dsi],
dsi)
		if(deep_scrubbing[dsi]) printf(" %d scrubbing+deep", deep_scrubbing[dsi])
		if(unclean_d[dsi])      printf(" %d unclean", unclean_d[dsi])
		if(pg_dsn[dsi]<=5) {
			split(pg_dsn_ids[dsi], pgs)
			osds_busy=0
			for(pg in pgs) {
				split(pg_osds[pgs[pg]], osds)
				for(o in osds) if(osd[osds[o]]=="busy") osds_busy=1
				if(osds_busy) printf(" %s*", pgs[pg])
				if(!osds_busy) printf(" %s", pgs[pg])
			}
		}
		printf("\n")
	}
	print ""
	print "PGs marked with a * are on busy OSDs and not eligible for scrubbing."
}
'

Don't forget the last "'" when copy-pasting.

Thanks for any pointers.
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14