The Suse docs are pretty good for this:
https://www.suse.com/support/kb/doc/?id=000019693
basically up the osd-max-backfills / osd-recovery-max-active and this will allow
concurrent backfills to the same device. If you watch the OSD in grafana you should be
able to see the underlying device utilisation and tune it until it's reasonably high
but not falling over. If you set it too high you are just going to end up with an OSD
that continually restarts.
________________________________
From: Peter <petersun(a)raksmart.com>
Sent: 26 July 2023 17:19
To: ceph-users(a)ceph.io <ceph-users(a)ceph.io>
Subject: [ceph-users] PG backfilled slow
CAUTION: This email originates from outside THG
Hi all,
I need replace some disk due to bad sector. I have crush out these disks and ceph did
backfilling and migrate data as I want. However, I could see these OSD has one or more PG
left after a day wait and backfilling really slow. Now it has only one backfilling PG at
the same time.
host001:~# ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE
VAR PGS STATUS
122 hdd 9.37500 0 0 B 0 B 0 B 0 B 0 B 0 B
0 0 1 up
123 hdd 9.37500 1.00000 9.4 TiB 2.1 TiB 1.8 TiB 224 KiB 4.7 GiB 7.3 TiB
22.06 0.69 64 up
124 hdd 9.37500 1.00000 9.4 TiB 2.0 TiB 1.7 TiB 211 KiB 4.4 GiB 7.4 TiB
21.14 0.67 61 up
125 hdd 9.37500 1.00000 9.4 TiB 2.2 TiB 1.9 TiB 218 KiB 5.0 GiB 7.2 TiB
22.94 0.72 67 up
126 hdd 9.37500 1.00000 9.4 TiB 2.3 TiB 2.0 TiB 235 KiB 4.7 GiB 7.1 TiB
24.50 0.77 72 up
127 hdd 9.37500 1.00000 9.4 TiB 2.4 TiB 2.1 TiB 248 KiB 5.5 GiB 6.9 TiB
25.91 0.82 77 up
128 hdd 9.37500 1.00000 9.4 TiB 2.2 TiB 1.9 TiB 349 KiB 5.0 GiB 7.2 TiB
23.52 0.74 69 up
129 hdd 9.37500 1.00000 9.4 TiB 2.1 TiB 1.8 TiB 216 KiB 4.6 GiB 7.3 TiB
22.62 0.71 66 up
130 hdd 9.37500 1.00000 9.4 TiB 2.5 TiB 2.2 TiB 244 KiB 5.3 GiB 6.9 TiB
26.51 0.83 79 up
131 hdd 9.37500 1.00000 9.4 TiB 2.1 TiB 1.8 TiB 230 KiB 4.0 GiB 7.3 TiB
22.09 0.70 64 up
132 hdd 9.37500 1.00000 9.4 TiB 2.2 TiB 2.0 TiB 231 KiB 5.1 GiB 7.1 TiB
23.93 0.75 70 up
133 hdd 9.37500 1.00000 9.4 TiB 2.7 TiB 2.4 TiB 479 KiB 6.1 GiB 6.7 TiB
28.92 0.91 87 up
134 hdd 9.37500 1.00000 9.4 TiB 2.3 TiB 2.1 TiB 225 KiB 4.9 GiB 7.0 TiB
25.02 0.79 74 up
135 hdd 9.37500 1.00000 9.4 TiB 2.0 TiB 1.7 TiB 395 KiB 4.5 GiB 7.4 TiB
21.46 0.68 62 up
136 hdd 9.37500 1.00000 9.4 TiB 2.8 TiB 2.5 TiB 294 KiB 5.6 GiB 6.6 TiB
29.52 0.93 89 up
137 hdd 9.37500 0 0 B 0 B 0 B 0 B 0 B 0 B
0 0 2 up
138 hdd 9.37500 0 0 B 0 B 0 B 0 B 0 B 0 B
0 0 5 up
139 hdd 9.37500 1.00000 9.4 TiB 2.4 TiB 2.2 TiB 259 KiB 5.3 GiB 6.9 TiB
25.94 0.82 77 up
140 hdd 9.37500 1.00000 9.4 TiB 2.5 TiB 2.2 TiB 355 KiB 4.8 GiB 6.9 TiB
26.86 0.85 80 up
141 hdd 9.37500 0 0 B 0 B 0 B 0 B 0 B 0 B
0 0 1 up
142 hdd 9.37500 1.00000 9.4 TiB 2.6 TiB 2.3 TiB 1.6 GiB 4.9 GiB 6.8 TiB
27.43 0.86 83 up
143 hdd 9.37500 1.00000 9.4 TiB 2.7 TiB 2.4 TiB 276 KiB 5.7 GiB 6.7 TiB
28.64 0.90 86 up
144 hdd 9.37500 1.00000 9.4 TiB 2.5 TiB 2.2 TiB 256 KiB 5.5 GiB 6.9 TiB
26.77 0.84 80 up
145 hdd 9.37500 1.00000 9.4 TiB 2.3 TiB 2.0 TiB 248 KiB 5.0 GiB 7.1 TiB
24.46 0.77 72 up
146 hdd 9.37500 0 0 B 0 B 0 B 0 B 0 B 0 B
0 0 1 up
147 hdd 9.37500 1.00000 9.4 TiB 2.2 TiB 1.9 TiB 237 KiB 5.1 GiB 7.2 TiB
23.53 0.74 69 up
148 hdd 9.37500 0 0 B 0 B 0 B 0 B 0 B 0 B
0 0 1 up
host001:~# ceph pg dump_stuck
PG_STAT STATE UP UP_PRIMARY ACTING
ACTING_PRIMARY
5.3fd active+remapped+backfill_wait [145,158,151] 145 [145,151,126]
145
5.3a8 active+remapped+backfilling [136,133,158] 136 [136,133,167]
136
5.2e0 active+remapped+backfill_wait [147,158,135] 147 [147,135,166]
147
5.294 active+remapped+backfill_wait [147,128,164] 147 [147,128,138]
147
5.ef active+remapped+backfill_wait [123,134,158] 123 [123,148,137]
123
5.116 active+remapped+backfill_wait [123,166,145] 123 [123,166,138]
123
5.1e8 active+remapped+backfill_wait [127,158,157] 127 [127,157,161]
127
5.106 active+remapped+backfill_wait [124,158,144] 124 [124,144,167]
124
5.1c active+remapped+backfill_wait [128,158,155] 128 [128,155,140]
128
5.2ef active+remapped+backfill_wait [128,163,153] 128 [128,163,137]
128
5.1e0 active+remapped+backfill_wait [129,158,153] 129 [129,153,162]
129
5.1d2 active+remapped+backfill_wait [128,168,149] 128 [128,168,146]
128
5.167 active+remapped+backfill_wait [129,142,158] 129 [129,142,168]
129
5.f1 active+remapped+backfill_wait [124,147,158] 124 [124,147,168]
124
5.2c active+remapped+backfill_wait [129,159,154] 129 [129,159,141]
129
5.12b active+remapped+backfill_wait [128,169,157] 128 [128,169,138]
128
5.3eb active+remapped+backfill_wait [136,158,149] 136 [136,149,127]
136
5.6e active+remapped+backfill_wait [136,168,152] 136 [136,168,122]
136
5.3d8 active+remapped+backfill_wait [124,147,134] 124 [124,147,138]
124
5.b4 active+remapped+backfill_wait [123,142,166] 123 [123,166,138]
123
5.1f5 active+remapped+backfill_wait [145,153,158] 145 [145,153,169]
145
5.19c active+remapped+backfill_wait [129,158,151] 129 [129,151,164]
129
5.b3 active+remapped+backfill_wait [124,143,158] 124 [124,143,155]
124
5.108 active+remapped+backfill_wait [136,158,133] 136 [136,133,153]
136
Anyone can suggest what to do to fasten this process.
Thanks,
Peter
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io