Hello Paul,
thanks for your analysis.
I want to share more statistics of my cluster to follow-up on your
response "You have way too few PGs in one of the roots".
Here are the pool details:
root@ld3955:~# ceph osd pool ls detail
pool 11 'hdb_backup' replicated size 3 min_size 2 crush_rule 1
object_hash rjenkins pg_num 8192 pgp_num 8192 autoscale_mode warn
last_change 294572 flags hashpspool,selfmanaged_snaps stripe_width 0
application rbd
removed_snaps [1~3]
pool 59 'hdd' replicated size 2 min_size 2 crush_rule 3 object_hash
rjenkins pg_num 64 pgp_num 64 autoscale_mode warn last_change 267271
flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
removed_snaps [1~3]
pool 60 'ssd' replicated size 2 min_size 2 crush_rule 4 object_hash
rjenkins pg_num 128 pgp_num 128 autoscale_mode warn last_change 299719
lfor 299717/299717/299717 flags hashpspool,selfmanaged_snaps
stripe_width 0 application rbd
removed_snaps [1~3]
pool 61 'nvme' replicated size 2 min_size 2 crush_rule 2 object_hash
rjenkins pg_num 8 pgp_num 8 autoscale_mode warn last_change 267125 flags
hashpspool stripe_width 0 application rbd
pool 62 'cephfs_data' replicated size 3 min_size 2 crush_rule 3
object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn
last_change 300312 lfor 300310/300310/300310 flags hashpspool
stripe_width 0 application cephfs
pool 63 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 3
object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode warn last_change
267069 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16
recovery_priority 5 application cephfs
Any pg_num / pgp_num is monitored by Ceph, means I get a warning in the
log / health status if a pool is undersized.
I didn't enable PG auto-scaler for any pool, though.
The calculation of PGs per Pool is done with pgcalc
<https://ceph.io/pgcalc/>.
Here's a screenshot <https://ibb.co/VjR6X3x> of this calculation.
My focus is on pool hdb_backup.
Based on these statistics
root@ld3955:~# ceph df detail
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 1.4 PiB 744 TiB 729 TiB 730 TiB 49.53
nvme 23 TiB 23 TiB 43 GiB 51 GiB 0.22
ssd 27 TiB 25 TiB 1.9 TiB 1.9 TiB 7.15
TOTAL 1.5 PiB 792 TiB 731 TiB 732 TiB 48.02
POOLS:
POOL ID STORED OBJECTS USED
%USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY
USED COMPR UNDER COMPR
hdb_backup 11 241 TiB 63.29M 241 TiB
57.03 61 TiB N/A N/A 63.29M
0 B 0 B
hdd 59 553 GiB 142.16k 553 GiB
0.50 54 TiB N/A N/A 142.16k
0 B 0 B
ssd 60 2.0 TiB 530.75k 2.0 TiB
8.72 10 TiB N/A N/A 530.75k
0 B 0 B
nvme 61 0 B 0 0 B
0 11 TiB N/A N/A 0
0 B 0 B
cephfs_data 62 356 GiB 102.29k 356 GiB
0.32 36 TiB N/A N/A 102.29k
0 B 0 B
cephfs_metadata 63 117 MiB 52 117 MiB
0 36 TiB N/A N/A 52
0 B 0 B
there's only 57% used, but effectively I cannot store much more data
because some OSDs are filling up by +80%.
It is true that the disks that are used for this pool exclusively are
different in size, means
3x 48 disks à 7.2TB
4x 48 disks à 1.6TB
and the disk usage is for 7.2TB disk from 41% to 54% and for 1.6TB disks
from 52% to 81%.
If Ceph is not cabable to manage rebalancing automatically, how can I
proceed to rebalance the data manually?
OSD reweight is not an option in my opinion because it starts filling
OSDs that are not with lowest usage rate.
Can I move PGs to specific OSDs?
THX
Am 18.11.2019 um 20:18 schrieb Paul Emmerich:
You have way too few PGs in one of the roots. Many
OSDs have so few
PGs that you should see a lot of health warnings because of it.
The other root has a factor 5 difference in disk size which isn't ideal either.
Paul