We also suffer heavily from this so I wrote a custom balancer which yields much better
After you run it, it echoes the PG movements it suggests. You can then just run those
commands the cluster will balance more.
It's kinda work in progress, so I'm glad about your feedback.
Maybe it helps you :)
On 27/01/2021 17.15, Francois Legrand wrote:
I have a cluster with 116 disks (24 new disks of 16TB added in december and the rest of
8TB) running nautilus 14.2.16.
I moved (8 month ago) from crush_compat to upmap balancing.
But the cluster seems not well balanced, with a number of pgs on the 8TB disks varying
from 26 to 52 ! And an occupation from 35 to 69%.
The recent 16 TB disks are more homogeneous with 48 to 61 pgs and space between 30 and
Last week, I realized that some osd were maybe not using upmap because I did a ceph osd
crush weight-set ls and got (compat) as result.
Thus I ran a ceph osd crush weight-set rm-compat which triggered some rebalancing. Now
there is no more recovery for 2 days, but the cluster is still unbalanced.
As far as I understand, upmap is supposed to reach an equal number of pgs on all the
disks (I guess weighted by their capacity).
Thus I would expect more or less 30 pgs on the 8TB disks and 60 on the 16TB and around
50% usage on all. Which is not the case (by far).
The problem is that it impact the free available space in the pools (264Ti while there is
more than 578Ti free in the cluster) because free space seems to be based on space
available before the first osd will be full !
Is it normal ? Did I missed something ? What could I do ?
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io