[ceph-users] Unbalanced data distribution

22 Oct 2019

Hi,
in my 7 OSD node cluster I have the following disks:
Node 1
48x 1.6TB
Node 2
48x 1.6TB
Node 3
48x 1.6TB
Node 4
48x 1.6TB
Node 5
48x 7.2TB
Node 6
48x 7.2TB
Node 7
48x 7.2TB

The disk sizes are represented in CRUSH map accordingly.

For these disks only I created a pool "hdb_backup" with size 3.
Based on output of rados df this pool is using 247TB currently.
root@ld3955:~# rados df
POOL_NAME          USED  OBJECTS CLONES    COPIES MISSING_ON_PRIMARY
UNFOUND DEGRADED   RD_OPS      RD    WR_OPS      WR USED COMPR UNDER COMPR
cephfs_data     345 GiB    99092      0    297276                 
0       0        0     1638 811 MiB    109235 365 GiB        0 B         0 B
cephfs_metadata 102 MiB       48      0       144                 
0       0        0        8   8 KiB      8588 106 MiB        0 B         0 B
hdb_backup      247 TiB 64671398      0 194014194                 
0       0        0 12902005 4.3 TiB 323647757 601 TiB        0 B         0 B
hdd             2.4 TiB   635457      0   1270914                 
0       0        0 13237278 321 GiB  21526953 3.0 TiB        0 B         0 B
nvme                0 B        0      0         0                 
0       0        0        0     0 B         0     0 B        0 B         0 B
ssd             251 GiB    64307      0    128614                 
0       0        0   615475  29 GiB    885085  55 GiB        0 B         0 B

total_objects    65470302
total_used       747 TiB
total_avail      784 TiB
total_space      1.5 PiB

In order to rebalance the data in this pool I have configured balance
mode upmap:
root@ld3955:~# ceph balancer status
{
    "active": true,
    "plans": [],
    "mode": "upmap"
}

Unfortunately the data distribution is not balanced at all on 1.6TB
disks, means the range is between 53.37% and 83.04%.
root@ld3955:~# ceph osd df  | awk '{ print "osd."$1, "size: "$5,
"usage:
" $17, "reweight: "$4 }' | sort -nk5
osd.ID size: SIZE usage:  reweight: REWEIGHT
osd.MIN/MAX size: 26.45 usage:  reweight: STDDEV:
osd.TOTAL size: TiB usage:  reweight: 747
osd.265 size: 1.6 usage: 53.37 reweight:
1.00000                                                                                                                                                                                                                                       

osd.248 size: 1.6 usage: 53.41 reweight: 1.00000
osd.111 size: 1.6 usage: 53.43 reweight: 1.00000
osd.161 size: 1.6 usage: 53.46 reweight: 1.00000
osd.85 size: 1.6 usage: 53.46 reweight: 1.00000
osd.241 size: 1.6 usage: 53.49 reweight: 1.00000
osd.238 size: 1.6 usage: 53.51 reweight: 1.00000
osd.259 size: 1.6 usage: 53.56 reweight: 1.00000
osd.88 size: 1.6 usage: 53.57 reweight: 1.00000
osd.204 size: 1.6 usage: 53.58 reweight: 1.00000
osd.159 size: 1.6 usage: 55.16 reweight: 1.00000
osd.81 size: 1.6 usage: 55.16 reweight: 1.00000
osd.116 size: 1.6 usage: 55.20 reweight: 1.00000
osd.195 size: 1.6 usage: 55.25 reweight: 1.00000
osd.169 size: 1.6 usage: 55.33 reweight: 1.00000
osd.158 size: 1.6 usage: 55.34 reweight: 1.00000
[...]
osd.146 size: 1.6 usage: 79.31 reweight: 1.00000
osd.140 size: 1.6 usage: 79.34 reweight: 0.89999
osd.262 size: 1.6 usage: 79.38 reweight: 0.89999
osd.217 size: 1.6 usage: 79.48 reweight: 1.00000
osd.83 size: 1.6 usage: 79.50 reweight: 1.00000
osd.239 size: 1.6 usage: 79.52 reweight: 0.79999
osd.190 size: 1.6 usage: 80.87 reweight: 1.00000
osd.97 size: 1.6 usage: 80.95 reweight: 1.00000
osd.216 size: 1.6 usage: 80.97 reweight: 1.00000
osd.160 size: 1.6 usage: 81.03 reweight: 1.00000
osd.145 size: 1.6 usage: 81.19 reweight: 1.00000
osd.137 size: 1.6 usage: 81.20 reweight: 0.89999
osd.136 size: 1.6 usage: 81.21 reweight: 0.89999
osd.54 size: 1.6 usage: 82.88 reweight: 1.00000
osd.252 size: 1.6 usage: 83.04 reweight: 0.89999

Question:
Why is the data distribution on the 1.6TB disks unequal?
How can I correct this?

THX

2024

2023

2022

2021

2020

2019

[ceph-users] Unbalanced data distribution