Hi Thomas,
For 100% byte distribution of data across OSD, you should setup ceph balancer in
"byte" mode, not in PG mode.
Change will distribute all osd with the same % of usage, but the objects will be NOT
reduntant.
After several weeks and months testing balancer the best profile is balance by PG with
unmap.
In PG mode you are going to get always "until balancer got a better algorithm" a
not equially data distributed, an you sometime should manually redistribute weight by
CLI.
You can play with balancer directly from Dashboard from Nautilus. Balancer is not an
"active" agent asked before storage data into disk, first ceph store data and
them balancer move objects.
Regards
Manuel
-----Mensaje original-----
De: Thomas <74cmonty(a)gmail.com>
Enviado el: lunes, 23 de septiembre de 2019 11:08
Para: ceph-users(a)ceph.io
Asunto: [ceph-users] OSD rebalancing issue - should drives be distributed equally over all
nodes
Hi,
I'm facing several issues with my ceph cluster (2x MDS, 6x ODS).
Here I would like to focus on the issue with pgs backfill_toofull.
I assume this is related to the fact that the data distribution on my OSDs is not
balanced.
This is the current ceph status:
root@ld3955:~# ceph -s
cluster:
id: 6b1b5117-6e08-4843-93d6-2da3cf8a6bae
health: HEALTH_ERR
1 MDSs report slow metadata IOs
78 nearfull osd(s)
1 pool(s) nearfull
Reduced data availability: 2 pgs inactive, 2 pgs peering
Degraded data redundancy: 304136/153251211 objects degraded (0.198%), 57 pgs
degraded, 57 pgs undersized
Degraded data redundancy (low space): 265 pgs backfill_toofull
3 pools have too many placement groups
74 slow requests are blocked > 32 sec
80 stuck requests are blocked > 4096 sec
services:
mon: 3 daemons, quorum ld5505,ld5506,ld5507 (age 98m)
mgr: ld5505(active, since 3d), standbys: ld5506, ld5507
mds: pve_cephfs:1 {0=ld3976=up:active} 1 up:standby
osd: 368 osds: 368 up, 367 in; 302 remapped pgs
data:
pools: 5 pools, 8868 pgs
objects: 51.08M objects, 195 TiB
usage: 590 TiB used, 563 TiB / 1.1 PiB avail
pgs: 0.023% pgs not active
304136/153251211 objects degraded (0.198%)
1672190/153251211 objects misplaced (1.091%)
8564 active+clean
196 active+remapped+backfill_toofull
57 active+undersized+degraded+remapped+backfill_toofull
35 active+remapped+backfill_wait
12 active+remapped+backfill_wait+backfill_toofull
2 active+remapped+backfilling
2 peering
io:
recovery: 18 MiB/s, 4 objects/s
Currently I'm using 6 OSD nodes.
Node A
48x 1.6TB HDD
Node B
48x 1.6TB HDD
Node C
48x 1.6TB HDD
Node D
48x 1.6TB HDD
Node E
48x 7.2TB HDD
Node F
48x 7.2TB HDD
Question:
Is it advisable to distribute the drives equally over all nodes?
If yes, how should this be executed w/o ceph disruption?
Regards
Thomas
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to
ceph-users-leave(a)ceph.io