Hi,
I'm facing several issues with my ceph cluster (2x MDS, 6x ODS).
Here I would like to focus on the issue with pgs backfill_toofull.
I assume this is related to the fact that the data distribution on my
OSDs is not balanced.
This is the current ceph status:
root@ld3955:~# ceph -s
cluster:
id: 6b1b5117-6e08-4843-93d6-2da3cf8a6bae
health: HEALTH_ERR
1 MDSs report slow metadata IOs
78 nearfull osd(s)
1 pool(s) nearfull
Reduced data availability: 2 pgs inactive, 2 pgs peering
Degraded data redundancy: 304136/153251211 objects degraded
(0.198%), 57 pgs degraded, 57 pgs undersized
Degraded data redundancy (low space): 265 pgs backfill_toofull
3 pools have too many placement groups
74 slow requests are blocked > 32 sec
80 stuck requests are blocked > 4096 sec
services:
mon: 3 daemons, quorum ld5505,ld5506,ld5507 (age 98m)
mgr: ld5505(active, since 3d), standbys: ld5506, ld5507
mds: pve_cephfs:1 {0=ld3976=up:active} 1 up:standby
osd: 368 osds: 368 up, 367 in; 302 remapped pgs
data:
pools: 5 pools, 8868 pgs
objects: 51.08M objects, 195 TiB
usage: 590 TiB used, 563 TiB / 1.1 PiB avail
pgs: 0.023% pgs not active
304136/153251211 objects degraded (0.198%)
1672190/153251211 objects misplaced (1.091%)
8564 active+clean
196 active+remapped+backfill_toofull
57 active+undersized+degraded+remapped+backfill_toofull
35 active+remapped+backfill_wait
12 active+remapped+backfill_wait+backfill_toofull
2 active+remapped+backfilling
2 peering
io:
recovery: 18 MiB/s, 4 objects/s
Currently I'm using 6 OSD nodes.
Node A
48x 1.6TB HDD
Node B
48x 1.6TB HDD
Node C
48x 1.6TB HDD
Node D
48x 1.6TB HDD
Node E
48x 7.2TB HDD
Node F
48x 7.2TB HDD
Question:
Is it advisable to distribute the drives equally over all nodes?
If yes, how should this be executed w/o ceph disruption?
Regards
Thomas
Show replies by date