Hi,
I want to use balancer mode "upmap" for all pools.
This mode is currently enable for pool "hdb_backup" with ~600TB used space.
root@ld3955:~# rados df
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY
UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR
backup 0 B 0 0 0
0 0 0 0 0 B 0 0 B 0 B 0 B
cephfs_data 1.1 TiB 89592 0 268776
0 0 0 0 0 B 43443 144 GiB 0 B 0 B
cephfs_metadata 311 MiB 48 0 144
0 0 0 6 6 KiB 7465 106 MiB 0 B 0 B
hdb_backup 585 TiB 51077985 0 153233955
0 0 0 12577024 4.3 TiB 281002173 523 TiB 0 B 0 B
hdd 6.3 TiB 585051 0 1755153
0 0 0 4420255 69 GiB 8219453 1.2 TiB 0 B 0 B
root@ld3955:~# ceph osd lspools
11 hdb_backup
51 hdd
52 backup
57 cephfs_data
58 cephfs_metadata
I started with pool "cephfs_metadata" that allocated comparable low data.
At the same moment when I executed
ceph config set mgr mgr/balancer/pool_ids 11,52,58
Ceph status shows increasing number of Reduced data availability: <x> pg
inactive, <y> pg peering
And the number of
<z> slow requests are blocked > 32 sec
was increasing heavily up to 180.
Observing Ceph log (ceph -w) it becomes clear that there's a correlation
between pg inactive and slow requests are block in my cluster.
How can I start analysis why the cluster reports slow requests?
THX
Show replies by date