[ceph-users] EC pool used space high

25 Nov 2019

Hi,

I have a rook-provisioned cluster to be used for RBDs only. I have 2 pools
named replicated-metadata-pool and ec-data-pool. EC parameters are 6+3.
I've been writing some data to this cluster for some time and noticed that
the reported usage is not what I was expecting.

# ceph df
RAW STORAGE:
    CLASS     SIZE        AVAIL       USED        RAW USED     %RAW USED
    hdd       5.4 PiB     4.3 PiB     1.2 PiB      1.2 PiB         21.77
    TOTAL     5.4 PiB     4.3 PiB     1.2 PiB      1.2 PiB         21.77

POOLS:
    POOL                         ID     STORED      OBJECTS     USED
 %USED     MAX AVAIL
    replicated-metadata-pool      1      90 KiB         408      38 MiB
    0       1.2 PiB
    ec-data-pool                  2     722 TiB     191.64M     1.2 PiB
25.04       2.4 PiB

Since these numbers are rounded a bit too much, I generally use prometheus
metrics on mgr, which are as follows:

ceph_pool_stored : 793,746 G for ec-data-pool and 92323 for
replicated-metadata-pool
ceph_pool_stored_raw: 1,190,865 G for ec-data-pool and 99213 for
replicated-metadata-pool
ceph_cluster_total_used_bytes: 1,329,374 G
ceph_cluster_total_used_raw_bytes: 1,333,013 G
sum(ceph_bluefs_db_used_bytes) : 3,638 G

So ceph_pool_stored for the EC pool is a bit higher than the total used
space of the formatted RBDs. I think that's because of the sparse nature
and deleted blocks not being fstrimmed yet. That's OK.

ceph_pool_stored_raw is almost exactly 1.5x ceph_pool_stored which is what
I'd expect considering EC parameters of 6+3.

What I can't find is the 138,509 G difference between the
ceph_cluster_total_used_bytes and ceph_pool_stored_raw. This is not static
BTW, checking the same data historically shows we have about 1.12x of what
we expect. This seems to make our 1.5x EC overhead a 1.68x overhead in
reality. Anyone have any ideas for why this is the case?

We also have a ceph_cluster_total_used_raw_bytes metric, I believe to be
close to data+metadata. Which is why I tried to show with
sum(ceph_bluefs_db_used_bytes). Is that correct?

Best,

-- 
erdem agaoglu

2024

2023

2022

2021

2020

2019

[ceph-users] EC pool used space high