Hello,
We use purely cephfs in out ceph cluster (version 14.2.7). The cephfs
data is an EC pool (k=4, m=2) with hdd OSDs using bluestore. The
default file layout (i.e. 4MB object size) is used.
We see the following output of ceph df:
---
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW
USED
hdd 951 TiB 888 TiB 63 TiB 63
TiB 6.58
ssd 9.6 TiB 9.6 TiB 1.4 GiB 16
GiB 0.17
TOTAL 961 TiB 898 TiB 63 TiB 63
TiB 6.52
POOLS:
POOL ID STORED OBJECTS USED %USE
D MAX AVAIL
cephfs-data 2 34 TiB 12.51M 52
TiB 5.93 553 TiB
cephfs-metadata 4 994 MiB 98.61k 1.5
GiB 0.02 3.0 TiB
---
What triggered my attention is the discrepency between the reported
size of "USED" (52 TiB) and "STORED" (34 TiB) on the cephfs-data
pool.
From this document (
https://docs.ceph.com/docs/master/releases/nautilus/#upgrade-compatibility-…
), it says that
- "USED" represents amount of space allocated purely for data by all
OSD nodes in KB
- "STORED" represents amount of data stored by the user.
I seem to undersand that the "USED" size can be roughly taken as the
number of objects (12.51M) times the object size (4MB) of the file
layout; and since there are many files with size smaller than 4 MB in
our system, the actual stored data is less.
Is my interpretation correct? If so, does it mean that we will be
wasting a lot of space when we have a lot files smaller than the object
size of 4MB in the system? Thanks for the help!
Cheers, Hong
--
Hurng-Chun (Hong) Lee, PhD
ICT manager
Donders Institute for Brain, Cognition and Behaviour,
Centre for Cognitive Neuroimaging
Radboud University Nijmegen
e-mail: h.lee(a)donders.ru.nl
tel: +31(0) 243610977
web:
http://www.ru.nl/donders/