Hi Christian, I understand what you say, but in my understanding a small
capacity OSD should be properly weighted so that fewer PGs are allocated
on it and then the bulk of the data should reside on other, bigger OSDs.
In my specific case I also have more hosts (10) than shards (8), so a
single small host should not severely constrain the overall capacity
since 8 shards can be allocated on 8 hosts without the small one being
necessarily one of these.
My feeling is that my problem is due to the small amount for PGs (512
for 104 OSDs): large fluctuations in PG assignment make some small OSDs
to have too many PGs, even if properly weighted. For example, currently
the most used 500 GB OSD (61% occupancy) has 21 PGs, while the most used
2 TB OSD (41%) has 54 OSDs: the small OSD then has more than 1/3 PGs of
the big one, despite being only 1/4 spacious. The overpopulated small
OSDs might be the real limiting factor, paired with the OSD distribution
(all the 500 GB ones in the same machine) and the host failure domain.
Probably increasing the number of PGs the fluctuations would level out
and result in more available space, but being my machines very old and
limited (the lowest-specs one s a dual core with 8 GB RAM + 32 GB swap
on OSD, managing 8x2TB OSDs) I fear about the increased requirements.
Nicola
On 02/04/23 23:08, Christian Wuerdig wrote:
With failure domain host your max usable cluster
capacity is essentially
constrained by the total capacity of the smallest host which is 8TB if I
read the output correctly. You need to balance your hosts better by
swapping drives.
On Fri, 31 Mar 2023 at 03:34, Nicola Mori <mori(a)fi.infn.it
<mailto:mori@fi.infn.it>> wrote:
Dear Ceph users,
my cluster is made up of 10 old machines, with uneven number of
disks and disk size. Essentially I have just one big data pool (6+2
erasure code, with host failure domain) for which I am currently
experiencing a very poor available space (88 TB of which 40 TB
occupied, as reported by df -h on hosts mounting the cephfs)
compared to the raw one (196.5 TB). I have a total of 104 OSDs and
512 PGs for the pool; I cannot increment the PG number since the
machines are old and with very low amount of RAM, and some of them
are already overloaded.
In this situation I'm seeing a high occupation of small OSDs (500
MB) with respect to bigger ones (2 and 4 TB) even if the weight is
set equal to disk capacity (see below for ceph osd tree). For
example OSD 9 is at 62% occupancy even with weight 0.5 and reweight
0.75, while the highest occupancy for 2 TB OSDs is 41% (OSD 18) and
4 TB OSDs is 23% (OSD 79). I guess this high occupancy for 500 MB
OSDs combined with erasure code size and host failure domain might
be the cause of the poor available space, could this be true? The
upmap balancer is currently running but I don't know if and how much
it could improve the situation.
Any hint is greatly appreciated, thanks.
Nicola
# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 196.47754 root default
-7 14.55518 host aka
4 hdd 1.81940 osd.4 up 1.00000 1.00000
11 hdd 1.81940 osd.11 up 1.00000 1.00000
18 hdd 1.81940 osd.18 up 1.00000 1.00000
26 hdd 1.81940 osd.26 up 1.00000 1.00000
32 hdd 1.81940 osd.32 up 1.00000 1.00000
41 hdd 1.81940 osd.41 up 1.00000 1.00000
48 hdd 1.81940 osd.48 up 1.00000 1.00000
55 hdd 1.81940 osd.55 up 1.00000 1.00000
-3 14.55518 host balin
0 hdd 1.81940 osd.0 up 1.00000 1.00000
8 hdd 1.81940 osd.8 up 1.00000 1.00000
15 hdd 1.81940 osd.15 up 1.00000 1.00000
22 hdd 1.81940 osd.22 up 1.00000 1.00000
29 hdd 1.81940 osd.29 up 1.00000 1.00000
34 hdd 1.81940 osd.34 up 1.00000 1.00000
43 hdd 1.81940 osd.43 up 1.00000 1.00000
49 hdd 1.81940 osd.49 up 1.00000 1.00000
-13 29.10950 host bifur
3 hdd 3.63869 osd.3 up 1.00000 1.00000
14 hdd 3.63869 osd.14 up 1.00000 1.00000
27 hdd 3.63869 osd.27 up 1.00000 1.00000
37 hdd 3.63869 osd.37 up 1.00000 1.00000
50 hdd 3.63869 osd.50 up 1.00000 1.00000
59 hdd 3.63869 osd.59 up 1.00000 1.00000
64 hdd 3.63869 osd.64 up 1.00000 1.00000
69 hdd 3.63869 osd.69 up 1.00000 1.00000
-17 29.10950 host bofur
2 hdd 3.63869 osd.2 up 1.00000 1.00000
21 hdd 3.63869 osd.21 up 1.00000 1.00000
39 hdd 3.63869 osd.39 up 1.00000 1.00000
57 hdd 3.63869 osd.57 up 1.00000 1.00000
66 hdd 3.63869 osd.66 up 1.00000 1.00000
72 hdd 3.63869 osd.72 up 1.00000 1.00000
76 hdd 3.63869 osd.76 up 1.00000 1.00000
79 hdd 3.63869 osd.79 up 1.00000 1.00000
-21 29.10376 host dwalin
88 hdd 1.81898 osd.88 up 1.00000 1.00000
89 hdd 1.81898 osd.89 up 1.00000 1.00000
90 hdd 1.81898 osd.90 up 1.00000 1.00000
91 hdd 1.81898 osd.91 up 1.00000 1.00000
92 hdd 1.81898 osd.92 up 1.00000 1.00000
93 hdd 1.81898 osd.93 up 1.00000 1.00000
94 hdd 1.81898 osd.94 up 1.00000 1.00000
95 hdd 1.81898 osd.95 up 1.00000 1.00000
96 hdd 1.81898 osd.96 up 1.00000 1.00000
97 hdd 1.81898 osd.97 up 1.00000 1.00000
98 hdd 1.81898 osd.98 up 1.00000 1.00000
99 hdd 1.81898 osd.99 up 1.00000 1.00000
100 hdd 1.81898 osd.100 up 1.00000 1.00000
101 hdd 1.81898 osd.101 up 1.00000 1.00000
102 hdd 1.81898 osd.102 up 1.00000 1.00000
103 hdd 1.81898 osd.103 up 1.00000 1.00000
-9 14.55518 host ogion
7 hdd 1.81940 osd.7 up 1.00000 1.00000
16 hdd 1.81940 osd.16 up 1.00000 1.00000
23 hdd 1.81940 osd.23 up 1.00000 1.00000
33 hdd 1.81940 osd.33 up 1.00000 1.00000
40 hdd 1.81940 osd.40 up 1.00000 1.00000
47 hdd 1.81940 osd.47 up 1.00000 1.00000
54 hdd 1.81940 osd.54 up 1.00000 1.00000
61 hdd 1.81940 osd.61 up 1.00000 1.00000
-19 14.55518 host prestno
81 hdd 1.81940 osd.81 up 1.00000 1.00000
82 hdd 1.81940 osd.82 up 1.00000 1.00000
83 hdd 1.81940 osd.83 up 1.00000 1.00000
84 hdd 1.81940 osd.84 up 1.00000 1.00000
85 hdd 1.81940 osd.85 up 1.00000 1.00000
86 hdd 1.81940 osd.86 up 1.00000 1.00000
87 hdd 1.81940 osd.87 up 1.00000 1.00000
104 hdd 1.81940 osd.104 up 1.00000 1.00000
-15 29.10376 host remolo
6 hdd 1.81897 osd.6 up 1.00000 1.00000
12 hdd 1.81897 osd.12 up 1.00000 1.00000
19 hdd 1.81897 osd.19 up 1.00000 1.00000
28 hdd 1.81897 osd.28 up 1.00000 1.00000
35 hdd 1.81897 osd.35 up 1.00000 1.00000
44 hdd 1.81897 osd.44 up 1.00000 1.00000
52 hdd 1.81897 osd.52 up 1.00000 1.00000
58 hdd 1.81897 osd.58 up 1.00000 1.00000
63 hdd 1.81897 osd.63 up 1.00000 1.00000
67 hdd 1.81897 osd.67 up 1.00000 1.00000
71 hdd 1.81897 osd.71 up 1.00000 1.00000
73 hdd 1.81897 osd.73 up 1.00000 1.00000
74 hdd 1.81897 osd.74 up 1.00000 1.00000
75 hdd 1.81897 osd.75 up 1.00000 1.00000
77 hdd 1.81897 osd.77 up 1.00000 1.00000
78 hdd 1.81897 osd.78 up 1.00000 1.00000
-5 14.55518 host rokanan
1 hdd 1.81940 osd.1 up 1.00000 1.00000
10 hdd 1.81940 osd.10 up 1.00000 1.00000
17 hdd 1.81940 osd.17 up 1.00000 1.00000
24 hdd 1.81940 osd.24 up 1.00000 1.00000
31 hdd 1.81940 osd.31 up 1.00000 1.00000
38 hdd 1.81940 osd.38 up 1.00000 1.00000
46 hdd 1.81940 osd.46 up 1.00000 1.00000
53 hdd 1.81940 osd.53 up 1.00000 1.00000
-11 7.27515 host romolo
5 hdd 0.45470 osd.5 up 1.00000 1.00000
9 hdd 0.45470 osd.9 up 0.75000 1.00000
13 hdd 0.45470 osd.13 up 1.00000 1.00000
20 hdd 0.45470 osd.20 up 0.95000 1.00000
25 hdd 0.45470 osd.25 up 0.75000 1.00000
30 hdd 0.45470 osd.30 up 1.00000 1.00000
36 hdd 0.45470 osd.36 up 1.00000 1.00000
42 hdd 0.45470 osd.42 up 1.00000 1.00000
45 hdd 0.45470 osd.45 up 0.85004 1.00000
51 hdd 0.45470 osd.51 up 0.89999 1.00000
56 hdd 0.45470 osd.56 up 1.00000 1.00000
60 hdd 0.45470 osd.60 up 1.00000 1.00000
62 hdd 0.45470 osd.62 up 1.00000 1.00000
65 hdd 0.45470 osd.65 up 0.85004 1.00000
68 hdd 0.45470 osd.68 up 1.00000 1.00000
70 hdd 0.45470 osd.70 up 1.00000 1.00000
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
<mailto:ceph-users@ceph.io>
To unsubscribe send an email to ceph-users-leave(a)ceph.io
<mailto:ceph-users-leave@ceph.io>
--
Nicola Mori, Ph.D.
INFN sezione di Firenze
Via Bruno Rossi 1, 50019 Sesto F.no (Italy)
+390554572660
mori(a)fi.infn.it