I'm in the middle of increasing PG count for one of our pools by making small
increments, waiting for the process to complete, rinse and repeat. I'm doing it this
way so I can control when all this activity is happening and keeping it away from the
busier production traffic times.
I'm expecting some inbalance as PGs get created on already unbalanced OSDs, however
our monitoring picked up something today that I'm not really understanding. Our total
utilization is just over 50% and about 96% of our total data is in this one pool. Due to
there not being enough PGs, the amount of data in each is quite large and since they
aren't evenly spread across the OSDs, there's a bit of inbalance. That's all
cool and to be expected, which is the reason for increasing the PG count in the first
place.
However, as some PGs are splitting, the new PGs are sometimes being created on OSDs that
already have a disproportionate amount of data. Again, not totally unexpected. Our
monitoring detected the usage of this pool to be >85% today as I neared the end of
another increase in PG count. What I'm not understanding is how this value is
determined. I've read other posts and the calculations suggested don't give a
result that equals what shows in my %USED column. I'm suspecting that it's
somehow related to the MAX AVAIL value (which I believe is somewhat indirectly related to
the amount available based on the individual OSD utilization), but none of the posts I
read mention this in their calculations and I've been unable to create a formula with
any of the values I have to end up with the &USED value I have.
For the record, my current total utilization based on a 'ceph osd df' looks like
this:
TOTAL 39507G(SIZE) 19931G(USE) 17568G(AVAIL) 50.45(%USE)
My most utilised OSD (currently in the process of moving some data off this OSD) is 81.58%
used with 188G available and a variance of 1.62.
A cut-down output of 'ceph df' looks like this:
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
39507G 17569G 19930G 50.45
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
default.rgw.buckets.data 30 9552G 86.05 1548G 36285066
I suspect that as I get the utilization of my over-utilized OSDs down, this %USED value
will drop. But, I'd just love to fully understand how this value is calculated.
Thanks,
Mark J
Show replies by date