Hey that's it!
I stopped the up but out OSDs (100 and 177), and now the stats are correct!
# ceph df
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 5.5 PiB 1.2 PiB 4.3 PiB 4.3 PiB 78.62
TOTAL 5.5 PiB 1.2 PiB 4.3 PiB 4.3 PiB 78.62
POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL
public 68 2.9 PiB 143.56M 4.3 PiB 84.55 538 TiB
test 71 29 MiB 6.56k 1.2 GiB 0 269 TiB
foo 72 1.2 GiB 308 3.6 GiB 0 269 TiB
On Thu, Nov 26, 2020 at 8:02 PM Dan van der Ster <dan(a)vanderster.com> wrote:
>
> There are a couple gaps, yes:
https://termbin.com/9mx1
>
> What should I do?
>
> -- dan
>
> On Thu, Nov 26, 2020 at 7:52 PM Igor Fedotov <ifedotov(a)suse.de> wrote:
> >
> > Does "ceph osd df tree" show stats properly (I mean there are no
evident
> > gaps like unexpected zero values) for all the daemons?
> >
> >
> > > 1. Anyway, I found something weird...
> > >
> > > I created a new 1-PG pool "foo" on a different cluster and wrote
some
> > > data to it.
> > >
> > > The stored and used are equal.
> > >
> > > Thu 26 Nov 19:26:58 CET 2020
> > > RAW STORAGE:
> > > CLASS SIZE AVAIL USED RAW USED %RAW USED
> > > hdd 5.5 PiB 1.2 PiB 4.3 PiB 4.3 PiB 78.31
> > > TOTAL 5.5 PiB 1.2 PiB 4.3 PiB 4.3 PiB 78.31
> > >
> > > POOLS:
> > > POOL ID STORED OBJECTS USED %USED MAX
AVAIL
> > > public 68 2.9 PiB 143.54M 2.9 PiB 78.49 538
TiB
> > > test 71 29 MiB 6.56k 29 MiB 0 269
TiB
> > > foo 72 1.2 GiB 308 1.2 GiB 0 269
TiB
> > >
> > > But I tried restarting the relevant three OSDs, and the bytes_used are
> > > temporarily reported correctly:
> > >
> > > Thu 26 Nov 19:27:00 CET 2020
> > > RAW STORAGE:
> > > CLASS SIZE AVAIL USED RAW USED %RAW USED
> > > hdd 5.5 PiB 1.2 PiB 4.3 PiB 4.3 PiB 78.62
> > > TOTAL 5.5 PiB 1.2 PiB 4.3 PiB 4.3 PiB 78.62
> > >
> > > POOLS:
> > > POOL ID STORED OBJECTS USED %USED MAX
AVAIL
> > > public 68 2.9 PiB 143.54M 4.3 PiB 84.55 538
TiB
> > > test 71 29 MiB 6.56k 1.2 GiB 0 269
TiB
> > > foo 72 1.2 GiB 308 3.6 GiB 0 269
TiB
> > >
> > > But then a few seconds later it's back to used == stored:
> > >
> > > Thu 26 Nov 19:27:03 CET 2020
> > > RAW STORAGE:
> > > CLASS SIZE AVAIL USED RAW USED %RAW USED
> > > hdd 5.5 PiB 1.2 PiB 4.3 PiB 4.3 PiB 78.47
> > > TOTAL 5.5 PiB 1.2 PiB 4.3 PiB 4.3 PiB 78.47
> > >
> > > POOLS:
> > > POOL ID STORED OBJECTS USED %USED MAX
AVAIL
> > > public 68 2.9 PiB 143.54M 2.9 PiB 78.49 538
TiB
> > > test 71 29 MiB 6.56k 29 MiB 0 269
TiB
> > > foo 72 1.2 GiB 308 1.2 GiB 0 269
TiB
> > >
> > > It seems to report the correct stats only when the PG is peering (so
> > > some other transition state).
> > > I've restarted all three relevant OSDs now -- the stats are reported
> > > as stored == used.
> > >
> > > 2. Another data point -- I found another old cluster that reports
> > > stored/used correctly. I have no idea what might be different about
> > > that cluster -- we updated it just like the others.
> > >
> > > Cheers, Dan
> > >
> > > On Thu, Nov 26, 2020 at 6:22 PM Igor Fedotov <ifedotov(a)suse.de>
wrote:
> > >> For specific BlueStore instance you can learn relevant statfs output
by
> > >>
> > >> setting debug_bluestore to 20 and leaving OSD for 5-10 seconds (or may
> > >> be a couple of minutes - don't remember exact statsfs poll period
).
> > >>
> > >> Then grep osd log for "statfs" and/or "pool_statfs"
and get the output
> > >> formatted as per the following operator (taken from
src/osd/osd_types.cc):
> > >>
> > >> ostream& operator<<(ostream& out, const store_statfs_t
&s)
> > >> {
> > >> out << std::hex
> > >> << "store_statfs(0x" << s.available
> > >> << "/0x" << s.internally_reserved
> > >> << "/0x" << s.total
> > >> << ", data 0x" << s.data_stored
> > >> << "/0x" << s.allocated
> > >> << ", compress 0x" << s.data_compressed
> > >> << "/0x" << s.data_compressed_allocated
> > >> << "/0x" << s.data_compressed_original
> > >> << ", omap 0x" << s.omap_allocated
> > >> << ", meta 0x" << s.internal_metadata
> > >> << std::dec
> > >> << ")";
> > >> return out;
> > >> }
> > >>
> > >> But honestly I doubt this is BlueStore which reports incorrectly since
> > >> it doesn't care about replication.
> > >>
> > >> It rather looks like lack of stats from some replicas or improper pg
> > >> replica factor processing...
> > >>
> > >> Perhaps legacy vs. new pool what matters... Can you try to create a
new
> > >> pool at old cluster and fill it with some data (e.g. just a single 64K
> > >> object) and check the stats?
> > >>
> > >>
> > >> Thanks,
> > >>
> > >> Igor
> > >>
> > >> On 11/26/2020 8:00 PM, Dan van der Ster wrote:
> > >>> Hi Igor,
> > >>>
> > >>> No BLUESTORE_LEGACY_STATFS warning, and
> > >>> bluestore_warn_on_legacy_statfs is the default true on this (and
all)
> > >>> clusters.
> > >>> I'm quite sure we did the statfs conversion during one of the
recent
> > >>> upgrades (I forget which one exactly).
> > >>>
> > >>> # ceph tell osd.* config get bluestore_warn_on_legacy_statfs | grep
-v true
> > >>> #
> > >>>
> > >>> Is there a command to see the statfs reported by an individual OSD
?
> > >>> We have a mix of ~year old and recently recreated OSDs, so I could
try
> > >>> to see if they differ.
> > >>>
> > >>> Thanks!
> > >>>
> > >>> Dan
> > >>>
> > >>>
> > >>> On Thu, Nov 26, 2020 at 5:50 PM Igor Fedotov
<ifedotov(a)suse.de> wrote:
> > >>>> Hi Dan
> > >>>>
> > >>>> don't you have BLUESTORE_LEGACY_STATFS alert raised (might
be silenced
> > >>>> by bluestore_warn_on_legacy_statfs param) for the older
cluster?
> > >>>>
> > >>>>
> > >>>> Thanks,
> > >>>>
> > >>>> Igor
> > >>>>
> > >>>>
> > >>>> On 11/26/2020 7:29 PM, Dan van der Ster wrote:
> > >>>>> Hi,
> > >>>>>
> > >>>>> Depending on which cluster I look at (all running
v14.2.11), the
> > >>>>> bytes_used is reporting raw space or stored bytes
variably.
> > >>>>>
> > >>>>> Here's a 7 year old cluster:
> > >>>>>
> > >>>>> # ceph df -f json | jq .pools[0]
> > >>>>> {
> > >>>>> "name": "volumes",
> > >>>>> "id": 4,
> > >>>>> "stats": {
> > >>>>> "stored": 1229308190855881,
> > >>>>> "objects": 294401604,
> > >>>>> "kb_used": 1200496280133,
> > >>>>> "bytes_used": 1229308190855881,
> > >>>>> "percent_used": 0.4401889145374298,
> > >>>>> "max_avail": 521125025021952
> > >>>>> }
> > >>>>> }
> > >>>>>
> > >>>>> Note that stored == bytes_used for that pool. (this is a 3x
replica pool).
> > >>>>>
> > >>>>> But here's a newer cluster (installed recently with
nautilus)
> > >>>>>
> > >>>>> # ceph df -f json | jq .pools[0]
> > >>>>> {
> > >>>>> "name": "volumes",
> > >>>>> "id": 1,
> > >>>>> "stats": {
> > >>>>> "stored": 680977600893041,
> > >>>>> "objects": 163155803,
> > >>>>> "kb_used": 1995736271829,
> > >>>>> "bytes_used": 2043633942351985,
> > >>>>> "percent_used": 0.23379847407341003,
> > >>>>> "max_avail": 2232457428467712
> > >>>>> }
> > >>>>> }
> > >>>>>
> > >>>>> In the second cluster, bytes_used is 3x stored.
> > >>>>>
> > >>>>> Does anyone know why these are not reported consistently?
> > >>>>> Noticing this just now, I'll update our monitoring to
plot stored
> > >>>>> rather than bytes_used from now on.
> > >>>>>
> > >>>>> Thanks!
> > >>>>>
> > >>>>> Dan
> > >>>>> _______________________________________________
> > >>>>> ceph-users mailing list -- ceph-users(a)ceph.io
> > >>>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io