[ceph-users] Re: ceph df: pool stored vs bytes_used -- raw or not?

26 Nov 2020

hmm... I would suggest some issues in OSD-MON communication.

The first question is whether this "broken" OSD set is constant or it 
changes over time?

Does any of this OSD back the 'foo' PG?

Igor

On 11/26/2020 10:02 PM, Dan van der Ster wrote:
> There are a couple gaps, yes: https://termbin.com/9mx1
>
> What should I do?
>
> -- dan
>
> On Thu, Nov 26, 2020 at 7:52 PM Igor Fedotov &lt;ifedotov(a)suse.de&gt; wrote:
>> Does "ceph osd df tree" show stats properly (I mean there are no
evident
>> gaps like unexpected zero values) for all the daemons?
>>
>>
>>> 1. Anyway, I found something weird...
>>>
>>> I created a new 1-PG pool "foo" on a different cluster and wrote
some
>>> data to it.
>>>
>>> The stored and used are equal.
>>>
>>> Thu 26 Nov 19:26:58 CET 2020
>>> RAW STORAGE:
>>>       CLASS     SIZE        AVAIL       USED        RAW USED     %RAW USED
>>>       hdd       5.5 PiB     1.2 PiB     4.3 PiB      4.3 PiB         78.31
>>>       TOTAL     5.5 PiB     1.2 PiB     4.3 PiB      4.3 PiB         78.31
>>>
>>> POOLS:
>>>       POOL       ID     STORED      OBJECTS     USED        %USED     MAX
AVAIL
>>>       public     68     2.9 PiB     143.54M     2.9 PiB     78.49       538
TiB
>>>       test       71      29 MiB       6.56k      29 MiB         0       269
TiB
>>>       foo        72     1.2 GiB         308     1.2 GiB         0       269
TiB
>>>
>>> But I tried restarting the relevant three OSDs, and the bytes_used are
>>> temporarily reported correctly:
>>>
>>> Thu 26 Nov 19:27:00 CET 2020
>>> RAW STORAGE:
>>>       CLASS     SIZE        AVAIL       USED        RAW USED     %RAW USED
>>>       hdd       5.5 PiB     1.2 PiB     4.3 PiB      4.3 PiB         78.62
>>>       TOTAL     5.5 PiB     1.2 PiB     4.3 PiB      4.3 PiB         78.62
>>>
>>> POOLS:
>>>       POOL       ID     STORED      OBJECTS     USED        %USED     MAX
AVAIL
>>>       public     68     2.9 PiB     143.54M     4.3 PiB     84.55       538
TiB
>>>       test       71      29 MiB       6.56k     1.2 GiB         0       269
TiB
>>>       foo        72     1.2 GiB         308     3.6 GiB         0       269
TiB
>>>
>>> But then a few seconds later it's back to used == stored:
>>>
>>> Thu 26 Nov 19:27:03 CET 2020
>>> RAW STORAGE:
>>>       CLASS     SIZE        AVAIL       USED        RAW USED     %RAW USED
>>>       hdd       5.5 PiB     1.2 PiB     4.3 PiB      4.3 PiB         78.47
>>>       TOTAL     5.5 PiB     1.2 PiB     4.3 PiB      4.3 PiB         78.47
>>>
>>> POOLS:
>>>       POOL       ID     STORED      OBJECTS     USED        %USED     MAX
AVAIL
>>>       public     68     2.9 PiB     143.54M     2.9 PiB     78.49       538
TiB
>>>       test       71      29 MiB       6.56k      29 MiB         0       269
TiB
>>>       foo        72     1.2 GiB         308     1.2 GiB         0       269
TiB
>>>
>>> It seems to report the correct stats only when the PG is peering (so
>>> some other transition state).
>>> I've restarted all three relevant OSDs now -- the stats are reported
>>> as stored == used.
>>>
>>> 2. Another data point -- I found another old cluster that reports
>>> stored/used correctly. I have no idea what might be different about
>>> that cluster -- we updated it just like the others.
>>>
>>> Cheers, Dan
>>>
>>> On Thu, Nov 26, 2020 at 6:22 PM Igor Fedotov &lt;ifedotov(a)suse.de&gt; wrote:
>>>> For specific BlueStore instance you can learn relevant statfs output by
>>>>
>>>> setting debug_bluestore to 20 and leaving OSD for 5-10 seconds (or may
>>>> be a couple of minutes - don't remember exact statsfs poll period ).
>>>>
>>>> Then grep osd log for "statfs" and/or "pool_statfs"
and get the output
>>>> formatted as per the following operator (taken from
src/osd/osd_types.cc):
>>>>
>>>> ostream& operator<<(ostream& out, const store_statfs_t
&s)
>>>> {
>>>>      out << std::hex
>>>>          << "store_statfs(0x" << s.available
>>>>          << "/0x"  << s.internally_reserved
>>>>          << "/0x"  << s.total
>>>>          << ", data 0x" << s.data_stored
>>>>          << "/0x"  << s.allocated
>>>>          << ", compress 0x" << s.data_compressed
>>>>          << "/0x"  << s.data_compressed_allocated
>>>>          << "/0x"  << s.data_compressed_original
>>>>          << ", omap 0x" << s.omap_allocated
>>>>          << ", meta 0x" << s.internal_metadata
>>>>          << std::dec
>>>>          << ")";
>>>>      return out;
>>>> }
>>>>
>>>> But honestly I doubt this is BlueStore which reports incorrectly since
>>>> it doesn't care about replication.
>>>>
>>>> It rather looks like lack of stats from some replicas or improper pg
>>>> replica factor processing...
>>>>
>>>> Perhaps legacy vs. new pool what matters... Can you try to create a new
>>>> pool at old cluster and fill it with some data (e.g. just a single 64K
>>>> object) and check the stats?
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Igor
>>>>
>>>> On 11/26/2020 8:00 PM, Dan van der Ster wrote:
>>>>> Hi Igor,
>>>>>
>>>>> No BLUESTORE_LEGACY_STATFS warning, and
>>>>> bluestore_warn_on_legacy_statfs is the default true on this (and
all)
>>>>> clusters.
>>>>> I'm quite sure we did the statfs conversion during one of the
recent
>>>>> upgrades (I forget which one exactly).
>>>>>
>>>>> # ceph tell osd.* config get bluestore_warn_on_legacy_statfs | grep
-v true
>>>>> #
>>>>>
>>>>> Is there a command to see the statfs reported by an individual OSD ?
>>>>> We have a mix of ~year old and recently recreated OSDs, so I could
try
>>>>> to see if they differ.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>> On Thu, Nov 26, 2020 at 5:50 PM Igor Fedotov &lt;ifedotov(a)suse.de&gt;
wrote:
>>>>>> Hi Dan
>>>>>>
>>>>>> don't you have BLUESTORE_LEGACY_STATFS alert raised (might be
silenced
>>>>>> by bluestore_warn_on_legacy_statfs param) for the older cluster?
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Igor
>>>>>>
>>>>>>
>>>>>> On 11/26/2020 7:29 PM, Dan van der Ster wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Depending on which cluster I look at (all running v14.2.11),
the
>>>>>>> bytes_used is reporting raw space or stored bytes variably.
>>>>>>>
>>>>>>> Here's a 7 year old cluster:
>>>>>>>
>>>>>>> # ceph df -f json | jq .pools[0]
>>>>>>> {
>>>>>>>       "name": "volumes",
>>>>>>>       "id": 4,
>>>>>>>       "stats": {
>>>>>>>         "stored": 1229308190855881,
>>>>>>>         "objects": 294401604,
>>>>>>>         "kb_used": 1200496280133,
>>>>>>>         "bytes_used": 1229308190855881,
>>>>>>>         "percent_used": 0.4401889145374298,
>>>>>>>         "max_avail": 521125025021952
>>>>>>>       }
>>>>>>> }
>>>>>>>
>>>>>>> Note that stored == bytes_used for that pool. (this is a 3x
replica pool).
>>>>>>>
>>>>>>> But here's a newer cluster (installed recently with
nautilus)
>>>>>>>
>>>>>>> # ceph df -f json  | jq .pools[0]
>>>>>>> {
>>>>>>>       "name": "volumes",
>>>>>>>       "id": 1,
>>>>>>>       "stats": {
>>>>>>>         "stored": 680977600893041,
>>>>>>>         "objects": 163155803,
>>>>>>>         "kb_used": 1995736271829,
>>>>>>>         "bytes_used": 2043633942351985,
>>>>>>>         "percent_used": 0.23379847407341003,
>>>>>>>         "max_avail": 2232457428467712
>>>>>>>       }
>>>>>>> }
>>>>>>>
>>>>>>> In the second cluster, bytes_used is 3x stored.
>>>>>>>
>>>>>>> Does anyone know why these are not reported consistently?
>>>>>>> Noticing this just now, I'll update our monitoring to
plot stored
>>>>>>> rather than bytes_used from now on.
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> Dan
>>>>>>> _______________________________________________
>>>>>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>>>>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io

2024

2023

2022

2021

2020

2019

[ceph-users] Re: ceph df: pool stored vs bytes_used -- raw or not?