[ceph-users] Re: ceph df: pool stored vs bytes_used -- raw or not?

26 Nov 2020

OK, cool!

Will try to reproduce this locally tomorrow...

Thanks,

Igor

On 11/26/2020 10:19 PM, Dan van der Ster wrote:
> Those osds are intentionally out, yes. (They were drained to be replaced).
>
> I have fixed 2 clusters' stats already with this method ... both had
> up but out osds, and stopping the up/out osd fixed the stats.
>
> I opened a tracker for this: https://tracker.ceph.com/issues/48385
>
> -- dan
>
> On Thu, Nov 26, 2020 at 8:14 PM Igor Fedotov &lt;ifedotov(a)suse.de&gt; wrote:
>> Also wondering if you have the same "gap" OSDs at different cluster(s)
>> which show stats improperly?
>>
>>
>> On 11/26/2020 10:08 PM, Dan van der Ster wrote:
>>> Hey that's it!
>>>
>>> I stopped the up but out OSDs (100 and 177), and now the stats are correct!
>>>
>>> # ceph df
>>> RAW STORAGE:
>>>       CLASS     SIZE        AVAIL       USED        RAW USED     %RAW USED
>>>       hdd       5.5 PiB     1.2 PiB     4.3 PiB      4.3 PiB         78.62
>>>       TOTAL     5.5 PiB     1.2 PiB     4.3 PiB      4.3 PiB         78.62
>>>
>>> POOLS:
>>>       POOL       ID     STORED      OBJECTS     USED        %USED     MAX
AVAIL
>>>       public     68     2.9 PiB     143.56M     4.3 PiB     84.55       538
TiB
>>>       test       71      29 MiB       6.56k     1.2 GiB         0       269
TiB
>>>       foo        72     1.2 GiB         308     3.6 GiB         0       269
TiB
>>>
>>>
>>>
>>> On Thu, Nov 26, 2020 at 8:02 PM Dan van der Ster &lt;dan(a)vanderster.com&gt;
wrote:
>>>> There are a couple gaps, yes: https://termbin.com/9mx1
>>>>
>>>> What should I do?
>>>>
>>>> -- dan
>>>>
>>>> On Thu, Nov 26, 2020 at 7:52 PM Igor Fedotov &lt;ifedotov(a)suse.de&gt;
wrote:
>>>>> Does "ceph osd df tree" show stats properly (I mean there
are no evident
>>>>> gaps like unexpected zero values) for all the daemons?
>>>>>
>>>>>
>>>>>> 1. Anyway, I found something weird...
>>>>>>
>>>>>> I created a new 1-PG pool "foo" on a different cluster
and wrote some
>>>>>> data to it.
>>>>>>
>>>>>> The stored and used are equal.
>>>>>>
>>>>>> Thu 26 Nov 19:26:58 CET 2020
>>>>>> RAW STORAGE:
>>>>>>        CLASS     SIZE        AVAIL       USED        RAW USED    
%RAW USED
>>>>>>        hdd       5.5 PiB     1.2 PiB     4.3 PiB      4.3 PiB    
    78.31
>>>>>>        TOTAL     5.5 PiB     1.2 PiB     4.3 PiB      4.3 PiB    
    78.31
>>>>>>
>>>>>> POOLS:
>>>>>>        POOL       ID     STORED      OBJECTS     USED       
%USED     MAX AVAIL
>>>>>>        public     68     2.9 PiB     143.54M     2.9 PiB    
78.49       538 TiB
>>>>>>        test       71      29 MiB       6.56k      29 MiB        
0       269 TiB
>>>>>>        foo        72     1.2 GiB         308     1.2 GiB        
0       269 TiB
>>>>>>
>>>>>> But I tried restarting the relevant three OSDs, and the
bytes_used are
>>>>>> temporarily reported correctly:
>>>>>>
>>>>>> Thu 26 Nov 19:27:00 CET 2020
>>>>>> RAW STORAGE:
>>>>>>        CLASS     SIZE        AVAIL       USED        RAW USED    
%RAW USED
>>>>>>        hdd       5.5 PiB     1.2 PiB     4.3 PiB      4.3 PiB    
    78.62
>>>>>>        TOTAL     5.5 PiB     1.2 PiB     4.3 PiB      4.3 PiB    
    78.62
>>>>>>
>>>>>> POOLS:
>>>>>>        POOL       ID     STORED      OBJECTS     USED       
%USED     MAX AVAIL
>>>>>>        public     68     2.9 PiB     143.54M     4.3 PiB    
84.55       538 TiB
>>>>>>        test       71      29 MiB       6.56k     1.2 GiB        
0       269 TiB
>>>>>>        foo        72     1.2 GiB         308     3.6 GiB        
0       269 TiB
>>>>>>
>>>>>> But then a few seconds later it's back to used == stored:
>>>>>>
>>>>>> Thu 26 Nov 19:27:03 CET 2020
>>>>>> RAW STORAGE:
>>>>>>        CLASS     SIZE        AVAIL       USED        RAW USED    
%RAW USED
>>>>>>        hdd       5.5 PiB     1.2 PiB     4.3 PiB      4.3 PiB    
    78.47
>>>>>>        TOTAL     5.5 PiB     1.2 PiB     4.3 PiB      4.3 PiB    
    78.47
>>>>>>
>>>>>> POOLS:
>>>>>>        POOL       ID     STORED      OBJECTS     USED       
%USED     MAX AVAIL
>>>>>>        public     68     2.9 PiB     143.54M     2.9 PiB    
78.49       538 TiB
>>>>>>        test       71      29 MiB       6.56k      29 MiB        
0       269 TiB
>>>>>>        foo        72     1.2 GiB         308     1.2 GiB        
0       269 TiB
>>>>>>
>>>>>> It seems to report the correct stats only when the PG is peering
(so
>>>>>> some other transition state).
>>>>>> I've restarted all three relevant OSDs now -- the stats are
reported
>>>>>> as stored == used.
>>>>>>
>>>>>> 2. Another data point -- I found another old cluster that
reports
>>>>>> stored/used correctly. I have no idea what might be different
about
>>>>>> that cluster -- we updated it just like the others.
>>>>>>
>>>>>> Cheers, Dan
>>>>>>
>>>>>> On Thu, Nov 26, 2020 at 6:22 PM Igor Fedotov
&lt;ifedotov(a)suse.de&gt; wrote:
>>>>>>> For specific BlueStore instance you can learn relevant statfs
output by
>>>>>>>
>>>>>>> setting debug_bluestore to 20 and leaving OSD for 5-10
seconds (or may
>>>>>>> be a couple of minutes - don't remember exact statsfs
poll period ).
>>>>>>>
>>>>>>> Then grep osd log for "statfs" and/or
"pool_statfs" and get the output
>>>>>>> formatted as per the following operator (taken from
src/osd/osd_types.cc):
>>>>>>>
>>>>>>> ostream& operator<<(ostream& out, const
store_statfs_t &s)
>>>>>>> {
>>>>>>>       out << std::hex
>>>>>>>           << "store_statfs(0x" <<
s.available
>>>>>>>           << "/0x"  <<
s.internally_reserved
>>>>>>>           << "/0x"  << s.total
>>>>>>>           << ", data 0x" <<
s.data_stored
>>>>>>>           << "/0x"  << s.allocated
>>>>>>>           << ", compress 0x" <<
s.data_compressed
>>>>>>>           << "/0x"  <<
s.data_compressed_allocated
>>>>>>>           << "/0x"  <<
s.data_compressed_original
>>>>>>>           << ", omap 0x" <<
s.omap_allocated
>>>>>>>           << ", meta 0x" <<
s.internal_metadata
>>>>>>>           << std::dec
>>>>>>>           << ")";
>>>>>>>       return out;
>>>>>>> }
>>>>>>>
>>>>>>> But honestly I doubt this is BlueStore which reports
incorrectly since
>>>>>>> it doesn't care about replication.
>>>>>>>
>>>>>>> It rather looks like lack of stats from some replicas or
improper pg
>>>>>>> replica factor processing...
>>>>>>>
>>>>>>> Perhaps legacy vs. new pool what matters... Can you try to
create a new
>>>>>>> pool at old cluster and fill it with some data (e.g. just a
single 64K
>>>>>>> object) and check the stats?
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Igor
>>>>>>>
>>>>>>> On 11/26/2020 8:00 PM, Dan van der Ster wrote:
>>>>>>>> Hi Igor,
>>>>>>>>
>>>>>>>> No BLUESTORE_LEGACY_STATFS warning, and
>>>>>>>> bluestore_warn_on_legacy_statfs is the default true on
this (and all)
>>>>>>>> clusters.
>>>>>>>> I'm quite sure we did the statfs conversion during
one of the recent
>>>>>>>> upgrades (I forget which one exactly).
>>>>>>>>
>>>>>>>> # ceph tell osd.* config get
bluestore_warn_on_legacy_statfs | grep -v true
>>>>>>>> #
>>>>>>>>
>>>>>>>> Is there a command to see the statfs reported by an
individual OSD ?
>>>>>>>> We have a mix of ~year old and recently recreated OSDs,
so I could try
>>>>>>>> to see if they differ.
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>> Dan
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Nov 26, 2020 at 5:50 PM Igor Fedotov
&lt;ifedotov(a)suse.de&gt; wrote:
>>>>>>>>> Hi Dan
>>>>>>>>>
>>>>>>>>> don't you have BLUESTORE_LEGACY_STATFS alert
raised (might be silenced
>>>>>>>>> by bluestore_warn_on_legacy_statfs param) for the
older cluster?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Igor
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 11/26/2020 7:29 PM, Dan van der Ster wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Depending on which cluster I look at (all running
v14.2.11), the
>>>>>>>>>> bytes_used is reporting raw space or stored bytes
variably.
>>>>>>>>>>
>>>>>>>>>> Here's a 7 year old cluster:
>>>>>>>>>>
>>>>>>>>>> # ceph df -f json | jq .pools[0]
>>>>>>>>>> {
>>>>>>>>>>        "name": "volumes",
>>>>>>>>>>        "id": 4,
>>>>>>>>>>        "stats": {
>>>>>>>>>>          "stored": 1229308190855881,
>>>>>>>>>>          "objects": 294401604,
>>>>>>>>>>          "kb_used": 1200496280133,
>>>>>>>>>>          "bytes_used":
1229308190855881,
>>>>>>>>>>          "percent_used":
0.4401889145374298,
>>>>>>>>>>          "max_avail": 521125025021952
>>>>>>>>>>        }
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> Note that stored == bytes_used for that pool.
(this is a 3x replica pool).
>>>>>>>>>>
>>>>>>>>>> But here's a newer cluster (installed
recently with nautilus)
>>>>>>>>>>
>>>>>>>>>> # ceph df -f json  | jq .pools[0]
>>>>>>>>>> {
>>>>>>>>>>        "name": "volumes",
>>>>>>>>>>        "id": 1,
>>>>>>>>>>        "stats": {
>>>>>>>>>>          "stored": 680977600893041,
>>>>>>>>>>          "objects": 163155803,
>>>>>>>>>>          "kb_used": 1995736271829,
>>>>>>>>>>          "bytes_used":
2043633942351985,
>>>>>>>>>>          "percent_used":
0.23379847407341003,
>>>>>>>>>>          "max_avail": 2232457428467712
>>>>>>>>>>        }
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> In the second cluster, bytes_used is 3x stored.
>>>>>>>>>>
>>>>>>>>>> Does anyone know why these are not reported
consistently?
>>>>>>>>>> Noticing this just now, I'll update our
monitoring to plot stored
>>>>>>>>>> rather than bytes_used from now on.
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>> Dan
>>>>>>>>>> _______________________________________________
>>>>>>>>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>>>>>>>>> To unsubscribe send an email to
ceph-users-leave(a)ceph.io

2024

2023

2022

2021

2020

2019

[ceph-users] Re: ceph df: pool stored vs bytes_used -- raw or not?