[ceph-users] Re: osd_pglog memory hoarding - another case

17 Nov 2020

Hi,

...
  I don't think the default osd_min_pg_log_entries
has changed recently.
 In https://tracker.ceph.com/issues/47775 I proposed that we limit the
 pg log length by memory -- if it is indeed possible for log entries to
 get into several MB, then this would be necessary IMHO. 
I've had a surprising crash course on pg_log in the last 36 hours. But for the size of
each entry, you're right. I counted pg log * ODS, and did not take into factor pg log
* OSDs * pgs on the OSD. Still the total memory use that an OSD uses for pg_log was ~22GB
/ OSD process. 

...
  But you said you were trimming PG logs with the
offline tool? How long
 were those logs that needed to be trimmed? 
The logs we are trimming were ~3000, we trimmed them to the new size of 500. After
restarting the OSDs, it dropped the pg_log memory usage from ~22GB, to what we guess is
2-3GB but with the cluster at this state, it's hard to be specific. 

Cheers,
Kalle

> -- dan
> 
> 
> On Tue, Nov 17, 2020 at 11:58 AM Kalle Happonen &lt;kalle.happonen(a)csc.fi&gt; wrote:
>>
>> Another idea, which I don't know if has any merit.
>>
>> If 8 MB is a realistic log size (or has this grown for some reason?), did the
>> enforcement (or default) of the minimum value change lately
>> (osd_min_pg_log_entries)?
>>
>> If the minimum amount would be set to 1000, at 8 MB per log, we would have
>> issues with memory.
>>
>> Cheers,
>> Kalle
>>
>>
>>
>> ----- Original Message -----
>> > From: "Kalle Happonen" &lt;kalle.happonen(a)csc.fi&gt;
>> > To: "Dan van der Ster" &lt;dan(a)vanderster.com&gt;
>> > Cc: "ceph-users" &lt;ceph-users(a)ceph.io&gt;
>> > Sent: Tuesday, 17 November, 2020 12:45:25
>> > Subject: [ceph-users] Re: osd_pglog memory hoarding - another case
>>
>> > Hi Dan @ co.,
>> > Thanks for the support (moral and technical).
>> >
>> > That sounds like a good guess, but it seems like there is nothing alarming
here.
>> > In all our pools, some pgs are a bit over 3100, but not at any exceptional
>> > values.
>> >
>> > cat pgdumpfull.txt | jq '.pg_map.pg_stats[] |
>> > select(.ondisk_log_size > 3100)' | egrep
"pgid|ondisk_log_size"
>> >  "pgid": "37.2b9",
>> >  "ondisk_log_size": 3103,
>> >  "pgid": "33.e",
>> >  "ondisk_log_size": 3229,
>> >  "pgid": "7.2",
>> >  "ondisk_log_size": 3111,
>> >  "pgid": "26.4",
>> >  "ondisk_log_size": 3185,
>> >  "pgid": "33.4",
>> >  "ondisk_log_size": 3311,
>> >  "pgid": "33.8",
>> >  "ondisk_log_size": 3278,
>> >
>> > I also have no idea what the average size of a pg log entry should be, in
our
>> > case it seems it's around 8 MB (22GB/3000 entires).
>> >
>> > Cheers,
>> > Kalle
>> >
>> > ----- Original Message -----
>> >> From: "Dan van der Ster" &lt;dan(a)vanderster.com&gt;
>> >> To: "Kalle Happonen" &lt;kalle.happonen(a)csc.fi&gt;
>> >> Cc: "ceph-users" &lt;ceph-users(a)ceph.io&gt;io>, "xie
xingguo" &lt;xie.xingguo(a)zte.com.cn&gt;cn>,
>> >> "Samuel Just" &lt;sjust(a)redhat.com&gt;
>> >> Sent: Tuesday, 17 November, 2020 12:22:28
>> >> Subject: Re: [ceph-users] osd_pglog memory hoarding - another case
>> >
>> >> Hi Kalle,
>> >>
>> >> Do you have active PGs now with huge pglogs?
>> >> You can do something like this to find them:
>> >>
>> >>   ceph pg dump -f json | jq '.pg_map.pg_stats[] |
>> >> select(.ondisk_log_size > 3000)'
>> >>
>> >> If you find some, could you increase to debug_osd = 10 then share the
osd log.
>> >> I am interested in the debug lines from calc_trim_to_aggressively (or
>> >> calc_trim_to if you didn't enable pglog_hardlimit), but the whole
log
>> >> might show other issues.
>> >>
>> >> Cheers, dan
>> >>
>> >>
>> >> On Tue, Nov 17, 2020 at 9:55 AM Dan van der Ster
&lt;dan(a)vanderster.com&gt; wrote:
>> >>>
>> >>> Hi Kalle,
>> >>>
>> >>> Strangely and luckily, in our case the memory explosion didn't
reoccur
>> >>> after that incident. So I can mostly only offer moral support.
>> >>>
>> >>> But if this bug indeed appeared between 14.2.8 and 14.2.13, then I
>> >>> think this is suspicious:
>> >>>
>> >>>    b670715eb4 osd/PeeringState: do not trim pg log past
last_update_ondisk
>> >>>
>> >>>    https://github.com/ceph/ceph/commit/b670715eb4
>> >>>
>> >>> Given that it adds a case where the pg_log is not trimmed, I wonder
if
>> >>> there could be an unforeseen condition where `last_update_ondisk`
>> >>> isn't being updated correctly, and therefore the osd stops
trimming
>> >>> the pg_log altogether.
>> >>>
>> >>> Xie or Samuel: does that sound possible?
>> >>>
>> >>> Cheers, Dan
>> >>>
>> >>> On Tue, Nov 17, 2020 at 9:35 AM Kalle Happonen
&lt;kalle.happonen(a)csc.fi&gt; wrote:
>> >>> >
>> >>> > Hello all,
>> >>> > wrt:
>> >>> >
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/7IMIWCKIHXN…
>> >>> >
>> >>> > Yesterday we hit a problem with osd_pglog memory, similar to
the thread above.
>> >>> >
>> >>> > We have a 56 node object storage (S3+SWIFT) cluster with 25 OSD
disk per node.
>> >>> > We run 8+3 EC for the data pool (metadata is on replicated nvme
pool).
>> >>> >
>> >>> > The cluster has been running fine, and (as relevant to the
post) the memory
>> >>> > usage has been stable at 100 GB / node. We've had the
default pg_log of 3000.
>> >>> > The user traffic doesn't seem to have been exceptional
lately.
>> >>> >
>> >>> > Last Thursday we updated the OSDs from 14.2.8 -> 14.2.13. On
Friday the memory
>> >>> > usage on OSD nodes started to grow. On each node it grew
steadily about 30
>> >>> > GB/day, until the servers started OOM killing OSD processes.
>> >>> >
>> >>> > After a lot of debugging we found that the pg_logs were huge.
Each OSD process
>> >>> > pg_log had grown to ~22GB, which we naturally didn't have
memory for, and then
>> >>> > the cluster was in an unstable situation. This is significantly
more than the
>> >>> > 1,5 GB in the post above. We do have ~20k pgs, which may
directly affect the
>> >>> > size.
>> >>> >
>> >>> > We've reduced the pg_log to 500, and started offline
trimming it where we can,
>> >>> > and also just waited. The pg_log size dropped to ~1,2 GB on at
least some
>> >>> > nodes, but we're  still recovering, and have a lot of ODSs
down and out still.
>> >>> >
>> >>> > We're unsure if version 14.2.13 triggered this, or if the
osd restarts triggered
>> >>> > this (or something unrelated we don't see).
>> >>> >
>> >>> > This mail is mostly to figure out if there are good guesses why
the pg_log size
>> >>> > per OSD process exploded? Any technical (and moral) support is
appreciated.
>> >>> > Also, currently we're not sure if 14.2.13 triggered this,
so this is also to
>> >>> > put a data point out there for other debuggers.
>> >>> >
>> >>> > Cheers,
>> >>> > Kalle Happonen
>> >>> > _______________________________________________
>> >>> > ceph-users mailing list -- ceph-users(a)ceph.io
>> >> > > To unsubscribe send an email to ceph-users-leave(a)ceph.io
>> > _______________________________________________
>> > ceph-users mailing list -- ceph-users(a)ceph.io
> > > To unsubscribe send an email to ceph-users-leave(a)ceph.io

2024

2023

2022

2021

2020

2019

[ceph-users] Re: osd_pglog memory hoarding - another case