Hi,
I don't think the default osd_min_pg_log_entries
has changed recently.
In
https://tracker.ceph.com/issues/47775 I proposed that we limit the
pg log length by memory -- if it is indeed possible for log entries to
get into several MB, then this would be necessary IMHO.
I've had a surprising crash course on pg_log in the last 36 hours. But for the size of
each entry, you're right. I counted pg log * ODS, and did not take into factor pg log
* OSDs * pgs on the OSD. Still the total memory use that an OSD uses for pg_log was ~22GB
/ OSD process.
But you said you were trimming PG logs with the
offline tool? How long
were those logs that needed to be trimmed?
The logs we are trimming were ~3000, we trimmed them to the new size of 500. After
restarting the OSDs, it dropped the pg_log memory usage from ~22GB, to what we guess is
2-3GB but with the cluster at this state, it's hard to be specific.
Cheers,
Kalle
> -- dan
>
>
> On Tue, Nov 17, 2020 at 11:58 AM Kalle Happonen <kalle.happonen(a)csc.fi> wrote:
>>
>> Another idea, which I don't know if has any merit.
>>
>> If 8 MB is a realistic log size (or has this grown for some reason?), did the
>> enforcement (or default) of the minimum value change lately
>> (osd_min_pg_log_entries)?
>>
>> If the minimum amount would be set to 1000, at 8 MB per log, we would have
>> issues with memory.
>>
>> Cheers,
>> Kalle
>>
>>
>>
>> ----- Original Message -----
>> > From: "Kalle Happonen" <kalle.happonen(a)csc.fi>
>> > To: "Dan van der Ster" <dan(a)vanderster.com>
>> > Cc: "ceph-users" <ceph-users(a)ceph.io>
>> > Sent: Tuesday, 17 November, 2020 12:45:25
>> > Subject: [ceph-users] Re: osd_pglog memory hoarding - another case
>>
>> > Hi Dan @ co.,
>> > Thanks for the support (moral and technical).
>> >
>> > That sounds like a good guess, but it seems like there is nothing alarming
here.
>> > In all our pools, some pgs are a bit over 3100, but not at any exceptional
>> > values.
>> >
>> > cat pgdumpfull.txt | jq '.pg_map.pg_stats[] |
>> > select(.ondisk_log_size > 3100)' | egrep
"pgid|ondisk_log_size"
>> > "pgid": "37.2b9",
>> > "ondisk_log_size": 3103,
>> > "pgid": "33.e",
>> > "ondisk_log_size": 3229,
>> > "pgid": "7.2",
>> > "ondisk_log_size": 3111,
>> > "pgid": "26.4",
>> > "ondisk_log_size": 3185,
>> > "pgid": "33.4",
>> > "ondisk_log_size": 3311,
>> > "pgid": "33.8",
>> > "ondisk_log_size": 3278,
>> >
>> > I also have no idea what the average size of a pg log entry should be, in
our
>> > case it seems it's around 8 MB (22GB/3000 entires).
>> >
>> > Cheers,
>> > Kalle
>> >
>> > ----- Original Message -----
>> >> From: "Dan van der Ster" <dan(a)vanderster.com>
>> >> To: "Kalle Happonen" <kalle.happonen(a)csc.fi>
>> >> Cc: "ceph-users" <ceph-users(a)ceph.io>io>, "xie
xingguo" <xie.xingguo(a)zte.com.cn>cn>,
>> >> "Samuel Just" <sjust(a)redhat.com>
>> >> Sent: Tuesday, 17 November, 2020 12:22:28
>> >> Subject: Re: [ceph-users] osd_pglog memory hoarding - another case
>> >
>> >> Hi Kalle,
>> >>
>> >> Do you have active PGs now with huge pglogs?
>> >> You can do something like this to find them:
>> >>
>> >> ceph pg dump -f json | jq '.pg_map.pg_stats[] |
>> >> select(.ondisk_log_size > 3000)'
>> >>
>> >> If you find some, could you increase to debug_osd = 10 then share the
osd log.
>> >> I am interested in the debug lines from calc_trim_to_aggressively (or
>> >> calc_trim_to if you didn't enable pglog_hardlimit), but the whole
log
>> >> might show other issues.
>> >>
>> >> Cheers, dan
>> >>
>> >>
>> >> On Tue, Nov 17, 2020 at 9:55 AM Dan van der Ster
<dan(a)vanderster.com> wrote:
>> >>>
>> >>> Hi Kalle,
>> >>>
>> >>> Strangely and luckily, in our case the memory explosion didn't
reoccur
>> >>> after that incident. So I can mostly only offer moral support.
>> >>>
>> >>> But if this bug indeed appeared between 14.2.8 and 14.2.13, then I
>> >>> think this is suspicious:
>> >>>
>> >>> b670715eb4 osd/PeeringState: do not trim pg log past
last_update_ondisk
>> >>>
>> >>>
https://github.com/ceph/ceph/commit/b670715eb4
>> >>>
>> >>> Given that it adds a case where the pg_log is not trimmed, I wonder
if
>> >>> there could be an unforeseen condition where `last_update_ondisk`
>> >>> isn't being updated correctly, and therefore the osd stops
trimming
>> >>> the pg_log altogether.
>> >>>
>> >>> Xie or Samuel: does that sound possible?
>> >>>
>> >>> Cheers, Dan
>> >>>
>> >>> On Tue, Nov 17, 2020 at 9:35 AM Kalle Happonen
<kalle.happonen(a)csc.fi> wrote:
>> >>> >
>> >>> > Hello all,
>> >>> > wrt:
>> >>> >
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/7IMIWCKIHXN…
>> >>> >
>> >>> > Yesterday we hit a problem with osd_pglog memory, similar to
the thread above.
>> >>> >
>> >>> > We have a 56 node object storage (S3+SWIFT) cluster with 25 OSD
disk per node.
>> >>> > We run 8+3 EC for the data pool (metadata is on replicated nvme
pool).
>> >>> >
>> >>> > The cluster has been running fine, and (as relevant to the
post) the memory
>> >>> > usage has been stable at 100 GB / node. We've had the
default pg_log of 3000.
>> >>> > The user traffic doesn't seem to have been exceptional
lately.
>> >>> >
>> >>> > Last Thursday we updated the OSDs from 14.2.8 -> 14.2.13. On
Friday the memory
>> >>> > usage on OSD nodes started to grow. On each node it grew
steadily about 30
>> >>> > GB/day, until the servers started OOM killing OSD processes.
>> >>> >
>> >>> > After a lot of debugging we found that the pg_logs were huge.
Each OSD process
>> >>> > pg_log had grown to ~22GB, which we naturally didn't have
memory for, and then
>> >>> > the cluster was in an unstable situation. This is significantly
more than the
>> >>> > 1,5 GB in the post above. We do have ~20k pgs, which may
directly affect the
>> >>> > size.
>> >>> >
>> >>> > We've reduced the pg_log to 500, and started offline
trimming it where we can,
>> >>> > and also just waited. The pg_log size dropped to ~1,2 GB on at
least some
>> >>> > nodes, but we're still recovering, and have a lot of ODSs
down and out still.
>> >>> >
>> >>> > We're unsure if version 14.2.13 triggered this, or if the
osd restarts triggered
>> >>> > this (or something unrelated we don't see).
>> >>> >
>> >>> > This mail is mostly to figure out if there are good guesses why
the pg_log size
>> >>> > per OSD process exploded? Any technical (and moral) support is
appreciated.
>> >>> > Also, currently we're not sure if 14.2.13 triggered this,
so this is also to
>> >>> > put a data point out there for other debuggers.
>> >>> >
>> >>> > Cheers,
>> >>> > Kalle Happonen
>> >>> > _______________________________________________
>> >>> > ceph-users mailing list -- ceph-users(a)ceph.io
>> >> > > To unsubscribe send an email to ceph-users-leave(a)ceph.io
>> > _______________________________________________
>> > ceph-users mailing list -- ceph-users(a)ceph.io
> > > To unsubscribe send an email to ceph-users-leave(a)ceph.io