Theoretically we shouldn't be spiking memory
as much these days
during
recovery, but the code is complicated and it's tough to reproduce
these kinds of issues in-house. If you happen to catch it in the
act,
do you see the pglog mempool stats also spiking up?
Mark
On 10/21/20 2:34 AM, Dan van der Ster wrote:
Hi,
This might be the pglog issue which has been coming up a few times
on the list.
If the OSD cannot boot without going OOM, you might have success by
trimming the pglog, e.g. search this list for "ceph-objectstore-tool
--op trim-pg-log" for some recipes. The thread "OSDs taking too much
memory, for pglog" in particular might help.
Cheers, Dan
On Tue, Oct 20, 2020 at 11:57 PM Ing. Luis Felipe Domínguez Vega
<luis.dominguez(a)desoft.cu> wrote:
> Hi, today mi Infra provider has a blackout, then the Ceph was try
> to
> recover but are in an inconsistent state because many OSD can
> recover
> itself because the kernel kill it by OOM. Even now one OSD that was
> OK,
> go down by OOM killed.
>
> Even in a server with 32GB RAM the OSD use ALL that and never
> recover, i
> think that can be a memory leak, ceph version octopus 15.2.3
>
> In:
https://pastebin.pl/view/59089adc
> You can see that buffer_anon get 32GB, but why?? all my cluster is
> down
> because that.
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io