increasing PGs OOM kill SSD OSDs (octopus) - unstable OSD behavior - ceph-users

List overview All Threads
Download

newer

increasing PGs OOM kill SSD OSDs (octopus) - unstable OSD behavior

older

slow replication of large buckets

setup problem for ingress + SSL...

Boris Behrens

21 Feb 2023 21 Feb '23

4:28 p.m.

Hi, today I wanted to increase the PGs from 2k -> 4k and random OSDs went offline in the cluster. After some investigation we saw, that the OSDs got OOM killed (I've seen a host that went from 90GB used memory to 190GB before OOM kills happen). We have around 24 SSD OSDs per host and 128GB/190GB/265GB memory in these hosts. All of them experienced OOM kills. All hosts are octopus / ubuntu 20.04. And on every step new OSDs crashed with OOM. (We now set the pg_num/pgp_num to 2516 to stop the process). The OSD logs do not show anything why this might happen. Some OSDs also segfault. I now started to stop all OSDs on a host, and do a "ceph-bluestore-tool repair" and a "ceph-kvstore-tool bluestore-kv compact" on all OSDs. This takes for the 8GB OSDs around 30 minutes. When I start the OSDs I instantly get a lot of slow OPS from all the other OSDs when the OSD come up (the 8TB OSDs take around 10 minutes with "load_pgs". I am unsure what I can do to restore normal cluster performance. Any ideas or suggestions or maybe even known bugs? Maybe a line for what I can search in the logs. Cheers Boris

Show replies by date

Josh Baergen

21 Feb 21 Feb

5:21 p.m.

Hi Boris, This sounds a bit like https://tracker.ceph.com/issues/53729. https://tracker.ceph.com/issues/53729#note-65 might help you diagnose whether this is the case. Josh On Tue, Feb 21, 2023 at 9:29 AM Boris Behrens <bb(a)kervyn.de> wrote: > > Hi, > today I wanted to increase the PGs from 2k -> 4k and random OSDs went > offline in the cluster. > After some investigation we saw, that the OSDs got OOM killed (I've seen a > host that went from 90GB used memory to 190GB before OOM kills happen). > > We have around 24 SSD OSDs per host and 128GB/190GB/265GB memory in these > hosts. All of them experienced OOM kills. > All hosts are octopus / ubuntu 20.04. > > And on every step new OSDs crashed with OOM. (We now set the pg_num/pgp_num > to 2516 to stop the process). > The OSD logs do not show anything why this might happen. > Some OSDs also segfault. > > I now started to stop all OSDs on a host, and do a "ceph-bluestore-tool > repair" and a "ceph-kvstore-tool bluestore-kv compact" on all OSDs. This > takes for the 8GB OSDs around 30 minutes. When I start the OSDs I instantly > get a lot of slow OPS from all the other OSDs when the OSD come up (the 8TB > OSDs take around 10 minutes with "load_pgs". > > I am unsure what I can do to restore normal cluster performance. Any ideas > or suggestions or maybe even known bugs? > Maybe a line for what I can search in the logs. > > Cheers > Boris > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io

Boris Behrens

5:31 p.m.

Thanks a lot Josh. That really seems like my problem. That does not look healthy in the cluster. oof. ~# ceph tell osd.* perf dump |grep 'osd_pglog\|^osd\.[0-9]' osd.0: { "osd_pglog_bytes": 459617868, "osd_pglog_items": 2955043, osd.1: { "osd_pglog_bytes": 598414548, "osd_pglog_items": 4315956, osd.2: { "osd_pglog_bytes": 357056504, "osd_pglog_items": 1942486, osd.3: { "osd_pglog_bytes": 436198324, "osd_pglog_items": 2863501, osd.4: { "osd_pglog_bytes": 373516972, "osd_pglog_items": 2127588, osd.5: { "osd_pglog_bytes": 335471560, "osd_pglog_items": 1822608, osd.6: { "osd_pglog_bytes": 391814808, "osd_pglog_items": 2394209, osd.7: { "osd_pglog_bytes": 541849048, "osd_pglog_items": 3880437, ... Am Di., 21. Feb. 2023 um 18:21 Uhr schrieb Josh Baergen < jbaergen(a)digitalocean.com>gt;:

...

Hi, today I wanted to increase the PGs from 2k -> 4k and random OSDs went offline in the cluster. After some investigation we saw, that the OSDs got OOM killed (I've seen

host that went from 90GB used memory to 190GB before OOM kills happen). We have around 24 SSD OSDs per host and 128GB/190GB/265GB memory in these hosts. All of them experienced OOM kills. All hosts are octopus / ubuntu 20.04. And on every step new OSDs crashed with OOM. (We now set the

pg_num/pgp_num

to 2516 to stop the process). The OSD logs do not show anything why this might happen. Some OSDs also segfault. I now started to stop all OSDs on a host, and do a "ceph-bluestore-tool repair" and a "ceph-kvstore-tool bluestore-kv compact" on all OSDs. This takes for the 8GB OSDs around 30 minutes. When I start the OSDs I

instantly

get a lot of slow OPS from all the other OSDs when the OSD come up (the

8TB

OSDs take around 10 minutes with "load_pgs". I am unsure what I can do to restore normal cluster performance. Any

ideas

or suggestions or maybe even known bugs? Maybe a line for what I can search in the logs. Cheers Boris _______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

-- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groÃƒ¼en Saal.

Boris Behrens

23 Feb 23 Feb

12:43 p.m.

New subject: growing osd_pglog_items (was: increasing PGs OOM kill SSD OSDs (octopus) - unstable OSD behavior)

After reading a lot about it I still don't understand how this happened and what I can do to fix it. This only trims the pglog, but not the duplicates: ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-41 --op trim-pg-log --pgid 8.664 I also try to recreate the OSDs (sync out, crush rm, wipe disk, create new osd, sync in), but the osd_pglog_items value seems to grow after everything is synced back in (I have 8TB disks that are at around 10million items, one day after I synced them back in). It seems not reach the old value which is around 50million, but still growing. Is there anything I can do with an octopus cluster, or is the only way to upgrade? And why does it happen? Am Di., 21. Feb. 2023 um 18:31 Uhr schrieb Boris Behrens <bb(a)kervyn.de>de>:

...

Hi, today I wanted to increase the PGs from 2k -> 4k and random OSDs went offline in the cluster. After some investigation we saw, that the OSDs got OOM killed (I've

seen a

host that went from 90GB used memory to 190GB before OOM kills happen). We have around 24 SSD OSDs per host and 128GB/190GB/265GB memory in

these

hosts. All of them experienced OOM kills. All hosts are octopus / ubuntu 20.04. And on every step new OSDs crashed with OOM. (We now set the

pg_num/pgp_num

instantly

get a lot of slow OPS from all the other OSDs when the OSD come up (the

8TB

OSDs take around 10 minutes with "load_pgs". I am unsure what I can do to restore normal cluster performance. Any

ideas

-- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groÃƒ¼en Saal.

441

days inactive

443

days old

ceph-users@ceph.io

Manage subscription

3 comments

2 participants

tags (0)

participants (2)

Boris Behrens
Josh Baergen