[Octopus] OSD overloading

List overview All Threads
Download

newer

older

Nautilus 14.2.7 radosgw lifecycle...

Multiple OSDs down, and won't come...

Jack

8 Apr 2020 8 Apr '20

12:38 p.m.

Hello, I've a issue, since my Nautilus -> Octopus upgrade My cluster has many rbd images (~3k or something) Each of them has ~30 snapshots Each day, I create and remove a least a snapshot per image Since Octopus, when I remove the "nosnaptrim" flags, each OSDs uses 100% of its CPU time The whole cluster collapses: OSDs no longer see each others, most of them are seens as down .. I do not see any progress being made : it does not appear the problem will solve by itself What can I do ? Best regards,

Show replies by date

Wido den Hollander

8 Apr 8 Apr

12:58 p.m.

On 4/8/20 1:38 PM, Jack wrote:

...

Why do you have the 'nosnaptrim' flag set? I'm missing that piece of information. > The whole cluster collapses: OSDs no longer see each others, most of > them are seens as down .. > I do not see any progress being made : it does not appear the problem > will solve by itself > > What can I do ? > > Best regards, > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io >

Jack

1:02 p.m.

I put the nosnaptrim during upgrade because I saw high CPU usage and though it was somehow related to the upgrade process However, all my daemon are now running Octopus, and the issue is still here, so I was wrong On 4/8/20 1:58 PM, Wido den Hollander wrote:

...

On 4/8/20 1:38 PM, Jack wrote:

Dan van der Ster

1:53 p.m.

Do you have a custom value for osd_snap_trim_sleep ? On Wed, Apr 8, 2020 at 2:03 PM Jack <ceph(a)jack.fr.eu.org> wrote: > > I put the nosnaptrim during upgrade because I saw high CPU usage and > though it was somehow related to the upgrade process > However, all my daemon are now running Octopus, and the issue is still > here, so I was wrong > > > On 4/8/20 1:58 PM, Wido den Hollander wrote: > > > > > > On 4/8/20 1:38 PM, Jack wrote: > >> Hello, > >> > >> I've a issue, since my Nautilus -> Octopus upgrade > >> > >> My cluster has many rbd images (~3k or something) > >> Each of them has ~30 snapshots > >> Each day, I create and remove a least a snapshot per image > >> > >> Since Octopus, when I remove the "nosnaptrim" flags, each OSDs uses 100% > >> of its CPU time > > > > Why do you have the 'nosnaptrim' flag set? I'm missing that piece of > > information. > > > >> The whole cluster collapses: OSDs no longer see each others, most of > >> them are seens as down .. > >> I do not see any progress being made : it does not appear the problem > >> will solve by itself > >> > >> What can I do ? > >> > >> Best regards, > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users(a)ceph.io > >> To unsubscribe send an email to ceph-users-leave(a)ceph.io > >> > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io

Jack

2:08 p.m.

I do: root@backup1:~# ceph config dump | grep snap_trim_sleep global advanced osd_snap_trim_sleep 60.000000 global advanced osd_snap_trim_sleep_hdd 60.000000 (cluster is fully rusty) On 4/8/20 2:53 PM, Dan van der Ster wrote:

...

Do you have a custom value for osd_snap_trim_sleep ? On Wed, Apr 8, 2020 at 2:03 PM Jack <ceph(a)jack.fr.eu.org> wrote: > > I put the nosnaptrim during upgrade because I saw high CPU usage and > though it was somehow related to the upgrade process > However, all my daemon are now running Octopus, and the issue is still > here, so I was wrong > > > On 4/8/20 1:58 PM, Wido den Hollander wrote: >> >> >> On 4/8/20 1:38 PM, Jack wrote: >>> Hello, >>> >>> I've a issue, since my Nautilus -> Octopus upgrade >>> >>> My cluster has many rbd images (~3k or something) >>> Each of them has ~30 snapshots >>> Each day, I create and remove a least a snapshot per image >>> >>> Since Octopus, when I remove the "nosnaptrim" flags, each OSDs uses 100% >>> of its CPU time >> >> Why do you have the 'nosnaptrim' flag set? I'm missing that piece of >> information. >> >>> The whole cluster collapses: OSDs no longer see each others, most of >>> them are seens as down .. >>> I do not see any progress being made : it does not appear the problem >>> will solve by itself >>> >>> What can I do ? >>> >>> Best regards, >>> _______________________________________________ >>> ceph-users mailing list -- ceph-users(a)ceph.io >>> To unsubscribe send an email to ceph-users-leave(a)ceph.io >>> > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io

Paul Emmerich

2:14 p.m.

What's the CPU busy with while spinning at 100%? Check "perf top" for a quick overview Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Wed, Apr 8, 2020 at 3:09 PM Jack <ceph(a)jack.fr.eu.org> wrote: > > I do: > root@backup1:~# ceph config dump | grep snap_trim_sleep > global advanced osd_snap_trim_sleep > 60.000000 > global advanced osd_snap_trim_sleep_hdd > 60.000000 > > (cluster is fully rusty) > > > On 4/8/20 2:53 PM, Dan van der Ster wrote: > > Do you have a custom value for osd_snap_trim_sleep ? > > > > On Wed, Apr 8, 2020 at 2:03 PM Jack <ceph(a)jack.fr.eu.org> wrote: > >> > >> I put the nosnaptrim during upgrade because I saw high CPU usage and > >> though it was somehow related to the upgrade process > >> However, all my daemon are now running Octopus, and the issue is still > >> here, so I was wrong > >> > >> > >> On 4/8/20 1:58 PM, Wido den Hollander wrote: > >>> > >>> > >>> On 4/8/20 1:38 PM, Jack wrote: > >>>> Hello, > >>>> > >>>> I've a issue, since my Nautilus -> Octopus upgrade > >>>> > >>>> My cluster has many rbd images (~3k or something) > >>>> Each of them has ~30 snapshots > >>>> Each day, I create and remove a least a snapshot per image > >>>> > >>>> Since Octopus, when I remove the "nosnaptrim" flags, each OSDs uses 100% > >>>> of its CPU time > >>> > >>> Why do you have the 'nosnaptrim' flag set? I'm missing that piece of > >>> information. > >>> > >>>> The whole cluster collapses: OSDs no longer see each others, most of > >>>> them are seens as down .. > >>>> I do not see any progress being made : it does not appear the problem > >>>> will solve by itself > >>>> > >>>> What can I do ? > >>>> > >>>> Best regards, > >>>> _______________________________________________ > >>>> ceph-users mailing list -- ceph-users(a)ceph.io > >>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io > >>>> > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users(a)ceph.io > >> To unsubscribe send an email to ceph-users-leave(a)ceph.io > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io

Jack

2:27 p.m.

The CPU is used by userspace, not kernelspace Here is the perf top, see attachment Rocksdb eats everything :/ On 4/8/20 3:14 PM, Paul Emmerich wrote:

...

What's the CPU busy with while spinning at 100%? Check "perf top" for a quick overview Paul

Jack

7:13 p.m.

Just to confirm this does not get better: root@backup1:~# ceph status cluster: id: 9cd41f0f-936d-4b59-8e5d-9b679dae9140 health: HEALTH_WARN 20 OSD(s) reporting legacy (not per-pool) BlueStore omap usage stats 4/50952060 objects unfound (0.000%) nobackfill,norecover,noscrub,nodeep-scrub flag(s) set 1 osds down 3 nearfull osd(s) Reduced data availability: 826 pgs inactive, 616 pgs down, 185 pgs peering, 158 pgs stale Low space hindering backfill (add storage if this doesn't resolve itself): 93 pgs backfill_toofull Degraded data redundancy: 13285415/101904120 objects degraded (13.037%), 706 pgs degraded, 696 pgs undersized 989 pgs not deep-scrubbed in time 378 pgs not scrubbed in time 10 pool(s) nearfull 2216 slow ops, oldest one blocked for 13905 sec, daemons [osd.1,osd.11,osd.20,osd.24,osd.25,osd.29,osd.31,osd.37,osd.4,osd.5]... have slow ops. services: mon: 1 daemons, quorum backup1 (age 8d) mgr: backup1(active, since 8d) osd: 37 osds: 26 up (since 9m), 27 in (since 2h); 626 remapped pgs flags nobackfill,norecover,noscrub,nodeep-scrub rgw: 1 daemon active (backup1.odiso.net) task status: data: pools: 10 pools, 2785 pgs objects: 50.95M objects, 92 TiB usage: 121 TiB used, 39 TiB / 160 TiB avail pgs: 29.659% pgs not active 13285415/101904120 objects degraded (13.037%) 433992/101904120 objects misplaced (0.426%) 4/50952060 objects unfound (0.000%) 840 active+clean+snaptrim_wait 536 down 490 active+undersized+degraded+remapped+backfilling 326 active+clean 113 peering 88 active+undersized+degraded 83 active+undersized+degraded+remapped+backfill_toofull 79 stale+down 63 stale+peering 51 active+clean+snaptrim 24 activating 22 active+recovering+degraded 19 active+remapped+backfilling 13 stale+active+undersized+degraded 9 remapped+peering 9 active+undersized+remapped+backfilling 9 active+undersized+degraded+remapped+backfill_wait+backfill_toofull 2 stale+active+clean+snaptrim 2 active+undersized 1 stale+active+clean+snaptrim_wait 1 active+remapped+backfill_toofull 1 active+clean+snaptrim_wait+laggy 1 active+recovering+undersized+remapped 1 down+remapped 1 activating+undersized+degraded+remapped 1 active+recovering+laggy On 4/8/20 3:27 PM, Jack wrote:

...

The CPU is used by userspace, not kernelspace Here is the perf top, see attachment Rocksdb eats everything :/ On 4/8/20 3:14 PM, Paul Emmerich wrote:

What's the CPU busy with while spinning at 100%? Check "perf top" for a quick overview Paul

_______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

Jack

7:15 p.m.

...

The CPU is used by userspace, not kernelspace Here is the perf top, see attachment Rocksdb eats everything :/ On 4/8/20 3:14 PM, Paul Emmerich wrote:

What's the CPU busy with while spinning at 100%? Check "perf top" for a quick overview Paul

_______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

Ashley Merrick

9 Apr 9 Apr

3:51 a.m.

Are you sure your not being hit by: ceph config set osd bluestore_fsck_quick_fix_on_mount false @ https://docs.ceph.com/docs/master/releases/octopus/ Have all your OSD's successfully completed the fsck? Reasons I say that is I can see "20 OSD(s) reporting legacy (not per-pool) BlueStore omap usage stats" ---- On Thu, 09 Apr 2020 02:15:02 +0800 Jack <mailto:ceph@jack.fr.eu.org> wrote ---- Just to confirm this does not get better: root@backup1:~# ceph status cluster: id: 9cd41f0f-936d-4b59-8e5d-9b679dae9140 health: HEALTH_WARN 20 OSD(s) reporting legacy (not per-pool) BlueStore omap usage stats 4/50952060 objects unfound (0.000%) nobackfill,norecover,noscrub,nodeep-scrub flag(s) set 1 osds down 3 nearfull osd(s) Reduced data availability: 826 pgs inactive, 616 pgs down, 185 pgs peering, 158 pgs stale Low space hindering backfill (add storage if this doesn't resolve itself): 93 pgs backfill_toofull Degraded data redundancy: 13285415/101904120 objects degraded (13.037%), 706 pgs degraded, 696 pgs undersized 989 pgs not deep-scrubbed in time 378 pgs not scrubbed in time 10 pool(s) nearfull 2216 slow ops, oldest one blocked for 13905 sec, daemons [osd.1,osd.11,osd.20,osd.24,osd.25,osd.29,osd.31,osd.37,osd.4,osd.5]... have slow ops. services: mon: 1 daemons, quorum backup1 (age 8d) mgr: backup1(active, since 8d) osd: 37 osds: 26 up (since 9m), 27 in (since 2h); 626 remapped pgs flags nobackfill,norecover,noscrub,nodeep-scrub rgw: 1 daemon active (backup1.odiso.net) task status: data: pools: 10 pools, 2785 pgs objects: 50.95M objects, 92 TiB usage: 121 TiB used, 39 TiB / 160 TiB avail pgs: 29.659% pgs not active 13285415/101904120 objects degraded (13.037%) 433992/101904120 objects misplaced (0.426%) 4/50952060 objects unfound (0.000%) 840 active+clean+snaptrim_wait 536 down 490 active+undersized+degraded+remapped+backfilling 326 active+clean 113 peering 88 active+undersized+degraded 83 active+undersized+degraded+remapped+backfill_toofull 79 stale+down 63 stale+peering 51 active+clean+snaptrim 24 activating 22 active+recovering+degraded 19 active+remapped+backfilling 13 stale+active+undersized+degraded 9 remapped+peering 9 active+undersized+remapped+backfilling 9 active+undersized+degraded+remapped+backfill_wait+backfill_toofull 2 stale+active+clean+snaptrim 2 active+undersized 1 stale+active+clean+snaptrim_wait 1 active+remapped+backfill_toofull 1 active+clean+snaptrim_wait+laggy 1 active+recovering+undersized+remapped 1 down+remapped 1 activating+undersized+degraded+remapped 1 active+recovering+laggy On 4/8/20 3:27 PM, Jack wrote:

...

The CPU is used by userspace, not kernelspace Here is the perf top, see attachment Rocksdb eats everything :/ On 4/8/20 3:14 PM, Paul Emmerich wrote:

What's the CPU busy with while spinning at 100%? Check "perf top" for a quick overview Paul

_______________________________________________ ceph-users mailing list -- mailto:ceph-users@ceph.io To unsubscribe send an email to mailto:ceph-users-leave@ceph.io

Jack

12 Apr 12 Apr

11:01 p.m.

Yep I am The issue is solved now .. and by solved, brace yourselves, I mean I had to recreate all OSDs And this the cluster would not heal itself (because of the original issue), I had to drop every rados pool, stop all OSDs, destroy & recreate them .. Yeah, well, hum There is definitly an underlying issue there Those OSDs were created and upgraded since Luminous I have no more cue on the bug Sadly, there is only so much downtime I can afford on this cluster Anyway .. On 4/9/20 4:51 AM, Ashley Merrick wrote:

...

The CPU is used by userspace, not kernelspace Here is the perf top, see attachment Rocksdb eats everything :/ On 4/8/20 3:14 PM, Paul Emmerich wrote:

What's the CPU busy with while spinning at 100%? Check "perf top" for a quick overview Paul

_______________________________________________ ceph-users mailing list -- mailto:ceph-users@ceph.io To unsubscribe send an email to mailto:ceph-users-leave@ceph.io

Igor Fedotov

13 Apr 13 Apr

11:20 a.m.

Given the symptoms high CPU usage within RocksDB and corresponding slowdown were presumably caused by RocksDB fragmentation. And temporary workaround would be to do manual DB compaction using ceph-kvstore-tool's compact command. Thanks, Igor On 4/13/2020 1:01 AM, Jack wrote: > Yep I am > > The issue is solved now .. and by solved, brace yourselves, I mean I had > to recreate all OSDs > > And this the cluster would not heal itself (because of the original > issue), I had to drop every rados pool, stop all OSDs, destroy & > recreate them .. > Yeah, well, hum > > There is definitly an underlying issue there > Those OSDs were created and upgraded since Luminous > > I have no more cue on the bug > Sadly, there is only so much downtime I can afford on this cluster > > Anyway .. > > On 4/9/20 4:51 AM, Ashley Merrick wrote: >> Are you sure your not being hit by: >> >> >> >> ceph config set osd bluestore_fsck_quick_fix_on_mount false @ https://docs.ceph.com/docs/master/releases/octopus/ >> >> Have all your OSD's successfully completed the fsck? >> >> >> >> Reasons I say that is I can see "20 OSD(s) reporting legacy (not per-pool) BlueStore omap usage stats" >> >> >> >> >> >> ---- On Thu, 09 Apr 2020 02:15:02 +0800 Jack <mailto:ceph@jack.fr.eu.org> wrote ---- >> >> >> >> Just to confirm this does not get better: >> >> root@backup1:~# ceph status >> cluster: >> id: 9cd41f0f-936d-4b59-8e5d-9b679dae9140 >> health: HEALTH_WARN >> 20 OSD(s) reporting legacy (not per-pool) BlueStore omap >> usage stats >> 4/50952060 objects unfound (0.000%) >> nobackfill,norecover,noscrub,nodeep-scrub flag(s) set >> 1 osds down >> 3 nearfull osd(s) >> Reduced data availability: 826 pgs inactive, 616 pgs down, >> 185 pgs peering, 158 pgs stale >> Low space hindering backfill (add storage if this doesn't >> resolve itself): 93 pgs backfill_toofull >> Degraded data redundancy: 13285415/101904120 objects >> degraded (13.037%), 706 pgs degraded, 696 pgs undersized >> 989 pgs not deep-scrubbed in time >> 378 pgs not scrubbed in time >> 10 pool(s) nearfull >> 2216 slow ops, oldest one blocked for 13905 sec, daemons >> [osd.1,osd.11,osd.20,osd.24,osd.25,osd.29,osd.31,osd.37,osd.4,osd.5]... >> have slow ops. >> >> services: >> mon: 1 daemons, quorum backup1 (age 8d) >> mgr: backup1(active, since 8d) >> osd: 37 osds: 26 up (since 9m), 27 in (since 2h); 626 remapped pgs >> flags nobackfill,norecover,noscrub,nodeep-scrub >> rgw: 1 daemon active (backup1.odiso.net) >> >> task status: >> >> data: >> pools: 10 pools, 2785 pgs >> objects: 50.95M objects, 92 TiB >> usage: 121 TiB used, 39 TiB / 160 TiB avail >> pgs: 29.659% pgs not active >> 13285415/101904120 objects degraded (13.037%) >> 433992/101904120 objects misplaced (0.426%) >> 4/50952060 objects unfound (0.000%) >> 840 active+clean+snaptrim_wait >> 536 down >> 490 active+undersized+degraded+remapped+backfilling >> 326 active+clean >> 113 peering >> 88 active+undersized+degraded >> 83 active+undersized+degraded+remapped+backfill_toofull >> 79 stale+down >> 63 stale+peering >> 51 active+clean+snaptrim >> 24 activating >> 22 active+recovering+degraded >> 19 active+remapped+backfilling >> 13 stale+active+undersized+degraded >> 9 remapped+peering >> 9 active+undersized+remapped+backfilling >> 9 >> active+undersized+degraded+remapped+backfill_wait+backfill_toofull >> 2 stale+active+clean+snaptrim >> 2 active+undersized >> 1 stale+active+clean+snaptrim_wait >> 1 active+remapped+backfill_toofull >> 1 active+clean+snaptrim_wait+laggy >> 1 active+recovering+undersized+remapped >> 1 down+remapped >> 1 activating+undersized+degraded+remapped >> 1 active+recovering+laggy >> >> On 4/8/20 3:27 PM, Jack wrote: >>> The CPU is used by userspace, not kernelspace >>> >>> Here is the perf top, see attachment >>> >>> Rocksdb eats everything :/ >>> >>> >>> On 4/8/20 3:14 PM, Paul Emmerich wrote: >>>> What's the CPU busy with while spinning at 100%? >>>> >>>> Check "perf top" for a quick overview >>>> >>>> >>>> Paul >>>> >>> >>> _______________________________________________ >>> ceph-users mailing list -- mailto:ceph-users@ceph.io >>> To unsubscribe send an email to mailto:ceph-users-leave@ceph.io >>> >> _______________________________________________ >> ceph-users mailing list -- mailto:ceph-users@ceph.io >> To unsubscribe send an email to mailto:ceph-users-leave@ceph.io >> > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io

Xiaoxi Chen

6:40 p.m.

I am not sure if any change in Octopus make this worse, but we are in Nautilus also seeing the RocksDB overhead during snaptrim is huge, we walk around by throttling the snaptrim speed to minimal as well as throttle deep-scurb, see https://www.spinics.net/lists/dev-ceph/msg01277.html for detail. We were expecting the Octopus get rid of removed_snaps key in OSDMap may improve things. Igor Fedotov <ifedotov(a)suse.de> 于2020年4月13日周一下午6:20写道：

...

https://docs.ceph.com/docs/master/releases/octopus/

> > Have all your OSD's successfully completed the fsck? > > > > Reasons I say that is I can see "20 OSD(s) reporting legacy (not

per-pool) BlueStore omap usage stats"

> > > > > > ---- On Thu, 09 Apr 2020 02:15:02 +0800 Jack <mailto:

ceph(a)jack.fr.eu.org> wrote ----

The CPU is used by userspace, not kernelspace Here is the perf top, see attachment Rocksdb eats everything :/ On 4/8/20 3:14 PM, Paul Emmerich wrote: > What's the CPU busy with while spinning at 100%? > > Check "perf top" for a quick overview > > > Paul > _______________________________________________ ceph-users mailing list -- mailto:ceph-users@ceph.io To unsubscribe send an email to mailto:ceph-users-leave@ceph.io

_______________________________________________ ceph-users mailing list -- mailto:ceph-users@ceph.io To unsubscribe send an email to mailto:ceph-users-leave@ceph.io

_______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

1501

days inactive

1501

days old

ceph-users@ceph.io

Manage subscription

12 comments

8 participants

tags (0)

participants (8)

Ashley Merrick
Dan van der Ster
Igor Fedotov
Jack
Jack
Paul Emmerich
Wido den Hollander
Xiaoxi Chen