osdmaps not trimmed until ceph-mon's restarted (if cluster has a down osd)

List overview All Threads
Download

newer

older

ceph-volume simple disk scenario...

14.2.5 QE Nautilus validation...

Dan van der Ster

14 Nov 2019 14 Nov '19

7:13 p.m.

Hi Joao, I might have found the reason why several of our clusters (and maybe Bryan's too) are getting stuck not trimming osdmaps. It seems that when an osd fails, the min_last_epoch_clean gets stuck forever (even long after HEALTH_OK), until the ceph-mons are restarted. I've updated the ticket: https://tracker.ceph.com/issues/41154 Cheers, Dan

Show replies by thread

Nathan Cutler

14 Nov 14 Nov

9:41 p.m.

Hi Dan:

...

I might have found the reason why several of our clusters (and maybe Bryan's too) are getting stuck not trimming osdmaps. It seems that when an osd fails, the min_last_epoch_clean gets stuck forever (even long after HEALTH_OK), until the ceph-mons are restarted. I've updated the ticket: https://tracker.ceph.com/issues/41154

Did you mean to write https://tracker.ceph.com/issues/37875 here? Nathan

Dan van der Ster

10:03 p.m.

On Thursday, November 14, 2019, Nathan Cutler <ncutler(a)suse.com> wrote:

...

Hi Dan:

Did you mean to write https://tracker.ceph.com/issues/37875 here?

Oops yes that's it. Not sure where that other link came from. Thanks! Dan > Nathan

Gregory Farnum

10:04 p.m.

On Thu, Nov 14, 2019 at 8:14 AM Dan van der Ster <dan(a)vanderster.com> wrote:

...

Wrong ticket, I think you meant https://tracker.ceph.com/issues/37875#note-7 ? :)

...

Cheers, Dan _______________________________________________ Dev mailing list -- dev(a)ceph.io To unsubscribe send an email to dev-leave(a)ceph.io

Joao Eduardo Luis

15 Nov 15 Nov

6:45 p.m.

New subject: [ceph-users] Re: osdmaps not trimmed until ceph-mon's restarted (if cluster has a down osd)

On 19/11/14 11:04AM, Gregory Farnum wrote:

...

On Thu, Nov 14, 2019 at 8:14 AM Dan van der Ster <dan(a)vanderster.com> wrote:

Wrong ticket, I think you meant https://tracker.ceph.com/issues/37875#note-7

I've seen this behavior a long, long time ago, but stopped being able to reproduce it consistently enough to ensure the patch was working properly. I think I have a patch here: https://github.com/ceph/ceph/pull/19076/commits If you are feeling adventurous, and want to give it a try, let me know. I'll be happy to forward port it to whatever you are running. -Joao

Dan van der Ster

18 Nov 18 Nov

6:12 p.m.

New subject: [ceph-users] Re: osdmaps not trimmed until ceph-mon's restarted (if cluster has a down osd)

On Fri, Nov 15, 2019 at 4:45 PM Joao Eduardo Luis <joao(a)suse.de> wrote:

...

On 19/11/14 11:04AM, Gregory Farnum wrote:

On Thu, Nov 14, 2019 at 8:14 AM Dan van der Ster <dan(a)vanderster.com> wrote:

Wrong ticket, I think you meant https://tracker.ceph.com/issues/37875#note-7

Thanks Joao, this patch is what I had in mind. I'm trying to evaluate how adventurous this would be -- Is there any risk that if a huge number of osds are down all at once (but transiently), it would trigger the mon to trim too many maps? I would expect that the remaining up OSDs will have a safe, low, osd_epoch ? And anyway I guess that your proposed get_min_last_epoch_clean patch is equivalent to what we have today if we restart the ceph-mon leader while an osd is down. -- Dan

Bryan Stillwell

9 Dec 9 Dec

8:24 p.m.

New subject: [ceph-users] Re: osdmaps not trimmed until ceph-mon's restarted (if cluster has a down osd)

On Nov 18, 2019, at 8:12 AM, Dan van der Ster <dan(a)vanderster.com> wrote:

...

On Fri, Nov 15, 2019 at 4:45 PM Joao Eduardo Luis <joao(a)suse.de> wrote:

On 19/11/14 11:04AM, Gregory Farnum wrote:

On Thu, Nov 14, 2019 at 8:14 AM Dan van der Ster <dan(a)vanderster.com> wrote:

Wrong ticket, I think you meant https://tracker.ceph.com/issues/37875#note-7

Joao, I ran into this again today and found over 100,000 osdmaps on all 1,000 OSDs (~50 TiB of disk space used just by osdmaps). There were down OSDs (pretty regular occurrence with ~1,000 OSDs) so that matches up with what Dan found. Then when I restarted all the mon nodes twice the osdmaps started cleaning up. I believe the steps to reproduce would look like this: 1. Start with a cluster with at least 1 down osd 2. Expand the cluster (the bigger the expansion, the more osdmaps that pile up) 3. Notice that after the expansion completes and the cluster is healthy that the old osdmaps aren't cleaned up I would be willing to test the fix on our test cluster after 14.2.5 comes out. Could you make a build based on that release? Thanks, Bryan

1608

days inactive

1633

days old

dev@ceph.io

Manage subscription

6 comments

5 participants

tags (0)

participants (5)

Bryan Stillwell
Dan van der Ster
Gregory Farnum
Joao Eduardo Luis
Nathan Cutler