On Nov 18, 2019, at 8:12 AM, Dan van der Ster <dan(a)vanderster.com> wrote:
On Fri, Nov 15, 2019 at 4:45 PM Joao Eduardo Luis <joao(a)suse.de> wrote:
On 19/11/14 11:04AM, Gregory Farnum wrote:
On Thu, Nov 14, 2019 at 8:14 AM Dan van der Ster
I might have found the reason why several of our clusters (and maybe
Bryan's too) are getting stuck not trimming osdmaps.
It seems that when an osd fails, the min_last_epoch_clean gets stuck
forever (even long after HEALTH_OK), until the ceph-mons are
I've updated the ticket: https://tracker.ceph.com/issues/41154
Wrong ticket, I think you meant https://tracker.ceph.com/issues/37875#note-7
I've seen this behavior a long, long time ago, but stopped being able to
reproduce it consistently enough to ensure the patch was working properly.
I think I have a patch here:
If you are feeling adventurous, and want to give it a try, let me know. I'll
be happy to forward port it to whatever you are running.
Thanks Joao, this patch is what I had in mind.
I'm trying to evaluate how adventurous this would be -- Is there any
risk that if a huge number of osds are down all at once (but
transiently), it would trigger the mon to trim too many maps?
I would expect that the remaining up OSDs will have a safe, low, osd_epoch ?
And anyway I guess that your proposed get_min_last_epoch_clean patch
is equivalent to what we have today if we restart the ceph-mon leader
while an osd is down.
I ran into this again today and found over 100,000 osdmaps on all 1,000 OSDs (~50 TiB of
disk space used just by osdmaps). There were down OSDs (pretty regular occurrence with
~1,000 OSDs) so that matches up with what Dan found. Then when I restarted all the mon
nodes twice the osdmaps started cleaning up.
I believe the steps to reproduce would look like this:
1. Start with a cluster with at least 1 down osd
2. Expand the cluster (the bigger the expansion, the more osdmaps that pile up)
3. Notice that after the expansion completes and the cluster is healthy that the old
osdmaps aren't cleaned up
I would be willing to test the fix on our test cluster after 14.2.5 comes out. Could you
make a build based on that release?