Hi Rich,
I've noticed this a couple of times on Nautilus
after doing some large
backfill operations. It seems the osd map doesn't clear properly after
the cluster returns to Health OK and builds up on the mons. I do a
"du" on the mon folder e.g. du -shx /var/lib/ceph/mon/ and this shows
several GB of data.
It does, almost 8 GB for <300 OSDs, which increased several-fold over
the last weeks (since we started upgrading Nautilus->Pacific). However,
I didn't think much of it after reading in the docs about the hardware
recommendations that require at least 60 GB per ceph-mon [1].
I give all my mgrs and mons a restart and after a few
minutes I can
see this osd map data getting purged from the mons. After a while it
should be back to a few hundred MB (depending on cluster size).
This may not be the problem in your case, but an easy thing to try.
Note, if your cluster is being held in Warning or Error by something
this can also explain the osd maps not clearing. Make sure you get the
cluster back to health OK first.
Thanks for the suggestion, will try that once we reach HEALTH_OK.
Best regards,
Jan-Philipp
[1]:
https://docs.ceph.com/en/latest/start/hardware-recommendations/#minimum-har…