thx for taking care. I read "works as designed,
be sure to have disk
space for the mon available”.
Well, yeah ;)
It sounds a little odd that the growth
from 50MB to ~15GB + compaction space happens within a couple of
seconds, when two OSD rejoin the cluster.
I’m suspicious — even on an Optane (which would be unusual for this application) it would
seem that writing 15GB -20GB (you mentioned both) of mon DB couldn’t happen within a
couple of seconds. Is it that `ceph -s` et al only start complaining when the DB reaches
that threshold, and that before the complaint it’s actually using like 14TB?
One thing, usually minor: dig into the mon DB directory. Are there extraneous files that
somehow are being dumped there? Next,
cd store.db ; ls -l LOG*
If there’s a huge LOG.old file, that can cause problems and can be deleted. If the LOG
file is huge, something else is going on.
If a get new disks (partitions) for the mons, is there
a size
recommendation? Is there a rule of thumb? BTW: Do I still need a
filesystem for the partition of the mon DB?
How small is your drive that you run out of space with just 20 GB of mon DB? Above you
hint that you have a separate filesystem for /var/lib/ceph — that could be your problem.
Did something dump a bunch of data in /var/tmp?
I tend to run mons with / using the whole of the drive (modulo /boot). So long as you
have proper rotation of /var/log/* configured in /etc/logrotate*, with decent size drives
you shouldn’t run out.
A pair of ~240GB SSDs usually suffices for mons; 400-480 are better yet.