Hi,
I've done a bit more testing ...
Am 05.03.2020 schrieb Hartwig Hauschild:
Hi,
I'm (still) testing upgrading from Luminous to Nautilus and ran into the
following situation:
The lab-setup I'm testing in has three OSD-Hosts.
If one of those hosts dies the store.db in /var/lib/ceph/mon/ on all my
Mon-Nodes starts to rapidly grow in size until either the OSD-host comes
back up or disks are full.
This also happens when I take one single OSD offline - /var/lib/ceph/mon/
grows from around 100MB to ~2GB in about 5 Minutes, then I aborted the test.
Since we've had an OSD-Host fail over a weekend I know that growing won't
stop until the disk is full and that usually happens in around 20 Minutes,
then taking up 17GB of diskspace.
On another cluster that's still on Luminous I
don't see any growth at all.
Retested that cluster as well, observing the size on disk of
/var/lib/ceph/mon/ suggests, that there's writes and deletes / compactions
going on as it kept floating within +- 5% of the original size.
Is that a difference in behaviour between Luminous and
Nautilus or is that
caused by the lab-setup only having three hosts and one lost host causing
all PGs to be degraded at the same time?
I've read somewhere in the docs that I should provide ample space (tens of
GB) for the store.db, found on the ML and Bugtracker that ~100GB might not
be a bad idea and that large clusters may require space on order of
magnitude greater.
Is there some sort of formula I can use to approximate the space required?
Also: is the db supposed to grow this fast in Nautilus when it did not do
that in Luminous? Is that behaviour configurable somewhere?
--
Cheers,
Hardy