On Wed, Sep 23, 2020 at 3:03 AM Ivan Kurnosov <zerkms(a)zerkms.com> wrote:
Hi,
this morning I woke up to a degraded test ceph cluster (managed by rook,
but it does not really change anything for the question I'm about to ask).
After checking logs I have found that bluestore on one of the OSDs run out
of space.
I think this is a consequence, and the real error is something else
that happened before.
The problem is that, if the cluster is unhealthy, the MON storage
accumulates a lot of osdmaps and pgmaps, and is not cleaned up
automatically, because the MONs think that these old versions might be
needed. And OSDs also get a copy of these osdmaps and pgmaps, if I
understand correctly, that's why small OSDs get quickly filled up if
the cluster stays unhealthy for a few hours.
So, my question would be: how could I have prevented
that? From monitoring
I have (prometheus) - OSDs are healthy, have plenty of space, yet they are
not.
What command (and prometheus metric) would help me understand the actual
real bluestore use? Or am I missing something?
You can fix monitoring by setting the "mon data size warn" to
something like 1 GB or even less.
Oh, and I "fixed" the cluster by expanding
the broken osd.0 with a larger
15GB volume. And 2 other OSDs still run on 10GB volumes.
Sometimes this doesn't help. For data recovery purposes, the most
helpful step if you get the "bluefs enospc" error is to add a separate
db device, like this:
systemctl disable --now ceph-osd@${OSDID}
truncate -s 32G /junk/osd.${OSDID}-recover/block.db
sgdisk -n 0:0:0 /junk/osd.${OSDID}-recover/block.db
ceph-bluestore-tool \
bluefs-bdev-new-db --path /var/lib/ceph/osd/ceph-${OSDID} \
--dev-target /junk/osd.${OSDID}-recover/block.db \
--bluestore-block-db-size=31G --bluefs-log-compact-min-size=31G
Of course you can use a real block device instead of just a file.
After that, export all PGs using ceph-objecttstore-tool and re-import
into a fresh OSD, then destroy or purge the full one.
Here is why the options:
--bluestore-block-db-size=31G: ceph-bluestore-tool refuses to do
anything if this option is not set to any value
--bluefs-log-compact-min-size=31G: make absolutely sure that log
compaction doesn't happen, because it would hit "bluefs enospc" again.
--
Alexander E. Patrakov
CV:
http://pc.cd/PLz7