Sorry for following up on myself (again), but I had left out an
important detail:
Simon Leinen writes:
Using the "stupid" allocator, we never had
any crashes with this
assert. But the OSDs run more slowly this way.
So what we ended up doing was: When an OSD crashed
with this assert, we
did an offline compaction of the DB, and then started it again with the
bitmap allocator. So far the resulting OSDs seem to run fine.
For the offline compaction, we used the "stupid" allocator, i.e.
sudo env CEPH_ARGS="--bluestore-allocator stupid" ceph-kvstore-tool
bluestore-kv /var/lib/ceph/osd/ceph-$OSD compact
With the default "bitmap" allocator, the compaction job would fail with
the same ceph_assert().
--
Simon.