Hello Ingo.
Did the problem actually went away after you upgraded everything to Nautilus?
I’m seeing the same issue in a Luminous cluster where a Nautilus node was introduced (with
the intent of upgrading the whole cluster to Nautilus).
When the problem happened we had:
Mons, Mgr - Nautilus
OSDs, rgw - Most on Luminous, 1 on Nautilus
Afterwards the Nautilus RGW was disabled, but still we left the Nautilus OSDs, and the
problem has never happened again.
There’s also this issue which seems related, but which implies that it can also happened
on an all Nautilus cluster:
https://tracker.ceph.com/issues/47451
<https://tracker.ceph.com/issues/47451>
Best regards,
André