Hi Hervé,
On 01.06.21 14:00, Hervé Ballans wrote:
I'm aware with your points, and maybe I was not
really clear in my
previous email (written in a hurry!)
The problematic pool is the metadata one. All its OSDs (x3) are full.
The associated data pool is OK and no OSD is full on the data pool.
Are you saying that you only have 3 OSD for your metadata pool, which
are the full ones? Alright, then you can - at least for this specific
issue - disregard my previous comment.
The problem is that metadata pool suddenly increases a lot and
continiously from 3% to 100% in 5 hours (from 5 am to 10 am, then crash)
724 GiB stored in the metadata pool with only 11 TiB cephfs data size
does seem huge at first glance. For reference, I have about 160 TiB
cephfs data with only 31 GiB stored in the metadata pool.
I don't have an explanation for this behaviour, as I am relatively new
to Ceph. Maybe the list can chime in?
And we don't understand the reason, since there was no specific
activities on the data pool ?
This cluster runs perfectly with the current configuration since many
years.
Probably unrelated to your issues: I noticed that your STORED and USED
column in `ceph df` output are identical. Is that because of Nautilus (I
myself am running Octopus, where USED is the expected multiple of STORED
depending on replication factor / EC configuration in the pool) or are
you running a specific configuration that might cause that?
Cheers
Sebastian