Hello Simon,
On Wed, Feb 24, 2021 at 7:43 AM Simon Oosthoek <s.oosthoek(a)science.ru.nl> wrote:
On 24/02/2021 12:40, Simon Oosthoek wrote:
Hi
we've been running our Ceph cluster for nearly 2 years now (Nautilus)
and recently, due to a temporary situation the cluster is at 80% full.
We are only using CephFS on the cluster.
Normally, I realize we should be adding OSD nodes, but this is a
temporary situation, and I expect the cluster to go to <60% full quite soon.
Anyway, we are noticing some really problematic slowdowns. There are
some things that could be related but we are unsure...
- Our 2 MDS nodes (1 active, 1 standby) are configured with 128GB RAM,
but are not using more than 2GB, this looks either very inefficient, or
wrong ;-)
After looking at our monitoring history, it seems the mds cache is
actually used more fully, but most of our servers are getting a weekly
reboot by default. This clears the mds cache obviously. I wonder if
that's a smart idea for an MDS node...? ;-)
No, it's not. Can you also check that you do not have mds_cache_size
configured, perhaps on the MDS local ceph.conf?