Hi
we've been running our Ceph cluster for nearly 2 years now (Nautilus)
and recently, due to a temporary situation the cluster is at 80% full.
We are only using CephFS on the cluster.
Normally, I realize we should be adding OSD nodes, but this is a
temporary situation, and I expect the cluster to go to <60% full quite soon.
Anyway, we are noticing some really problematic slowdowns. There are
some things that could be related but we are unsure...
- Our 2 MDS nodes (1 active, 1 standby) are configured with 128GB RAM,
but are not using more than 2GB, this looks either very inefficient, or
wrong ;-)
"ceph config dump |grep mds":
mds basic mds_cache_memory_limit
107374182400
mds advanced mds_max_scrub_ops_in_progress 10
Perhaps we require more or different settings to properly use the MDS
memory?
- On all our OSD nodes, the memory line is red in "atop", though no swap
is in use, it seems the memory on the OSD nodes is taking quite a
beating, is this normal, or can we tweak settings to make it less stressed?
This is the first time we are having performance issues like this, I
think, I'd like to learn some commands to help me analyse this...
I hope this will ring a bell with someone...
Cheers
/Simon