[ceph-users] Re: ceph slow at 80% full, mds nodes lots of unused memory

25 Feb 2021

On 24/02/2021 22:28, Patrick Donnelly wrote:
...
  Hello Simon,

 On Wed, Feb 24, 2021 at 7:43 AM Simon Oosthoek &lt;s.oosthoek(a)science.ru.nl&gt; wrote:

 On 24/02/2021 12:40, Simon Oosthoek wrote:
  Hi

 we've been running our Ceph cluster for nearly 2 years now (Nautilus)
 and recently, due to a temporary situation the cluster is at 80% full.

 We are only using CephFS on the cluster.

 Normally, I realize we should be adding OSD nodes, but this is a
 temporary situation, and I expect the cluster to go to <60% full quite soon.

 Anyway, we are noticing some really problematic slowdowns. There are
 some things that could be related but we are unsure...

 - Our 2 MDS nodes (1 active, 1 standby) are configured with 128GB RAM,
 but are not using more than 2GB, this looks either very inefficient, or
 wrong ;-) 
 After looking at our monitoring history, it seems the mds cache is
 actually used more fully, but most of our servers are getting a weekly
 reboot by default. This clears the mds cache obviously. I wonder if
 that's a smart idea for an MDS node...? ;-)  
 No, it's not. Can you also check that you do not have mds_cache_size
 configured, perhaps on the MDS local ceph.conf?

Hi Patrick,

I've already changed the reboot period to 1 month.

The mds_cache_size is not configured locally in the /etc/ceph/ceph.conf
file, so I guess it's just the weekly reboot that cleared the memory of
cache data...

I'm starting to think that a full ceph cluster could probably be the
only cause of performance problems. Though I don't know why that would be.

Cheers

/Simon

2024

2023

2022

2021

2020

2019

[ceph-users] Re: ceph slow at 80% full, mds nodes lots of unused memory