[ceph-users] Re: MDS cache tunning

26 May 2021

Ok thank's, I will try to update Nautilus. But really  I don't 
understand the problem, apparently randomly Warnings appear:

[WRN] Health check failed: 1 MDSs report slow requests (MDS_SLOW_REQUEST)

cluster [INF] Health check cleared: FS_DEGRADED (was: 1 filesystem is 
degraded)

: cluster [DBG] mds.? 
[v2:10.100.190.39:6800/2624951349,v1:10.100.190.39:6801/2624951349] 
up:rejoin
2021-05-26 10:55:33.215102 mon.ceph2mon01 (mon.0) 700 : cluster [DBG] 
fsmap nxtclfs:2/2 {0=ceph2mon03=up:rejoin,1=ceph2mon01=up:active} 1 
up:standby

Degrading the filesystem and I have assumed that the problem is due to 
the memory consumption of the MDS process, which can reach around 80% or 
more of the total memory.

El 26/5/21 a las 13:21, Dan van der Ster escribió:
...
  I've seen your other thread. Using 78GB of RAM
when the memory limit
 is set to 64GB is not highly unusual, and doesn't necessarily indicate
 any problem.
 It *would* be a problem if the MDS memory grows uncontrollably, however.

 Otherwise, check those new defaults for caps recall -- they were
 released around 14.2.19 IIRC.

 -- Dan

 On Wed, May 26, 2021 at 12:46 PM Andres Rojas Guerrero &lt;a.rojas(a)csic.es&gt; wrote:
>
> Thanks for the answer. Yes, during these last weeks I have had memory
> consumption problems in the MDS nodes that led, at least it seemed to
> me, to performance problems in CephFS. I have been varying, for example:
>
> mds_cache_memory_limit
> mds_min_caps_per_client
> mds_health_cache_threshold
> mds_max_caps_per_client
> mds_cache_reservation
>
> But without much knowledge and with a trial and error procedure, i.e.
> observing how CephFS behaved when changing one of the parameters.
> Although I have achieved improvement the procedure does not convince me
> at all and that's why I was asking if there was something  more reliable ...
>
>
>
>
> El 26/5/21 a las 12:15, Dan van der Ster escribió:
>> Hi,
>>
>> The mds_cache_memory_limit should be set to something relative to the
>> RAM size of the MDS -- maybe 50% is a good rule of thumb, because
>> there are a few cases where the RSS can exceed this limit. Your
>> experience will help guide what size you need (metadata pool IO
>> activity will be really high if the MDS cache is too small)
>>
>> Otherwise, in recent releases of N/O/P the defaults for those settings
>> you mentioned are quite good [1]; I would be surprised if they need
>> further tuning for 99% of users.
>> Is there any reason you want to start adjusting these params?
>>
>> Best Regards,
>>
>> Dan
>>
>> [1] https://github.com/ceph/ceph/pull/38574
>>
>> On Wed, May 26, 2021 at 11:58 AM Andres Rojas Guerrero &lt;a.rojas(a)csic.es&gt;
wrote:
>>>
>>> Hi all, I have observed that the MDS Cache Configuration has 18 parameters:
>>>
>>> mds_cache_memory_limit
>>> mds_cache_reservation
>>> mds_health_cache_threshold
>>> mds_cache_trim_threshold
>>> mds_cache_trim_decay_rate
>>> mds_recall_max_caps
>>> mds_recall_max_decay_threshold
>>> mds_recall_max_decay_rate
>>> mds_recall_global_max_decay_threshold
>>> mds_recall_warning_threshold
>>> mds_recall_warning_decay_rate
>>> mds_session_cap_acquisition_throttle
>>> mds_session_cap_acquisition_decay_rate
>>> mds_session_max_caps_throttle_ratio
>>> mds_cap_acquisition_throttle_retry_request_timeout
>>> mds_session_cache_liveness_magnitude
>>> mds_session_cache_liveness_decay_rate
>>> mds_max_caps_per_client
>>>
>>>
>>>
>>> I find the Ceph documentation in this section a bit cryptic and I have
>>> tried to find some resources that talk about how to tune these
>>> parameters, but without success.
>>>
>>> Does anyone have experience in adjusting these parameters according to
>>> the characteristics of the Ceph cluster itself, the hardware and the use
>>> of MDS?
>>>
>>> Regards!
>>> --
>>> *******************************************************
>>> Andrés Rojas Guerrero
>>> Unidad Sistemas Linux
>>> Area Arquitectura Tecnológica
>>> Secretaría General Adjunta de Informática
>>> Consejo Superior de Investigaciones Científicas (CSIC)
>>> Pinar 19
>>> 28006 - Madrid
>>> Tel: +34 915680059 -- Ext. 990059
>>> email: a.rojas(a)csic.es
>>> ID comunicate.csic.es: @50852720l:matrix.csic.es
>>> *******************************************************
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>
> --
> *******************************************************
> Andrés Rojas Guerrero
> Unidad Sistemas Linux
> Area Arquitectura Tecnológica
> Secretaría General Adjunta de Informática
> Consejo Superior de Investigaciones Científicas (CSIC)
> Pinar 19
> 28006 - Madrid
> Tel: +34 915680059 -- Ext. 990059
> email: a.rojas(a)csic.es
> ID comunicate.csic.es: @50852720l:matrix.csic.es
> ******************************************************* 
-- 
*******************************************************
Andrés Rojas Guerrero
Unidad Sistemas Linux
Area Arquitectura Tecnológica
Secretaría General Adjunta de Informática
Consejo Superior de Investigaciones Científicas (CSIC)
Pinar 19
28006 - Madrid
Tel: +34 915680059 -- Ext. 990059
email: a.rojas(a)csic.es
ID comunicate.csic.es: @50852720l:matrix.csic.es
*******************************************************

2024

2023

2022

2021

2020

2019

[ceph-users] Re: MDS cache tunning