I am not sure if making caps recall more aggressive helps. It seems to be the client failing to respond to it (at least that's what the warnings say).

But I will try your new suggested settings as soon as I get the chance and will report back with the results. 

On 25 Jul 2019 11:00 pm, Patrick Donnelly <pdonnell@redhat.com> wrote:

On Thu, Jul 25, 2019 at 12:49 PM Janek Bevendorff
<janek.bevendorff@uni-weimar.de> wrote:
>
>
> > Based on that message, it would appear you still have an inode limit
> > in place ("mds_cache_size"). Please unset that config option. Your
> > mds_cache_memory_limit is apparently ~19GB.
>
> No, I do not have an inode limit set. Only the memory limit.
>
>
> > There is another limit mds_max_caps_per_client (default 1M) which the
> > client is hitting. That's why the MDS is recalling caps from the
> > client and not because any cache memory limit is hit. It is not
> > recommend you increase this.
> Okay, this this setting isn't documented either and I did not change it,
> but it's also quite clear that it isn't working. My MDS hasn't crashed
> yet (without the recall settings it would have), but ceph fs status is
> reporting 14M inodes at this point and the number is slowly going up.

Can you share two captures of `ceph daemon mds.X perf dump` about 1
second apart.

You can also try increasing the aggressiveness of the MDS recall but
I'm surprised it's still a problem with the settings I gave you:

ceph config set mds mds_recall_max_caps 15000
ceph config set mds mds_recall_max_decay_rate 0.75

--
Patrick Donnelly, Ph.D.
He / Him / His
Senior Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D