Thanks. I tried playing around a bit with
mds_export_ephemeral_distributed just now, because it's pretty much the
same thing that your script does manually. Unfortunately, it seems to
have no effect.
I pinned all top-level directories to mds.0 and then enabled
ceph.dir.pin.distributed for a few sub trees. Despite
mds_export_ephemeral_distributed being set to true, all work is done by
mds.0 now and I also don't see any additional pins in ceph tell mds.\*
get subtrees.
Any ideas why that might be?
On 07/12/2020 10:49, Dan van der Ster wrote:
> On Mon, Dec 7, 2020 at 10:39 AM Janek Bevendorff
> <janek.bevendorff(a)uni-weimar.de> wrote:
>>
>>> What exactly do you set to 64k?
>>> We used to set mds_max_caps_per_client to 50000, but once we started
>>> using the tuned caps recall config, we reverted that back to the
>>> default 1M without issue.
>> mds_max_caps_per_client. As I mentioned, some clients hit this limit
>> regularly and they aren't entirely idle. I will keep tuning the recall
>> settings, though.
>>
>>> This 15k caps client I mentioned is not related to the max caps per
>>> client config. In recent nautilus, the MDS will proactively recall
>>> caps from idle clients -- so a client with even just a few caps like
>>> this can provoke the caps recall warnings (if it is buggy, like in
>>> this case). The client doesn't cause any real problems, just the
>>> annoying warnings.
>> We only see the warnings during normal operation. I remember having
>> massive issues with early Nautilus releases, but thanks to more
>> aggressive recall behaviour in newer releases, that is fixed. Back then
>> it was virtually impossible to keep the MDS within the bounds of its
>> memory limit. Nowadays, the warnings only appear when the MDS is really
>> stressed. In that situation, the whole FS performance is already
>> degraded massively and MDSs are likely to fail and run into the rejoin loop.
>>
>>> Multi-active + pinning definitely increases the overall MD throughput
>>> (once you can get the relevant inodes cached), because as you know the
>>> MDS is single threaded and CPU bound at the limit.
>>> We could get something like 4-5k handle_client_requests out of a
>>> single MDS, and that really does scale horizontally as you add MDSs
>>> (and pin).
>> Okay, I will definitely re-evaluate options for pinning individual
>> directories, perhaps a small script can do it.
> There is a new ephemeral pinning option in the latest latest releases,
> but we didn't try it yet.
> Here's our script -- it assumes the parent dir is pinned to zero or
> that bal is disabled:
>
>
https://github.com/cernceph/ceph-scripts/blob/master/tools/cephfs/cephfs-ba…
>
> Too many pins can cause problems -- we have something like 700 pins at
> the moment and it's fine, though.
>
> Cheers, Dan
>
>
>