[Ceph-users] Re: MDS failing under load with large cache sizes

24 Jul 2019

Hello Janek,

On Mon, Jul 22, 2019 at 6:02 AM Janek Bevendorff
&lt;janek.bevendorff(a)uni-weimar.de&gt; wrote:
...

 Hi,

 I am trying to copy the contents of our storage server into a CephFS,
 but am experiencing stability issues with my MDSs. The CephFS sits on
 top of an erasure-coded pool with 5 MONs, 5 MDSs and a max_mds setting
 of two. My Ceph cluster version is Nautilus, the client is Mimic and
 uses the kernel module to mount the FS.

 The index of filenames to copy is about 23GB and I am using 16 parallel
 rsync processes over a 10G link to copy the files over to Ceph. This
 works perfectly for a while, but then the MDSs start reporting oversized
 caches (between 20 and 50GB, sometimes more) 
What did you have the MDS cache size set to at the time?

< and an inode count between
...
  1 and 4 million. Particularly the Inode count seems
quite high to me.
 Each rsync job has 25k files to work with, so if all 16 processes open
 all their files at the same time, I should not exceed 400k. Even if I
 double this number to account for the client's page cache, I should get
 nowhere near that number of inodes (a sync flush takes about 1 second).

 Then after a few hours, my MDSs start failing with messages like this:

     -21> 2019-07-22 14:00:05.877 7f67eacec700  1 heartbeat_map
 is_healthy 'MDSRank' had timed out after 15
     -20> 2019-07-22 14:00:05.877 7f67eacec700  0 mds.beacon.XXX Skipping
 beacon heartbeat to monitors (last acked 24.0042s ago); MDS internal
 heartbeat is not healthy! 
This is probably related to using multiple active metadata servers.
There have been stability issues we're still looking to work out with
the MDS balancer, especially with these batch-create style workloads.
However, if you're willing to use subtree pinning [1] where you
statically assign each directory tree before each rsync job uploads
its data, then that should be safe as the balancer will effectively be
disabled.

Alternatively, using one active MDS for the duration of the batch
upload should work too but may be significantly slower.

...
  The standby nodes try to take over, but take forever
to become active
 and will fail as well eventually. 
Same error? Missed heartbeats?

[1] https://ceph.com/community/new-luminous-cephfs-subtree-pinning/

--
Patrick Donnelly, Ph.D.
He / Him / His
Senior Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

2024

2023

2022

2021

2020

2019

[Ceph-users] Re: MDS failing under load with large cache sizes