[Ceph-users] MDS failing under load with large cache sizes

22 Jul 2019

Hi,

I am trying to copy the contents of our storage server into a CephFS, 
but am experiencing stability issues with my MDSs. The CephFS sits on 
top of an erasure-coded pool with 5 MONs, 5 MDSs and a max_mds setting 
of two. My Ceph cluster version is Nautilus, the client is Mimic and 
uses the kernel module to mount the FS.

The index of filenames to copy is about 23GB and I am using 16 parallel 
rsync processes over a 10G link to copy the files over to Ceph. This 
works perfectly for a while, but then the MDSs start reporting oversized 
caches (between 20 and 50GB, sometimes more) and an inode count between 
1 and 4 million. Particularly the Inode count seems quite high to me. 
Each rsync job has 25k files to work with, so if all 16 processes open 
all their files at the same time, I should not exceed 400k. Even if I 
double this number to account for the client's page cache, I should get 
nowhere near that number of inodes (a sync flush takes about 1 second).

Then after a few hours, my MDSs start failing with messages like this:

    -21> 2019-07-22 14:00:05.877 7f67eacec700  1 heartbeat_map 
is_healthy 'MDSRank' had timed out after 15
    -20> 2019-07-22 14:00:05.877 7f67eacec700  0 mds.beacon.XXX Skipping 
beacon heartbeat to monitors (last acked 24.0042s ago); MDS internal 
heartbeat is not healthy!

The standby nodes try to take over, but take forever to become active 
and will fail as well eventually.

During my research, I found this related topic: 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-January/015959.html, 
but I tried everything in there from increasing to lowering my cache 
size, the number of segments etc. I also played around with the number 
of active MDSs and two appears to work the best, whereas one cannot keep 
up with the load and three seems to be the worst of all choices.

Do you have any ideas how I can improve the stability of my MDS damons 
to handle the load properly? single 10G link is a toy and we could query 
the cluster with a lot more requests per second, but it's already 
yielding to 16 rsync processes.

Thanks

2024

2023

2022

2021

2020

2019

[Ceph-users] MDS failing under load with large cache sizes