[Ceph-users] Re: MDS failing under load with large cache sizes

23 Jul 2019

Alright, I did some further research and found this topic which seems to 
be about the same problem: 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-February/024944.ht…

We have many small files (as I said, the file list is 23GB) and since we 
are only copying them but we are not accessing them afterwards, the 
clients start piling up capabilities, which at least explains the 
growing cache sizes and the MDSs' failure to keep up (there should 
definitely be a solution to this, batch-copying many files to a CephFS 
is a pretty standard use case).

After I increased the beacon grace period, I experienced few MDS crashes 
(although I still see them flapping occasionally), but now I have 
another problem. After too many MDS failures (?) the client starts 
locking up and the mount becomes unresponsive. Sometimes it becomes so 
unresponsive, I cannot even unmount it with umount -lf and have to 
force-reboot the server. While the client is locked up, the MDSs recover 
and the FS is accessible again without issues from other clients. This 
looks like a bug to me. I tried upgrading the client from Mimic to 
Nautilus, but I have the same problem.

I increased the MDS max cache size massively and started the copy job 
again, let's see how far it goes this time.

On 22.07.19 15:02, Janek Bevendorff wrote:
...
  Hi,

 I am trying to copy the contents of our storage server into a CephFS, 
 but am experiencing stability issues with my MDSs. The CephFS sits on 
 top of an erasure-coded pool with 5 MONs, 5 MDSs and a max_mds setting 
 of two. My Ceph cluster version is Nautilus, the client is Mimic and 
 uses the kernel module to mount the FS.

 The index of filenames to copy is about 23GB and I am using 16 
 parallel rsync processes over a 10G link to copy the files over to 
 Ceph. This works perfectly for a while, but then the MDSs start 
 reporting oversized caches (between 20 and 50GB, sometimes more) and 
 an inode count between 1 and 4 million. Particularly the Inode count 
 seems quite high to me. Each rsync job has 25k files to work with, so 
 if all 16 processes open all their files at the same time, I should 
 not exceed 400k. Even if I double this number to account for the 
 client's page cache, I should get nowhere near that number of inodes 
 (a sync flush takes about 1 second).

 Then after a few hours, my MDSs start failing with messages like this:

    -21> 2019-07-22 14:00:05.877 7f67eacec700  1 heartbeat_map 
 is_healthy 'MDSRank' had timed out after 15
    -20> 2019-07-22 14:00:05.877 7f67eacec700  0 mds.beacon.XXX 
 Skipping beacon heartbeat to monitors (last acked 24.0042s ago); MDS 
 internal heartbeat is not healthy!

 The standby nodes try to take over, but take forever to become active 
 and will fail as well eventually.

 During my research, I found this related topic: 
 http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-January/015959.html, 
 but I tried everything in there from increasing to lowering my cache 
 size, the number of segments etc. I also played around with the number 
 of active MDSs and two appears to work the best, whereas one cannot 
 keep up with the load and three seems to be the worst of all choices.

 Do you have any ideas how I can improve the stability of my MDS damons 
 to handle the load properly? single 10G link is a toy and we could 
 query the cluster with a lot more requests per second, but it's 
 already yielding to 16 rsync processes.

 Thanks

-- 
Bauhaus-Universität Weimar
Bauhausstr. 9a, Room 308
99423 Weimar, Germany

Phone: +49 (0)3643 - 58 3577

2024

2023

2022

2021

2020

2019

[Ceph-users] Re: MDS failing under load with large cache sizes