[Ceph-users] Re: MDS failing under load with large cache sizes

23 Jul 2019

Unfortunately, the experiment failed, so I tried increasing the number 
of MDS to four (although I had a bad experience with three before). This 
worked surprisingly well for some time, but the crash came eventually 
and the rank-0 MDS got kicked. Now the standbys have been playing 
round-robin trying to join and getting kicked again for two hours 
straight without any end in sight.

Log messages are

2019-07-23 17:53:35.618 7f3b165ab700  0 mds.beacon.XXX Skipping beacon 
heartbeat to monitors (last acked 4.50406s ago); MDS internal heartbeat 
is not healthy!
2019-07-23 17:53:36.118 7f3b165ab700  1 heartbeat_map is_healthy 
'MDSRank' had timed out after 15

Followed by some of these:

2019-07-23 17:53:37.386 7f3b135a5700  0 mds.0.cache.ino(0x100019693f8) 
have open dirfrag * but not leaf in fragtree_t(*^3): [dir 0x100019693f8 
/XXX_12_doc_ids_part7/ [2,head] auth{1=2,2=2} v=0 cv=0/0 
state=1140850688 f() n() hs=17033+0,ss=0+0 | child=1 replicated=1 
0x5642a2ff7700]

and finally:

2019-07-23 17:53:48.786 7fb02bc08700  1 mds.XXX Map has assigned me to 
become a standby

It is impossible to migrate our storage server to CephFS if this 
continues. I would immensely appreciate some help on this. Thanks a lot!

Janek

On 23.07.19 14:18, Janek Bevendorff wrote:
> Alright, I did some further research and found this topic which seems 
> to be about the same problem: 
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-February/024944.ht…
>
> We have many small files (as I said, the file list is 23GB) and since 
> we are only copying them but we are not accessing them afterwards, the 
> clients start piling up capabilities, which at least explains the 
> growing cache sizes and the MDSs' failure to keep up (there should 
> definitely be a solution to this, batch-copying many files to a CephFS 
> is a pretty standard use case).
>
> After I increased the beacon grace period, I experienced few MDS 
> crashes (although I still see them flapping occasionally), but now I 
> have another problem. After too many MDS failures (?) the client 
> starts locking up and the mount becomes unresponsive. Sometimes it 
> becomes so unresponsive, I cannot even unmount it with umount -lf and 
> have to force-reboot the server. While the client is locked up, the 
> MDSs recover and the FS is accessible again without issues from other 
> clients. This looks like a bug to me. I tried upgrading the client 
> from Mimic to Nautilus, but I have the same problem.
>
> I increased the MDS max cache size massively and started the copy job 
> again, let's see how far it goes this time.
>
>
> On 22.07.19 15:02, Janek Bevendorff wrote:
>> Hi,
>>
>> I am trying to copy the contents of our storage server into a CephFS, 
>> but am experiencing stability issues with my MDSs. The CephFS sits on 
>> top of an erasure-coded pool with 5 MONs, 5 MDSs and a max_mds 
>> setting of two. My Ceph cluster version is Nautilus, the client is 
>> Mimic and uses the kernel module to mount the FS.
>>
>> The index of filenames to copy is about 23GB and I am using 16 
>> parallel rsync processes over a 10G link to copy the files over to 
>> Ceph. This works perfectly for a while, but then the MDSs start 
>> reporting oversized caches (between 20 and 50GB, sometimes more) and 
>> an inode count between 1 and 4 million. Particularly the Inode count 
>> seems quite high to me. Each rsync job has 25k files to work with, so 
>> if all 16 processes open all their files at the same time, I should 
>> not exceed 400k. Even if I double this number to account for the 
>> client's page cache, I should get nowhere near that number of inodes 
>> (a sync flush takes about 1 second).
>>
>> Then after a few hours, my MDSs start failing with messages like this:
>>
>>    -21> 2019-07-22 14:00:05.877 7f67eacec700  1 heartbeat_map 
>> is_healthy 'MDSRank' had timed out after 15
>>    -20> 2019-07-22 14:00:05.877 7f67eacec700  0 mds.beacon.XXX 
>> Skipping beacon heartbeat to monitors (last acked 24.0042s ago); MDS 
>> internal heartbeat is not healthy!
>>
>> The standby nodes try to take over, but take forever to become active 
>> and will fail as well eventually.
>>
>> During my research, I found this related topic: 
>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-January/015959.html, 
>> but I tried everything in there from increasing to lowering my cache 
>> size, the number of segments etc. I also played around with the 
>> number of active MDSs and two appears to work the best, whereas one 
>> cannot keep up with the load and three seems to be the worst of all 
>> choices.
>>
>> Do you have any ideas how I can improve the stability of my MDS 
>> damons to handle the load properly? single 10G link is a toy and we 
>> could query the cluster with a lot more requests per second, but it's 
>> already yielding to 16 rsync processes.
>>
>> Thanks
>>
>

2024

2023

2022

2021

2020

2019

[Ceph-users] Re: MDS failing under load with large cache sizes