Unfortunately, the experiment failed, so I tried increasing the number
of MDS to four (although I had a bad experience with three before). This
worked surprisingly well for some time, but the crash came eventually
and the rank-0 MDS got kicked. Now the standbys have been playing
round-robin trying to join and getting kicked again for two hours
straight without any end in sight.
Log messages are
2019-07-23 17:53:35.618 7f3b165ab700 0 mds.beacon.XXX Skipping beacon
heartbeat to monitors (last acked 4.50406s ago); MDS internal heartbeat
is not healthy!
2019-07-23 17:53:36.118 7f3b165ab700 1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
Followed by some of these:
2019-07-23 17:53:37.386 7f3b135a5700 0 mds.0.cache.ino(0x100019693f8)
have open dirfrag * but not leaf in fragtree_t(*^3): [dir 0x100019693f8
/XXX_12_doc_ids_part7/ [2,head] auth{1=2,2=2} v=0 cv=0/0
state=1140850688 f() n() hs=17033+0,ss=0+0 | child=1 replicated=1
0x5642a2ff7700]
and finally:
2019-07-23 17:53:48.786 7fb02bc08700 1 mds.XXX Map has assigned me to
become a standby
It is impossible to migrate our storage server to CephFS if this
continues. I would immensely appreciate some help on this. Thanks a lot!
Janek
On 23.07.19 14:18, Janek Bevendorff wrote:
> Alright, I did some further research and found this topic which seems
> to be about the same problem:
>
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-February/024944.ht…
>
> We have many small files (as I said, the file list is 23GB) and since
> we are only copying them but we are not accessing them afterwards, the
> clients start piling up capabilities, which at least explains the
> growing cache sizes and the MDSs' failure to keep up (there should
> definitely be a solution to this, batch-copying many files to a CephFS
> is a pretty standard use case).
>
> After I increased the beacon grace period, I experienced few MDS
> crashes (although I still see them flapping occasionally), but now I
> have another problem. After too many MDS failures (?) the client
> starts locking up and the mount becomes unresponsive. Sometimes it
> becomes so unresponsive, I cannot even unmount it with umount -lf and
> have to force-reboot the server. While the client is locked up, the
> MDSs recover and the FS is accessible again without issues from other
> clients. This looks like a bug to me. I tried upgrading the client
> from Mimic to Nautilus, but I have the same problem.
>
> I increased the MDS max cache size massively and started the copy job
> again, let's see how far it goes this time.
>
>
> On 22.07.19 15:02, Janek Bevendorff wrote:
>> Hi,
>>
>> I am trying to copy the contents of our storage server into a CephFS,
>> but am experiencing stability issues with my MDSs. The CephFS sits on
>> top of an erasure-coded pool with 5 MONs, 5 MDSs and a max_mds
>> setting of two. My Ceph cluster version is Nautilus, the client is
>> Mimic and uses the kernel module to mount the FS.
>>
>> The index of filenames to copy is about 23GB and I am using 16
>> parallel rsync processes over a 10G link to copy the files over to
>> Ceph. This works perfectly for a while, but then the MDSs start
>> reporting oversized caches (between 20 and 50GB, sometimes more) and
>> an inode count between 1 and 4 million. Particularly the Inode count
>> seems quite high to me. Each rsync job has 25k files to work with, so
>> if all 16 processes open all their files at the same time, I should
>> not exceed 400k. Even if I double this number to account for the
>> client's page cache, I should get nowhere near that number of inodes
>> (a sync flush takes about 1 second).
>>
>> Then after a few hours, my MDSs start failing with messages like this:
>>
>> -21> 2019-07-22 14:00:05.877 7f67eacec700 1 heartbeat_map
>> is_healthy 'MDSRank' had timed out after 15
>> -20> 2019-07-22 14:00:05.877 7f67eacec700 0 mds.beacon.XXX
>> Skipping beacon heartbeat to monitors (last acked 24.0042s ago); MDS
>> internal heartbeat is not healthy!
>>
>> The standby nodes try to take over, but take forever to become active
>> and will fail as well eventually.
>>
>> During my research, I found this related topic:
>>
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-January/015959.html,
>> but I tried everything in there from increasing to lowering my cache
>> size, the number of segments etc. I also played around with the
>> number of active MDSs and two appears to work the best, whereas one
>> cannot keep up with the load and three seems to be the worst of all
>> choices.
>>
>> Do you have any ideas how I can improve the stability of my MDS
>> damons to handle the load properly? single 10G link is a toy and we
>> could query the cluster with a lot more requests per second, but it's
>> already yielding to 16 rsync processes.
>>
>> Thanks
>>
>