Looking at the archives, I think this list is dead. I moved the
discussion to the
ceph.com mailing list. No idea why there are two lists.
Cheers and apologies to anybody here on this list.
On 23/07/2019 18:20, Janek Bevendorff wrote:
> Unfortunately, the experiment failed, so I tried increasing the number
> of MDS to four (although I had a bad experience with three before).
> This worked surprisingly well for some time, but the crash came
> eventually and the rank-0 MDS got kicked. Now the standbys have been
> playing round-robin trying to join and getting kicked again for two
> hours straight without any end in sight.
>
> Log messages are
>
> 2019-07-23 17:53:35.618 7f3b165ab700 0 mds.beacon.XXX Skipping beacon
> heartbeat to monitors (last acked 4.50406s ago); MDS internal
> heartbeat is not healthy!
> 2019-07-23 17:53:36.118 7f3b165ab700 1 heartbeat_map is_healthy
> 'MDSRank' had timed out after 15
>
> Followed by some of these:
>
> 2019-07-23 17:53:37.386 7f3b135a5700 0 mds.0.cache.ino(0x100019693f8)
> have open dirfrag * but not leaf in fragtree_t(*^3): [dir
> 0x100019693f8 /XXX_12_doc_ids_part7/ [2,head] auth{1=2,2=2} v=0 cv=0/0
> state=1140850688 f() n() hs=17033+0,ss=0+0 | child=1 replicated=1
> 0x5642a2ff7700]
>
> and finally:
>
> 2019-07-23 17:53:48.786 7fb02bc08700 1 mds.XXX Map has assigned me to
> become a standby
>
>
> It is impossible to migrate our storage server to CephFS if this
> continues. I would immensely appreciate some help on this. Thanks a lot!
>
> Janek
>
>
> On 23.07.19 14:18, Janek Bevendorff wrote:
>> Alright, I did some further research and found this topic which seems
>> to be about the same problem:
>>
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-February/024944.ht…
>>
>> We have many small files (as I said, the file list is 23GB) and since
>> we are only copying them but we are not accessing them afterwards,
>> the clients start piling up capabilities, which at least explains the
>> growing cache sizes and the MDSs' failure to keep up (there should
>> definitely be a solution to this, batch-copying many files to a
>> CephFS is a pretty standard use case).
>>
>> After I increased the beacon grace period, I experienced few MDS
>> crashes (although I still see them flapping occasionally), but now I
>> have another problem. After too many MDS failures (?) the client
>> starts locking up and the mount becomes unresponsive. Sometimes it
>> becomes so unresponsive, I cannot even unmount it with umount -lf and
>> have to force-reboot the server. While the client is locked up, the
>> MDSs recover and the FS is accessible again without issues from other
>> clients. This looks like a bug to me. I tried upgrading the client
>> from Mimic to Nautilus, but I have the same problem.
>>
>> I increased the MDS max cache size massively and started the copy job
>> again, let's see how far it goes this time.
>>
>>
>> On 22.07.19 15:02, Janek Bevendorff wrote:
>>> Hi,
>>>
>>> I am trying to copy the contents of our storage server into a
>>> CephFS, but am experiencing stability issues with my MDSs. The
>>> CephFS sits on top of an erasure-coded pool with 5 MONs, 5 MDSs and
>>> a max_mds setting of two. My Ceph cluster version is Nautilus, the
>>> client is Mimic and uses the kernel module to mount the FS.
>>>
>>> The index of filenames to copy is about 23GB and I am using 16
>>> parallel rsync processes over a 10G link to copy the files over to
>>> Ceph. This works perfectly for a while, but then the MDSs start
>>> reporting oversized caches (between 20 and 50GB, sometimes more) and
>>> an inode count between 1 and 4 million. Particularly the Inode count
>>> seems quite high to me. Each rsync job has 25k files to work with,
>>> so if all 16 processes open all their files at the same time, I
>>> should not exceed 400k. Even if I double this number to account for
>>> the client's page cache, I should get nowhere near that number of
>>> inodes (a sync flush takes about 1 second).
>>>
>>> Then after a few hours, my MDSs start failing with messages like this:
>>>
>>> -21> 2019-07-22 14:00:05.877 7f67eacec700 1 heartbeat_map
>>> is_healthy 'MDSRank' had timed out after 15
>>> -20> 2019-07-22 14:00:05.877 7f67eacec700 0 mds.beacon.XXX
>>> Skipping beacon heartbeat to monitors (last acked 24.0042s ago); MDS
>>> internal heartbeat is not healthy!
>>>
>>> The standby nodes try to take over, but take forever to become
>>> active and will fail as well eventually.
>>>
>>> During my research, I found this related topic:
>>>
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-January/015959.html,
>>> but I tried everything in there from increasing to lowering my cache
>>> size, the number of segments etc. I also played around with the
>>> number of active MDSs and two appears to work the best, whereas one
>>> cannot keep up with the load and three seems to be the worst of all
>>> choices.
>>>
>>> Do you have any ideas how I can improve the stability of my MDS
>>> damons to handle the load properly? single 10G link is a toy and we
>>> could query the cluster with a lot more requests per second, but
>>> it's already yielding to 16 rsync processes.
>>>
>>> Thanks
>>>
>>