Hello Janek,
On Mon, Jul 22, 2019 at 6:02 AM Janek Bevendorff
<janek.bevendorff(a)uni-weimar.de> wrote:
Hi,
I am trying to copy the contents of our storage server into a CephFS,
but am experiencing stability issues with my MDSs. The CephFS sits on
top of an erasure-coded pool with 5 MONs, 5 MDSs and a max_mds setting
of two. My Ceph cluster version is Nautilus, the client is Mimic and
uses the kernel module to mount the FS.
The index of filenames to copy is about 23GB and I am using 16 parallel
rsync processes over a 10G link to copy the files over to Ceph. This
works perfectly for a while, but then the MDSs start reporting oversized
caches (between 20 and 50GB, sometimes more)
What did you have the MDS cache size set to at the time?
< and an inode count between
1 and 4 million. Particularly the Inode count seems
quite high to me.
Each rsync job has 25k files to work with, so if all 16 processes open
all their files at the same time, I should not exceed 400k. Even if I
double this number to account for the client's page cache, I should get
nowhere near that number of inodes (a sync flush takes about 1 second).
Then after a few hours, my MDSs start failing with messages like this:
-21> 2019-07-22 14:00:05.877 7f67eacec700 1 heartbeat_map
is_healthy 'MDSRank' had timed out after 15
-20> 2019-07-22 14:00:05.877 7f67eacec700 0 mds.beacon.XXX Skipping
beacon heartbeat to monitors (last acked 24.0042s ago); MDS internal
heartbeat is not healthy!
This is probably related to using multiple active metadata servers.
There have been stability issues we're still looking to work out with
the MDS balancer, especially with these batch-create style workloads.
However, if you're willing to use subtree pinning [1] where you
statically assign each directory tree before each rsync job uploads
its data, then that should be safe as the balancer will effectively be
disabled.
Alternatively, using one active MDS for the duration of the batch
upload should work too but may be significantly slower.
The standby nodes try to take over, but take forever
to become active
and will fail as well eventually.
Same error? Missed heartbeats?
[1]
https://ceph.com/community/new-luminous-cephfs-subtree-pinning/
--
Patrick Donnelly, Ph.D.
He / Him / His
Senior Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D