MDS failing under load with large cache sizes

List overview All Threads
Download

newer

older

RBD Mirroring down+unknown

Disk fail, some question...

Janek Bevendorff

22 Jul 2019 22 Jul '19

5:32 p.m.

Hi, I am trying to copy the contents of our storage server into a CephFS, but am experiencing stability issues with my MDSs. The CephFS sits on top of an erasure-coded pool with 5 MONs, 5 MDSs and a max_mds setting of two. My Ceph cluster version is Nautilus, the client is Mimic and uses the kernel module to mount the FS. The index of filenames to copy is about 23GB and I am using 16 parallel rsync processes over a 10G link to copy the files over to Ceph. This works perfectly for a while, but then the MDSs start reporting oversized caches (between 20 and 50GB, sometimes more) and an inode count between 1 and 4 million. Particularly the Inode count seems quite high to me. Each rsync job has 25k files to work with, so if all 16 processes open all their files at the same time, I should not exceed 400k. Even if I double this number to account for the client's page cache, I should get nowhere near that number of inodes (a sync flush takes about 1 second). Then after a few hours, my MDSs start failing with messages like this: -21> 2019-07-22 14:00:05.877 7f67eacec700 1 heartbeat_map is_healthy 'MDSRank' had timed out after 15 -20> 2019-07-22 14:00:05.877 7f67eacec700 0 mds.beacon.XXX Skipping beacon heartbeat to monitors (last acked 24.0042s ago); MDS internal heartbeat is not healthy! The standby nodes try to take over, but take forever to become active and will fail as well eventually. During my research, I found this related topic: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-January/015959.html, but I tried everything in there from increasing to lowering my cache size, the number of segments etc. I also played around with the number of active MDSs and two appears to work the best, whereas one cannot keep up with the load and three seems to be the worst of all choices. Do you have any ideas how I can improve the stability of my MDS damons to handle the load properly? single 10G link is a toy and we could query the cluster with a lot more requests per second, but it's already yielding to 16 rsync processes. Thanks

Show replies by thread

Janek Bevendorff

23 Jul 23 Jul

4:48 p.m.

New subject: [Ceph-users] Re: MDS failing under load with large cache sizes

Alright, I did some further research and found this topic which seems to be about the same problem: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-February/024944.ht… We have many small files (as I said, the file list is 23GB) and since we are only copying them but we are not accessing them afterwards, the clients start piling up capabilities, which at least explains the growing cache sizes and the MDSs' failure to keep up (there should definitely be a solution to this, batch-copying many files to a CephFS is a pretty standard use case). After I increased the beacon grace period, I experienced few MDS crashes (although I still see them flapping occasionally), but now I have another problem. After too many MDS failures (?) the client starts locking up and the mount becomes unresponsive. Sometimes it becomes so unresponsive, I cannot even unmount it with umount -lf and have to force-reboot the server. While the client is locked up, the MDSs recover and the FS is accessible again without issues from other clients. This looks like a bug to me. I tried upgrading the client from Mimic to Nautilus, but I have the same problem. I increased the MDS max cache size massively and started the copy job again, let's see how far it goes this time. On 22.07.19 15:02, Janek Bevendorff wrote:

...

-- Bauhaus-Universität Weimar Bauhausstr. 9a, Room 308 99423 Weimar, Germany Phone: +49 (0)3643 - 58 3577

Janek Bevendorff

8:50 p.m.

New subject: [Ceph-users] Re: MDS failing under load with large cache sizes

Unfortunately, the experiment failed, so I tried increasing the number of MDS to four (although I had a bad experience with three before). This worked surprisingly well for some time, but the crash came eventually and the rank-0 MDS got kicked. Now the standbys have been playing round-robin trying to join and getting kicked again for two hours straight without any end in sight. Log messages are 2019-07-23 17:53:35.618 7f3b165ab700 0 mds.beacon.XXX Skipping beacon heartbeat to monitors (last acked 4.50406s ago); MDS internal heartbeat is not healthy! 2019-07-23 17:53:36.118 7f3b165ab700 1 heartbeat_map is_healthy 'MDSRank' had timed out after 15 Followed by some of these: 2019-07-23 17:53:37.386 7f3b135a5700 0 mds.0.cache.ino(0x100019693f8) have open dirfrag * but not leaf in fragtree_t(*^3): [dir 0x100019693f8 /XXX_12_doc_ids_part7/ [2,head] auth{1=2,2=2} v=0 cv=0/0 state=1140850688 f() n() hs=17033+0,ss=0+0 | child=1 replicated=1 0x5642a2ff7700] and finally: 2019-07-23 17:53:48.786 7fb02bc08700 1 mds.XXX Map has assigned me to become a standby It is impossible to migrate our storage server to CephFS if this continues. I would immensely appreciate some help on this. Thanks a lot! Janek On 23.07.19 14:18, Janek Bevendorff wrote: > Alright, I did some further research and found this topic which seems > to be about the same problem: > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-February/024944.ht… > > We have many small files (as I said, the file list is 23GB) and since > we are only copying them but we are not accessing them afterwards, the > clients start piling up capabilities, which at least explains the > growing cache sizes and the MDSs' failure to keep up (there should > definitely be a solution to this, batch-copying many files to a CephFS > is a pretty standard use case). > > After I increased the beacon grace period, I experienced few MDS > crashes (although I still see them flapping occasionally), but now I > have another problem. After too many MDS failures (?) the client > starts locking up and the mount becomes unresponsive. Sometimes it > becomes so unresponsive, I cannot even unmount it with umount -lf and > have to force-reboot the server. While the client is locked up, the > MDSs recover and the FS is accessible again without issues from other > clients. This looks like a bug to me. I tried upgrading the client > from Mimic to Nautilus, but I have the same problem. > > I increased the MDS max cache size massively and started the copy job > again, let's see how far it goes this time. > > > On 22.07.19 15:02, Janek Bevendorff wrote: >> Hi, >> >> I am trying to copy the contents of our storage server into a CephFS, >> but am experiencing stability issues with my MDSs. The CephFS sits on >> top of an erasure-coded pool with 5 MONs, 5 MDSs and a max_mds >> setting of two. My Ceph cluster version is Nautilus, the client is >> Mimic and uses the kernel module to mount the FS. >> >> The index of filenames to copy is about 23GB and I am using 16 >> parallel rsync processes over a 10G link to copy the files over to >> Ceph. This works perfectly for a while, but then the MDSs start >> reporting oversized caches (between 20 and 50GB, sometimes more) and >> an inode count between 1 and 4 million. Particularly the Inode count >> seems quite high to me. Each rsync job has 25k files to work with, so >> if all 16 processes open all their files at the same time, I should >> not exceed 400k. Even if I double this number to account for the >> client's page cache, I should get nowhere near that number of inodes >> (a sync flush takes about 1 second). >> >> Then after a few hours, my MDSs start failing with messages like this: >> >> -21> 2019-07-22 14:00:05.877 7f67eacec700 1 heartbeat_map >> is_healthy 'MDSRank' had timed out after 15 >> -20> 2019-07-22 14:00:05.877 7f67eacec700 0 mds.beacon.XXX >> Skipping beacon heartbeat to monitors (last acked 24.0042s ago); MDS >> internal heartbeat is not healthy! >> >> The standby nodes try to take over, but take forever to become active >> and will fail as well eventually. >> >> During my research, I found this related topic: >> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-January/015959.html, >> but I tried everything in there from increasing to lowering my cache >> size, the number of segments etc. I also played around with the >> number of active MDSs and two appears to work the best, whereas one >> cannot keep up with the load and three seems to be the worst of all >> choices. >> >> Do you have any ideas how I can improve the stability of my MDS >> damons to handle the load properly? single 10G link is a toy and we >> could query the cluster with a lot more requests per second, but it's >> already yielding to 16 rsync processes. >> >> Thanks >> >

Janek Bevendorff

11:01 p.m.

New subject: [Ceph-users] Re: MDS failing under load with large cache sizes

Looking at the archives, I think this list is dead. I moved the discussion to the ceph.com mailing list. No idea why there are two lists. Cheers and apologies to anybody here on this list. On 23/07/2019 18:20, Janek Bevendorff wrote: > Unfortunately, the experiment failed, so I tried increasing the number > of MDS to four (although I had a bad experience with three before). > This worked surprisingly well for some time, but the crash came > eventually and the rank-0 MDS got kicked. Now the standbys have been > playing round-robin trying to join and getting kicked again for two > hours straight without any end in sight. > > Log messages are > > 2019-07-23 17:53:35.618 7f3b165ab700 0 mds.beacon.XXX Skipping beacon > heartbeat to monitors (last acked 4.50406s ago); MDS internal > heartbeat is not healthy! > 2019-07-23 17:53:36.118 7f3b165ab700 1 heartbeat_map is_healthy > 'MDSRank' had timed out after 15 > > Followed by some of these: > > 2019-07-23 17:53:37.386 7f3b135a5700 0 mds.0.cache.ino(0x100019693f8) > have open dirfrag * but not leaf in fragtree_t(*^3): [dir > 0x100019693f8 /XXX_12_doc_ids_part7/ [2,head] auth{1=2,2=2} v=0 cv=0/0 > state=1140850688 f() n() hs=17033+0,ss=0+0 | child=1 replicated=1 > 0x5642a2ff7700] > > and finally: > > 2019-07-23 17:53:48.786 7fb02bc08700 1 mds.XXX Map has assigned me to > become a standby > > > It is impossible to migrate our storage server to CephFS if this > continues. I would immensely appreciate some help on this. Thanks a lot! > > Janek > > > On 23.07.19 14:18, Janek Bevendorff wrote: >> Alright, I did some further research and found this topic which seems >> to be about the same problem: >> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-February/024944.ht… >> >> We have many small files (as I said, the file list is 23GB) and since >> we are only copying them but we are not accessing them afterwards, >> the clients start piling up capabilities, which at least explains the >> growing cache sizes and the MDSs' failure to keep up (there should >> definitely be a solution to this, batch-copying many files to a >> CephFS is a pretty standard use case). >> >> After I increased the beacon grace period, I experienced few MDS >> crashes (although I still see them flapping occasionally), but now I >> have another problem. After too many MDS failures (?) the client >> starts locking up and the mount becomes unresponsive. Sometimes it >> becomes so unresponsive, I cannot even unmount it with umount -lf and >> have to force-reboot the server. While the client is locked up, the >> MDSs recover and the FS is accessible again without issues from other >> clients. This looks like a bug to me. I tried upgrading the client >> from Mimic to Nautilus, but I have the same problem. >> >> I increased the MDS max cache size massively and started the copy job >> again, let's see how far it goes this time. >> >> >> On 22.07.19 15:02, Janek Bevendorff wrote: >>> Hi, >>> >>> I am trying to copy the contents of our storage server into a >>> CephFS, but am experiencing stability issues with my MDSs. The >>> CephFS sits on top of an erasure-coded pool with 5 MONs, 5 MDSs and >>> a max_mds setting of two. My Ceph cluster version is Nautilus, the >>> client is Mimic and uses the kernel module to mount the FS. >>> >>> The index of filenames to copy is about 23GB and I am using 16 >>> parallel rsync processes over a 10G link to copy the files over to >>> Ceph. This works perfectly for a while, but then the MDSs start >>> reporting oversized caches (between 20 and 50GB, sometimes more) and >>> an inode count between 1 and 4 million. Particularly the Inode count >>> seems quite high to me. Each rsync job has 25k files to work with, >>> so if all 16 processes open all their files at the same time, I >>> should not exceed 400k. Even if I double this number to account for >>> the client's page cache, I should get nowhere near that number of >>> inodes (a sync flush takes about 1 second). >>> >>> Then after a few hours, my MDSs start failing with messages like this: >>> >>> -21> 2019-07-22 14:00:05.877 7f67eacec700 1 heartbeat_map >>> is_healthy 'MDSRank' had timed out after 15 >>> -20> 2019-07-22 14:00:05.877 7f67eacec700 0 mds.beacon.XXX >>> Skipping beacon heartbeat to monitors (last acked 24.0042s ago); MDS >>> internal heartbeat is not healthy! >>> >>> The standby nodes try to take over, but take forever to become >>> active and will fail as well eventually. >>> >>> During my research, I found this related topic: >>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-January/015959.html, >>> but I tried everything in there from increasing to lowering my cache >>> size, the number of segments etc. I also played around with the >>> number of active MDSs and two appears to work the best, whereas one >>> cannot keep up with the load and three seems to be the worst of all >>> choices. >>> >>> Do you have any ideas how I can improve the stability of my MDS >>> damons to handle the load properly? single 10G link is a toy and we >>> could query the cluster with a lot more requests per second, but >>> it's already yielding to 16 rsync processes. >>> >>> Thanks >>> >>

Patrick Donnelly

24 Jul 24 Jul

9:46 p.m.

New subject: [Ceph-users] Re: MDS failing under load with large cache sizes

Hello Janek, On Mon, Jul 22, 2019 at 6:02 AM Janek Bevendorff <janek.bevendorff(a)uni-weimar.de> wrote:

...

What did you have the MDS cache size set to at the time? < and an inode count between

...

1 and 4 million. Particularly the Inode count seems quite high to me. Each rsync job has 25k files to work with, so if all 16 processes open all their files at the same time, I should not exceed 400k. Even if I double this number to account for the client's page cache, I should get nowhere near that number of inodes (a sync flush takes about 1 second). Then after a few hours, my MDSs start failing with messages like this: -21> 2019-07-22 14:00:05.877 7f67eacec700 1 heartbeat_map is_healthy 'MDSRank' had timed out after 15 -20> 2019-07-22 14:00:05.877 7f67eacec700 0 mds.beacon.XXX Skipping beacon heartbeat to monitors (last acked 24.0042s ago); MDS internal heartbeat is not healthy!

This is probably related to using multiple active metadata servers. There have been stability issues we're still looking to work out with the MDS balancer, especially with these batch-create style workloads. However, if you're willing to use subtree pinning [1] where you statically assign each directory tree before each rsync job uploads its data, then that should be safe as the balancer will effectively be disabled. Alternatively, using one active MDS for the duration of the batch upload should work too but may be significantly slower.

...

The standby nodes try to take over, but take forever to become active and will fail as well eventually.

Same error? Missed heartbeats? [1] https://ceph.com/community/new-luminous-cephfs-subtree-pinning/ -- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

Janek Bevendorff

9:56 p.m.

New subject: [Ceph-users] Re: MDS failing under load with large cache sizes

...

what's the ceph.com mailing list? I wondered whether this list is dead but it's the list announced on the official ceph.com homepage, isn't it?

There are two mailing lists announced on the website. If you go to https://ceph.com/resources/ you will find the subscribe/unsubscribe/archive links for the (much more active) ceph.com MLs. But if you click on "Mailing Lists & IRC page" you will get to a page where you can subscribe to this list, which is different. Very confusing.

...

What did you have the MDS cache size set to at the time? < and an inode count between

I actually did not think I'd get a reply here. We are a bit further than this on the other mailing list. This is the thread: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-July/036095.html To sum it up: the ceph client prevents the MDS from freeing its cache, so inodes keep piling up until either the MDS becomes too slow (fixable by increasing the beacon grace time) or runs out of memory. The latter will happen eventually. In the end, my MDSs couldn't even rejoin because they hit the host's 128GB memory limit and crashed.

...

The same happens with only one MDS. I tried it with a fresh CephFS and after two minutes of rsyncing stuff into it, I hit 900k inodes before I stopped the process.

...

Alternatively, using one active MDS for the duration of the batch upload should work too but may be significantly slower.

The point was to have multiple for the transfer, because one couldn't keep up. But now it seems like the problem wasn't really the single-MDS bottleneck. The problem was rather the unbounded growth of runaway inodes, which I do not have a solution for.

...

Same error? Missed heartbeats?

Yeah, same thing.

Patrick Donnelly

25 Jul 25 Jul

2:20 a.m.

New subject: [Ceph-users] Re: MDS failing under load with large cache sizes

+ other ceph-users On Wed, Jul 24, 2019 at 10:26 AM Janek Bevendorff <janek.bevendorff(a)uni-weimar.de> wrote:

...

what's the ceph.com mailing list? I wondered whether this list is dead but it's the list announced on the official ceph.com homepage, isn't it?

It is confusing. This is supposed to be the new ML but I don't think the migration has started yet.

...

What did you have the MDS cache size set to at the time? < and an inode count between

It's possible the MDS is not being aggressive enough with asking the single (?) client to reduce its cache size. There were recent changes [1] to the MDS to improve this. However, the defaults may not be aggressive enough for your client's workload. Can you try: ceph config set mds mds_recall_max_caps 10000 ceph config set mds mds_recall_max_decay_rate 1.0 Also your other mailings made me think you may still be using the old inode limit for the cache size. Are you using the new mds_cache_memory_limit config option? Finally, if this fixes your issue (please let us know!) and you decide to try multiple active MDS, you should definitely use pinning as the parallel create workload will greatly benefit from it. [1] https://ceph.com/community/nautilus-cephfs/ -- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

Janek Bevendorff

12:16 p.m.

New subject: [Ceph-users] Re: MDS failing under load with large cache sizes

...

Thank you. I was looking for config directives that do exactly this all week. Why are they not documented anywhere outside that blog post? I added them as you described and the MDS seems to have stabilized and stays just under 1M inos now. I will continue to monitor it and see if it is working in the long run. Settings like these should be the default IMHO. Clients should never be able to crash the server just by holding onto their capabilities. If a server decides to drop things from its cache, clients must deal with it. Everything else threatens the stability of the system (and may even prevent the MDS from ever starting again, as we saw).

...

Also your other mailings made me think you may still be using the old inode limit for the cache size. Are you using the new mds_cache_memory_limit config option?

No, I am not. I tried it at some point to see if it made things better, but just like the memory cache limit, it seemed to have no effect whatsoever except for delaying the health warning.

...

Finally, if this fixes your issue (please let us know!) and you decide to try multiple active MDS, you should definitely use pinning as the parallel create workload will greatly benefit from it.

I will try that, although I directory tree is quite imbalanced.

Janek Bevendorff

2:38 p.m.

New subject: [Ceph-users] Re: MDS failing under load with large cache sizes

The rsync job has been copying quite happily for two hours now. The good news is that the cache size isn't increasing unboundedly with each request anymore. The bad news is that it still is increasing afterall, though much slower. I am at 3M inodes now and it started off with 900k, settling at 1M initially. I had a peak just now of 3.7M, but it went back down to 3.2M shortly after that. According to the health status, the client has started failing to respond to cache pressure, so it's still not working as reliably as I would like it to. I am also getting this very peculiar message: MDS cache is too large (7GB/19GB); 52686 inodes in use by clients I guess the 53k inodes is the number that is actively in use right now (compared to the 3M for which the client generally holds caps). Is that so? Cache memory is still well within bounds, however. Perhaps the message is triggered by the recall settings and just a bit misleading? On 25/07/2019 09:46, Janek Bevendorff wrote: > >> It's possible the MDS is not being aggressive enough with asking the >> single (?) client to reduce its cache size. There were recent changes >> [1] to the MDS to improve this. However, the defaults may not be >> aggressive enough for your client's workload. Can you try: >> >> ceph config set mds mds_recall_max_caps 10000 >> ceph config set mds mds_recall_max_decay_rate 1.0 > > Thank you. I was looking for config directives that do exactly this > all week. Why are they not documented anywhere outside that blog post? > > I added them as you described and the MDS seems to have stabilized and > stays just under 1M inos now. I will continue to monitor it and see if > it is working in the long run. Settings like these should be the > default IMHO. Clients should never be able to crash the server just by > holding onto their capabilities. If a server decides to drop things > from its cache, clients must deal with it. Everything else threatens > the stability of the system (and may even prevent the MDS from ever > starting again, as we saw). > >> Also your other mailings made me think you may still be using the old >> inode limit for the cache size. Are you using the new >> mds_cache_memory_limit config option? > > No, I am not. I tried it at some point to see if it made things > better, but just like the memory cache limit, it seemed to have no > effect whatsoever except for delaying the health warning. > > >> Finally, if this fixes your issue (please let us know!) and you decide >> to try multiple active MDS, you should definitely use pinning as the >> parallel create workload will greatly benefit from it. > > I will try that, although I directory tree is quite imbalanced. > > _______________________________________________ > Ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io

Patrick Donnelly

11:01 p.m.

New subject: [Ceph-users] Re: MDS failing under load with large cache sizes

On Thu, Jul 25, 2019 at 3:08 AM Janek Bevendorff <janek.bevendorff(a)uni-weimar.de> wrote:

...

Based on that message, it would appear you still have an inode limit in place ("mds_cache_size"). Please unset that config option. Your mds_cache_memory_limit is apparently ~19GB. There is another limit mds_max_caps_per_client (default 1M) which the client is hitting. That's why the MDS is recalling caps from the client and not because any cache memory limit is hit. It is not recommend you increase this. -- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

Janek Bevendorff

26 Jul 26 Jul

12:19 a.m.

New subject: [Ceph-users] Re: MDS failing under load with large cache sizes

...

Based on that message, it would appear you still have an inode limit in place ("mds_cache_size"). Please unset that config option. Your mds_cache_memory_limit is apparently ~19GB.

No, I do not have an inode limit set. Only the memory limit.

...

There is another limit mds_max_caps_per_client (default 1M) which the client is hitting. That's why the MDS is recalling caps from the client and not because any cache memory limit is hit. It is not recommend you increase this.

Okay, this this setting isn't documented either and I did not change it, but it's also quite clear that it isn't working. My MDS hasn't crashed yet (without the recall settings it would have), but ceph fs status is reporting 14M inodes at this point and the number is slowly going up.

Patrick Donnelly

1:30 a.m.

New subject: [Ceph-users] Re: MDS failing under load with large cache sizes

On Thu, Jul 25, 2019 at 12:49 PM Janek Bevendorff <janek.bevendorff(a)uni-weimar.de> wrote:

...

Based on that message, it would appear you still have an inode limit in place ("mds_cache_size"). Please unset that config option. Your mds_cache_memory_limit is apparently ~19GB.

No, I do not have an inode limit set. Only the memory limit.

Can you share two captures of `ceph daemon mds.X perf dump` about 1 second apart. You can also try increasing the aggressiveness of the MDS recall but I'm surprised it's still a problem with the settings I gave you: ceph config set mds mds_recall_max_caps 15000 ceph config set mds mds_recall_max_decay_rate 0.75 -- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

Janek Bevendorff

1:44 a.m.

New subject: [Ceph-users] Re: MDS failing under load with large cache sizes

Janek Bevendorff

5 Aug 5 Aug

11:51 a.m.

New subject: [Ceph-users] Re: MDS failing under load with large cache sizes

Hi,

...

You can also try increasing the aggressiveness of the MDS recall but I'm surprised it's still a problem with the settings I gave you: ceph config set mds mds_recall_max_caps 15000 ceph config set mds mds_recall_max_decay_rate 0.75

I finally had the chance to try the more aggressive recall settings, but they did not change anything. As soon as the client starts copying files again, the numbers go up an I get a health message that the client is failing to respond to cache pressure. After this week of idle time, the dns/inos numbers (what does dns stand for anyway?) settled at around 8000k. That's basically that "idle" number that it goes back to when the client stops copying files. Though, for some weird reason, this number gets (quite) a bit higher every time (last time it was around 960k). Of course, I wouldn't expect it to go back all the way to zero, because that would mean dropping the entire cache for no reason, but it's still quite high and the same after restarting the MDS and all clients, which doesn't make a lot of sense to me. After resuming the copy job, the number went up to 20M in just the time it takes to write this email. There must be a bug somewhere.

...

Can you share two captures of `ceph daemon mds.X perf dump` about 1 second apart.

I attached the requested perf dumps. Thanks!

Patrick Donnelly

11:23 p.m.

New subject: [Ceph-users] Re: MDS failing under load with large cache sizes

On Mon, Aug 5, 2019 at 12:21 AM Janek Bevendorff <janek.bevendorff(a)uni-weimar.de> wrote:

...

Hi,

Can you share two captures of `ceph daemon mds.X perf dump` about 1 second apart.

I attached the requested perf dumps.

Thanks that helps. Looks like the problem is that the MDS is not automatically trimming its cache fast enough. Please try bumping mds_cache_trim_threshold: bin/ceph config set mds mds_cache_trim_threshold 512K Increase it further if it's not aggressive enough. Please let us know if that helps. It shouldn't be necessary to do this so I'll make a tracker ticket once we confirm that's the issue. -- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

Janek Bevendorff

6 Aug 6 Aug

12:07 p.m.

New subject: [Ceph-users] Re: MDS failing under load with large cache sizes

...

That did help. Somewhat. I removed the aggressive recall settings I set before and only set this option instead. The cache size seems to be quite stable now, although still increasing in the long run (but at least not strictly monotonically). However, now my client processes are basically in constant I/O wait state and the CephFS is slow for everybody. After I restarted the copy job, I got around 4k reqs/s and then it went down to 100 reqs/s with everybody waiting their turn. So yes, it does seem to help, but it increases latency by a magnitude. As always, it would be great if these options were documented somewhere. Google has like five results, one of them being this thread. ;-) > Increase it further if it's not aggressive enough. Please let us know > if that helps. > > It shouldn't be necessary to do this so I'll make a tracker ticket > once we confirm that's the issue.

Janek Bevendorff

12:18 p.m.

New subject: [Ceph-users] Re: MDS failing under load with large cache sizes

...

However, now my client processes are basically in constant I/O wait state and the CephFS is slow for everybody. After I restarted the copy job, I got around 4k reqs/s and then it went down to 100 reqs/s with everybody waiting their turn. So yes, it does seem to help, but it increases latency by a magnitude.

Addition: I reduced the number to 256K and the cache size started inflating instantly (with about 140 reqs/s). So I reset it to 512K and the cache size started reducing slowly, though with fewer reqs/s. So I guess it is solving the problem, but only by trading it off against severe latency issues (order of magnitude as we saw).

Patrick Donnelly

6:39 p.m.

New subject: [Ceph-users] Re: MDS failing under load with large cache sizes

On Tue, Aug 6, 2019 at 12:48 AM Janek Bevendorff <janek.bevendorff(a)uni-weimar.de> wrote:

...

> However, now my client processes are basically in constant I/O wait > state and the CephFS is slow for everybody. After I restarted the copy > job, I got around 4k reqs/s and then it went down to 100 reqs/s with > everybody waiting their turn. So yes, it does seem to help, but it > increases latency by a magnitude.

4k req/s is too fast for a create workload on one MDS. That must include other operations like getattr.

...

I wouldn't expect such extreme latency issues. Please share: ceph config dump ceph daemon mds.X cache status and the two perf dumps one second apart again please. Also, you said you removed the aggressive recall changes. I assume you didn't reset them to the defaults, right? Just the first suggested change (10k/1.0)? -- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

Janek Bevendorff

7:27 p.m.

New subject: [Ceph-users] Re: MDS failing under load with large cache sizes

...

4k req/s is too fast for a create workload on one MDS. That must include other operations like getattr.

That is rsync going through millions of files checking which ones need updating. Right now there are not actually any create operations, since I restarted the copy job.

...

I wouldn't expect such extreme latency issues. Please share: ceph config dump ceph daemon mds.X cache status

Config dump: https://pastebin.com/1jTrjzA9 Cache status: { "pool": { "items": 127688932, "bytes": 20401092561 } }

...

and the two perf dumps one second apart again please.

Perf dump 1: https://pastebin.com/US3y6JEJ Perf dump 2: https://pastebin.com/Mm02puje

...

Also, you said you removed the aggressive recall changes. I assume you didn't reset them to the defaults, right? Just the first suggested change (10k/1.0)?

Either seems to work. I added two more MDSs to split the workload and got a steady 150 reqs/s after that. Then I noticed that I still had a max segments settings from one of my earlier attempts at fixing the cache runaway issue and after removing that, I got 250-500 reqs/s, sometimes up to 1.5k (per MDS). However, to generate the dumps for you, I changed my max_mds setting back to 1 and reqs/s went down to 80. After re-adding the two active MDSs again, I am back at higher numbers, although not quite as much as before. But I think to remember that it took several minutes if not more until all MDSs received approximately equal load the last time.

Patrick Donnelly

9:20 p.m.

New subject: [Ceph-users] Re: MDS failing under load with large cache sizes

On Tue, Aug 6, 2019 at 7:57 AM Janek Bevendorff <janek.bevendorff(a)uni-weimar.de> wrote:

...

4k req/s is too fast for a create workload on one MDS. That must include other operations like getattr.

That is rsync going through millions of files checking which ones need updating. Right now there are not actually any create operations, since I restarted the copy job.

Your parallel rsync job is only getting 150 creates per second? What was the previous throughput?

...

I wouldn't expect such extreme latency issues. Please share: ceph config dump ceph daemon mds.X cache status

Config dump: https://pastebin.com/1jTrjzA9 Cache status: { "pool": { "items": 127688932, "bytes": 20401092561 } }

and the two perf dumps one second apart again please.

Perf dump 1: https://pastebin.com/US3y6JEJ Perf dump 2: https://pastebin.com/Mm02puje

The cache size looks correct here.

...

Also, you said you removed the aggressive recall changes. I assume you didn't reset them to the defaults, right? Just the first suggested change (10k/1.0)?

Okay, so you're getting a more normal throughput for parallel creates on a single MDS.

...

However, to generate the dumps for you, I changed my max_mds setting back to 1 and reqs/s went down to 80. After re-adding the two active MDSs again, I am back at higher numbers, although not quite as much as before. But I think to remember that it took several minutes if not more until all MDSs received approximately equal load the last time.

Try pinning if possible in each parallel rsync job. Here are tracker tickets to resolve the issues you encountered: https://tracker.ceph.com/issues/41140 https://tracker.ceph.com/issues/41141 -- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

Janek Bevendorff

10:22 p.m.

New subject: [Ceph-users] Re: MDS failing under load with large cache sizes

...

Your parallel rsync job is only getting 150 creates per second? What was the previous throughput?

I am actually not quite sure what the exact throughput was or is or what I can expect. It varies so much. I am copying from a 23GB file list that is split into 3000 chunks which are then processed by 16-24 parallel rsync processes. I have copied 27 of 64TB so far (according to df -h) and to my taste it's taking a lot longer than it should be doing. The main problem here is not that I'm trying to copy 64TB (drop in the bucket), the problem is that it's 64TB in tiny, small, and medium-sized files. This whole MDS mess and several pauses and restarts in between have completely distorted my sense of how far in the process I actually am or how fast I would expect it to go. Right now it's starting again from the beginning, so I expect it'll be another day or so until it starts moving some real data again.

...

The cache size looks correct here.

Yeah. Cache appears to be constant-size now. I am still getting occasional "client failing to respond to cache pressure", but that goes away as fast as it came.

...

Try pinning if possible in each parallel rsync job.

I was considering that, but couldn't come up with a feasible pinning strategy. We have all those files of very different sizes spread very unevenly across a handful of top-level directories. I get the impression that I couldn't do much (or any) better than the automatic balancer.

...

Here are tracker tickets to resolve the issues you encountered: https://tracker.ceph.com/issues/41140 https://tracker.ceph.com/issues/41141

Thanks a lot!

Janek Bevendorff

12 Aug 12 Aug

2:07 p.m.

New subject: [Ceph-users] Re: MDS failing under load with large cache sizes

I've been copying happily for days now (not very fast, but the MDS were stable), but eventually the MDSs started flapping again due to large cache sizes (they are being killed after 11M inodes). I could solve the problem by temporarily increasing the cache size in order to allow them to rejoin, but it tells me that my settings do not fully solve the problem yet (unless perhaps I increase the trim threshold even further. On 06.08.19 19:52, Janek Bevendorff wrote: >> Your parallel rsync job is only getting 150 creates per second? What >> was the previous throughput? > I am actually not quite sure what the exact throughput was or is or what > I can expect. It varies so much. I am copying from a 23GB file list that > is split into 3000 chunks which are then processed by 16-24 parallel > rsync processes. I have copied 27 of 64TB so far (according to df -h) > and to my taste it's taking a lot longer than it should be doing. The > main problem here is not that I'm trying to copy 64TB (drop in the > bucket), the problem is that it's 64TB in tiny, small, and medium-sized > files. > > This whole MDS mess and several pauses and restarts in between have > completely distorted my sense of how far in the process I actually am or > how fast I would expect it to go. Right now it's starting again from the > beginning, so I expect it'll be another day or so until it starts moving > some real data again. > >> The cache size looks correct here. > Yeah. Cache appears to be constant-size now. I am still getting > occasional "client failing to respond to cache pressure", but that goes > away as fast as it came. > > >> Try pinning if possible in each parallel rsync job. > I was considering that, but couldn't come up with a feasible pinning > strategy. We have all those files of very different sizes spread very > unevenly across a handful of top-level directories. I get the impression > that I couldn't do much (or any) better than the automatic balancer. > > >> Here are tracker tickets to resolve the issues you encountered: >> >> https://tracker.ceph.com/issues/41140 >> https://tracker.ceph.com/issues/41141 > Thanks a lot! > _______________________________________________ > Ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io

Patrick Donnelly

29 Aug 29 Aug

10:28 p.m.

New subject: [Ceph-users] Re: MDS failing under load with large cache sizes

Hi Janek, On Tue, Aug 6, 2019 at 11:25 AM Janek Bevendorff <janek.bevendorff(a)uni-weimar.de> wrote:

...

> Here are tracker tickets to resolve the issues you encountered: > > https://tracker.ceph.com/issues/41140 > https://tracker.ceph.com/issues/41141

The fix has been merged into master and will be backported soon. I've also done testing in a large cluster to confirm the issue you found. Using multiple processes to create files as fast as possible in a single client reliably reproduced the issue. The MDS cannot recall capabilities fast enough when the internal upkeep thread ran every 5 seconds. Moving the cache trimming and capability recall to a separate thread running every second resolved the issue. -- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

Janek Bevendorff

30 Aug 30 Aug

11:57 a.m.

New subject: [Ceph-users] Re: MDS failing under load with large cache sizes

...

The fix has been merged into master and will be backported soon.

Amazing, thanks! > I've > also done testing in a large cluster to confirm the issue you found. > Using multiple processes to create files as fast as possible in a > single client reliably reproduced the issue. The MDS cannot recall > capabilities fast enough when the internal upkeep thread ran every 5 > seconds. Moving the cache trimming and capability recall to a separate > thread running every second resolved the issue. > > -- > Patrick Donnelly, Ph.D. > He / Him / His > Senior Software Engineer > Red Hat Sunnyvale, CA > GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

Janek Bevendorff

5 Dec 5 Dec

10 p.m.

New subject: [Ceph-users] Re: MDS failing under load with large cache sizes

I had similar issues again today. Some users were trying to train a neural network on several million files resulting in enormous cache sizes. Due to my custom cap recall and decay rate settings, the MDSs were able to withstand the load for quite some time, but at some point the active rank crashed taking the whole CephFS down. As usual, the MDS were playing round-robin Russian roulette trying to recover the cache only to be killed by MONs after some time. I tried increasing the beacon grace time, but it didn't help, MONs were still kicking MDSs after what seemed like a random timeout. Even with the setting to wipe the MDS cache on startup, the CephFS was unable to recover. I had to manually delete the mds0_openfiles.* objects from the CephFS metadata pool of which I had a total of 9. Only then was I able to get the MDS back into a working state. I know there are some unreleased patches to improve the MDS behaviour as a result of this thread. Is there any timeline for when those will be available? This issue is rather critical. What I need is a faster cap recall (which got fixed I think, but hasn't been released so far) as well as probably some kind of hard limit after which a client has to release file handles. Right now it can still continue requesting new caps while the MDS is failing to evict sufficiently many old ones. Also, the must be a way to inspect the MDS cache and retire individual clients even when none of the ranks are active (because they are failing to rejoin). Or at least give us a setting for a clean MDS start similar to what deleting the mds0_openfiles objects does. Cheers Janek On 30/08/2019 09:27, Janek Bevendorff wrote: > >> The fix has been merged into master and will be backported soon. > > Amazing, thanks! > > >> I've >> also done testing in a large cluster to confirm the issue you found. >> Using multiple processes to create files as fast as possible in a >> single client reliably reproduced the issue. The MDS cannot recall >> capabilities fast enough when the internal upkeep thread ran every 5 >> seconds. Moving the cache trimming and capability recall to a separate >> thread running every second resolved the issue. >> >> -- >> Patrick Donnelly, Ph.D. >> He / Him / His >> Senior Software Engineer >> Red Hat Sunnyvale, CA >> GPG: 19F28A586F808C2402351B93C3301A3E258DD79D > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io

Patrick Donnelly

10:22 p.m.

New subject: [Ceph-users] Re: MDS failing under load with large cache sizes

On Thu, Dec 5, 2019 at 10:31 AM Janek Bevendorff <janek.bevendorff(a)uni-weimar.de> wrote:

...

You set mds_beacon_grace ?

...

Even with the setting to wipe the MDS cache on startup, the CephFS was unable to recover. I had to manually delete the mds0_openfiles.* objects from the CephFS metadata pool of which I had a total of 9. Only then was I able to get the MDS back into a working state.

Yes, this optimization is having some struggles with large cache sizes (ironically). Luckily, nuking the open file objects is harmless...

...

I know there are some unreleased patches to improve the MDS behaviour as a result of this thread. Is there any timeline for when those will be available?

14.2.5: https://tracker.ceph.com/issues/41467

...

This issue is rather critical. What I need is a faster cap recall (which got fixed I think, but hasn't been released so far) as well as probably some kind of hard limit after which a client has to release file handles.

MDS will soon be more aggressive about recalling caps from idle sessions, which may help: https://tracker.ceph.com/issues/41865 That'll make 14.2.6 probably. -- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

Janek Bevendorff

6 Dec 6 Dec

12:07 a.m.

New subject: [Ceph-users] Re: MDS failing under load with large cache sizes

...

You set mds_beacon_grace ?

Yes, as I said. It seemed to have no effect or at least none that I could see. The kick timeout seemed random after all. I even set it to something ridiculous like 1800 and the MDS were still timed out. Sometimes they got to 20M inodes, sometimes only to a few 100k. The ones that got further often reported slow metadata operations, the less lucky ones unhealthy MDS beacons. But none lasted for a full 1800s.

...

Yes, this optimization is having some struggles with large cache sizes

:-(

Janek Bevendorff

17 Dec 17 Dec

12:26 p.m.

New subject: [Ceph-users] Re: MDS failing under load with large cache sizes

Hey Patrick, I just wanted to give you some feedback about how 14.2.5 is working for me. I've had the chance to test it for a day now and overall, the experience is much better, although not perfect (perhaps far from it). I have two active MDS (I figured that'd spread the meta data load a little and seems to work pretty well for me). After the upgrade to the new release, I removed all special recall settings, so my MDS config is basically on default. The only thing I set is a mds_max_caps_per_client of 200k, a mds_cache_reservation of 0.1 and 40G of mds_cache_memory_limit. Right now, everything seems to be running smoothly, although I notice that the max cap setting isn't fully honoured. The overall cache size seems fairly constant at 15M (for mds.0, mds.1 a little less), but the client cap count can easily exceed 10M if I run something like `find` on a large directory. We have one particularly problematic folder containing about 400 sub folders holding a total of about 35M files among them. My first attempts at running `find -type d` on those had the weird effect that after pretty much exactly 2M caps, mds.1 got killed and replaced by a standby. Fortunately, the standby managed to take over in a matter of seconds (sometimes up to a few minutes) resetting the cap count to about 5k. The same thing then happened again once the new MDS reached the magical 2M caps. I would suppose that this is still the same problem as before, but with the huge improvement that the take-over standby MDS can actually recover. Previously, it would just die the same way after a minute or two of futile recovery attempts and the FS would be down indefinitely until I delete the openfiles object. Right now, I cannot reproduce the crash any more---the caps to surge to 10-15M, but no crash. However, I keep seeing the dreaded "client failing to respond to cache pressure" message occasionally. So far, though, the MDS have been able to keep up and reduce the number of caps after about 15M, though, so that the message disappears after a while and the cap count growth isn't entirely unbounded. I ran a `find -type d` on the most problematic folder and attached two perf dumps for you (current cap count on the client: 14660568): https://pastebin.com/W2dVJiW0 https://pastebin.com/pzQ5uQQ3 Cheers Janek P.S. Just as I was finishing this email, the rank 0 MDS actually crashed. Unfortunately, I didn't have increased debug levels enabled, so its death note is rather uninformative: 2019-12-17 09:42:12.325 7f7633dde700 1 mds.deltaweb011 Updating MDS map to version 103112 from mon.3 2019-12-17 09:43:27.774 7f7633dde700 1 mds.deltaweb011 Updating MDS map to version 103113 from mon.3 2019-12-17 09:43:40.086 7f7633dde700 1 mds.deltaweb011 Updating MDS map to version 103114 from mon.3 2019-12-17 09:44:46.203 7f7633dde700 -1 *** Caught signal (Aborted) ** in thread 7f7633dde700 thread_name:ms_dispatch Also, this time around the recovery appears to be a lot more problematic, so I'm afraid I have to apply the previous procedure again of deleting the openfiles object to get it back up. I don't think my `find` alone would have crashed the MDS, but if another client is doing similar things at the same time, it overloads the MDS.

Stefan Kooman

4:36 p.m.

New subject: [Ceph-users] Re: MDS failing under load with large cache sizes

Hi Janek, Quoting Janek Bevendorff (janek.bevendorff(a)uni-weimar.de):

...

Have you already tried to adjust the "mds_cache_memory_limit" and or "ceph tell mds.* cache drop"? I really wonder how the MDS copes with that with milions of CAPS. Gr. Stefan -- | BIT BV https://www.bit.nl/ Kamer van Koophandel 09090351 | GPG: 0xD14839C6 +31 318 648 688 / info(a)bit.nl

Janek Bevendorff

4:41 p.m.

New subject: [Ceph-users] Re: MDS failing under load with large cache sizes

...

Have you already tried to adjust the "mds_cache_memory_limit" and or "ceph tell mds.* cache drop"? I really wonder how the MDS copes with that with milions of CAPS.

I played with the cache size, yeah. I kind of need a large cache, otherwise everything is just slow and I'm constantly getting cache size warnings. ceph tell mds.* doesn't work when the FS is degraded.

Janek Bevendorff

6 Jan 6 Jan

2:35 p.m.

New subject: [Ceph-users] Re: MDS failing under load with large cache sizes

Hi, my MDS failed again, but this time I cannot recover it by deleting the mds*_openfiles .0 object. The startup behaviour is also different. Both inode count and cache size stay at zero while the MDS is replaying. When I set the MDS log level to 7, I get tons of these messages: 2020-01-06 11:59:49.303 7f30149e4700 7 mds.1.cache current root is [dir 0x10000073682 /XXX/XXX/ [2,head] auth v=5527265 cv=0/0 dir_auth=1 state=1073741824 f(v0 m2019-08-14 16:39:17.790395 4=1+3) n(v84855 rc2019-09-17 08:54:57.569803 b3226894326662 5255834=4707755+548079) hs=1+0,ss=0+0 | child=1 subtree=1 0x5608a02e7900] 2020-01-06 11:59:49.323 7f30149e4700 7 mds.1.cache adjust_subtree_auth -1,-2 -> -2,-2 on [dir 0x1000ae4a784 /XXX/XXX/ [2,head] auth v=114 cv=0/0 state=1073741824 f(v0 m2019-08-23 05:07:32.658490 9=9+0) n(v1 rc2019-09-16 15:51:58.418555 b21646377 9=9+0) hs=0+0,ss=0+0 0x5608c602cd00] 2020-01-06 11:59:49.323 7f30149e4700 7 mds.1.cache current root is [dir 0x10000073682 /XXX/XXX/ [2,head] auth v=5527265 cv=0/0 dir_auth=1 state=1073741824 f(v0 m2019-08-14 16:39:17.790395 4=1+3) n(v84855 rc2019-09-17 08:54:57.569803 b3226894326662 5255834=4707755+548079) hs=1+0,ss=0+0 | child=1 subtree=1 0x5608a02e7900] 2020-01-06 11:59:49.343 7f30149e4700 7 mds.1.cache adjust_subtree_auth -1,-2 -> -2,-2 on [dir 0x1000ae4a78b /XXX/XXX/ [2,head] auth v=102 cv=0/0 state=1073741824 f(v0 m2019-08-23 05:07:35.046498 9=9+0) n(v1 rc2019-09-16 15:51:58.478556 b1430317 9=9+0) hs=0+0,ss=0+0 0x5608c602d200] 2020-01-06 11:59:49.343 7f30149e4700 7 mds.1.cache current root is [dir 0x10000073682 /XXX/XXX/ [2,head] auth v=5527265 cv=0/0 dir_auth=1 state=1073741824 f(v0 m2019-08-14 16:39:17.790395 4=1+3) n(v84855 rc2019-09-17 08:54:57.569803 b3226894326662 5255834=4707755+548079) hs=1+0,ss=0+0 | child=1 subtree=1 0x5608a02e7900] 2020-01-06 11:59:49.363 7f30149e4700 7 mds.1.cache adjust_subtree_auth -1,-2 -> -2,-2 on [dir 0x1000ae4a78e /XXX/XXX/ [2,head] auth v=91 cv=0/0 state=1073741824 f(v0 m2019-08-23 05:07:38.986513 8=8+0) n(v1 rc2019-09-16 15:51:58.498556 b1932614 8=8+0) hs=0+0,ss=0+0 0x5608c602d700] 2020-01-06 11:59:49.363 7f30149e4700 7 mds.1.cache current root is [dir 0x10000073682 /XXX/XXX/ [2,head] auth v=5527265 cv=0/0 dir_auth=1 state=1073741824 f(v0 m2019-08-14 16:39:17.790395 4=1+3) n(v84855 rc2019-09-17 08:54:57.569803 b3226894326662 5255834=4707755+548079) hs=1+0,ss=0+0 | child=1 subtree=1 0x5608a02e7900] Is there any way I can recover the MDS? I tried wiping sessions on startup etc., but nothing worked. Thanks

Janek Bevendorff

4:02 p.m.

New subject: [Ceph-users] Re: MDS failing under load with large cache sizes

Update: turns out I just had to wait for an hour. The MDSs were sending Beacons regularly, so the MONs didn't try to kill them and instead let them finish doing whatever they were doing. Unlike the other bug where the number of open files outgrows what the MDS can handle, this incident allowed "self-healing", but I still consider this a severe bug. On 06/01/2020 12:05, Janek Bevendorff wrote: > Hi, my MDS failed again, but this time I cannot recover it by deleting > the mds*_openfiles .0 object. The startup behaviour is also different. > Both inode count and cache size stay at zero while the MDS is replaying. > > When I set the MDS log level to 7, I get tons of these messages: > > 2020-01-06 11:59:49.303 7f30149e4700 7 mds.1.cache current root is > [dir 0x10000073682 /XXX/XXX/ [2,head] auth v=5527265 cv=0/0 dir_auth=1 > state=1073741824 f(v0 m2019-08-14 16:39:17.790395 4=1+3) n(v84855 > rc2019-09-17 08:54:57.569803 b3226894326662 5255834=4707755+548079) > hs=1+0,ss=0+0 | child=1 subtree=1 0x5608a02e7900] > 2020-01-06 11:59:49.323 7f30149e4700 7 mds.1.cache adjust_subtree_auth > -1,-2 -> -2,-2 on [dir 0x1000ae4a784 /XXX/XXX/ [2,head] auth v=114 > cv=0/0 state=1073741824 f(v0 m2019-08-23 05:07:32.658490 9=9+0) n(v1 > rc2019-09-16 15:51:58.418555 b21646377 9=9+0) hs=0+0,ss=0+0 0x5608c602cd00] > 2020-01-06 11:59:49.323 7f30149e4700 7 mds.1.cache current root is > [dir 0x10000073682 /XXX/XXX/ [2,head] auth v=5527265 cv=0/0 dir_auth=1 > state=1073741824 f(v0 m2019-08-14 16:39:17.790395 4=1+3) n(v84855 > rc2019-09-17 08:54:57.569803 b3226894326662 5255834=4707755+548079) > hs=1+0,ss=0+0 | child=1 subtree=1 0x5608a02e7900] > 2020-01-06 11:59:49.343 7f30149e4700 7 mds.1.cache adjust_subtree_auth > -1,-2 -> -2,-2 on [dir 0x1000ae4a78b /XXX/XXX/ [2,head] auth v=102 > cv=0/0 state=1073741824 f(v0 m2019-08-23 05:07:35.046498 9=9+0) n(v1 > rc2019-09-16 15:51:58.478556 b1430317 9=9+0) hs=0+0,ss=0+0 0x5608c602d200] > 2020-01-06 11:59:49.343 7f30149e4700 7 mds.1.cache current root is > [dir 0x10000073682 /XXX/XXX/ [2,head] auth v=5527265 cv=0/0 dir_auth=1 > state=1073741824 f(v0 m2019-08-14 16:39:17.790395 4=1+3) n(v84855 > rc2019-09-17 08:54:57.569803 b3226894326662 5255834=4707755+548079) > hs=1+0,ss=0+0 | child=1 subtree=1 0x5608a02e7900] > 2020-01-06 11:59:49.363 7f30149e4700 7 mds.1.cache adjust_subtree_auth > -1,-2 -> -2,-2 on [dir 0x1000ae4a78e /XXX/XXX/ [2,head] auth v=91 cv=0/0 > state=1073741824 f(v0 m2019-08-23 05:07:38.986513 8=8+0) n(v1 > rc2019-09-16 15:51:58.498556 b1932614 8=8+0) hs=0+0,ss=0+0 0x5608c602d700] > 2020-01-06 11:59:49.363 7f30149e4700 7 mds.1.cache current root is > [dir 0x10000073682 /XXX/XXX/ [2,head] auth v=5527265 cv=0/0 dir_auth=1 > state=1073741824 f(v0 m2019-08-14 16:39:17.790395 4=1+3) n(v84855 > rc2019-09-17 08:54:57.569803 b3226894326662 5255834=4707755+548079) > hs=1+0,ss=0+0 | child=1 subtree=1 0x5608a02e7900] > > Is there any way I can recover the MDS? I tried wiping sessions on > startup etc., but nothing worked. > > Thanks > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io

Stefan Kooman

7 Jan 7 Jan

5:13 p.m.

New subject: [Ceph-users] Re: MDS failing under load with large cache sizes

Quoting Janek Bevendorff (janek.bevendorff(a)uni-weimar.de):

...

Just to get this straight : was your fs offline during this time? Do you have any idea why it was busy trimming it's cache (because that was wat is was doing, right?). Gr. Stefan -- | BIT BV https://www.bit.nl/ Kamer van Koophandel 09090351 | GPG: 0xD14839C6 +31 318 648 688 / info(a)bit.nl

janek.bevendorff＠uni-weimar.de

10:07 p.m.

New subject: [Ceph-users] Re: MDS failing under load with large cache sizes

1565

days inactive

1734

days old

ceph-users@ceph.io

Manage subscription

33 comments

4 participants

tags (0)

participants (4)

Janek Bevendorff
janek.bevendorff＠uni-weimar.de
Patrick Donnelly
Stefan Kooman