New subject: ceph fs crashes on simple fio test

12 Sep 2019

On Tue, Sep 10, 2019 at 1:11 PM Frank Schilder &lt;frans(a)dtu.dk&gt; wrote:

...
  Hi Robert,

 I have meta data on SSD (3xrep) and data on 8+2 EC on spinning disks, so
 the speed difference is orders of magnitudes. Our usage is quite meta data
 heavy, so this suits us well. In particular since EC pools are high
 throughput with large IO sizes.

 As long as one uses fio with direct=1 (probably also if using sync=1
 and/or fsync=1), everything is fine and behaves as you describe. IOPs
 fluctuate but adjust to media speed. No problems at all.

 As mentioned in my last update (I cut it out below), the destructive fio
 command runs with direct=0 and neither sync=1 nor fsync=1. This test just
 writes as fast as it can (to buffers) without waiting for acks. I would
 expect that a ceph client would translate that to synced or direct IO,
 which would be fine.

 But it doesn't. Instead, it pushes the IO also as fast as possible to the
 cluster. I have seen 40kops write on the EC pool (on 100+ HDDs) that can
 handle maybe 1kops write in total. The queues were constantly increasing at
 an incredible rate (several hundred ops per second). I hope with the change
 of cut_off=high that heartbeats will not get lost any more, but this will
 still destabilize our ceph cluster quite dramatically.

Changing the cut_off to high will not allow heartbeats to not get lost
(heartbeats have a priority far above the high mark). What cut_off = high
does is put replication ops into the main queue instead of the strict
priority queue. That way an OSD doesn't get DDOSed from it's peers and is
never able to service it's own clients.

When I did my fio testing, was on FireFly/Hammer and on RBD, so I can't
talk specifically to newer versions and CephFS. We haven't had time to set
up our test cluster, so I can't run benches at the moment.

...
  My problem is not so much that such an IO pattern
could occur in
 reasonable software, but
 - that someone might try just for fun, and that
 - the number of 500+ clients might occasionally produce such a workload by
 aggregation.

 I find it somewhat alarming that a storage system that promises data
 integrity and reliability can be taken down with a publicly available
 benchmark tool in a matter of a few dozen seconds by ordinary users.
 Potentially with damaging effects. I guess something similar could be
 achieved with a modified rogue client.

 I would expect that a storage cluster should have basic self-defence
 mechanisms that prevent this kind of overload or DOS attack by throttling
 clients with crazy IO requests. Are there any settings that can be enabled
 to prevent this from happening?

----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

Re: ceph fs crashes on simple fio test