Hi Robert,
thanks for your reply. These are actually settings I found in cases I referred to with
"other cases" in my mail. These settings could be a first step. Looking at the
documentation, solving the overload problem might require some QoS settings I found below
the description of "osd op queue"
https://docs.ceph.com/docs/master/rados/configuration/osd-config-ref/#opera… .
I see some possibilities, but I'm not sure how to use these settings to enforce load
dependent rate limiting on clients. As far as I can see, IOPs QoS does not take backlog
into account, which would be important for distinguishing a burst from a sustained
overload. In addition, this requires mClock, which is labelled experimental.
If anyone could shed some light on what possibilities currently exist beyond playing with
"osd op queue" and "osd op queue cut off" that would be great. Also if
there is some experience out there about this problem.
For example, would reducing "osd client op priority" have any effect? As far as
I can see, this is only for weighting between recovery and client IO, not for priority of
IO already in flight versus new client OPS.
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
________________________________________
From: Robert LeBlanc <robert(a)leblancnet.us>
Sent: 23 August 2019 17:28
To: Frank Schilder
Cc: ceph-users
Subject: Re: [ceph-users] ceph fs crashes on simple fio test
The WPQ scheduler may help your clients back off when things get busy.
Put this in your ceph.conf and restart your OSDs.
osd op queue = wpq
osd op queue cut off = high
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1