High should be the default with WPQ.
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
On Mon, Aug 26, 2019 at 10:44 AM Paul Emmerich <paul.emmerich(a)croit.io>
wrote:
WPQ has been the default queue for quite some
time now (Luminous?).
However, the default cut off is low. I remember changing this in some
early jewel (or kraken?) version to high and it helped a lot with the
only cluster we had back then.
We've been running all of our clusters with cut off high since then,
any reason why this isn't the default?
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at
https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
On Mon, Aug 26, 2019 at 6:21 PM Robert LeBlanc <robert(a)leblancnet.us>
wrote:
Frank,
I wrote the wpq and the cut off code because the only scheduler at the
time was
not servicing other priorities under extreme load. The default op
scheduler prioritized replication ops in the strict queue which meant as
long as there were any ops from other OSDs for replication, no client or
backfill ops would be serviced. Once the strict queue was empty then it
would start dequeing client ops, but the way the token bucket code worked
it would drain the client queue quickly and then start running
backfill/recovery ops which didn't drain that bucket as fast. This did not
sit well with our VMs with heavy write loads.
I wrote WPQ to dequeue each op priority based on the weight of the op
rather than
token bucket queue and showed that it proportionally dequeued
ops based on the priority. It meant that sometimes higher priority ops
would be blocked to run a lower priority op but no queue was ever starved
from dequeunig an op like before. An op that had twice the priority of
another op had twice the probability of being dequed. The op scheduler in
Ceph actually consists of two queues, a strict priority queue and a TB/WPQ
queue. The cut off refers to the op priority number that separates the
strict priority queue from the WPQ or default token bucket. By setting it
to high, you are telling Ceph to include the replication ops in the token
bucket or WPQ rather than the strict queue and only allows very small ops
that don't require disk access to be in the strict priority queue
(heartbeats, Mon messages, OSD messages, etc) so that all the slow work is
prioritized by the WPQ/TB queue.
With this, we found that we didn't need QoS as all client now got a
fair share
of I/O instead of some clients being 'lucky' to land on a
non-busy OSD and send many rep ops to a busy OSD who could only service
replication ops and never any client ops. I also found that op priorities
worked as expected. We could raise the number of backfill operations on an
OSD and it would negligibly impact clients as it started using only idle
capacity to do the backfill and prioritize client traffic. I assume that if
you change the op priority of the different classes of ops, that it would
work more predictably with WPQ, but I don't think that you can change it on
the fly and would require an OSD reboot which I could not do at the time I
tried.
The WPQ did not prevent all blocked I/O, but what it did was prevent
any single
client from being blocked indefinitely. I saw latencies become
very tight across all clients, instead of some clients having very good
latency and other extremely poor latency, each client had statistically the
same latency. No longer was the cluster limited by the slowest drive in the
cluster, the OSD with the slow drive would now execute client ops sending
rep ops to other OSD and helping to generate more load on a less loaded OSD
which would then possibly reduce the load on the overloaded OSD (because
now the idle OSD had other work to do other than just servicing client
ops). This allowed the cluster to appropriately throttle clients by
increasing latency on all clients in a more uniform manner. It allows the
cluster to achieve 100% utilization at the same time.
WPQ was planned to be the default scheduler, but I left the company I
was working
for shortly after getting it merged and my new company wasn't
doing object storage so I wasn't there to see it become the default. I'm at
a new company and again working with Ceph and have made it the default on
our two large production clusters with great success. The client latencies
and backfill pain that my co-workers experienced on a daily basis have been
all alleviated since moving to WPQ.
Honestly, WPQ may do what you need without having to try to configure
QoS.
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
On Sat, Aug 24, 2019 at 2:08 AM Frank Schilder <frans(a)dtu.dk> wrote:
>
> Hi Robert,
>
> thanks for your reply. These are actually settings I found in cases I
referred
to with "other cases" in my mail. These settings could be a first
step. Looking at the documentation, solving the overload problem might
require some QoS settings I found below the description of "osd op queue"
https://docs.ceph.com/docs/master/rados/configuration/osd-config-ref/#opera…
.
>
> I see some possibilities, but I'm not sure how to use these settings
to
enforce load dependent rate limiting on clients. As far as I can see,
IOPs QoS does not take backlog into account, which would be important for
distinguishing a burst from a sustained overload. In addition, this
requires mClock, which is labelled experimental.
>
> If anyone could shed some light on what possibilities currently exist
beyond
playing with "osd op queue" and "osd op queue cut off" that would be
great. Also if there is some experience out there about this problem.
>
> For example, would reducing "osd client op priority" have any effect?
As far as I can see, this is only for weighting between recovery and client
IO, not for priority of IO already in flight versus new client OPS.
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
________________________________________
From: Robert LeBlanc <robert(a)leblancnet.us>
Sent: 23 August 2019 17:28
To: Frank Schilder
Cc: ceph-users
Subject: Re: [ceph-users] ceph fs crashes on simple fio test
The WPQ scheduler may help your clients back off when things get busy.
Put this in your ceph.conf and restart your OSDs.
osd op queue = wpq
osd op queue cut off = high
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io