Hi Yongseok, apologies for the delayed response.
On 10/1/20 7:49 AM, Yongseok Oh wrote:
Hi Josh and Greg,
Let me try to recall major issues related to dmClock QoS scheduler from the previous PRs
and your helpful comments.
[MDS]
- NFS Ganesha. Exploiting NFS Ganesha is one of the possible solutions to the client
QoS. But, I think introducing a new layer has the potential to cause limitations in terms
of overall performance or scalability. It cannot also cover the CephFS native clients.
- Multiple different clients on the same subvolume. In this case, subvolume is probably
shared with multiple clients, where each client maintains their own QoS tracker, resulting
in inappropriate scheduling. In other words, another layer (or process) must monitor the
global QoS state among clients. Now we can simply devise a client group approach that each
client group constitutes numerous clients and allocated IOPS are divided equally or based
on a policy. For instance, there are four clients in a group running on the same volume
and 1000 IOPS are given per group, each client can satisfy a 250 IOPS. Another approach
(more complex) is that a client perf monitor classifies clients' workloads and
dynamically tunes and reallocates their IOPS based on workloads.
This seems like a good approach.
[OSD]
- multiple OP queue shards per OSD. Previously, they have mentioned [1] that it is
difficult to provide the right QoS with the less in-flight request as the number of shards
increases and requests are distributed to shards. To overcome this issue, they have
proposed two solutions. One is that the number of shards is just set to ‘1’. The other is
Outstanding I/O, namely OIO, throttler that gathers many requests as much as possible with
the insertion of short latency. As clients cannot distinguish between shards in an OSD,
they have come up with the shard identifier along with OSD ID [2]. For background
operations, normalized rho/delta values are calculated based on their average numbers [3].
Settings shards to 1 is simpler, and has shown the same performance when
we keep the total number of threads constant (i.e. 1 shard x 16
threads, instead of today's default of 8 x 2 on ssd). We'll propose
changing the defaults to reflect this before Pacific.
[Compatibility]
- One of the feature bits needs to be allotted to QoS for compatibility. Since the bits
are very limited resources, further discussion and confirmation by maintainers are
required [4].
We may be able to piggyback on the SERVER_PACIFIC feature bit to avoid
consuming another one. Ilya, any concern about that from the kernel
client?
That depends entirely on what else is going to be covered by
SERVER_PACIFIC. Unless it's just client aware QoS, we should try
to avoid conditioning anything else that is client facing (or can
potentially become client facing in the future) on it. One example
we ran into in the past was a X+2 or X+3 version of some encoding
depending on an otherwise optional feature (i.e. no way to get the
version with a new client facing field without adding some form of
support for the feature).
Thanks,
Ilya