Eric

J. Eric Ivancich

he / him / his
Red Hat Storage
Ann Arbor, Michigan, USA

On Oct 26, 2020, at 7:44 AM, Yongseok Oh <yongseok.oh@linecorp.com> wrote:

Hi Josh and maintainers,

We have confirmed again that the dmClock algorithm does not address multiple clients running on the same subvolume. In the dmClock, the client tracker monitors how much workload has been performed on each server and acts as a sort of scheduler through sending rho/delta values the servers. Our simple idea was that the total QoS IOPS is divided into multiple clients within a subvolume based on the workload dynamics or evenly manner. For example, assuming that 1000 IOPS is allocated to a subvolume and then the value is shared to periodically multiple clients. If 100 clients simultaneously issue requests on the same volume, each client can consume 10 IOPS by mClock scheduler.

Like this, we can consider a client workload metric-based approach, but it is not easy to ensure QoS stability as client workloads are dynamically changed and the time period to obtain metrics affects the allocation accuracy. Additionally, per client session QoS can instead be considered, but it is difficult to predict and limit the number of sessions in the subvolume.

For this reason, instead of applying dmClock, mClock scheduler can be considered as a good solution to the noisy neighbor problem. Expected per subvolume QoS IOPS can also be calculated as follows. (Assuming MDS and OSD requests are almost evenly distributed and reservation and weight values are omitted for brief exploration.)

[MDS]
- Per MDS Limit IOPS * # of MDSs (Perf MDS Limit IOPS * 1 when ephemeral random pinning is configured.)
[OSD]
- Per OSD Limit IOPS * # of OSDs

Of course, depending on the workload or server conditions, the above equations may not be 100% satisfied. However, the QoS scheduler can be implemented without the client and manager modifications.

There are any other comments from a CephFS point of view?

Finally, we are going to implement our prototype that the mClock scheduler is applied to MDS and then make a pull request to share and discuss them.

Thanks

Yongseok
_______________________________________________
Dev mailing list -- dev@ceph.io
To unsubscribe send an email to dev-leave@ceph.io