Hi Abhishek
On 2 Nov 2020, at 14:54, Abhishek Lekshmanan
<abhishek(a)suse.com> wrote:
There isn't much in terms of code changes in the scheduler from
v15.2.4->5. Does the perf dump (`ceph daemon perf dump <client.rgw-name>
`) on RGW socket show any throttle counts?
I know, I was wondering if this somehow might have an influence, but I’m likely wrong:
https://github.com/ceph/ceph/commit/c43f71056322e1a149a444735bf65d80fec7a7ae
<https://github.com/ceph/ceph/commit/c43f71056322e1a149a444735bf65d80fec7a7ae>
As for the perf counters, I don’t see anything interesting. I dumped the current state,
but I don’t know how interesting this is:
https://gist.github.com/href/a42c30e001789f005e9aa748f6f858fc
<https://gist.github.com/href/a42c30e001789f005e9aa748f6f858fc>
At the moment we don’t see any errors, but I do already count 135 incomplete requests in
the current log (out of 3 Million).
This number is typical for most days, where we’ll see something like 150 such requests.
Our working theory is that out of the 1024 maximum outstanding requests of the throttler,
~150 get lost every day to those incomplete requests, until our need for up to 400
requests per instance can no longer be met (first a few will be over the watermark, then
more, then all).
For those incomplete requests we know that the following line is executed, producing
“starting new request”:
https://github.com/ceph/ceph/blob/8f393c0fc1886a369d213d5e5791c10cb1591828/…
<https://github.com/ceph/ceph/blob/8f393c0fc1886a369d213d5e5791c10cb1591828/src/rgw/rgw_process.cc#L187>
However, it never reaches “req done” in the same function:
https://github.com/ceph/ceph/blob/master/src/rgw/rgw_process.cc#L350
<https://github.com/ceph/ceph/blob/master/src/rgw/rgw_process.cc#L350>
That entry, and the “beast” entry is missing for those few requests.
Cheers, Denis