Re: [ceph-users] RGW blocking on large objects

18 Oct 2019

On 10/17/19 4:00 PM, Robert LeBlanc wrote:
...
  On Thu, Oct 17, 2019 at 11:46 AM Casey Bodley
&lt;cbodley(a)redhat.com&gt; wrote:

 On 10/17/19 12:59 PM, Robert LeBlanc wrote:
  On Thu, Oct 17, 2019 at 9:22 AM Casey Bodley
&lt;cbodley(a)redhat.com&gt; wrote:

  With respect to this issue, civetweb and beast
should behave the same.
 Both frontends have a large thread pool, and their calls to
 process_request() run synchronously (including blocking on rados
 requests) on a frontend thread. So once there are more concurrent client
 connections than there are frontend threads, new connections will block
 until there's a thread available to service them.  Okay, this really helps me
understand what's going on here. Is there
 plans to remove the synchronous calls and make them async or improve
 this flow a bit?  Absolutely yes, this work has been in progress for a long time
now, and
 octopus does get a lot of concurrency here. Eventually, all of
 process_request() will be async-enabled, and we'll be able to run beast
 with a much smaller thread pool.  This is great news. Anything we can do to help in
this effort as it is
 very important for us? 
We would love help here. While most of the groundwork is done, so the 
remaining work is mostly mechanical.

To summarize the strategy, the beast frontend spawns a coroutine for 
each client connection, and that coroutine is represented by a 
boost::asio::yield_context. We wrap this in an 'optional_yield' struct 
that gets passed to process_request(). The civetweb frontend always 
passes an empty object (ie null_yield) so that everything runs 
synchronously. When making calls into librados, we have a 
rgw_rados_operate() function that supports this optional_yield argument. 
If it gets a null_yield, it calls the blocking version of 
librados::IoCtx::operate(). Otherwise it calls a special 
librados::async_operate() function which suspends the coroutine until 
completion instead of blocking the thread.

So most of the remaining work is in plumbing this optional_yield 
variable through all of the code paths under process_request() that call 
into librados. The rgw_rados_operate() helpers will log a "WARNING: 
blocking librados call" whenever they block inside of a beast frontend 
thread, so we can go through the rgw log to identify all of the places 
that still need a yield context. By iterating on this process, we can 
eventually remove all of the blocking calls, then set up regression 
testing to verify that no rgw logs contain that warning.

Here's an example pr from Ali that adds the optional_yield to requests 
for bucket instance info: https://github.com/ceph/ceph/pull/27898. It 
extends the get_bucket_info() call to take optional_yield, and passes 
one in where available, using null_yield to mark the synchronous cases 
where one isn't available.

>
>>> Currently I'm seeing 1024 max concurrent ops and 512 thread pool. Does
>>> this mean that on an equally distributed requests that one op could be
>>> processing on the backend RADOS with another queued behind it waiting?
>>> Is this done in round robin fashion so for 99% small io, a very long
>>> RADOS request can get many IO blocked behind it because it is being
>>> round-robin dispatched to the thread pool? (I assume the latter is
>>> what I'm seeing).
>>>
>>> rgw_max_concurrent_requests                                1024
>>> rgw_thread_pool_size                                       512
>>>
>>> If I match the two, do you think it would help prevent small IO from
>>> being blocked by larger IO?
>> rgw_max_concurrent_requests was added in support of the beast/async
>> work, precisely because (post-Nautilus) the number of beast threads will
>> no longer limit the number of concurrent requests. This variable is what
>> throttles incoming requests to prevent radosgw's resource consumption
>> from ballooning under heavy workload. And unlike the existing model
>> where a request remains in the queue until a thread is ready to service
>> it, any requests that exceed rgw_max_concurrent_requests will be
>> rejected with '503 SlowDown' in s3 or '498 Rate Limited' in
swift.
>>
>> With respect to prioritization, there isn't any by default but we do
>> have a prototype request scheduler that uses dmclock to prioritize
>> requests based on some hard-coded request classes. It's not especially
>> useful in its current form, but we do have plans to further elaborate
>> the classes and eventually pass the information down to osds for
>> integrated QOS.
>>
>> As of nautilus, though, the thread pool size is the only effective knob
>> you have.
> Do you see any problems with running 2k-4k threads if we have the RAM to do so?
>
> ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

2024

2023

2022

2021

2020

2019

Re: [ceph-users] RGW blocking on large objects