Hi Casey,
If Beast can't, I don't have an alternative that I'd suggest, no. Adam
seems confident that this makes sense for now, also.
Matt
On Tue, Feb 28, 2023 at 1:52 PM Casey Bodley <cbodley(a)redhat.com> wrote:
hey Matt,
On Tue, Feb 28, 2023 at 1:10 PM Matt Benjamin <mbenjami(a)redhat.com> wrote:
Removing the dependency on libcurl was one of the things I hoped to get
out of the
refactoring.
can you expand on your objections to libcurl? there's some messy http
client code in rgw, but i wouldn't necessarily attribute that to
libcurl. i feel like the only thing it's really missing, as a C
library, is support for custom allocators. even so, i don't know that
we've ever shown libcurl to be a bottleneck anywhere. i've also been
contributing to its aws sigv4 support in
https://github.com/curl/curl/pull/9995, which could allow us to remove
our own custom client-side signing code
what would you replace it with? i haven't done a recent review of c++
libraries in this space, but i don't think beast will ever solve all
of these problems for us. for one, the author never expressed an
interest in supporting HTTP/2 or 3. during my last interaction in
https://github.com/boostorg/beast/pull/2334#issuecomment-952122694, he
suggested that beast was a first draft and that he would rather start
over on a new library (now at
https://github.com/CPPAlliance/http_proto, which still only covers
HTTP/1.1)
Matt
On Tue, Feb 28, 2023 at 12:57 PM Casey Bodley <cbodley(a)redhat.com>
wrote:
>
> reviving this old thread about http clients after reading through
>
https://github.com/RobertLeahy/ASIO-cURL and discovering the
> "multi-socket flavor" of the libcurl-multi API documented in
>
https://curl.se/libcurl/c/libcurl-multi.html
>
> rgw's existing RGWHTTPManager uses the older flavor of libcurl-multi,
> which requires a background thread that polls libcurl for io and
> completions. this new flavor allows us to do all of the polling and
> timers asynchronously with asio, and only call into libcurl for
> non-blocking io when the sockets are ready to read/write. getting rid
> of the background thread makes it much easier to integrate with asio
> applications, because it removes many complications around locking and
> object lifetimes
>
> i experimented with this multi-socket API by building my own asio
> integration in
https://github.com/cbodley/ceph/pull/6. there are two
> main reasons i find this especially interesting:
>
> 1) we've been doing some prototyping for multisite sync with asio's
> c++20 coroutines. RGWHTTPManager only supports the
> optional_yield-style coroutines, so we were talking about using beast
> for this initial prototype. however, i listed several of beast's
> missing features earlier in this thread (mainly timeouts and
> connection pooling), so this new curl client could be a much better
> fit here
>
> 2) curl can be built with HTTP/3 support, and that's what we've been
> using to test rgw's prototype frontend in
>
https://github.com/ceph/ceph/pull/48178. we need a multiplexing client
> like libcurl-multi in order to test QUIC's stream multiplexing. and
> because the QUIC library depends on BoringSSL, this HTTP/3-enabled
> version of curl can't be linked against rgw (which requires OpenSSL)
> for RGWHTTPManager
>
> On Thu, Oct 28, 2021 at 12:24 PM Casey Bodley <cbodley(a)redhat.com>
wrote:
> >
> > On Thu, Oct 28, 2021 at 10:41 AM Yuval Lifshitz <ylifshit(a)redhat.com>
wrote:
> > >
> > > Hi Casey,
> > > When it comes to "dechnical debt", the main question is what is
the
ongoing cost of not making this change?
> > > Do we see memory allocation and
copy into RGWHTTPArgs as noticeable
perf issue? Maybe there is a simpler way to
resolve this specific issue?
> >
> > historically, we have seen very bad behavior from tcmalloc at high
> > thread counts in rgw, and we've been making general efforts both to
> > reduce allocations and the number of threads required. i don't think
> > anyone has tried to measure the impact of RGWHTTPArgs itself, but i do
> > see it's use of map<string, string> as low hanging fruit. and because
> > this piece is on rgw's http server side, replacing this map wouldn't
> > require any of the client stuff described above
> >
> > > It looks like the list of things to do to achieve feature parity
with libcurl is substantial.
> >
> > i agree! i wanted to start by documenting where the gaps are, to help
> > us understand the scope of a project here
> >
> > even without dropping libcurl, i think there's a lot of potential
> > cleanup in the several layers (rgw_http_client, rgw_rest_client,
> > rgw_rest_conn, rgw_cr_rest) between libcurl and multisite. for
> > multisite in general, i would really like to see it adopt similar
> > async primitives to the rest of the rgw codebase so that we can share
> > more code
> >
> > > Is there a desire by the beast maintainers to add these
capabilities?
> >
> > beast has generally positioned itself as a low-level http protocol
> > library, to serve as building blocks for higher-level client and
> > server libraries/applications. the http ecosystem is vast, so it makes
> > sense to limit the scope of any individual library. libcurl is
> > enormous, yet still only covers the client side
> >
> > though with the addition of the tcp_stream in boost 1.70
> > (
https://www.boost.org/doc/libs/1_70_0/libs/beast/doc/html/beast/release_not…
),
> > beast did take a step toward this higher
level of abstraction. it's
> > definitely worth discussing whether additional features like client
> > connection pooling would be in scope for the project. it's also worth
> > researching what other asio-compatible http client libraries are out
> > there
> >
> >
> > > Yuval
> > >
> > >
> > > On Tue, Oct 26, 2021 at 9:34 PM Casey Bodley <cbodley(a)redhat.com>
wrote:
> > >>
> > >> dear Adam and list,
> > >>
> > >> aside from rgw's frontend, which is the server side of http, we
also
> > >> have plenty of http client code
that sends http requests to other
> > >> servers. the biggest user of the client is multisite sync, which
uses
> > >> http to read replication logs
and fetch objects from other zones.
all
> > >> of this http client code is
based on libcurl, and uses its 'multi
api'
> > >> to provide an async interface
with a background thread that polls
for
> > >> completions
> > >>
> > >> it's hard to beat libcurl for stability and features, but there
has
> > >> also been interest in using asio+beast for the client ever since we
> > >> added it to the frontend. benefits there would include a nicer c++
> > >> interface, better integration with the asio async model (we do
> > >> currently have wrappers for libcurl, but they're specific to
> > >> coroutines), and the potential to use custom allocators to avoid
most
> > >> of the per-request allocations
> > >>
> > >>
> > >> to help with a comparison against beast, these are the features of
> > >> libcurl that we rely on:
> > >>
> > >> - asynchronous using the 'multi api' and a background thread
> > >> (
https://everything.curl.dev/libcurl/drive/multi)
> > >> - connection pooling (see
https://everything.curl.dev/libcurl/connectionreuse)
> > >> - ssl context and optional
certificate verification
> > >> - connect/request timeouts
> > >> - rate limits
> > >>
> > >> see RGWHTTPClient::init_request() in rgw_http_client.cc for all of
the
> > >> specific CURLOPT_ features
we're using now
> > >>
> > >> also noteworthy is curl's support for http/1.1, http/2, and http/3
> > >> (
https://everything.curl.dev/libcurl-http/versions)
> > >>
> > >>
> > >> asio does not have connection pooling or connect timeouts (though
it
> > >> has the components necessary to
build them), and beast only
supports
> > >> http/1.1. i think everything
else in the list is covered:
> > >>
> > >> ssl support comes from boost::asio::ssl and ssl_stream
> > >>
> > >> there's a tcp_stream class
> > >> (
https://www.boost.org/doc/libs/1_70_0/libs/beast/doc/html/beast/ref/boost__…
)
> > >> that wraps a tcp socket and
adds rate limiting and timeouts. we use
> > >> that in the frontend, though we're tracking a performance
regression
> > >> related to its timeouts in
https://tracker.ceph.com/issues/52333
> > >>
> > >> there's a very nice http::fields class
> > >> (
https://www.boost.org/doc/libs/1_70_0/libs/beast/doc/html/beast/ref/boost__…
)
> > >> for headers that has custom
allocator support. there's an
> > >> 'http_server_fast' example at
> > >>
https://www.boost.org/doc/libs/1_70_0/libs/beast/example/http/server/fast/h…
> > >> that uses the custom allocator
in
> > >>
https://www.boost.org/doc/libs/1_70_0/libs/beast/example/http/server/fast/f…
.
> > >> i'd love to see something
like that replace our use of map<string,
> > >> string> for headers in RGWHTTPArgs during request processing
> > >>
> > >>
> > >> for connection pooling with asio, i did explore this for a while
with
> > >> Abhishek in
https://github.com/cbodley/nexus/tree/wip-connection-pool/include/nexus/htt…
.
> > >> it had connect timeouts and
some test coverage in
> > >>
https://github.com/cbodley/nexus/blob/wip-connection-pool/test/http/test_co…
,
> > >> but needs more work. for
example, each connection_pool is
constructed
> > >> with one hostname/port. there
also needs to be a map of these
pools,
> > >> keyed either on hostname/port
or resolved address, so we can cache
> > >> connections for any url the client requests
> > >>
> > >> i was also imagining higher-level interfaces like http::async_get()
> > >> (and head/put/post/etc) that would hide the use of connection
pooling
> > >> entirely, and use beast's
request/response concepts to write the
> > >> request and read its response. this is also a good place to
implement
> > >> retries. i explored this idea
in a separate repo here
> > >>
https://github.com/cbodley/requests/tree/master/include/requests
> > >>
> > >> with asio, we can attach a connection pooling service as an
> > >> io_context::service that gets created automatically on first use,
and
> > >> saved over the lifetime of the
io_context. the application would
have
> > >> the option to configure it, but
doesn't have to know anything
about it
>> otherwise
>>
>> overloading those high-level interfaces could also provide a good
>> abstraction to support http 2 and 3, where their connection pools
>> would just have one connection per address, and each request would
>> open its own stream
>>
>> _______________________________________________
>> Dev mailing list -- dev(a)ceph.io
>> To unsubscribe send an email to dev-leave(a)ceph.io
>>
_______________________________________________
Dev mailing list -- dev(a)ceph.io
To unsubscribe send an email to dev-leave(a)ceph.io
--
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103
http://www.redhat.com/en/technologies/storage
tel. 734-821-5101
fax. 734-769-8938
cel. 734-216-5309
--
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103
tel. 734-821-5101
fax. 734-769-8938
cel. 734-216-5309