February 2023 - Dev - lists.ceph.io

Ceph Leadership Team Meeting, Feb 22 2023 Minutes

by Ernesto Puerta

Hi Cephers, These are the minutes of today's meeting (quicker than usual since some CLT members were at Ceph Days NYC): - *[Yuri] Upcoming Releases:* - Pending PRs for Quincy - Sepia Lab still absorbing the PR queue after the past issues - [Ernesto] Github started sending dependabot alerts to devels (previously it was only sent to org admins) - https://github.blog/2023-01-17-dependabot-alerts-are-now-visible-to-more-de… - Most don't necessarily involve a risk (e.g.: Javascript dependency only exploitable in a back-end/node.js server)... - ... but it might still cause some unnecessary concern among devs/users regarding Ceph security status - Current list of vulnerable dependencies: https://github.com/ceph/ceph/security/dependabot - 40% are Dashboard Javascript ones (most could be dismissed since only impact when used on node.js apps) - Remaining ones are: - Python: requirements.txt (not relevant since Python package versions change with every distro and we assume distro-maintainers will fix those) - It might become more relevant when we start packaging Python deps ( https://github.com/ceph/ceph/pull/47501/) - Golang: "/examples/rgw" path (Casey opened https://tracker.ceph.com/issues/58828, but maybe we should just dismiss the alert?) - [Ernesto] Enabling Github Auto-merge feature in the Ceph repo - https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/i… - Use case: - There's a PR with approvals but flaky CI tests (API, make check, ...) (example: https://github.com/ceph/ceph/pull/50201) - We could retrigger tests and come back to the PR page multiple times until all tests pass... - ... Or we just click the "Auto-merge" button, fill out the merge message as usual, and let Github merge it when the CI tests pass. - It'd reduce cognitive load, especially with small PRs (docs, backport PRs) where the overhead of the PR process is more noticeable. - There's still one issue: - Keeping Redmine in sync with Github - It could be done: when clicking the Auto-merge or still requiring reviewers to poll the PR until passed and then updating Redmine (not ideal) - A Github action that update a tracker when Github merges the PR would be very useful - Yuri/Ilya: discussion around backport requirement reverse order (needs-qa label vs. approvals vs. CI tests passing). - Greg pointed out the risks of auto-merge merging PRs with patches submitted after passing requirements or approvals. Auto-merge status should be reset on new commits. - Decision: not to enable it. - Yuri suggested auto-labeling PRs with passing CI, so they better know when to start QA testing. - Separate discussion on CI flakiness & stability and lack of clear points of contact (Kefu and David did that). For unit tests it's clear that affected teams should do that, but for infrastructure issues there's still a vacuum. Kind Regards, Ernesto

1 year, 2 months

1
0
0 0

rgw: breaking changes to the rgw qa suite merging next Monday

by Casey Bodley

there are some upcoming changes to the rgw qa suite [1] and its accompanying s3-tests [2] and ragweed repos [3] that, once merged, will cause earlier ceph-ci builds to fail the rgw suite this just means that ceph-ci branches and suite-branches will need to rebase on main after these merges, so plan accordingly. we don't usually announce these changes, but with the reef freeze on the horizon i don't want to delay anyone's testing [1] https://github.com/ceph/ceph/pull/49950 [2] https://github.com/ceph/s3-tests/pull/487 [3] https://github.com/ceph/ragweed/pull/26

1 year, 2 months

1
1
0 0

Re: RGW encrypt is implemented by qat batch and queue mode

by Casey Bodley

On Mon, Feb 13, 2023 at 8:48 PM Feng, Hualong <hualong.feng(a)intel.com> wrote: > > > > > -----Original Message----- > > From: Casey Bodley <cbodley(a)redhat.com> > > Sent: Wednesday, October 12, 2022 11:11 PM > > To: Feng, Hualong <hualong.feng(a)intel.com> > > Cc: Mark Kogan <mkogan(a)redhat.com>; Tang, Guifeng > > <guifeng.tang(a)intel.com>; Ma, Jianpeng <jianpeng.ma(a)intel.com>; > > dev(a)ceph.io > > Subject: Re: RGW encrypt is implemented by qat batch and queue mode > > > > On Thu, Sep 22, 2022 at 9:31 PM Feng, Hualong <hualong.feng(a)intel.com> > > wrote: > > > > > > > > > > > > > -----Original Message----- > > > > From: Casey Bodley <cbodley(a)redhat.com> > > > > Sent: Wednesday, September 21, 2022 10:20 PM > > > > To: Feng, Hualong <hualong.feng(a)intel.com> > > > > Cc: Mark Kogan <mkogan(a)redhat.com>; Tang, Guifeng > > > > <guifeng.tang(a)intel.com>; Ma, Jianpeng <jianpeng.ma(a)intel.com>; > > > > dev(a)ceph.io > > > > Subject: Re: RGW encrypt is implemented by qat batch and queue mode > > > > > > > > On Mon, Sep 19, 2022 at 4:06 AM Feng, Hualong > > > > <hualong.feng(a)intel.com> > > > > wrote: > > > > > > > > > > Hi Mark, Casey > > > > > > > > > > > > > > > > > > > > Could you spare some time to help review these two PRs or add them > > > > > to > > > > your plan? > > > > > > > > > > > > > > > > > > > > The PR link is below: > > > > > > > > > > https://github.com/ceph/ceph/pull/47040 > > > > > > > > > > https://github.com/ceph/ceph/pull/47845 > > > > > > > > > > > > > > > > > > > > I reimplemented the qat encryption plugin. Since the existing RGW > > > > encryption uses 4KB as an encryption unit, the performance is poor > > > > when the qat batch interface is not used. Now I have reimplemented > > > > the encryption plug-in using the qat batch interface, which is done > > > > in two PRs. PR47040 is used to realize that when the encrypted data > > > > block is larger than 128KB, 32 pieces of 4K data are taken out for a > > > > batch submission each time. PR47845 is based on PR47040, each time > > > > the encrypted data block is smaller than 128KB, it is put into a > > > > buffer queue first, and when 32 pieces of 4K data or timeout can be > > reached, a batch submission is performed. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The performance result is below, and moreover, the higher the CPU > > > > > usage, > > > > the more obvious the effect of qat. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From the flame graph, the proportion of the encryption plug-in > > > > implemented by qat in the RGWPutObj::execute function is lower than > > > > that of the encryption plug-in implemented by isal. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > -Hualong > > > > > > > > hey Hualong et al, (cc dev list) > > > > > > > > thanks for reaching out, this really helps me understand what those > > > > PRs are trying to accomplish > > > > > > > > in general i'm concerned about the need for threads, locking, and > > > > buffering down in the crypto plugins. ideally this stuff would be > > > > under the application's control. in radosgw, we've been trying to > > > > eliminate any blocking waits on condition variables in our i/o path > > > > now that requests are handled in coroutines - instead of blocking an > > > > entire thread, we just suspend the coroutine and run others in the > > > > meantime > > > > > > I agree with your view, but now crypto function calls are still using the > > synchronous interface. If we don't want the plugin to contain condition > > variables, we need to implement the plugin in an asynchronous way and > > provide an asynchronous interface. This requires the RGW to call the > > interface to make changes. > > > > > > And the number of QAT instances is difficult to keep consistent with the > > number of threads. The number of QAT instance(hardware resources) is > > limited. When the number of instances is less than the number of threads, we > > still need to wait for the free instance. If the asynchronous interface is used, > > we can use the queue as a buffer to avoid blocking the current thread while > > waiting for a free instance. > > > > > > If it is still a synchronous interface, there is no good way to eliminate the > > condition variable. Do you have a better suggestion here? > > > > below you suggest that we could fall back to CPU processing for small object > > uploads. could we use that same fallback in the cases where we'd otherwise > > have to block waiting for a QAT instance? > > > > > > > > > seeing that graph by object size, my first impression was that > > > > radosgw should be using bigger buffers. > > > > > > Use a bigger buffer? You mean we should change the encrypted > > CHUNK_SIZE, from the current 4096B, to a bigger one? > > > Or other buffers? > > > > sorry not the CHUNK_SIZE, but the total amount of data we can feed into QAT > > at a time. i see in https://github.com/ceph/ceph/pull/47040 > > that you've found the loop in AES_256_CBC::cbc_transform() which breaks > > the input into CHUNK_SIZEd pieces, and you converted that loop into a single > > batch() call - that part looks great > > > > if each call to cbc_transform() is getting large enough buffers, it could acquire > > exclusive access to one QAT instance, feed all of its data through it, then > > release the instance back to a pool. for a large object workload it seems like > > this strategy would best utilize the hardware resources, because you never > > have to coordinate a single batch across several threads. you just need to > > acquire/release access to a QAT instance every 4MB, which you can use for > > 32 batches (assuming batch size is 32*4k=128k?) at a time > > > > > > > > > GetObj and PutObj are both reading data in 4MB chunks, maybe we can > > > > find a way to use the qat batch interfaces within those chunks? > > > > > > Yes, they are both reading data in 4MB. > > > But when the object we put is larger than 4MB, the data block size when > > calling the encryption function is not necessarily 4MB. > > > > > > Such as the below that put an object, but the data block size that the > > > encryption function using is 64KB PUT /examplebucket/chunkObject.txt > > > > > > content-encoding:aws-chunked > > > content-length:66824 > > > host:s3.amazonaws.com > > > x-amz-content-sha256:STREAMING-AWS4-HMAC-SHA256-PAYLOAD > > > x-amz-date:20130524T000000Z > > > x-amz-decoded-content-length:66560 > > > x-amz-storage-class:REDUCED_REDUNDANCY > > > Authorization:AWS4-HMAC-SHA256 > > > > > Credential=AKIAIOSFODNN7EXAMPLE/20130524/us-east-1/s3/aws4_request > > ,Sig > > > > > nedHeaders=content-encoding;content-length;host;x-amz-content-sha256;x > > > -amz-date;x-amz-decoded-content-length;x-amz-storage-class,Signature=4 > > > > > f232c4386841ef735655705268965c44a0e4690baa4adea153f7db9fa80a0a9 > > > --------------- > > > > > 10000;chunk-signature=ad80c730a21e5b8d04586a2213dd63b9a0e99e0e230 > > 7b0ad > > > e35a65485a288648 > > > <65536-bytes> > > > --------------- > > > > > 10000;chunk-signature=ad80c730a21e5b8d04586a2213dd63b9a0e99e0e230 > > 7b0ad > > > e35a65485a288648 > > > <65536-bytes> > > > > are you sure that this http-level chunking has an effect on the buffer sizes that > > encryption sees? it may cause the buffers to be segmented at 64k, but > > encryption and decryption both call bufferlist::c_str() to reallocate a single > > contiguous buffer: > > https://github.com/ceph/ceph/blob/9aa8bed/src/rgw/rgw_crypt.cc#L490 > > > > so i'd still expect this loop in RGWPutObj::execute() to read up to > > rgw_max_chunk_size at a time: > > https://github.com/ceph/ceph/blob/fc01eeb7/src/rgw/rgw_op.cc#L4111-L41 > > 41 > > > > if there are cases where the RGWPutObj_BlockEncrypt filter isn't getting large > > enough buffers, we can use the same strategy as > > https://github.com/ceph/ceph/pull/21479, where we improved compression > > ratios by adding a buffering filter in front > > > > > > > > > that could avoid the need for cross-thread queues and > > > > synchronization. compared to your approach in > > > > https://github.com/ceph/ceph/pull/47845, i imagine this would show > > > > less of a benefit for small object uploads, but more of a benefit > > > > for the big ones. do you think this could work? > > > > > > If in order to avoid the need for cross-thread queues and show less of a > > benefit for small object uploads, we can turn small objects to CPU processing. > > Only for big object, we use QAT batch api. > > > > > > Hi Casey > > > > > > Thanks for your reply. The detail message and some question are above. > > > > > > Thanks > > > -Hualong > > > > > > > all of my feedback here relates to large objects, though you've really focused > > on small objects in https://github.com/ceph/ceph/pull/47845. > > for small object workloads, i do agree that the queuing and thread > > synchronization is necessary to take advantage of this batching > > > > it's just hard for me to tell whether that extra complexity is worth it. we've > > tried to minimize any synchronization between rgw's requests so that we're > > able to scale to thousands of concurrent requests/connections. at scale, i'd > > worry that lock contention here would negate some of the gains from QAT > > > > in workloads with a mix of small and large objects, i think we'd make the best > > use of QAT if we applied it to the larger objects (>= 128k?) where we can use > > it most efficiently > > Hi Casey > > About the PR https://github.com/ceph/ceph/pull/47040. I have changed the code to be coroutine, > so it is able to scale without lock contention. And I restrict the use of qat only when object>64K so that > we can use it most efficiently. > > In the QAT part code, I use two async_* interface: one used to get instance, another one used to submit perform request. > And in rgw code, in order to get `yield` parameter in crypto plugin, I add extra parameter in all process function. > > Can you help to review, is the coroutine mode I changed feasible? thanks Hualong, the coroutine changes are nicely done; however i still have concerns about the overall design: 1. these crypto plugins are meant to be common to all ceph components. rgw may be the only user now, but this reliance on a coroutine-based runtime could make the plugin unusable elsewhere the `optional_yield` wrapper (which may or may not contain a real coroutine yield_context) can potentially make this more general, if-and-only-if the plugin has a synchronous implementation as a fallback. currently, the plugin interfaces take an optional_yield, but QccCrypto::perform_op_batch() calls y.get_yield_context() on it unconditionally. even within rgw, there may not be a real yield_context - rgw_beast_enable_async may be false, or the object write maybe be driven synchronously by an admin command like `radosgw-admin obj rewrite` 2. even with coroutines, rgw requests may still have to wait for a qat instance. with a limited number of instances, couldn't this itself become a bottleneck as we scale up the number of concurrent requests? earlier in the thread, we discussed falling back to the cpu implementation if there wasn't a qat instance available. that could avoid the need for waits, synchronous or otherwise, inside of the plugin. this would let us take advantage of hardware acceleration when we can without introducing any new contention. do you see any drawbacks to this approach? i hate to keep sending you back to the drawing board; would it help to discuss this in person? the Performance Weekly call (https://pad.ceph.com/p/performance_weekly) could be a good place for that. if that isn't a good time, we might schedule a separate call or wait until March 1st for the Ceph Developer Monthly (APAC) > > And about the PR https://github.com/ceph/ceph/pull/47845, I do agree your view that the extra complexity isn't > worth it. I will close this PR. > > Thanks > -Hualong

1 year, 2 months

1
0
0 0

clt meeting summary [15/02/2023]

by Nizamudeen A

Hi all, today's topics were: - Labs: - Keeping a catalog - Have a dedicated group to debug/work through the issues. - Looking for interested parties that would like to contribute in the lab maintenance tasks - Poll for meeting time, looking for a central person to follow up / organize - No one's been actively coordinating on the lab issues apart from Laura. David Orman volunteered if we need help coordinating the lab issues - Reef release - [casey] things aren't looking good for end-of-february freeze - Since the whole thing depends on test-infra, can't really estimate the time frame. - The freeze maybe delayed - Dev Summit in Amsterdam: estimate how many would attend in person, remote - 50/50 of those present would attend (as per the voting) - Ad hoc virtual could work - Need to update the component leads page: https://ceph.io/en/community/team/ - Vikhyath volunteered before, so Josh will check with him. Regards, -- Nizamudeen A Software Engineer Red Hat <https://www.redhat.com/> <https://www.redhat.com/>

1 year, 2 months

2
2
0 0

02/16/2023 perf meeting is canceled!

by Mark Nelson

Hi Folks, I won't be able to make it to run the meeting, so let's cancel today. Have a good week folks! Etherpad: https://pad.ceph.com/p/performance_weekly Bluejeans: https://bluejeans.com/908675367 Mark

1 year, 2 months

1
0
0 0

RGW matching guest's stats against bucket owner's when checking quotas

by Paolo De Pasquale

Hi all, I'm seeing the following behavior on a Pacific (16.2.9) cluster and wanted to know if it is expected and - eventually - what's the rationale behind it. - User A owns the bucket X - User B is allowed to write to X and also owns other buckets on his own - Both A and B users have bucket and user quotas set When User B writes in the bucket X, I can see from the logs that: - Bucket quota is checked matching User A stats against User A limits (...so far so - almost - good, being A the owner of the bucket...) - User quota is checked matching User B stats against User A limits Recalling that User B stats are affected by his own buckets, I'm guessing if the last check makes sense. Thank you.* * *Paolo De Pasquale*

1 year, 2 months

3
3
0 0

Re: Announcing go-ceph v0.17.0

by Sven Anderson

We are happy to announce another release of the go-ceph API library. This is a regular release following our every-two-months release cadence. https://github.com/ceph/go-ceph/releases/tag/v0.20.0 Changes include additions to the rbd, rgw and cephfs packages. More details are available at the link above. The library includes bindings that aim to play a similar role to the "pybind" python bindings in the ceph tree but for the Go language. The library also includes additional APIs that can be used to administer cephfs, rbd, and rgw subsystems. There are already a few consumers of this library in the wild, including the ceph-csi project. Sven

1 year, 2 months

1
0
0 0

Process question: How to make progress on PR#48697

by John Mulligan

As part of an effort to build and test ceph in containers I posted a PR in November: https://github.com/ceph/ceph/pull/48697 As noted in the PR there are two approvals but no one seems willing or has the time to shepherd the PR through the merge process. I'm not sure what my next steps on this PR can be. I had planned on having this PR serve as an intermediate step towards being able to build from source and run 'make tests' in containers. This is an effort Enesto started and I've been working on for a while. I'm currently blocked because of the uncertainty around this PR. If anyone has any thoughts or recommendations for me I'd appreciate it. I did ping ceph/core in the PR as well, but I figured I might get some more attention here on the list. Thanks!

1 year, 2 months

4
4
0 0

ninja/cmake stuck at downloading boost during build

by Rishabh Dave

Hi all, I am trying to build Ceph binaries from the main branch. Sometimes "ninja -j 7 " (see the bottom for steps I run to initiate the build) gets stuck at the step where the Boost library is downloaded. The step description printed on stdout is "Performing download step (download, verify and extract) for 'Boost'". Running nethogs shows that cmake is indeed downloading but the download speed is less than 50 Kbps. Last night, I also saw this issue while building binaries for the "quincy" branch. Cancelling and reinitiating the build has no effect on the download speed. Is there a way to choose the faster/fastest mirror to download the Boost library? How can I change the default mirror that cmake uses? Another option is to use the boost library installed on my system. For this I can pass "-DWITH_SYSTEM_BOOST=ON" to the "do_cmake.sh" script. My system has version 1.76 installed. Will this version work fine? Has anyone tried this before? I don't face this issue all the time. It happens once in a couple months but whenever it does, I am stuck for several hours. System I am using is Fedora 36. Following are the commands I usually run to initiate building of Ceph binaries - $ sudo ./install-deps.sh $ ./do_cmake.sh -DWITH_CEPHFS_SHELL=ON -DWITH_BABELTRACE=OFF -DWITH_MANPAGE=OFF -DWITH_RBD=OFF -DWITH_RADOSGW=OFF -DWITH_KRBD=OFF -DWITH_MGR_DASHBOARD_FRONTEND=OFF $ cd build $ ninja -j 7 Thanks, - Rishabh

1 year, 2 months

2
3
0 0

github: rgw team and CODEOWNERS

by Casey Bodley

i've created an rgw team [1] on github for use with CODEOWNERS [2]. i also raised a pull request [3] to add rgw paths to our .github/CODEOWNERS file members of the team will get github notifications to review pull requests that touch any of these paths. this can be spammy, so i've only added the members listed in src/rgw/MAINTAINERS.md. if other devs would like to opt-in to this team, please reach out to me off-list [1] https://github.com/orgs/ceph/teams/rgw [2] https://docs.github.com/en/repositories/managing-your-repositorys-settings-… [3] https://github.com/ceph/ceph/pull/50073

1 year, 2 months

1
0
0 0

2024

2023

2022

2021

2020

2019

Dev February 2023