March 2024 - Dev - lists.ceph.io

Interest in Participating in Ceph Project for Google Summer of Code

by Xu, Jiachen

Dear Team, My name is Jiachen Xu, I'm a junior Computer Science student at the University of Florida, interested in joining the Ceph project for Google Summer of Code. Although new to Ceph, I have experience with AWS and a strong foundation in data structures and operating systems. Beyond that, I am willing to grow and learn with the team, especially learning knowledge that were outside of classroom. Excited to learn and contribute to the Future of Storage with Ceph. Looking forward to the opportunity. Sincerely, Jiachen Xu

1 month, 1 week

2
1
0 0

How can I set osd fast shutdown = true

by Suyash Dongre

Hello, I want to set osd fast shutdown = true, how should I achieve this? Regards, Suyash

1 month, 2 weeks

1
0
0 0

Query regarding sts token refresh error code on expiry

by Ajinkya Deshpande

Dear Team, I have a query regarding STS token refresh. In the case of CEPH, when a token expires and we attempt any operation, we receive an "access denied" error. However, with other S3 providers like Amazon, we receive the correct error code "ExpiredToken." Is this discrepancy a known issue with CEPH , and if so, do we have plans to address it? Having "access denied" as the error code in the event of token expiry seems incorrect. Given the above scenario, please let us know how token refresh can be handled with CEPH ? Ajinkya Deshpande Sr Software Engineer NetBackup Engineering Veritas Technologies LLC Mobile: (7776835515) Ajinkya.Deshpande(a)veritas.com

1 month, 2 weeks

2
1
0 0

librbd error type

by Anatoly Finch

Currently, the error type of librbd is int, which contains an error code of system_error category. Each function uses their own meaning for different error codes, also a code can be used for different situations, e.g. PreRemoveRequest returns EBUSY when an image has watchers, on acquiring a lock, or when the image is being migrated. Librados uses the same int as an error type, however neorados uses boost::system::error_code. Latest version of boost::system::error_code is {std::error_code, std::source_location*}. And std::error_code is {int, error_category*} (in stdlibc++ and libcxx). It is possible to use custom error types, internally, however user-facing API have to use a popular type such as std or boost error_code, otherwise users would have to convert error types. As std::error_code is the standard error type, and boost::system::error_code is convertible to it Various alternative approaches to errors, such as Boost.LEAF, Boost.Outcome and the corresponding std::error p1028 proposal, all of them are compatible with std(boost)::error_code. As std::error_code is a pair of an int and a pointer, it fits into two registers (eax:rdx), so changing result type from int to std::error_code basically adds one instruction: std::error_code f() { return SomeErrEnum::Code1; } with properly defined std::is_error_code_enum<SomeErrEnum> trait, and a SomeErrCategory type with a corresponding error category global value, compiles to f(): mov edx, OFFSET some_category_var mov eax, 1 ret Full code - https://godbolt.org/z/K7vv5h8oo Although there are corner cases such as surprisingly ineffective error_category& cat() { static SomeErrCategory c; return c; } where "static" requires branching for a guard variable, it is possible to keep overhead to adding single instruction for the error category. Back to the librbd. It is indeed not possible to simply change it's C++ API from returning int to error_code. Also, we cannot add overloads with different return types, we'd have to also change arguments. Passing error_code as an output parameter works int flatten(); // current API void flatten(std::error_code& ec); // alternative API but it doesn't look great, ideally we'd rather use "std::expected<T, error_code> f()" and not "void f(T& out, error_code& ec)". We can also add asynchronous API like in neorados, with CompletionTokens and callbacks that receive error_code. However, much of librbd code is synchronous (Operations), so rewriting it to an async code would needlessly complicate things. Is there a way to make librbd return std::error_code, so that we could use different error categories, enums, for more distinct error codes?

1 month, 2 weeks

1
0
0 0

Ceph Leadership Team meeting 2024-03-18

by Nizamudeen A

hi, below are the topics discussed: - Re-evaluate 19.1.0 * cephfs waiting for 2 feature PR, to be finished by this week * rgw: https://lists.ceph.io/hyperkitty/list/dev@ceph.io/thread/CT5MJ44ELP5OYLUVXV… * blockers: centos9 containers: https://github.com/ceph/ceph-container/pull/2183 ceph-object-corpus: https://github.com/ceph/ceph-object-corpus/pull/17 (testing in https://github.com/ceph/ceph/pull/54735) jammy packages for quincy to support upgrades: https://github.com/ceph/ceph-build/pull/2206 * target date is March 27. Will be re-evaluating again on next monday (3/25) - Replacement for Trello * Patrick proposed redmine agile board: https://tracker.ceph.com/projects/ceph-qa/agile/board It basically works the same. There are new issue statuses only used by the "QA Run" tracker (like "Bug" or "Backport") which correspond to what's used in Trello. * Yuri is going to give redmine a try: https://tracker.ceph.com/projects/ceph-qa/issues?query_id=309 - Redmine upgrade * We are on 3.3.1 (Oct 2016): https://www.redmine.org/versions/122 * Current stable is: redmine-5.1.2 * talk with Dan/Adam to see the possibilities for upgrading! - Patrick will follow-up * https://www.redmine.org/projects/redmine/wiki/RedmineUpgrade - Retrospective on github team permissions * write access is needed for almost everything * need to rely on branch protection hooks instead of permissions * re-iterate on the criteria for merging PRs (more discussion needed, will be followed up in the coming week CLTs) The full meeting minutes can be found here: https://pad.ceph.com/p/clt-weekly-minutes Regards -- Nizamudeen A Software Engineer Red Hat <https://www.redhat.com/> <https://www.redhat.com/>

1 month, 3 weeks

1
0
0 0

PSA: CephFS/MDS config defer_client_eviction_on_laggy_osds

by Venky Shankar

If you are using CephFS on Pacific v16.2.14(+), the MDS config `defer_client_eviction_on_laggy_osds' is enabled by default. This config is used to not evict cephfs clients if OSDs are laggy[1]. However, this can result in a single client holding up the MDS in servicing other clients. To avoid this, please disable the config by executing ceph config set mds defer_client_eviction_on_laggy_osds false This issue doesn't affect any Quincy or Reef releases. Furthermore, the config is being disabled by default[2] till the fix is being worked on. [1]: https://tracker.ceph.com/issues/58023 [2]: https://tracker.ceph.com/issues/64685 -- Cheers, Venky

1 month, 3 weeks

1
0
0 0

[PSA] push/admin access to ceph/ceph.io

by Patrick Donnelly

All, At the CLT today we discussed the proliferation of write/admin access on the ceph repository. One of the consequences of this has been that Ceph's code guidelines have not been followed in merges [1]. Additionally, having too many folks -- many of whom have retired from active development -- with write access to the repository presents security concerns. With the CLT's support, I have addressed this by pruning write/admin access to the Ceph repository to only these Github teams: - https://github.com/orgs/ceph/teams/ceph-maintainers - https://github.com/orgs/ceph/teams/ceph-release-team - https://github.com/orgs/ceph/teams/admins - https://github.com/orgs/ceph/people?query=role%3Aowner "ceph-maintainers" is a new team that includes component team leads and senior Ceph engineers. If you feel you should be in this list and were missed (sorry!), please reply to this mail. "ceph-release-team" is a new team that includes folks working on Ceph releases, right now Yuri. "admins" is an extant team that includes members who help administrate the Ceph project. The members of the Ceph org who are owners have write/admin privileges regardless of team organization. I've included that for completeness. Anyone not in these aforementioned teams will (should) be unable to push to ceph.git [2]. Please coordinate with your component team lead for merging your changes. [1] https://github.com/ceph/ceph/blob/main/SubmittingPatches.rst [2] https://github.com/ceph/ceph/settings/access -- Patrick Donnelly, Ph.D. He / Him / His Red Hat Partner Engineer IBM, Inc. GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

1 month, 3 weeks

2
2
0 0

Ceph Leadership Team meeting 2024-03-13

by Ilya Dryomov

Hello, Topics discussed: - a noticeable backlog of "make check" jobs and shaman builds (>6 hours) - mostly self-inflicted because folks have been retriggering "make check" a lot recently in the hopes of working around a number of transient failures - even more so if the pool of jenkins workers used for "make check" and shaman builds is the same - Laura will confirm in the infra meeting - Patrick will downgrade github org owners that aren't active to regular members - proposal to prune down the list of individuals with write access to ceph.git repo (Patrick) - component leads and long-time senior contributors only - the goal is to enforce our SubmittingPatches.rst rules better - also some additional security - concerns over fractured issue tracking - the most recent example is ceph-nvmeof using github issues, but there is also nvmeof subproject on tracker.ceph.com - other notable examples are ceph-csi and go-ceph, although there hasn't been anything on tracker.ceph.com for these - conclusion: github issues or any other issue tracking system is fine as long as there is no tight coupling to ceph.git - question: does a repo being brought in as a submodule as is the case with ceph-nvmeof in https://github.com/ceph/ceph/pull/54671 constitute tight coupling? - in this case the submodule is intended to bring in two .proto files for use by NVMeofGwMonitorClient daemon, everything else in ceph-nvmeof repo shouldn't be looked at - is this the best way to do that -- could these files just be copied? - dashboard already carries a copy of one of them (src/pybind/mgr/dashboard/services/proto/gateway.proto) - creating a pypy package is another option - QA nightlies - now that they are back, need to ensure they are looked at! - poll among component leads: is status quo where all results go to ceph-qa list OK or do we need to have teuthology email people/teams directly? - let's tally up next week Topics moved to next week: - 19.1.0 status - trello to limit free workspaces to 10 collaborators - need a replacement for Yuri's board at the very least Thanks, Ilya

1 month, 3 weeks

1
0
0 0

References to common bad code / practices

by Suyash Dongre

Hi there. I am creating a static analysis tool using clang-tidy for the Ceph codebase to quickly find out potential bugs. I wanted references to or examples of bad code or code which is frequently used which when written badly can result in failures. With this I hope to create the tool to check for bad code. Any help would be appreciated. Thank you. Regards, Suyash

1 month, 4 weeks

1
0
0 0

v18.2.2 Reef (hot-fix) released

by Yuri Weinstein

We're happy to announce the 2nd hotfix release in the Reef series. We recommend users to update to this release. For detailed release notes with links & changelog please refer to the official blog entry at https://ceph.io/en/news/blog/2024/v18-2-2-reef-released/ Notable Changes --------------- * mgr/Prometheus: refine the orchestrator availability check to prevent against crashes in the prometheus module during startup. Introduce additional checks to handle daemon_ids generated within the Rook environment, thus preventing potential issues during RGW metrics metadata generation. Related tracker: https://tracker.ceph.com/issues/64721 Getting Ceph ------------ * Git at git://github.com/ceph/ceph.git * Tarball at https://download.ceph.com/tarballs/ceph_18.2.2.orig.tar.gz * Containers at https://quay.io/repository/ceph/ceph * For packages, see https://docs.ceph.com/en/latest/install/get-packages/ * Release git sha1: 531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2

1 month, 4 weeks

1
0
0 0

2024

2023

2022

2021

2020

2019

Dev March 2024