Dear Team,
My name is Jiachen Xu, I'm a junior Computer Science student at the University of Florida, interested in joining the Ceph project for Google Summer of Code. Although new to Ceph, I have experience with AWS and a strong foundation in data structures and operating systems. Beyond that, I am willing to grow and learn with the team, especially learning knowledge that were outside of classroom.
Excited to learn and contribute to the Future of Storage with Ceph. Looking forward to the opportunity.
Sincerely,
Jiachen Xu
Dear Team,
I have a query regarding STS token refresh. In the case of CEPH, when a token expires and we attempt any operation, we receive an "access denied" error. However, with other S3 providers like Amazon, we receive the correct error code "ExpiredToken." Is this discrepancy a known issue with CEPH , and if so, do we have plans to address it? Having "access denied" as the error code in the event of token expiry seems incorrect.
Given the above scenario, please let us know how token refresh can be handled with CEPH ?
Ajinkya Deshpande
Sr Software Engineer
NetBackup Engineering
Veritas Technologies LLC
Mobile: (7776835515)
Ajinkya.Deshpande(a)veritas.com
Currently, the error type of librbd is int, which contains an error code of system_error category.
Each function uses their own meaning for different error codes,
also a code can be used for different situations,
e.g. PreRemoveRequest returns EBUSY when an image has watchers,
on acquiring a lock, or when the image is being migrated.
Librados uses the same int as an error type,
however neorados uses boost::system::error_code.
Latest version of boost::system::error_code is {std::error_code, std::source_location*}.
And std::error_code is {int, error_category*} (in stdlibc++ and libcxx).
It is possible to use custom error types, internally,
however user-facing API have to use a popular type such as std or boost error_code,
otherwise users would have to convert error types.
As std::error_code is the standard error type, and boost::system::error_code is convertible to it
Various alternative approaches to errors, such as Boost.LEAF,
Boost.Outcome and the corresponding std::error p1028 proposal,
all of them are compatible with std(boost)::error_code.
As std::error_code is a pair of an int and a pointer, it fits into two registers (eax:rdx),
so changing result type from int to std::error_code basically adds one instruction:
std::error_code f() {
return SomeErrEnum::Code1;
}
with properly defined std::is_error_code_enum<SomeErrEnum> trait,
and a SomeErrCategory type with a corresponding error category global value,
compiles to
f():
mov edx, OFFSET some_category_var
mov eax, 1
ret
Full code - https://godbolt.org/z/K7vv5h8oo
Although there are corner cases such as surprisingly ineffective
error_category& cat() { static SomeErrCategory c; return c; }
where "static" requires branching for a guard variable,
it is possible to keep overhead to adding single instruction for the error category.
Back to the librbd.
It is indeed not possible to simply change it's C++ API from returning int to error_code.
Also, we cannot add overloads with different return types, we'd have to also change arguments.
Passing error_code as an output parameter works
int flatten(); // current API
void flatten(std::error_code& ec); // alternative API
but it doesn't look great, ideally we'd rather use
"std::expected<T, error_code> f()" and not
"void f(T& out, error_code& ec)".
We can also add asynchronous API like in neorados,
with CompletionTokens and callbacks that receive error_code.
However, much of librbd code is synchronous (Operations),
so rewriting it to an async code would needlessly complicate things.
Is there a way to make librbd return std::error_code,
so that we could use different error categories, enums, for more distinct error codes?
If you are using CephFS on Pacific v16.2.14(+), the MDS config
`defer_client_eviction_on_laggy_osds' is enabled by default. This
config is used to not evict cephfs clients if OSDs are laggy[1].
However, this can result in a single client holding up the MDS in
servicing other clients. To avoid this, please disable the config by
executing
ceph config set mds defer_client_eviction_on_laggy_osds false
This issue doesn't affect any Quincy or Reef releases. Furthermore,
the config is being disabled by default[2] till the fix is being
worked on.
[1]: https://tracker.ceph.com/issues/58023
[2]: https://tracker.ceph.com/issues/64685
--
Cheers,
Venky
All,
At the CLT today we discussed the proliferation of write/admin access
on the ceph repository. One of the consequences of this has been that
Ceph's code guidelines have not been followed in merges [1].
Additionally, having too many folks -- many of whom have retired from
active development -- with write access to the repository presents
security concerns.
With the CLT's support, I have addressed this by pruning write/admin
access to the Ceph repository to only these Github teams:
- https://github.com/orgs/ceph/teams/ceph-maintainers
- https://github.com/orgs/ceph/teams/ceph-release-team
- https://github.com/orgs/ceph/teams/admins
- https://github.com/orgs/ceph/people?query=role%3Aowner
"ceph-maintainers" is a new team that includes component team leads
and senior Ceph engineers. If you feel you should be in this list and
were missed (sorry!), please reply to this mail.
"ceph-release-team" is a new team that includes folks working on Ceph
releases, right now Yuri.
"admins" is an extant team that includes members who help administrate
the Ceph project.
The members of the Ceph org who are owners have write/admin privileges
regardless of team organization. I've included that for completeness.
Anyone not in these aforementioned teams will (should) be unable to
push to ceph.git [2]. Please coordinate with your component team lead
for merging your changes.
[1] https://github.com/ceph/ceph/blob/main/SubmittingPatches.rst
[2] https://github.com/ceph/ceph/settings/access
--
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
Hello,
Topics discussed:
- a noticeable backlog of "make check" jobs and shaman builds
(>6 hours)
- mostly self-inflicted because folks have been retriggering "make
check" a lot recently in the hopes of working around a number of
transient failures
- even more so if the pool of jenkins workers used for "make check"
and shaman builds is the same
- Laura will confirm in the infra meeting
- Patrick will downgrade github org owners that aren't active to
regular members
- proposal to prune down the list of individuals with write access to
ceph.git repo (Patrick)
- component leads and long-time senior contributors only
- the goal is to enforce our SubmittingPatches.rst rules better
- also some additional security
- concerns over fractured issue tracking
- the most recent example is ceph-nvmeof using github issues, but
there is also nvmeof subproject on tracker.ceph.com
- other notable examples are ceph-csi and go-ceph, although there
hasn't been anything on tracker.ceph.com for these
- conclusion: github issues or any other issue tracking system is
fine as long as there is no tight coupling to ceph.git
- question: does a repo being brought in as a submodule as is the
case with ceph-nvmeof in https://github.com/ceph/ceph/pull/54671
constitute tight coupling?
- in this case the submodule is intended to bring in two .proto
files for use by NVMeofGwMonitorClient daemon, everything else in
ceph-nvmeof repo shouldn't be looked at
- is this the best way to do that -- could these files just be
copied?
- dashboard already carries a copy of one of them
(src/pybind/mgr/dashboard/services/proto/gateway.proto)
- creating a pypy package is another option
- QA nightlies
- now that they are back, need to ensure they are looked at!
- poll among component leads: is status quo where all results go to
ceph-qa list OK or do we need to have teuthology email people/teams
directly?
- let's tally up next week
Topics moved to next week:
- 19.1.0 status
- trello to limit free workspaces to 10 collaborators
- need a replacement for Yuri's board at the very least
Thanks,
Ilya
Hi there.
I am creating a static analysis tool using clang-tidy for the Ceph codebase
to quickly find out potential bugs.
I wanted references to or examples of bad code or code which is frequently
used which when written badly can result in failures.
With this I hope to create the tool to check for bad code.
Any help would be appreciated.
Thank you.
Regards,
Suyash