Hi everyone,
CDM (APAC) is happening tomorrow, March 6th at 9:00pm ET. See more meeting
details below.
Please add any topics you'd like to discuss to the agenda:
https://tracker.ceph.com/projects/ceph/wiki/CDM_06-MAR-2024
Thanks,
Laura Flores
Meeting link:
<https://meet.jit.si/ceph-dev-monthly>
<https://meet.jit.si/ceph-dev-monthly>https://meet.jit.si/ceph-dev-monthly
Time conversions:
UTC: Thursday, March 7, 2:00 UTC
Mountain View, CA, US: Wednesday, March 6, 18:00 PST
Phoenix, AZ, US: Wednesday, March 6, 19:00 MST
Denver, CO, US: Wednesday, March 6, 19:00 MST
Huntsville, AL, US: Wednesday, March 6, 20:00 CST
Raleigh, NC, US: Wednesday, March 6, 21:00 EST
London, England: Thursday, March 7, 2:00 GMT
Paris, France: Thursday, March 7, 3:00 CET
Helsinki, Finland: Thursday, March 7, 4:00 EET
Tel Aviv, Israel: Thursday, March 7, 4:00 IST
Pune, India: Thursday, March 7, 7:30 IST
Brisbane, Australia: Thursday, March 7, 12:00 AEST
Singapore, Asia: Thursday, March 7, 10:00 +08
Auckland, New Zealand: Thursday, March 7, 15:00 NZDT
--
Laura Flores
She/Her/Hers
Software Engineer, Ceph Storage <https://ceph.io>
Chicago, IL
lflores(a)ibm.com | lflores(a)redhat.com <lflores(a)redhat.com>
M: +17087388804
We're happy to announce the 15th, and expected to be the last,
backport release in the Pacific series.
https://ceph.io/en/news/blog/2024/v16-2-15-pacific-released/
Notable Changes
---------------
* `ceph config dump --format <json|xml>` output will display the localized
option names instead of their normalized version. For example,
"mgr/prometheus/x/server_port" will be displayed instead of
"mgr/prometheus/server_port". This matches the output of the non pretty-print
formatted version of the command.
* CephFS: MDS evicts clients who are not advancing their request tids,
which causes
a large buildup of session metadata, resulting in the MDS going
read-only due to
the RADOS operation exceeding the size threshold. The
`mds_session_metadata_threshold`
config controls the maximum size that an (encoded) session metadata can grow.
* RADOS: The `get_pool_is_selfmanaged_snaps_mode` C++ API has been deprecated
due to its susceptibility to false negative results. Its safer replacement is
`pool_is_in_selfmanaged_snaps_mode`.
* RBD: When diffing against the beginning of time (`fromsnapname == NULL`) in
fast-diff mode (`whole_object == true` with `fast-diff` image feature enabled
and valid), diff-iterate is now guaranteed to execute locally if exclusive
lock is available. This brings a dramatic performance improvement for QEMU
live disk synchronization and backup use cases.
Getting Ceph
------------
* Git at git://github.com/ceph/ceph.git
* Tarball at https://download.ceph.com/tarballs/ceph-16.2.15.tar.gz
* Containers at https://quay.io/repository/ceph/ceph
* For packages, see https://docs.ceph.com/en/latest/install/get-packages/
* Release git sha1: 618f440892089921c3e944a991122ddc44e60516
Hello all,
We are using Ceph as the storage backend for some Cloud research which involves offloading functions to storage nodes to benefit from near-storage processing. We are using rados_exec to achieve this by attempting to call a class method on the object which then executes the function locally. However, we have been running into an issue where rados_exec fails with EIO and the request is never reaching the storage node with method never being called.
Upon debugging this, I have noticed that if i re-put the same object with a different key it works (provided it is on a different OSD). It appears that the OSD cannot serve a rados_exec request.
This bug happens under a few conditions
1. If we invoke the function before uploading it
2. Non-deterministically when the OSD is under load.
I cannot seem to debug it for the life of me and only thing I have to go on is the OSDs cannot serve requests. I have attempted to remove the object from the pool and put it back with the same key and it does the exact same thing.
Any advice on where to get started / help debugging this would be greatly appreciated as my thesis depends on it. (any request to OSD.0 and OSD.1 fails)🙁
Donald
good morning,
i am trying to understand how ceph snapshot works.
i have read that snap are cow, which means if i am correct that if a new
write update an exising block on a volume, the "old" block is copied to
snap before overwrite it on original volume, am i right?
so, i creted a volume say 10 GB in size, empty, then created a snap.
so, coming to my doubt, why the snap is 10GB in size? it should be 0,
because no new write update were done, am i right?
thank you VERY much for your time.
Hi folks,
Today we discussed:
- [casey] on dropping ubuntu focal support for squid
- Discussion thread:
https://lists.ceph.io/hyperkitty/list/dev@ceph.io/thread/ONAWOAE7MPMT7CP6KH…
- Quincy doesn't build jammy packages, so quincy->squid upgrade tests
have to run on focal
- proposing to add jammy packages for quincy to enable that upgrade path
(from 17.2.8+)
- https://github.com/ceph/ceph-build/pull/2206
- Need to indicate that Quincy clusters must upgrade to jammy before
upgrading to Squid.
- T release name: https://pad.ceph.com/p/t
- Tentacle wins!
- Patrick to do release kick-off
- Cephalocon news?
- Planning is in progress; no news as knowledgeable parties not present
for this meeting.
- Volunteers for compiling the Contributor Credits?
-
https://tracker.ceph.com/projects/ceph/wiki/Ceph_contributors_list_maintena…
- Laura will give it a try.
- Plan for tagged vs. named Github milestones?
- Continue using priority order for qa testing: exhaust testing on
tagged milestone, then go to "release" catch-all milestone
- v18.2.2 hotfix release next
- Reef HEAD is still cooking with to-be-addressed upgrade issues.
- v19.1.0 (first Squid RC)
- two rgw features still waiting to go into squid
- cephfs quiesce feature to be backported
- Nightlies crontab to be updated by Patrick.
- V19.1.0 milestone: https://github.com/ceph/ceph/milestone/21
--
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
Hello everyone,
As part of nvmeof monitor PR, a new dependency of grpc/C++ was introduced to the ceph codebase.
No issues with centos9, however on Ubuntu we stumbled upon an unexpected issue, declared grrpc as ceph .deb package build deps under debian/control , however Ubuntu grpc packages do not include cmake files, , see https://github.com/grpc/grpc/issues/29977
For instance, make check are failing on "CMake Error at src/CMakeLists.txt:900 (find_package): gRPC" . This issue renders Ubuntu packages as unusable.
Could anyone assist with fixing the Ubuntu grpc packages?
Thank you,
~baum
Redouane and Avan came to me with an issue with RGW related metrics that
warrants a broader community discussion for all daemons. For more
information, the issue is being tracked by
https://tracker.ceph.com/issues/64598
Currently, metrics consumed by Prometheus related to the RGW are being
generated by combining two parts:
1. The RGW perf counters: these counters are generated by the ceph-exporter
by parsing the output of the rgw command `ceph counter dump`.
2. The RGW metadata (daemon, ceph-version, hostname, etc): this information
is generated by the prometheus mgr module.
To combine the two parts ceph-exporter uses a key field called instance_id,
which is generated as follows:
1. On the ceph-exporter side asok admin socket filename is parsed to
extract the daemon_id which is used to derive the instance_id.
2. On the prometheus-mgr module side orchestrator (cephadm or rook) is
called to get the daemon_id then instance_id is derived from the daemon_id
This approach/design suffers from the following issues:
1. It creates a strong dependency between prometheus-mgr module and the
orchestrator module (this has already caused issues for Rook environments,
ceph v18.2.1 metrics are completely broken because of this)
2. instance_id on the ceph-exporter side mgmt is weak as it relies on
socket filename parsing
3. instance_id generation is error-prone as it relies on how daemon_ids are
handled by the orchestrator module (which is difference between rook and
cephadm)
The issue for RGW is that with certain orchestrators, for example in Rook,
there is a mismatch between the instance IDs for the metrics emitted by the
exporter and the metrics from the prometheus manager module.
This has ramifications when running queries in Prometheus when the instance
id is the primary key between the metrics in the queries.
There are many options for solutions, and I'd be happy to hear the
community's thoughts about what they think.
Here are ours (Avan, Redouane, and I):
1. We think daemon specific metrics meant for Prometheus should only be
emitted from one place, and that place should be the newer ceph-exporter.
2. We discussed having a command you can run on an admin socket that would
emit all of the metadata that is currently being sent by the manager
module. This way we're not relying on parsing file names anymore.
3. promtheus-mgr module will still exist and will be used to emit cluster
wise metrics
The command could be something like `ceph who-am-i` that you would expect
to work on any daemons admin socket, or something daemon specific like
`ceph rgw-info`.
In other words, move the metadata source from the mgr-prometheus module to
the ceph-exporter and use this new command `ceph who-am-i` to get it. This
way, each ceph-daemon will be self-sufficient and able to provide the
metadata needed to label/tag its metrics.
At this moment this affects at least two daemons: rgw and rbd-mirror, but
following the approach above and by introducing the new generic command we
can follow the same pattern for other legacy (or new) daemons.
Look forward to hearing other thoughts,
Ali, Redouane, Avan
Hi Ceph Developers,
Ceph is going to be part of Google Summer of Code
<https://summerofcode.withgoogle.com/> (GSoC) this summer.
If you have any ideas for intern projects please add them to the pad below.
Projects are due by *Tuesday March 12th *and projects will be added to
ceph.io as they come in*.*
https://pad.ceph.com/p/project-ideas
I will be reaching out to tech leads and other previous mentors within the
community over the next week.
Best,
Ali
Greetings!
I hope this message finds you well. I am writing to express my admiration for the innovative technology that your organization is working on for GSoC 2024. I have been following your work and find it truly amazing.
Currently, I am committed to another organisation and may not be able to contribute to your projects at this time. However, I noticed that your organisation sometimes faces a shortage of proposals. If such a situation arises, please consider this email as an expression of my interest.
I hold a strong interest in your organisation and would be more than willing to submit a comprehensive proposal should the need arise. I have also been an active contribution at open source projects and have decent programming experience.
Thank you. I look forward to the possibility of future interactions.
Best regards,
Akhilesh.