October 2023 - Dev - lists.ceph.io

by Casey Bodley

hey Gal and Eric, in today's standup, we discussed the version of our apache arrow submodule. it's currently pinned at 6.0.1, which was tagged in nov. 2021. the centos9 builds are using the system package libarrow-devel-9.0.0. arrow's upstream recently tagged an 11.0.0 release as far as i know, there still aren't any system packages for ubuntu, so we're likely to be stuck with the submodule for quite a while. how do guys want to handle these updates? is it worth trying to update before the reef release?

2 months, 2 weeks

4
7
0 0

ceph_assert() vs ceph_assert_always()

by Casey Bodley

Matt recently raised the issue of ceph assertions in production code, which reminded me of Sage's 2016 pr https://github.com/ceph/ceph/pull/9969 that added a ceph_assert_always(). the idea was to eventually make ceph_assert() conditional on NDEBUG to match the behavior of libc's assert(), leaving ceph_assert_always() as the marked unconditional case i would love to see this finally happen, but there are some potential risks: * ceph_assert()s with side effects won't behave as expected in release builds. assert() documents this same issue at https://www.man7.org/linux/man-pages/man3/assert.3.html#BUGS. if we could at least identify these cases, we can switch them to ceph_assert_always() * in teuthology, we test the same release builds that we ship to users. that means teuthology won't catch the code paths that trigger debug assertions. if those lead to crashes, they could be much harder to debug without the assertions and backtraces * conversely, merging pull requests after a successful teuthology run may introduce new assertions in debug builds. it could be annoying for developers to track down and fix new assertions after pulling the latest main or stable release branch * unused variable warnings in release builds where ceph_assert() was the only reference. at least the compiler will catch all of these for us, and [[maybe_unused]] annotations can clear them up in general, do folks agree that this is a change worth making? if so, what can we do to mitigate the risks? if not, how should we handle the use of ceph_assert() vs raw assert() in new code? should there be some guidance in CodingStyle? as a half-measure, we might introduce a new ceph_assert_debug() as an alternative to raw assert(), then convert some existing uses of ceph_assert() on a case-by-case basis

5 months, 1 week

3
3
0 0

Debian packages for Reef - any chance of reviews / builds?

by Matthew Vernon

Hi, When Reef was released, the announcement said that Debian packages would be built once the blocking bug in Bookworm was fixed. As I noted on the tracker item https://tracker.ceph.com/issues/61845 a couple of weeks ago, that is now the case after the most recent Bookworm point release. I also opened a PR to make the minimal change that would build Reef packages on Bookworm[0]. I subsequently opened another PR to fix some low-hanging fruit in terms of packaging errors - missing #! in maintscripts, syntax errors in debian/control, erroneous dependencies on Essential packages[1]. Neither PR has had any feedback/review as far as I can see. Those packages (and the previous state of the debian/ tree) had some significant problems - no copyright file, and some of them contain python scripts without declaring a python dependency, so I've today submitted a slightly larger PR that brings the dh compatibility level up to what I think the latest lowest-common-denominator level is, as well as fixing these errors[2]. I believe these changes all ought to go into the reef branch, but obviously you might prefer to just make the bare-minimum-to-build change in the first PR. Is there any chance of having some reef packages for Bookworm, please? Relatedly, is there interest in further packaging fixes for future branches? lintian still has quite a lot to say about the .debs for Ceph, and while you might reasonably not want to care about crossing every t of Debian policy, I think there are still changes that would be worth doing... I should declare a bit of an interest here - I'd like to evaluate cephadm for work use, which would require us to be able to build our own packages per local policy[3], which in turn would mean I'd want to get Debian-based images going again. But that requires Reef .debs being available to install onto said images :) Thanks, Matthew [0] https://github.com/ceph/ceph/pull/53342 [1] https://github.com/ceph/ceph/pull/53397 [2] https://github.com/ceph/ceph/pull/53546 [3] https://wikitech.wikimedia.org/wiki/Kubernetes/Images#Production_images

5 months, 3 weeks

11
18
0 0

Ceph Developer Monthly happening tomorrow, November 1st

by Laura Flores

Hi everyone, CDM is happening tomorrow, November 1st at 9:00 PM EST. See more meeting details below. Please add any topics you'd like to discuss to the agenda: https://tracker.ceph.com/projects/ceph/wiki/CDM_01-NOV-2023 Thanks, Laura Flores Meeting link: <https://meet.jit.si/ceph-dev-monthly> <https://meet.jit.si/ceph-dev-monthly>https://meet.jit.si/ceph-dev-monthly Time conversions: UTC: Thursday, November 2, 1:00 UTC Mountain View, CA, US: Wednesday, November 1, 18:00 PDT Phoenix, AZ, US: Wednesday, November 1, 18:00 MST Denver, CO, US: Wednesday, November 1, 19:00 MDT Huntsville, AL, US: Wednesday, November 1, 20:00 CDT Raleigh, NC, US: Wednesday, November 1, 21:00 EDT London, England: Thursday, November 2, 1:00 GMT Paris, France: Thursday, November 2, 2:00 CET Helsinki, Finland: Thursday, November 2, 3:00 EET Tel Aviv, Israel: Thursday, November 2, 3:00 IST Pune, India: Thursday, November 2, 6:30 IST Brisbane, Australia: Thursday, November 2, 11:00 AEST Singapore, Asia: Thursday, November 2, 9:00 +08 Auckland, New Zealand: Thursday, November 2, 14:00 NZDT -- Laura Flores She/Her/Hers Software Engineer, Ceph Storage <https://ceph.io> Chicago, IL lflores(a)ibm.com | lflores(a)redhat.com <lflores(a)redhat.com> M: +17087388804

5 months, 3 weeks

1
1
0 0

v17.2.7 Quincy released

by Yuri Weinstein

We're happy to announce the 7th backport release in the Quincy series. https://ceph.io/en/news/blog/2023/v17-2-7-quincy-released/ Notable Changes --------------- * `ceph mgr dump` command now displays the name of the Manager module that registered a RADOS client in the `name` field added to elements of the `active_clients` array. Previously, only the address of a module's RADOS client was shown in the `active_clients` array. * mClock Scheduler: The mClock scheduler (default scheduler in Quincy) has undergone significant usability and design improvements to address the slow backfill issue. Some important changes are: * The 'balanced' profile is set as the default mClock profile because it represents a compromise between prioritizing client IO or recovery IO. Users can then choose either the 'high_client_ops' profile to prioritize client IO or the 'high_recovery_ops' profile to prioritize recovery IO. * QoS parameters including reservation and limit are now specified in terms of a fraction (range: 0.0 to 1.0) of the OSD's IOPS capacity. * The cost parameters (osd_mclock_cost_per_io_usec_* and osd_mclock_cost_per_byte_usec_*) have been removed. The cost of an operation is now determined using the random IOPS and maximum sequential bandwidth capability of the OSD's underlying device. * Degraded object recovery is given higher priority when compared to misplaced object recovery because degraded objects present a data safety issue not present with objects that are merely misplaced. Therefore, backfilling operations with the 'balanced' and 'high_client_ops' mClock profiles may progress slower than what was seen with the 'WeightedPriorityQueue' (WPQ) scheduler. * The QoS allocations in all mClock profiles are optimized based on the above fixes and enhancements. * For more detailed information see: https://docs.ceph.com/en/quincy/rados/configuration/mclock-config-ref/ * RGW: S3 multipart uploads using Server-Side Encryption now replicate correctly in multi-site. Previously, the replicas of such objects were corrupted on decryption. A new tool, ``radosgw-admin bucket resync encrypted multipart``, can be used to identify these original multipart uploads. The ``LastModified`` timestamp of any identified object is incremented by 1 nanosecond to cause peer zones to replicate it again. For multi-site deployments that make any use of Server-Side Encryption, we recommended running this command against every bucket in every zone after all zones have upgraded. * CephFS: MDS evicts clients which are not advancing their request tids which causes a large buildup of session metadata resulting in the MDS going read-only due to the RADOS operation exceeding the size threshold. `mds_session_metadata_threshold` config controls the maximum size that a (encoded) session metadata can grow. * CephFS: After recovering a Ceph File System post following the disaster recovery procedure, the recovered files under `lost+found` directory can now be deleted. Getting Ceph ------------ * Git at git://github.com/ceph/ceph.git * Tarball at https://download.ceph.com/tarballs/ceph-17.2.7.tar.gz * Containers at https://quay.io/repository/ceph/ceph * For packages, see https://docs.ceph.com/en/latest/install/get-packages/ * Release git sha1: b12291d110049b2f35e32e0de30d70e9a4c060d2

5 months, 4 weeks

1
0
0 0

Tracing configurations for ceph

by Александр Гуркин

Hi all. I've been experimenting with tracing configurations for ceph from the docs and it seems like it doesn't work as described. There is an option using jaeger, described in the documentation - https://docs.ceph.com/en/latest/jaegertracing/#jaeger-distributed-tracing/. Unfortunately, at this time there are only a few spans left inside the traces, and there is no end-to-end tracing between components. This is not enough to work. There is also an option using LTTng and zipkin for visualization, described in the documentation - https://docs.ceph.com/en/latest/dev/blkin/#tracing-ceph-with-lttng. When compilation flags are added, the system stops functioning. After adding -DWITH_LTTNG=ON - a crash occurs while the rados bench is running. After adding the -DWITH_BLKIN=ON flag, the cluster cannot create a pool. When you add the -DWITH_EVENTTRACE=ON flag, the application does not build at all. Are there any plans to restore LTTng functionality? Are there any plans to improve Jaeger tracing? Is there any recommended way to use tracing in ceph today? Thanks in advance. Aleksandr Gurkin

5 months, 4 weeks

2
1
0 0

Join us for the User + Dev Meeting, happening tomorrow!

by Laura Flores

Hi Ceph users and developers, You are invited to join us at the User + Dev meeting tomorrow at 10:00 AM EST! See below for more meeting details. We have two guest speakers joining us tomorrow: 1. "CRUSH Changes at Scale" by Joshua Baergen, Digital Ocean In this talk, Joshua Baergen will discuss the problems that operators encounter with CRUSH changes at scale and how DigitalOcean built pg-remapper to control and speed up CRUSH-induced backfill. 2. "CephFS Management with Ceph Dashboard" by Pedro Gonzalez Gomez, IBM This talk will demonstrate new Dashboard behavior regarding CephFS management. The last part of the meeting will be dedicated to open discussion. Feel free to add questions for the speakers or additional topics under the "Open Discussion" section on the agenda: https://pad.ceph.com/p/ceph-user-dev-monthly-minutes If you have an idea for a focus topic you'd like to present at a future meeting, you are welcome to submit it to this Google Form: https://docs.google.com/forms/d/e/1FAIpQLSdboBhxVoBZoaHm8xSmeBoemuXoV_rmh4v… Any Ceph user or developer is eligible to submit! Thanks, Laura Flores Meeting link: https://meet.jit.si/ceph-user-dev-monthly Time conversions: UTC: Thursday, October 19, 14:00 UTC Mountain View, CA, US: Thursday, October 19, 7:00 PDT Phoenix, AZ, US: Thursday, October 19, 7:00 MST Denver, CO, US: Thursday, October 19, 8:00 MDT Huntsville, AL, US: Thursday, October 19, 9:00 CDT Raleigh, NC, US: Thursday, October 19, 10:00 EDT London, England: Thursday, October 19, 15:00 BST Paris, France: Thursday, October 19, 16:00 CEST Helsinki, Finland: Thursday, October 19, 17:00 EEST Tel Aviv, Israel: Thursday, October 19, 17:00 IDT Pune, India: Thursday, October 19, 19:30 IST Brisbane, Australia: Friday, October 20, 0:00 AEST Singapore, Asia: Thursday, October 19, 22:00 +08 Auckland, New Zealand: Friday, October 20, 3:00 NZDT -- Laura Flores She/Her/Hers Software Engineer, Ceph Storage <https://ceph.io> Chicago, IL lflores(a)ibm.com | lflores(a)redhat.com <lflores(a)redhat.com> M: +17087388804

6 months

1
1
0 0

quincy v17.2.7 QE Validation status

by Yuri Weinstein

Details of this release are summarized here: https://tracker.ceph.com/issues/63219#note-2 Release Notes - TBD Issue https://tracker.ceph.com/issues/63192 appears to be failing several runs. Should it be fixed for this release? Seeking approvals/reviews for: smoke - Laura rados - Laura, Radek, Travis, Ernesto, Adam King rgw - Casey fs - Venky orch - Adam King rbd - Ilya krbd - Ilya upgrade/quincy-p2p - Known issue IIRC, Casey pls confirm/approve client-upgrade-quincy-reef - Laura powercycle - Brad pls confirm ceph-volume - Guillaume pls take a look Please reply to this email with approval and/or trackers of known issues/PRs to address them. Josh, Neha - gibba and LRC upgrades -- N/A for quincy now after reef release. Thx YuriW

6 months

10
25
0 0

10/26/2023 perf meeting is on: Samsung to present on xNVMe

by Mark Nelson

Hi Folks, Today we will have Simon A. F. Lund presenting on his work on xNVMe at Samsung! For a brief overview of the project, please see: https://xnvme.io/ Hope to see you there! Etherpad: https://pad.ceph.com/p/performance_weekly Meeting URL: https://meet.jit.si/ceph-performance Mark -- Best Regards, Mark Nelson Head of R&D (USA) Clyso GmbH p: +49 89 21552391 12 a: Loristraße 8 | 80335 München | Germany w: https://clyso.com | e: mark.nelson(a)clyso.com We are hiring: https://www.clyso.com/jobs/

6 months

1
0
0 0

Ceph Leadership Team notes 10/25

by Dan van der Ster

Hi all, Here are this week's notes from the CLT: * Collective review of the Reef/Squid "State of Cephalopod" slides. * Smoke test suite was unscheduled but it's back on now. * Releases: * 17.2.7: about to start building last week, delayed by a few issues (https://tracker.ceph.com/issues/63257, https://tracker.ceph.com/issues/63305, https://github.com/ceph/ceph/pull/54169). ceph_exporter test coverage will be prioritized. * 18.2.1: all PRs in testing or merged. * Ceph Board approved a new Foundation member tiers model, Silver, Gold, Platinum, Diamond. Working on implementation with LF. -- dan

6 months

1
0
0 0

2024

2023

2022

2021

2020

2019

Dev October 2023