June 2023 - Dev - lists.ceph.io

weekly rgw meeting canceled

by Casey Bodley

please join us in the Ceph Developer Monthly instead!

10 months, 3 weeks

1
0
0 0

"install-deps.sh" fails on centos9

by Yuval Lifshitz

Hi, Currently the following packages do not exist in epel9: python3-asyncssh and python3-routes the were added in these commits: * https://github.com/ceph/ceph/commit/e5fa448229646a9dc9e3314389374d43243b06d2 * https://github.com/ceph/ceph/commit/2c4b31601f546c8a6b08824d9fcaa3c33f518b74 any upstream fix for the issue?

10 months, 3 weeks

3
2
0 0

RGW versioned bucket index issues

by Cory Snyder

Hi all, I wanted to call attention to some RGW issues that we've observed on a Pacific cluster over the past several weeks. The problems relate to versioned buckets and index entries that can be left behind after transactions complete abnormally. The scenario is multi-faceted and we're still investigating some of the details, but I wanted to provide a big-picture summary of what we've found so far. It looks like most of these issues should be reproducible on versions before and after Pacific as well. I'll enumerate the individual issues below: 1. PUT requests during reshard of versioned bucket fail with 404 and leave behind dark data Tracker: https://tracker.ceph.com/issues/61359 2. When bucket index ops are cancelled it can leave behind zombie index entries This one was merged a few months ago and did make the v16.2.13 release, but in our case we had billions of extra index entries by the time that we had upgraded to the patched version. Tracker: https://tracker.ceph.com/issues/58673 3. Issuing a delete for a key that already has a delete marker as the current version leaves behind index entries and OLH objects Note that the tracker's original description describes the problem a bit differently, but I've clarified the nature of the issue in a comment. Tracker: https://tracker.ceph.com/issues/59663 The extra index entries and OLH objects that are left behind due to these sorts of issues are obviously annoying in regards to the fact that they unnecessarily consume space, but we've found that they can also cause severe performance degradation for bucket listings, lifecycle processing, and other ops indirectly due to higher osd latencies. The reason for the performance impact is that bucket listing calls must repeatedly perform additional OSD ops until they find the requisite number of entries to return. The OSD cls method for bucket listing also does its own internal iteration for the same purpose. Since these entries are invalid, they are skipped. In the case that we observed, where some of our bucket indexes were filled with a sea of contiguous leftover entries, the process of continually iterating over and skipping invalid entries caused enormous read amplification. I believe that the following tracker is describing symptoms that are related to the same issue: https://tracker.ceph.com/issues/59164. Note that this can also cause LC processing to repeatedly fail in cases where there are enough contiguous invalid entries, since the OSD cls code eventually gives up and returns an error that isn't handled. The severity of these issues likely varies greatly based upon client behavior. If anyone has experienced similar problems, we'd love to hear about the nature of how they've manifested for you so that we can be more confident that we've plugged all of the holes. Thanks, Cory Snyder 11:11 Systems

10 months, 3 weeks

2
3
0 0

PSA: developer wiki page for sepia CephFS file systems on the LRC

by Patrick Donnelly

If you are a Ceph developer, you should familiarize yourself with this core infrastructure service [1]. Included are instructions for mounting these CephFS file systems on your development machines. https://wiki.sepia.ceph.com/doku.php?id=services:cephfs -- Patrick Donnelly, Ph.D. He / Him / His Red Hat Partner Engineer IBM, Inc. GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

10 months, 4 weeks

1
0
0 0

Ceph User + Dev Monthly June Meetup

by Neha Ojha

Hi everyone, This month's Ceph User + Dev Monthly meetup is on June 15, 14:00-15:00 UTC. We'd love to share details about the first Reef release candidate. Please feel free to add more topics to https://pad.ceph.com/p/ceph-user-dev-monthly-minutes. Hope to see you there! Thanks, Neha

11 months

1
0
0 0

First Reef release candidate - v18.1.0

by Neha Ojha

Hi everyone, This is the first release candidate for Reef. The Reef release comes with a new RockDB version (7.9.2) [0], which incorporates several performance improvements and features. Our internal testing doesn't show any side effects from the new version, but we are very eager to hear community feedback on it. This is the first release to have the ability to tune RockDB settings per column family [1], which allows for more granular tunings to be applied to different kinds of data stored in RocksDB. A new set of settings has been used in Reef to optimize performance for most kinds of workloads with a slight penalty in some cases, outweighed by large improvements in use cases such as RGW, in terms of compactions and write amplification. We would highly encourage community members to give these a try against their performance benchmarks and use cases. The detailed list of changes in terms of RockDB and BlueStore can be found in https://pad.ceph.com/p/reef-rc-relnotes. If any of our community members would like to help us with performance investigations or regression testing of the Reef release candidate, please feel free to provide feedback via email or in https://pad.ceph.com/p/reef_scale_testing. For more active discussions, please use the #ceph-at-scale slack channel in ceph-storage.slack.com. Overall things are looking pretty good based on our testing. Please try it out and report any issues you encounter. Happy testing! Thanks, Neha Get the release from * Git at git://github.com/ceph/ceph.git * Tarball at https://download.ceph.com/tarballs/ceph-18.1.0.tar.gz * Containers at https://quay.io/repository/ceph/ceph * For packages, see https://docs.ceph.com/en/latest/install/get-packages/ * Release git sha1: c2214eb5df9fa034cc571d81a32a5414d60f0405 [0] https://github.com/ceph/ceph/pull/49006 [1] https://github.com/ceph/ceph/pull/51821

11 months

1
0
0 0

[PSA] Did you know you can mount the teuthology file system locally?

by Patrick Donnelly

The teuthology file system is where results are stored from QA runs. You don't _have_ to login to the teuthology VM to access these test artifacts. In fact, you can mount this file system from your laptop with sepia VPN access or a dev machine (like vossi [1] or senta [2]). For example: pdonnell@vossi04 $ grep teuthology < /etc/fstab 172.21.2.201,172.21.2.202,172.21.6.108:/teuthology-archive /teuthology ceph name=teuthology-ro,secret=<redacted>,mds_namespace=teuthology,_netdev 0 2 This client.teuthology-ro credential can only read the file system. To get the secret, login to vossi04.front.sepia.ceph.com to read the unredacted /etc/fstab or email me directly. You may ask why? Because it's usually much faster to access the file system on another machine. The teuthology VM is often under heavy load and memory pressure so the file system cache is cold. Looking at multi-GB test artifacts also uses up significant memory that's primarily earmarked for running the teuthology workers. [1] https://wiki.sepia.ceph.com/doku.php?id=hardware:vossi [2] https://wiki.sepia.ceph.com/doku.php?id=hardware:senta -- Patrick Donnelly, Ph.D. He / Him / His Red Hat Partner Engineer IBM, Inc. GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

11 months

1
0
0 0

Ceph Leadership Team meeting 2023-06-07

by Ilya Dryomov

Hello, - call home functionality - https://github.com/ceph/ceph/pull/51538 closed as a false start - something like this is definitely useful but there is a concern with making it a manager module since that could fracture cluster data collection interfaces; also push vs pull/scrape model - suggestion to take what is there in the exporter and run a translation layer in a sidecar which could be managed by cephadm as a third-party container to provide tighter integration; that way any new metric is readily available to all users (David) - not everything can be/is suitable to be a metric though, there is a variety of data sources (inventory data, one-off reports, etc) - there is a desire to generate and push "SOS report" blobs when issues are encountered - technical discussion to continue in other forums - reef 18.1.0 RC - https://github.com/ceph/ceph/pull/51900 is in - gibba cluster was upgraded (twice?) last week, no blocker issues - expecting to publish in the next day or two - better tracking for kernel client backports - currently manual, on a best-effort basis, by posting/editing comments; there is no support structure in the ticket - even this is often missed; it's not clear to users which release the original patch landed in, let alone a backport - proposal to add version drop-downs to "Linux kernel client" sub-project and have it generate backport tickets the same way it's done for most other sub-projects under "Ceph" - only upstream LTS kernels are in scope for now - resolving backport tickets can be automated with a new action to watch for respective commits showing up in linux-stable.git repo Thanks, Ilya

11 months

1
0
0 0

reef v18.1.0 QE Validation status

by Yuri Weinstein

Details of this release are summarized here: https://tracker.ceph.com/issues/61515#note-1 Release Notes - TBD Seeking approvals/reviews for: rados - Neha, Radek, Travis, Ernesto, Adam King (we still have to merge https://github.com/ceph/ceph/pull/51788 for the core) rgw - Casey fs - Venky orch - Adam King rbd - Ilya krbd - Ilya upgrade/octopus-x - deprecated upgrade/pacific-x - known issues, Ilya, Laura? upgrade/reef-p2p - N/A clients upgrades - not run yet powercycle - Brad ceph-volume - in progress Please reply to this email with approval and/or trackers of known issues/PRs to address them. gibba upgrade was done and will need to be done again this week. LRC upgrade TBD TIA

11 months

6
8
0 0

Do we have CDM this week?

by Kevin Zhao

Hi Cephers, Do we have Ceph Developer Monthly this week? If so, we'd like to share some progress on Ceph openEuler support. -- *Best Regards* *Kevin Zhao* Tech Lead, LDCG Cloud Infra & Storage Linaro Vertical Technologies IRC(freenode): kevinz Slack(kubernetes.slack.com): kevinz kevin.zhao(a)linaro.org | Mobile/Direct/Wechat: +86 18818270915

11 months, 1 week

2
4
0 0

2024

2023

2022

2021

2020

2019

Dev June 2023