Hi Folks,
We're going to be relaunching the user+dev meeting tomorrow and I'm
double booked during the perf meeting time slot, so let's cancel this
week and we'll reconvene next week. Have a good week folks!
Mark
--
Best Regards,
Mark Nelson
Head of R&D (USA)
Clyso GmbH
p: +49 89 21552391 12
a: Loristraße 8 | 80335 München | Germany
w: https://clyso.com | e: mark.nelson(a)clyso.com
We are hiring: https://www.clyso.com/jobs/
Hello,
A packed agenda today:
- User + Dev meeting relaunch happening tomorrow!
- https://ceph.io/en/news/blog/2023/user-dev-meeting-relaunch/
- https://pad.ceph.com/p/ceph-user-dev-monthly-minutes
- dropping CentOS/RHEL 8 and Ubuntu 20.04 for Squid (Casey)
- https://github.com/ceph/ceph/pull/53517
- CentOS 8 -> CentOS 9, Ubuntu 20.04 -> Ubuntu 22.04
- RHEL 8 facet would be dropped without replacement, CephFS team
to take on adding RHEL 9 for testing if need arises
- Debian packages for Reef
- lacking review on https://github.com/ceph/ceph/pull/53342
- it's a bare minimum change but can't "just" merge because it
may affect Ubuntu packages
- looks like further improvements are being consolidated in
https://github.com/ceph/ceph/pull/53546
- Sepia user cleanup/expiration policy (Patrick)
- currently ad-hoc, needs to be formally defined
- https://tracker.ceph.com/issues/62909
- using an Ansible playbook for this isn't ideal, LDAP anyone?
- https://tracker.ceph.com/issues/62908
- can rely on tickets under Infrastructure project more, they should
be scrubbed scrubbed regularly now!
- backport release cadence
- very long gaps (six months from 17.2.5 to 17.2.6, five months
and counting from 17.2.6), folks end up using private builds in
some cases
- let's start by defining a 3-4 month cadence: it's still pretty
long but should be realistic, the key is to become predictable
- what resources are needed to reduce that time to ideally 6-8 weeks
- TBD, need Yuri in the room
- Ceph Quarterly to be published on Oct 2 (Zac)
Thanks,
Ilya
Hello, Ceph families.
I know that Ceph stores metadata in Rocksdb.
So, I studied Rocksdb, and what I learned is as follows.
(https://github.com/facebook/rocksdb/wiki/Administration-and-Data-Access-Tool, https://docs.ceph.com/en/latest/man/8/ceph-bluestore-tool/, etc)
1. Check the Rocksdb sst file using the ldb tool.
2. Allocates prefixes to the Bluetooth metadata and saves them for each Rocksdb column family.
3. Using the "ceph-kvstore-tool rocksdb [db path] list" command , we can check that the object name exists in the sst file corresponding to prefix O (thought to be the prefix of the object).
I leave some questions because I want to know more information about Ceph and Rocksdb. Here's my question.
1. I saw the following in the code, but there are many parts that I do not understand.
// kv store prefixes
const string PREFIX_SUPER = "S"; // field -> value
What is the exact meaning of each prefix of Bluestore metadata? (e.g., O=object name)
2. The Meaning and Interpretation Method of Rocksdb sst File Contents
In particular, I wonder how to read the key-value of Data blcok.
3. It may be included in the above questions, but is it possible to determine the physical location of the object through rocksdb?
First of all, the question that comes to mind is as above.
Have a good day, and thank you.
At the infrastructure meeting today, we decided on a course of action
for migrating the existing /home directory to CephFS. This is being
done for a few reasons:
- Alleviate load on the root file system device (which is also hosted
on the LRC via iscsi)
- Avoid disk space full scenarios we've regularly hit
- Is more recoverable in the event of teuthology corruption/catastrophe
- Is generally much faster.
- Use as a home file system on other sepia resources (maybe)
To effect this:
- The new "home" CephFS file system is mounted at /cephfs/home
- User's home /home/$USER has been or will be (again) rsync'd to
/cephfs/home/$USER
- User's account "home" (/etc/passwd) is being updated to /cephfs/home/$USER
- User's old home /home/$USER will be archived to /home/.archive/$USER
- A symlink will be placed in /home/$USER pointing to
/cephfs/home/$USER for compatibility with existing
(mis-)configurations.
The main reason for not simply updating /home is to allow
administrators continued access to teuthology in the event of a
Ceph(FS) outage.
Most home directories have already been rsync'd as of 2 weeks ago. A
final rsync will be performed prior to each user's terminal migration.
In order to update a user's home directory, the user must be logged
out. Generally no action need be taken but I may kindly ask you to log
out of teuthology if necessary.
Thanks to Laura Flores, Venky Shankar, Yuri Weinstein, and Leonid Usov
for volunteering as guinea pigs for my early testing. They have
already been migrated. The rest of the users will be migrated in a few
days time incrementally.
--
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
Hi.
Finally, we are nearing the completion of the GPFS and BeeGFS tests.
Unfortunately, we encountered several issues along the way, such as
significant disk failures, etc. We estimate that in two to three weeks,
we could start the Ceph installation. Are you still interested in trying
out some configurations and running tests, although they might be
different now due to the time delay?
Thanks,
Michal
On 7/31/23 21:10, Michal Strnad wrote:
> Hi!
>
> When we finish the GPFS and BeeFS tests and move on to Ceph, as
> mentioned in my previous email, we can proceed with RockDB testing,
> including its behavior under real workload conditions. Our users
> primarily utilize S3 and then RBD. Regarding S3, typical tools used
> include s3cmd, s5cmd, aws-cli, Veeam, Restic, Bacula, etc. For RBD
> images, the common scenario involves attaching the block device to their
> own server, encrypting it, creating a file system on top, and writing
> data according to their preferences. I just wanted to quickly describe
> to you the type of workload you can expect ...
>
> We thought of providing you with easy access to information about all
> clusters, including the one we are currently discussing, in the form of
> telemetry. I can enable the perf module to give you performance metrics
> and then the ident module, where I can set a defined email as an
> identifier. Do you agree?
>
> Thanks,
> Michal
>
>
>>> After an agreement, it will be possible to arrange some form of
>>> access
>>> to the machines, for example, by meeting via video conference and
>>> fine-tuning them together. Alternatively, we can also work on it
>>> through
>>> email, IRC, Slack, or any other suitable means.
>>>
>>>
>>> We are coordinating community efforts around such testing in
>>> #ceph-at-scale slack channel in ceph-storage.slack.com
>>> <http://ceph-storage.slack.com>. I sent you an invite.
>>>
>>> Thanks,
>>> Neha
>>>
>>>
>>> Kind regards,
>>> Michal Strnad
>>>
>>>
>>> On 6/13/23 22:27, Neha Ojha wrote:
>>> > Hi everyone,
>>> >
>>> > This is the first release candidate for Reef.
>>> >
>>> > The Reef release comes with a new RockDB version (7.9.2) [0],
>>> which
>>> > incorporates several performance improvements and features. Our
>>> internal
>>> > testing doesn't show any side effects from the new version, but
>>> we are very
>>> > eager to hear community feedback on it. This is the first
>>> release to have
>>> > the ability to tune RockDB settings per column family [1], which
>>> allows for
>>> > more granular tunings to be applied to different kinds of data
>>> stored in
>>> > RocksDB. A new set of settings has been used in Reef to optimize
>>> > performance for most kinds of workloads with a slight penalty in
>>> some
>>> > cases, outweighed by large improvements in use cases such as
>>> RGW, in terms
>>> > of compactions and write amplification. We would highly
>>> encourage community
>>> > members to give these a try against their performance benchmarks
>>> and use
>>> > cases. The detailed list of changes in terms of RockDB and
>>> BlueStore can be
>>> > found in https://pad.ceph.com/p/reef-rc-relnotes.
>>> >
>>> > If any of our community members would like to help us with
>>> performance
>>> > investigations or regression testing of the Reef release
>>> candidate, please
>>> > feel free to provide feedback via email or in
>>> > https://pad.ceph.com/p/reef_scale_testing. For more active
>>> discussions,
>>> > please use the #ceph-at-scale slack channel in
>>> ceph-storage.slack.com <http://ceph-storage.slack.com>.
>>> >
>>> > Overall things are looking pretty good based on our testing.
>>> Please try it
>>> > out and report any issues you encounter. Happy testing!
>>> >
>>> > Thanks,
>>> > Neha
>>> >
>>> > Get the release from
>>> >
>>> > * Git at git://github.com/ceph/ceph.git
>>> <http://github.com/ceph/ceph.git>
>>> > * Tarball at https://download.ceph.com/tarballs/ceph-18.1.0.tar.gz
>>> > * Containers at https://quay.io/repository/ceph/ceph
>>> > * For packages, see
>>> https://docs.ceph.com/en/latest/install/get-packages/
>>> > * Release git sha1: c2214eb5df9fa034cc571d81a32a5414d60f0405
>>> >
>>> > [0] https://github.com/ceph/ceph/pull/49006
>>> > [1] https://github.com/ceph/ceph/pull/51821
>>> > _______________________________________________
>>> > ceph-users mailing list -- ceph-users(a)ceph.io
>>> > To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>>
>
--
Michal Strnad
Oddeleni datovych ulozist
CESNET z.s.p.o.
now that Reef gives our users an upgrade path away from centos stream
8 and ubuntu 20.04, i propose that we drop support for those old
distros from the S release. if there's agreement here, we can stop
building/testing them on main asap
We weren't targeting bullseye once we discovered the compiler version
problem, the focus shifted to bookworm. If anyone would like to help
maintaining debian builds, or looking into these issues, it would be
welcome:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1030129https://tracker.ceph.com/issues/61845
On Mon, Aug 21, 2023 at 7:50 AM Matthew Darwin <bugs(a)mdarwin.ca> wrote:
> Thanks for the link to the issue. Any reason it wasn't added to the
> release notes (for bullseye).
>
> I am also waiting for this to be available to start testing.
> On 2023-08-21 10:25, Josh Durgin wrote:
>
> There was difficulty building on bullseye due to the older version of GCC
> available: https://tracker.ceph.com/issues/61845
>
> On Mon, Aug 21, 2023 at 3:01 AM Chris Palmer <chris.palmer(a)idnet.com> <chris.palmer(a)idnet.com> wrote:
>
>
> I'd like to try reef, but we are on debian 11 (bullseye).
> In the ceph repos, there is debian-quincy/bullseye and
> debian-quincy/focal, but under reef there is only focal & jammy.
>
> Is there a reason why there is no reef/bullseye build? I had thought
> that the blocker only affected debian-bookworm builds.
>
> Thanks, Chris
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>
>
Dear Cephers,
Today brought us an eventful CTL meeting: it looks like Jitsi recently started
requiring user authentication
<https://jitsi.org/blog/authentication-on-meet-jit-si/> (anonymous users
will get a "Waiting for a moderator" modal), but authentication didn't work
against Google or GitHub accounts, so we had to move to the good old Google
Meet.
As a result of this, Neha has kindly set up a new private Slack channel
(#clt) to allow for quicker communication among CLT members (if you usually
attend the CLT meeting and have not been added, please ping any CLT member
to request that).
Now, let's move on the important stuff:
*The latest Pacific Release (v16.2.14)*
*The Bad*
The 14th drop of the Pacific release has landed with a few hiccups:
- Some .deb packages were made available to downloads.ceph.com before
the release process completion. Although this is not the first time it
happens, we want to ensure this is the last one, so we'd like to gather
ideas to improve the release publishing process. Neha encouraged everyone
to share ideas here:
- https://tracker.ceph.com/issues/62671
- https://tracker.ceph.com/issues/62672
- v16.2.14 also hit issues during the ceph-container stage. Laura
wanted to raise awareness of its current setbacks
<https://pad.ceph.com/p/16.2.14-struggles> and collect ideas to tackle
them:
- Enforce reviews and mandatory CI checks
- Rework the current approach to use simple Dockerfiles
<https://github.com/ceph/ceph/pull/43292>
- Call the Ceph community for help: ceph-container is currently
maintained part-time by a single contributor (Guillaume Abrioux). This
sub-project would benefit from the sound expertise on containers
among Ceph
users. If you have ever considered contributing to Ceph, but felt a bit
intimidated by C++, Paxos and race conditions, ceph-container is a good
place to shed your fear.
*The Good*
Not everything about v16.2.14 was going to be bleak: David Orman brought us
really good news. They tested v16.2.14 on a large production cluster
(10gbit/s+ RGW and ~13PiB raw) and found that it solved a major issue
affecting RGW in Pacific <https://github.com/ceph/ceph/pull/52552>.
*The Ugly*
During that testing, they noticed that ceph-mgr was occasionally OOM killed
(nothing new to 16.2.14, as it was previously reported). They already tried:
- Disabling modules (like the restful one, which was a suspect)
- Enabling debug 20
- Turning the pg autoscaler off
Debugging will continue to characterize this issue:
- Enable profiling (Mark Nelson)
- Try Bloomberg's Python mem profiler
<https://github.com/bloomberg/memray> (Matthew Leonard)
*Infrastructure*
*Reminder: Infrastructure Meeting Tomorrow. **11:30-12:30 Central Time*
Patrick brought up the following topics:
- Need to reduce the OVH spending ($72k/year, which is a good cut in the
Ceph Foundation budget, that's a lot less avocado sandwiches for the next
Cephalocon):
- Move services (e.g.: Chacra) to the Sepia lab
- Re-use CentOS (and any spared/unused) machines for devel purposes
- Current Ceph sys admins are overloaded, so devel/community involvement
would be much appreciated.
- More to be discussed in tomorrow's meeting. Please join if you
think you can help solve/improve the Ceph infrastrucru!
*BTW*: today's CDM will be canceled, since no topics were proposed.
Kind Regards,
Ernesto
Hi everyone, CDM is happening tomorrow @ 1:00 UTC. See more meeting details
below.
Please add any topics you'd like to discuss to the agenda:
https://tracker.ceph.com/projects/ceph/wiki/CDM_06-SEP-2023
Thanks,
Laura Flores
Meeting link:
https://meet.jit.si/ceph-dev-monthly
Time conversions:
UTC: Thursday, September 7, 1:00 UTC
Mountain View, CA, US: Wednesday, September 6, 18:00 PDT
Phoenix, AZ, US: Wednesday, September 6, 18:00 MST
Denver, CO, US: Wednesday, September 6, 19:00 MDT
Huntsville, AL, US: Wednesday, September 6, 20:00 CDT
Raleigh, NC, US: Wednesday, September 6, 21:00 EDT
London, England: Thursday, September 7, 2:00 BST
Paris, France: Thursday, September 7, 3:00 CEST
Helsinki, Finland: Thursday, September 7, 4:00 EEST
Tel Aviv, Israel: Thursday, September 7, 4:00 IDT
Pune, India: Thursday, September 7, 6:30 IST
Brisbane, Australia: Thursday, September 7, 11:00 AEST
Singapore, Asia: Thursday, September 7, 9:00 +08
Auckland, New Zealand: Thursday, September 7, 13:00 NZST
--
Laura Flores
She/Her/Hers
Software Engineer, Ceph Storage <https://ceph.io>
Chicago, IL
lflores(a)ibm.com | lflores(a)redhat.com <lflores(a)redhat.com>
M: +17087388804