Hey all,
Ceph Quarterly announcement [Josh and Zac]
One page digest that may published quarterly
Planning for 1st of June, September and December
Reef RC
https://pad.ceph.com/p/reef_scale_testinghttps://pad.ceph.com/p/ceph-user-dev-monthly-minutes#L17
ETA last week of May
Missing CentOS 9 Python deps
Ken Dreyer has volunteered to help get the packages in epel
Lab update
There have been reimaging issues and kernel timeouts seen in the lab;
infra team is working through fixing it
Please raise infrastructure trackers for any bugs you see, as we have
started to do monthly infrastructure bug scrubs
Regards,
Nizam
Hello,
I am emailing due to a make check failure my PR is experiencing. I added a
workunit called test_rgw_d4n.py to the branch and it's causing run_tox_qa
to fail because the workunit is executed during this test. The failure is
due to a library I am importing in the workunit. Is there a way around this
error?
If further information is needed, please let me know. Here is the PR:
https://github.com/ceph/ceph/pull/48879
Thank you in advance.
Sincerely,
Samarah Uriarte
Hi guys,
I wonder if anyone could help me figure out how to achieve the following. Basically, I'd like to add a new string data field in RGWBucketInfo. I use bucket->put_info() to update. However, this call doesn't seem to cause the bucket info to be updated in other zones. I don't know what I missed. I also did a "period update --commit" after the update in the master zone. Is there an example that I can follow to get my new piece of information to sync across?
Thanks,
Yixin
We have a continuing problem with Openstack Swift Public URLs returning “NoSuchBucket” even though the S3 equivalent URL works fine. This was not a problem in our Nautilus 14.2.22 clusters, but after upgrading to 16.2.9 or 16.2.10, this regression bug has returned. All 13 of our prod clusters were upgraded to 16.2.11 and still experience the problem, and now I’ve upgraded 1 prod cluster to Quincy/17.2.6 and it is still present.
I’ve submitted the following bug report:
https://tracker.ceph.com/issues/58019
Thanks in advance,
Josh Beaman
CEPH/Storage, Eng 4
816-406-9289
Hi everyone,
Over this weekend we will run a sync between telemetry crashes and Redmine
tracker issues.
This might affect your inbox, depending on your Redmine email notification
setup. You can set up filters for these emails to skip your inbox.
Thanks,
Yaarit
Hi, I'm doing a master's degree in the distributed file system, cephfs
in HPC, and I have some questions about the migration subtree to another
MDS to balance the MDS cluster.
* The Sage Weil Ph.D. thesis and the documentation on the ceph site say
that before migrating to balancer MDS, the first step is freezing the
subtree and unfreezing it after the migration is complete. Doesn't it
affect the performance? Freezing all subtrees until migration.
* Can explain why it freezes all subtree operations during migration to
another MDS to balance the MDS cluster?
* Is It possible to change the migration process to migrate brute INODES
and now freeze the subtree to migrate INODES that have modifications
during the brute migration? As live virtual machines migrate.
Sorry, if my question is so dummy. I'm really trying to understand the
subtree migration process to balance MDS. If I'm wrong, can explain the
process or indicate where I can find documentation or paper about the
subtree migration process. Thanks so much.
--
Odair M. Ditkun Jr
Research group Center of science computation and open source
Master's candidate in Networking and Distributed System — University of
state Paraná – Brazil
i'd like to congratulate Shilpa, Or, and Eric on the successful merge
of https://github.com/ceph/ceph/pull/45958. Eric designed the
distributed bidding algorithm back in 2019, and started on its
original implementation in cls_lock. Or took over that work early last
year, built up a process for testing and measurement, and eventually
rewrote the feature in terms of watch/notify. Shilpa took over early
this year to finish up the changes to metadata sync, and shepherded
the PR through final reviews and testing
i'm really happy with the result, and look forward to its use in data
sync as well. thanks team!
>
> We set up a test cluster with a script producing realistic workload and
> started testing an upgrade under load. This took about a month (meaning
> repeating the upgrade with a cluster on mimic deployed and populated
Hi Frank, do you have such scripts online? On github or so? I was thinking of compiling el9 rpms for Nautilus and run tests for a few days on a test cluster with mixed el7 and el9 hosts.
>
> So to get back to my starting point, we admins actually value rock solid
> over features. I know that this is boring for devs, but nothing is worse
> than nobody using your latest and greatest - which probably was the
> motivation for your question. If the upgrade paths were more solid and
> things like the question "why does an OSD conversion not lead to an OSD
> that is identical to one deployed freshly" or "where does the
> performance go" would actually attempted to track down, we would be much
> less reluctant to upgrade.
>
> I will bring it up here again: with the complexity that the code base
> reached now, the 2 year release cadence is way too fast, it doesn't
> provide sufficient maturity for upgrading fast as well. More and more
> admins will be several cycles behind and we are reaching the point where
> major bugs in so-called EOL versions will only be discovered before
> large clusters even reached this version. Which might become a
> fundamental blocker to upgrades entirely.
Indeed.
> An alternative to increasing the release cadence would be to keep more
> cycles in the life-time loop instead of only the last 2 major releases.
> 4 years really is nothing when it comes to storage.
>
I would like to see this change also.
Hi,
As users using hadoop in big data and S3A have many limitation on itself (not file system). cephfs-hadoop is inconvenient to use and no maintain now. Sometimes users do not modify their old apps on hadoop.
Is there any plan about separation storage and computing on ceph Or any suggesion abount hadoop native implement on ceph ?
Hi Folks,
The weekly performance meeting will be starting in approximately 30
minutes at 8AM PST. Today, Nitzan will be presenting his work on a
dashboard for tracking Teuthology CBT performance results.
Etherpad:
https://pad.ceph.com/p/performance_weekly
Meeting URL:
https://meet.jit.si/ceph-performance
Mark
--
Best Regards,
Mark Nelson
Head of R&D (USA)
Clyso GmbH
p: +49 89 21552391 12
a: Loristraße 8 | 80335 München | Germany
w: https://clyso.com | e: mark.nelson(a)clyso.com
We are hiring: https://www.clyso.com/jobs/