Hi.
When I use `rados ls -p buckets.index | wc -l` it shows me 9680 lines
(objects) but in `ceph df` it shows me that this pool has 28.98K
objects.
Why is it too different?
Hi Cephers,
Can someone please share about the research and industrial conferences
where one can publish Ceph related new research results? Additionally, are
there any conferences which are particularly interested in Ceph results? I
would like to know all suitable conferences. Thanks :-)
Looking forward to hearing from you.
BR
Bobby !!
Hi,
This problem was reported some time ago [1] though it seemed then the root
cause was not identified.
Now we have a customer that is experiencing the same problem so we did
some investigation. The issue is that in a bluestore/filestore mixed
cluster the deep scrub may incorrectly report missing objects due to
object sorting order is different on bluestore and filestore osds.
Looking at the cases we have it happens for objects that have the same
hash. When building the sorting key the bluestore escapes the object
name string, while the filestore sorts with the raw object name, which
may result in different order.
I pushed a PR for review that changes the sorting order for the
filestore [2]. Though using this solution would fix a cluster of mixed
bluestore and upgraded filestore osds but introduce the same issue for
a cluster of mixed old version and new version filestore osds.
[1] https://tracker.ceph.com/issues/43174
[2] https://github.com/ceph/ceph/pull/35938
--
Mykola Golub
Hi Folks,
I don't have anything on the agenda today and we've got a couple of
folks starting the U.S. 4th of July weekend early so let's cancel the
meeting for today. Have a great weekend everyone!
Thanks,
Mark
I've temporarily removed the requirement the above check, as it's
failing often on unrelated pull requests. Once it gets back healthy we
can reenable it (it still runs, it just won't cause the pull request to
be blocked).
Hello Ceph-Devs,
we have noticed a rise in overall load from the MGR daemon after upgrading to Nautilus 14.2.9 from Luminous 12.2.13. This has resulted in the Prometheus module not being able to respond due to overload while an OSD is out for example. We evaluated this on our test clusters with recent hardware and the issues still persisted and even getting worse with gaps in the Prometheus metric collection while the cluster is being written to in a perfectly healthy state.
After some digging and hoping the pull request from https://tracker.ceph.com/issues/45439 (https://github.com/ceph/ceph/pull/34356) Elatives the issue, which it didn't, we have traced most of our troubles down to the Progress MGR module:
The notify function in the progress module is highly inefficient in its current form due to unnecessary collection of PG data when nothing is beieng done with it (self._events being empty).
This results in the Prometheus module being blocked regularly and thus not responding in time (response times of > 10 seconds, or even outright cherrypy timeouts)
We have prepared an issue ticket and a Pull request for this to be fixed:
https://tracker.ceph.com/issues/46416https://github.com/ceph/ceph/pull/35973
After implementing this easy fix we haven't experienced any Prometheus timeouts.
Could someone please review, merge and Backport this pull request.
Thanks in advance
Hello,
There is a plan to make a code walkthrough on the implementation of
backfill in crimson.
It's scheduled just after the project's sync-up: this Wednesday, 05:00 AM UTC.
Link: https://redhat.bluejeans.com/908675367.
Regards,
Radek