Hi guys,
We recently upgrade the ceph-mgr to 15.2.4, Octopus in our production
clusters. The status of the cluster now is as follow:
*# ceph versions*
*{*
* "mon": {*
* "ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c)
octopus (stable)": 5*
* },*
* "mgr": {*
* "ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c)
octopus (stable)": 3*
* },*
* "osd": {*
* "ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c)
octopus (stable)": 1933*
* },*
* "mds": {*
* "ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c)
octopus (stable)": 14*
* },*
* "overall": {*
* "ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c)
octopus (stable)": 1955*
* }*
*}*
Now we suffered some problems in this cluster:
1. it always took a significant longer time to get the result of `ceph pg
dump`.
2. the ceph-exportor might failed to get cluster metrics.
3. sometimes the cluster showed a few inactive/down pgs but recovered very
soon.
We did a investigation on the ceph-mgr, didn't get the root cause yet. But
there are some dispersed clews (I am not sure if they ca help):
1. the ms_dispatch thread is always busy with one core.
2. the msg size is significant larger than 40K.
*2020-09-24T14:47:50.216+0000 7f8f811f6700 1 --
[v2:{mgr_ip}:6800/111,v1:{mgr_ip}:6801/111] <== osd.3038
v2:{osd_ip}:6800/384927 431 ==== pg_stats(17 pgs tid 0 v 0) v2 ====
42153+0+0 (secure 0 0 0) 0x55dae07c1800 con 0x55daf6dde400*
3. get some errors of "Fail to parse JSON result".
*2020-09-24T15:47:42.739+0000 7f8f8da0f700 0 [devicehealth ERROR root]
Fail to parse JSON result from daemon osd.1292 ()*
4. in the sending channel, we could see lots of faults.
*2020-09-24T14:53:17.725+0000 7f8fa866e700 1 --
[v2:{mgr_ip}:6800/111,v1:{mgr_ip}:6801/111] >> v1:{osd_ip}:0/1442957044
conn(0x55db38757400 legacy=0x55db03d8e800 unknown :6801
s=STATE_CONNECTION_ESTABLISHED l=1).tick idle (909347879) for more than
900000000 us, fault.*
*2020-09-24T14:53:17.725+0000 7f8fa866e700 1 --1-
[v2:{mgr_ip}:6800/111,v1:{mgr_ip}:6801/111] >> v1:{osd_ip}:0/1442957044
conn(0x55db38757400 0x55db03d8e800 :6801 s=OPENED pgs=1572189 cs=1
l=1).fault on lossy channel, failing*
5. or the mgr-fin thread would be busy with one core.
[image: image.png]
and from the perf dump we got:
* "finisher-Mgr": { "queue_len": 1359862,
"complete_latency": { "avgcount": 14, "sum":
40300.307764855, "avgtime": 2878.593411775 } },*
Sorry about these clews are a little messy. Could you have any comments on
this?
Thanks.
Regards,
Hao
Today it came to my attention that not all Ceph developers agree with the
following cherry-picking rule:
"if a commit could not be cherry-picked from master, the commit message must
explain why that was not possible" [1]
[1]
https://github.com/ceph/ceph/blob/master/SubmittingPatches-backports.rst#ch…
Now, I (Nathan) am the one who wrote these rules down, but I'm not their author.
These rules are a codification of a set of best practices I "inherited" from
Loic. Although I hesitate to speak on his behalf, I don't think he's around here
anymore, so I'll just go ahead and present what I think is the rationale for this
particular rule.
In the past, regressions often happened because bugs got fixed directly in
a stable branch, but not in master. Later, after a new major stable release was
split off from master and users upgraded their clusters to it, BOOM the bugs
were back! Of course, nobody initially knew why, but it was clear that the bug
was a regression. Therefore, forensic investigations of the git history were
undertaken to find the answer to the question: "which commit fixed this bug
in N-1 and why is that commit not in N?".
One possible tactic in such an investigation is to find all commits in the
N-1 stable branch (which does not exhibit the bug) that aren't cherry-picks, but
potentially should have been. One of these might be the fix, but which one?
Some bugs have to be fixed directly in a stable branch: they cannot be
cherry-picked from master for any number of valid reasons. So, in our
hypothetical forensic investigation, we are faced with the necessity of
distinguishing these "good" direct bug-fixing commits from "bad" ones which
should have been cherry-picks, but are not. But how to make that distinction
when the commit messages themselves are silent on the question of why they
aren't cherry picks? That, I believe, is where this rule came from.
Nowadays, it would seem that this type of forensic investigation is rarely
undertaken. BUT let us ask ourselves, could that be because (1) we have these
cherry-picking rules and (2) they are - for the most part - enforced?
Anyway, I thought I'd bring the matter up here to in the hopes of finding
a consensus on whether this rule should stand as-is or be revised. I don't
relish being in a position of enforcing a rule that the leads (and the developer
community as a whole) don't understand or agree with.
Thanks,
Nathan
I'm looking for clarification on which command should be used to manage configuration settings in Nautilus.
It's not clear which of the "config" commands are supposed to be used. The documentation refers to "ceph config set/get ...", but the man page for ceph(8) only references "ceph config-key ...".
What is the difference between "ceph config set" and "ceph config-key set" and which one is supposed to be used in Nautilus (and later) ?
For example, setting values for the dashboard appears to work using "ceph config set mgr ...":
ceph config set mgr mgr/dashboard/mon02/server_addr 10.4.3.22
But trying to read this value back using "config get" fails:
ceph config get mgr mgr/dashboard/mon02/server_addr
Error EINVAL: unrecognized entity 'mgr'
Furthermore, "ceph config ls" shows a rather long list of keys that should be able to be read back, but attempting to fetch them with "ceph config get" always results in the same above error "EINVAL: unrecognized entity 'mgr'".
Is this a known bug? The inconsistencies and documentation are confusing.
thanks,
Wyllys Ingersoll
Hi!
After almost a year of development in my spare time I present my own software-defined block storage system: Vitastor - https://vitastor.io
I designed it similar to Ceph in many ways, it also has Pools, PGs, OSDs, different coding schemes, rebalancing and so on. However it's much simpler and much faster. In a test cluster with SATA SSDs it achieved Q1T1 latency of 0.14ms which is especially great compared to Ceph RBD's 1ms for writes and 0.57ms for reads. In an "iops saturation" parallel load benchmark it reached 895k read / 162k write iops, compared to Ceph's 480k / 100k on the same hardware, but the most interesting part was CPU usage: Ceph OSDs were using 40 CPU cores out of 64 on each node and Vitastor was only using 4.
Of course it's an early pre-release which means that, for example, it lacks snapshot support and other useful features. However the base is finished - it works and runs QEMU VMs. I like the design and I plan to develop it further.
There are more details in the README file which currently opens from the domain https://vitastor.io
Sorry if it was a bit off-topic, I just thought it could be interesting for you :)
--
With best regards,
Vitaliy Filippov
Hi Ceph Developers,
The Ceph community is planning on participating in the upcoming round of
Outreachy (https://www.outreachy.org/).
Applicants will be applying for internships during the month of October and
interns would work on their projects from December - March.
If you're interested in mentoring a project, please add your ideas to this
projects list:
https://pad.ceph.com/p/project-ideas
I will be visiting various standup meetings over the coming weeks to
discuss project ideas as well. If you have any questions, please reach out
to me.
Best,
Ali
Hoi,
Clearly some code has been backported to Nautilus, since My FreeBSD
Nautilus builds fail
on:
gmake[2]: Leaving directory '/home/jenkins/workspace/ceph-nautilus/build'
/home/jenkins/workspace/ceph-nautilus/src/test/libcephfs/lazyio.cc:24:10: fatal error: 'sys/xattr.h' file not found
#include <sys/xattr.h>
^~~~~~~~~~~~~
gmake[2]: Entering directory '/home/jenkins/workspace/ceph-nautilus/build'
Something I have fixed in:
https://github.com/ceph/ceph/pull/30505
And tracked in:
https://tracker.ceph.com/issues/42448
So what do I need to do to get this fix backported as well?
Thanx,
--WjW
Thanks Marc :)
It's easier to write code than to cooperate :) I can do whatever I want in my own project.
Ceph is rather complex. For example, I failed to find bottlenecks in OSD when I tried to profile it - I'm not an expert of course, but still... The only bottleneck I found was cephx_sign_messages=true by default. Now I always disable it. In fact I don't think Ceph needs those signatures at all because 99.9% of setups live in private networks. Ceph has ~1M lines of code. Vitastor has 22k :). Bluestore is complicated, SeaStore seems like it may also end up being complicated, there are a lot of other architectural things like RBD cache, RBD object map, immediate commit semantics for all writes and so on that can't be easily fixed. It would take MUCH more than a year to fix everything.
Ceph is great for object storage, but 1ms write latency in an NVMe cluster is something that annoyed me so much that I basically had to try to reinvent the wheel. So I hope my wheel will make its way into production at some point :)
> Vitaliy you are crazy ;) But really cool work. Why not combine efforts
> with ceph? Especially with something as important as SDS and PB's of
> clients data stored on it, everyone with a little bit of brain chooses a
> solution from a 'reliable' source. For me it was decisive to learn that
> CERN and NASA were using this on a large scale. I do not have the
> expertise nor time (like probably 90% of ceph users) to test how they
> have been testing and using ceph.
>
> I often see opensource projects that could benefit from cooperation.
> Some teams totally lack the expertise that others have, and vice versa.
> Providing the community with 10 or 20 'shitty' projects instead of 3
> 'good' projects.
> I think opensource projects should more often embrace a sort of modular
> development solution. Where others can change functionality by replacing
> just a module. If I ever get my idea funded, I would make it like this.
Hi Folks,
The weekly performance meeting will be starting in 20 minutes! Last week
was mostly spent discussing Igor's recent testing, so today we are going
to continue discussing refactoring onodes in bluestore to improve memory
usage and CPU overhead. See you there!
Etherpad:
https://pad.ceph.com/p/performance_weekly
Bluejeans:
https://bluejeans.com/908675367
Thanks,
Mark
Hi Marc,
On 9/17/20 11:16 AM, Marc Roos wrote:
> This[1] and natural evolution(?)
>
> [1]
> https://bootstrap-datepicker.readthedocs.io/en/v1.9.0/
> Support Read the Docs!
>
> Please help keep us sustainable by allowing our Ethical Ads in your ad
> blocker or go ad-free by subscribing.
>
> Thank you! ❤️
Thanks for the info! That prompted me to read up some more on this.
RTD is actually quite open and honest about their advertising model [1]
and they do provide options to opt out of paid ads [2], so that's
definitely an option that we should consider disabling, if we haven't
done so already.
At present, I personally feel that the benefits of hosting the docs on
their platform vs. doing it on our own infrastructure are an acceptable
trade-off for ads, especially if these are about other open source
projects only.
As RTD is really an aggregation point for a lot of other open source
projects, it may also help us to gain visibility.
Of course, if "natural evolution" really kicks in and things get too
intrusive/annoying, we should re-evaluate this and take action accordingly.
Lenz
[1] https://docs.readthedocs.io/en/latest/advertising/index.html
[2]
https://docs.readthedocs.io/en/latest/advertising/ethical-advertising.html#…
--
SUSE Software Solutions Germany GmbH - Maxfeldstr. 5 - 90409 Nuernberg
GF: Felix Imendörffer, HRB 36809 (AG Nürnberg)