Hi everyone.
The next DocuBetter meeting is scheduled for 13 Nov 2019 at 1730 UTC.
Etherpad: https://pad.ceph.com/p/Ceph_Documentation
Meeting: https://bluejeans.com/908675367
Agenda: This week Zac Dover, the new Ceph writer, will introduce himself.
RST will be discussed. The Getting Started Guide will be discussed. Docs
gaps and wishes will be discussed.
Thanks, everyone.
Zac Dover
On Wed, 13 Nov 2019, Paul Cuzner wrote:
> Hi Sage,
>
> So I tried switching out the udev calls to pyudev, and shaved a whopping
> 1sec from the timings..Looking deeper I found that the issue is related to
> *ALL* process.Popen calls (of which there are many!) - they all use
> close_fds=True.
>
> My suspicion is that when running in a container the close_fds sees fd's
> from the host too - so it tries to tidy up more than it should. If you set
> ulimit -n 1024 or something and then try a ceph-volume inventory, it should
> just fly through! (at least it did for me)
>
> Let me know if this works for you.
Yes.. that speeds of significantly! 1.5s -> .2s in my case. I can't say
that I understand why, though... it seems like ulimit -n will make file
open attempts fail, but I don't see any failures.
Can we drop the close_fds arg?
sage
Hi everyone,
The transition to python3-only is blocked on three missing python packages
in EPEL7:
- python36-werkzeug: tracked by https://bugzilla.redhat.com/show_bug.cgi?id=1545888
- python36-pecan: tracked by https://bugzilla.redhat.com/show_bug.cgi?id=1766839
- python36-cherrypy: tracked by https://bugzilla.redhat.com/show_bug.cgi?id=1765032
In order to get these into EPEL, they need to go into Fedora first, which
has its own (slow) process. In the meantime, these packages are easy to
build manually as one-offs (and may already have been built by David and
sitting in a temporary repo).
To unblock this, what if we require that temporary repo for centos7
*master* installs, and add it to the teuthology workers via
ceph-cm-ansible? The assumption is that by the time we release octopus we
will have gotten the dependencies in to the appropriate upstream repos.
That means we have until March 2020... 4 months away.
Thanks!
sage
Right now the way ceph-daemon is used by the ssh orchestrator is designed
to minimize the dependencies/setup complexity. The only requirements for
a host to be added to the cluster are
- python (2 or 3)
- systemd
- either podman or docker installed
- the ceph cluster's pub key in /root/.ssh/authorized_keys
No other software (including Ceph) needs to be installed. The mgr/ssh
module invokes ceph-daemon on the remote host by running /usr/bin/python
over ssh and piping the cluster's version of ceph-daemon to stdin.
The downside to this approach is that some users might not like the idea
of ceph having an ssh key with root access. For large clusters I'm not
sure how much this really matters--if you pwn ceph you can delete TB to PB
of data so do you really care if someone has root?--but for hyperconverged
cases this might be a problem.
One alternative might be to
- create a ceph user on the node, and put the cluster's key in
that user's authorized_keys
- install a package that includes ceph-daemon (/usr/bin/ceph-damaen)
- install an /etc/sudoers.d/ceph file that lets the ceph user
'sudo ceph-daemon ...'
Cons:
- This makes the bootstrap process slightly more complicated: (1) install
package, (2) create user, (3) install ssh key (vs just #3).
- The remote version of ceph-daemon can get out of sync with the
cluster.. either stale and missing some feature, or even too new and
not behaving the way the cluster expects.
Pros:
- This limits the attack surface area (if someone manages to get the
cluster's ssh key) to the functions that ceph-daemon implements, vs
full root.
We could mitigate the 'keep ceph-daemon up to date' problem somewhat by
implementing a 'ceph-daemon update' function that will apt/dnf/yum install
ceph-daemon on the local host, so that the cluster could self-update the
remote host.
Or... we could skip the package entirely and install the ceph-daemon
script in /home/ceph/ceph-daemon, and include an update function that
updates the script in place. That is less complicated than knowing how to
apt/dnf/yum install a package for all the random distros and repo location
combinations (one of the biggest benefits of containers IMO). But it
still requires some sort of process to keep ceph-daemon up to date.
Maybe if we always pass an md5sum to ceph-daemon whenever we invoke it to
assert that we are running the version we want, and if there is a
mismatch, ceph-daemon bails out with a special exit code that triggers and
update and retry?
Anyway, what are people's thoughts here? How much more complicated are we
interested or willing to make this to make people more comfortable with
the idea that ceph owns an ssh key?
sage
subscribe
--
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103
http://www.redhat.com/en/technologies/storage
tel. 734-821-5101
fax. 734-769-8938
cel. 734-216-5309
Gal was able to track down some memory bugs with address sanitizer and
identify them as overflows of the coroutine stack used by the beast
frontend. Unlike pthread stacks, these heap-allocated coroutine stacks
don't have any memory protection to catch these overflows.
The boost::asio::spawn() [1] function that creates these coroutines does
allow you to pass in coroutine attributes to control the stack size
(which defaults to 128K), and we were able to increase that in testing
and see the errors go away. But since these coroutines are how we expect
the beast frontend to scale to thousands of connections, we really don't
want to raise the stack size permanently and limit how many we can fit
in memory.
Unfortunately, the boost::coroutine library used by boost::asio doesn't
give us any flexibility outside of the stack size, though the underlying
boost::context library provides several different options for the stack
allocation [2]. And despite boost::coroutine being long deprecated by
boost::coroutine2, boost::asio has never removed its dependency on the
former. The maintainer of these coroutine libraries even submitted a
pull request against boost::asio in 2017 to use boost::context directly
[3], allowing boost::asio::spawn() to use any of its stack allocators.
My sense is that the asio maintainer is more interested in integration
with the C++ Coroutines TS, rather than continuing to support these
stackful coroutines that aren't part of the C++ Networking TS.
I rebased this stale pull request and added some test coverage in
https://github.com/cbodley/asio/commits/wip-spawn-context. Then, since
spawn() is relatively independent from the rest of boost::asio, I forked
that part into a new repo at https://github.com/cbodley/spawn. Using
this fork, radosgw could continue using boost::asio as it is (and
eventually the Networking TS), but call spawn::spawn() instead for
direct control over the coroutine stack allocation.
Of the available stack allocators, the segmented_stack looked promising;
that would allow us to start with a small stack, and grow as more space
was needed. However, this would require us to build our own boost as
segmented stacks aren't enabled by default. It also doesn't help us
control the total stack size used per connection.
So the consensus is to use the protected_fixedsize allocator based on
mmap/mprotect with the existing 128K limit. That way we can catch these
stack overflows in testing, and treat the stack usage itself as bugs.
Casey
[1]
https://www.boost.org/doc/libs/1_71_0/doc/html/boost_asio/reference/spawn.h…
[2]
https://www.boost.org/doc/libs/1_71_0/libs/context/doc/html/context/stack.h…
[3] https://github.com/boostorg/asio/pull/55
hi folks,
just want to share my findings regarding to building ceph on RHEL8
here. today, i was trying to build ceph on RHEL8. but it seems we are
missing some build dependencies on this distro, because quite a few
packages were removed from RHEL8 [0], and these packages are still
missing EPEL8:
No matching package to install: 'gperftools-devel >= 2.6.1'
No matching package to install: 'leveldb-devel > 1.2'
No matching package to install: 'libbabeltrace-devel'
No matching package to install: 'liboath-devel'
No matching package to install: 'python3-cherrypy'
No matching package to install: 'python3-coverage'
No matching package to install: 'python3-pecan'
No matching package to install: 'python3-routes'
No matching package to install: 'python3-tox'
No matching package to install: 'xmlstarlet'
i've added EPEL8 repo by following [1]. probably packages being added
to EPEL8 in future will help to ease the pain. but at this moment,
some unnecessary dependencies are just missing.
cheers,
---
[0] https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/ht…
[1] https://fedoraproject.org/wiki/EPEL
--
Regards
Kefu Chai
I'm attempting to rebuild the Nautilus 14.2.4 SRPM on CentOS 7 and there's one build dependency I'm not able to find:
[bstillwell@build01 ~]$ rpmbuild --rebuild ceph-14.2.4-0.el7.src.rpm
Installing ceph-14.2.4-0.el7.src.rpm
warning: ceph-14.2.4-0.el7.src.rpm: Header V4 RSA/SHA256 Signature, key ID 460f3994: NOKEY
warning: user jenkins-build does not exist - using root
warning: group jenkins-build does not exist - using root
warning: user jenkins-build does not exist - using root
warning: group jenkins-build does not exist - using root
error: Failed build dependencies:
python3-Cython is needed by ceph-2:14.2.4-0.el7.x86_64
Could someone tell me where I can find the RPM for python3-Cython? It appears that Cython-0.19-5.el7 provides python2-Cython, but not the python3 version.
Thanks,
Bryan