Hello Sage,
...I think that part of this comes down to a learning
curve...
...cephadm represent two of the most successful efforts to address
usability...
Somehow it does not look right to me.
There is much more to operate a Ceph cluster than just deploying software.
Of course that helps on the short run to avoid that people leave the train
right when they started their Ceph journey. But the harder part is what to
do if shit hit's the fan and your cluster is down due to some issues and
then having additional layers of complexity kicking in and biting your ass.
Just saying, that day2 ops is much more important than getting a cluster
up&running. In my believe, no admin want to dig around containers and other
abstractions when the single most important part of a whole IT
infrastructure stops working. But just my thought, maybe I'm wrong.
In my opinion, the best possible way to run IT software is KISS, keep it
stupid simple. No additional layers, no abstractions of abstractions and
good error messages.
For example the docker topic here looks like something that can be
showcased:
Question: If it uses docker and docker daemon fails
what happens to you
containers?
Answer: This is an obnoxious feature of docker
As you might see, you need a lot of knowledge about abstraction layers to
operate them well. Docker for example provides so called live-restore (
https://docs.docker.com/config/containers/live-restore/) that allows you to
stop the daemon without killing your containers. This enables you to update
docker daemon without downtimes but you have to know it and of course
enable it. This can make operating a Ceph cluster harder, not easier.
What about more sophisticated features, for example performance. Ceph
already is not a fast storage solution with way to high latency. Does it
help to add containers instead of going more direct to the hardware and
reduce overhead? Of course you can run SPDK and/or DPDK inside containers,
but does it make it better and faster or even easier? If you need
high-performance storage today, you can turn to open source alternatives
that are massively cheaper per IO and only minimally more expensive per GB.
I therefore believe, stripping out overhead is also an important topic for
the future of Ceph.
--
Martin Verges
Managing director
Mobile: +49 174 9335695
E-Mail: martin.verges(a)croit.io
Chat:
https://t.me/MartinVerges
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web:
https://croit.io
YouTube:
https://goo.gl/PGE1Bx
On Fri, 18 Jun 2021 at 20:43, Sage Weil <sage(a)newdream.net> wrote:
Following up with some general comments on the main
container
downsides and on the upsides that led us down this path in the first
place.
Aside from a few minor misunderstandings, it seems like most of the
objections to containers boil down to a few major points:
Containers are more complicated than packages,
making debugging harder.
I think that part of this comes down to a learning curve and some
semi-arbitrary changes to get used to (e.g., systemd unit name has
changed; logs now in /var/log/ceph/$fsid instead of /var/log/ceph).
Another part of these changes are real hoops to jump through: to
inspect process(es) inside a container you have to `cephadm enter
--name ...`; ceph CLI may not be automatically installed on every
host; stracing or finding coredumps requires extra steps. We're
continuing to improve the tools etc so please call these things out as
you see them!
Security (50 containers -> 50 versions of
openssl to patch)
This feels like the most tangible critique. It's a tradeoff. We have
had so many bugs over the years due to varying versions of our
dependencies that containers feel like a huge win: we can finally test
and distribute something that we know won't break due to some random
library on some random distro. But it means the Ceph team is on the
hook for rebuilding our containers when the libraries inside the
container need to be patched.
On the flip side, cephadm's use of containers offer some huge wins:
- Package installation hell is gone. Previously, ceph-deploy and
ceph-ansible had thousands of lines of code to deal with the myriad
ways that packages could be installed and where they could be
published. With containers, this now boils down to a single string,
which is usually just something like "ceph/ceph:v16". We're grown a
handful of complexity there to let you log into private registries,
but otherwise things are so much simpler. Not to mention what happens
when package dependencies break.
- Upgrades/downgrades can be carefully orchestrated. With packages,
the version change is by host, with a limbo period (and occasional
SIGBUS) before daemons were restarted. Now we can run new or patched
code on individual daemons and avoid an accidental upgrade when a
daemon restarts. (Also, running e.g. ceph CLI commands no longer
error out with a dynamic linker error while the package upgrade itself
is in progress, something all of our automated upgrade tests have to
carefully avoid to prevent intermittent failures.)
- Ceph installations are carefully sandboxed. Removing/scrubbing ceph
from a host is trivial as only a handful of directories or
configuration files are touched. And we can safely run multiple
clusters on the same machine without worry about bad interactions
(mostly great for development, but also handy for users experimenting
with new features etc).
- Cephadm deploys a bunch of non-ceph software as well to provide a
complete storage system, including haproxy and keepalived for HA
ingress for RGW and NFS, ganesha for NFS service, grafana, prometheus,
node-exporter, and (soon) samba for SMB. All neatly containerized to
avoid bumping into other software on the host; testing and supporting
the huge matrix of packages versions available via various distros
would be a huge time sink.
Most importantly, cephadm and the orchestrator API vastly improve the
overall ceph experience from the CLI and dashboard. Users no longer
have to give any thought to where and which daemons run if they don't
want to (or they can carefully specify daemon placement if they
choose). And users can use commands like 'ceph fs volume create foo'
and the fs will get created *and* MDS daemons will be started all in
one go. (This would also be possible with a package-based
orchestrator implementation if one existed.)
We've been beat up for years about how complicated and hard Ceph is.
Rook and cephadm represent two of the most successful efforts to
address usability (and not just because they enable deployment
management via the dashboard!), and taking advantage of containers was
one expedient way to get to where we needed to go. If users feel
strongly about supporting packages, we can get much of the same
experience with another package-based orchestrator module. My view,
though, is that we have much higher priority problems to tackle.
sage
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io