[ceph-users] Re: Why you might want packages not containers for Ceph deployments

25 Jun 2021

Orchestration is hard, especially with every permutation. The devs have implemented what
they feel is the right solution for their own needs from the sound of it. The
orchestration was made modular to support non containerized deployment. It just takes
someone to step up and implement the permutations desired. And ultimately that's what
opensource is geared towards. With opensource and some desired feature, you can:
1. Implement it
2. Pay someone else to implement it
3. Convince someone else to implement it in their spare time.

The thread seems to be currently focused around #3 but no developer seems to be interested
in implementing it. So that leaves options 1 and 2?

To move this forward, is anyone interested in developing package support in the
orchestration system or paying to have it implemented?

________________________________________
From: Oliver Freyermuth &lt;freyermuth(a)physik.uni-bonn.de&gt;
Sent: Wednesday, June 2, 2021 2:26 PM
To: Matthew Vernon; ceph-users(a)ceph.io
Subject: [ceph-users] Re: Why you might want packages not containers for Ceph deployments

Check twice before you click! This email originated from outside PNNL.

Hi,

that's also a +1 from me — we also use containers heavily for scientific workflows,
and know their benefits well.
But they are not the "best", or rather, the most fitting tool in every
situation.
You have provided a great summary and I agree with all points, and thank you a lot for
this very competent and concise write-up.

Since in this lengthy thread, static linking and solving the issue of many
inter-dependencies for production services with containers have been mentioned as
solutions,
I'd like to add another point to your list of complexities:
* Keeping production systems secure may be a lot more of a hassle.

Even though the following article is long and many may regard it as controversial, I'd
like to link to a concise write-up from a packager discussing this topic in a quite
generic way:

https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fblogs.gen…
While the article discusses the issues of static linking and package management performed
in language-specific domains, it applies all the same to containers.

If I operate services in containers built by developers, of course this ensures the setup
works, and dependencies are well tested, and even upgrades work well — but it also means
that,
at the end of the day, if I run 50 services in 50 different containers from 50 different
upstreams, I'll have up to 50 different versions of OpenSSL floating around my
production servers.
If a security issue is found in any of the packages used in all the container images, I
now need to trust the security teams of all the 50 developer groups building these
containers
(and most FOSS projects won't have the ressources, understandably...),
instead of the one security team of the disto I use. And then, I also have to re-pull all
these containers, after finding out that a security fix has become available.
Or I need to build all these containers myself, and effectively take over the complete
job, and have my own security team.

This may scale somewhat well, if you have a team of 50 people, and every person takes care
of one service. Containers are often your friend in this case[1],
since it allows to isolate the different responsibilities along with the service.

But this is rarely the case outside of industry, and especially not in academics.
So the approach we chose for us is to have one common OS everywhere, and automate all of
our deployment and configuration management with Puppet.
Of course, that puts is in one of the many corners out there, but it scales extremely well
to all services we operate,
and I can still trust the distro maintainers to keep the base OS safe on all our servers,
automate reboots etc.

For Ceph, we've actually seen questions about security issues already on the list[0]
(never answered AFAICT).

To conclude, I strongly believe there's no one size fits all here.

That was why I was hopeful when I first heard about the Ceph orchestrator idea, when it
looked to be planned out to be modular,
with the different tasks being implementable in several backends, so one could imagine
them being implemented with containers, with classic SSH on bare-metal (i.e.
ceph-deploy-like), ansible, rook or maybe others.
Sadly, it seems it ended up being "container-only".
Containers certainly have many uses, and we run thousands of them daily, but neither do
they fit each and every existing requirement,
nor are they a magic bullet to solve all issues.

Cheers,
        Oliver

[0]
https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.cep…
[1] But you may also just have a very well structured configuration management system
fitting your organizational structure.

Am 02.06.21 um 11:36 schrieb Matthew Vernon:
...
  Hi,

 In the discussion after the Ceph Month talks yesterday, there was a bit of chat about
cephadm / containers / packages. IIRC, Sage observed that a common reason in the recent
user survey for not using cephadm was that it only worked on containerised deployments. I
think he then went on to say that he hadn't heard any compelling reasons why not to
use containers, and suggested that resistance was essentially a user education
question[0].

 I'd like to suggest, briefly, that:

 * containerised deployments are more complex to manage, and this is not simply a matter
of familiarity
 * reducing the complexity of systems makes admins' lives easier
 * the trade-off of the pros and cons of containers vs packages is not obvious, and will
depend on deployment needs
 * Ceph users will benefit from both approaches being supported into the future

 We make extensive use of containers at Sanger, particularly for scientific workflows, and
also for bundling some web apps (e.g. Grafana). We've also looked at a number of
container runtimes (Docker, singularity, charliecloud). They do have advantages - it's
easy to distribute a complex userland in a way that will run on (almost) any target
distribution; rapid "cloud" deployment; some separation (via namespaces) of
network/users/processes.

 For what I think of as a 'boring' Ceph deploy (i.e. install on a set of dedicated
hardware and then run for a long time), I'm not sure any of these benefits are
particularly relevant and/or compelling - Ceph upstream produce Ubuntu .debs and Canonical
(via their Ubuntu Cloud Archive) provide .debs of a couple of different Ceph releases per
Ubuntu LTS - meaning we can easily separate out OS upgrade from Ceph upgrade. And
upgrading the Ceph packages _doesn't_ restart the daemons[1], meaning that we maintain
control over restart order during an upgrade. And while we might briefly install packages
from a PPA or similar to test a bugfix, we roll those (test-)cluster-wide, rather than
trying to run a mixed set of versions on a single cluster - and I understand this
single-version approach is best practice.

 Deployment via containers does bring complexity; some examples we've found at Sanger
(not all Ceph-related, which we run from packages):

 * you now have 2 process supervision points - dockerd and systemd
 * docker updates (via distribution unattended-upgrades) have an unfortunate habit of
rudely restarting everything
 * docker squats on a chunk of RFC 1918 space (and telling it not to can be a bore), which
coincides with our internal network...
 * there is more friction if you need to look inside containers (particularly if you have
a lot running on a host and are trying to find out what's going on)
 * you typically need to be root to build docker containers (unlike packages)
 * we already have package deployment infrastructure (which we'll need regardless of
deployment choice)

 We also currently use systemd overrides to tweak some of the Ceph units (e.g. to do some
network sanity checks before bringing up an OSD), and have some tools to pair OSD /
journal / LVM / disk device up; I think these would be more fiddly in a containerised
deployment. I'd accept that fixing these might just be a SMOP[2] on our part.

 Now none of this is show-stopping, and I am most definitely not saying "don't
ship containers". But I think there is added complexity to your deployment from going
the containers route, and that is not simply a "learn how to use containers"
learning curve. I do think it is reasonable for an admin to want to reduce the complexity
of what they're dealing with - after all, much of my job is trying to automate or
simplify the management of complex systems!

 I can see from a software maintainer's point of view that just building one container
and shipping it everywhere is easier than building packages for a number of different
distributions (one of my other hats is a Debian developer, and I have a bunch of machinery
for doing this sort of thing). But it would be a bit unfortunate if the general thrust of
"let's make Ceph easier to set up and manage" was somewhat derailed with
"you must use containers, even if they make your life harder".

 I'm not going to criticise anyone who decides to use a container-based deployment
(and I'm sure there are plenty of setups where it's an obvious win), but if I were
advising someone who wanted to set up and use a 'boring' Ceph cluster for the
medium term, I'd still advise on using packages. I don't think this makes me a
luddite :)

 Regards, and apologies for the wall of text,

 Matthew

 [0] I think that's a fair summary!
 [1] This hasn't always been true...
 [2] Simple (sic.) Matter of Programming

--
Oliver Freyermuth
Universität Bonn
Physikalisches Institut, Raum 1.047
Nußallee 12
53115 Bonn
--
Tel.: +49 228 73 2367
Fax:  +49 228 73 7869
--

2024

2023

2022

2021

2020

2019

[ceph-users] Re: Why you might want packages not containers for Ceph deployments