Re: Why you might want packages not containers for Ceph deployments - ceph-users

21 Jun 2021

...
  -----Original Message-----
 Sent: Sunday, 20 June 2021 21:34
 To: ceph-users(a)ceph.io
 Subject: *****SPAM***** [ceph-users] Re: Why you might want packages not
 containers for Ceph deployments

  3. Why is in this cephadm still being talked
about systemd? Your
  orchestrator should handle restarts,namespaces and failed tasks not?
 There should be no need to have a systemd dependency, at least I have
 not seen any container images relying on this.

 Podman uses systemd to manage containers so that it is daemonless,
 contrast with Docker where one has to maintain a separate daemon and use
 docker-specific tools to mange containers.  If you assert that Podman
 should not exist, please take that up with the Podman folks.

If your OC uses systemd that means your OC is dependent on systemd and ceph not. Nobody
here is discussing OC specifics.

...
   4. Ok found
the container images[2] (I think). Sorry but this has
  ‘nothing’ to do with container thinking. I expected to find container
 images for osd, msd, rgw separately and smaller. This looks more like an
 OS deployment.

 Bundling all the daemons together into one container is *genius*.  Much
 simpler to build and maintain, one artifact vs a bunch.  I wouldn’t be
 surprised if there are memory usage efficiencies too.

😃 what a non-sense. If building container images is a problem, do not even get involved
with containers.

...
   7. If you are
not setting cpu and memory limits on your cephadm
  containers, then again there is an argument why even use containers.

 This seems like a non-sequitor.  As many have written, CPU and memory
 limits aren’t the reason for containerizing Ceph daemons.  If there are
 other container applications where doing so makes sense, that’s fine for
 those applications.

Indeed, so now we have concluded, cephadm does not use container functionality.

...
  I suspect that artificial CPU limiting of Ceph daemons
would have a
 negative impact on latency, peering/flapping, and slow requests.  Ceph
 is a distributed system, not a massively parallel one.  OSDs already
 have a memory target that can be managed natively, vs an external
 mechanism that arbitrarily cuts them off at the knees when they need it
 the most.  That approach would be addressing the symptoms, not the root
 cause.  Having been through a multi-day outage that was substantially
 worsened by the OOMkiller (*), I personally want nothing to do with
 blind external mechanisms deciding that they know better than Ceph
 daemons whether or not they should be running.  If your availability and
 performance needs favor rigidly defined areas of doubt and uncertainty,
 that’s your own lookout.

Agreed, no real need for a using containers.

...

 * A release that didn’t have OSD memory target setting yet.  Having that
 would have helped dramatically.

  8. I still see lots of comments on the mailing
list about accessing
  logs. I have all my containers log to a remote syslog server, if you
 still have your ceph daemons that can not do this (correctly). What
 point is it even going to containers.

 With all possible respect, that’s another non-sequitor, or at the very
 least, an assumption that your needs are everyone’s needs.  Centralized
 logging makes sense in some contexts.  But not always, and not to
 everyone.  Back around the Hammer or Jewel releases there was a bug
 where logging to syslog resulted in daemon crashes.  I haven’t tried it
 with newer releases, but assume that’s long been fixed.

 I’m not a an [r]syslog[ng] expert by far, but I suspect that central-
 only logging might not deal well with situations like an OSD spewing
 entries when the debug subsystem level is elevated.  Moreover, many of
 the issues one sees with Ceph clusters are network related.  So when
 there’s a network problem, I want to rely on the network to be able to
 see logs? I’ve seen syslog drop entries under load, something I
 personally wouldn’t want for Ceph daemons.   There are of course many
 strategies between the extremes.

So your arguing is, if it does not work in <5% let's not use it?

...
   9. I am
updating my small cluster something like this:

 I’m guessing you’ve never updated between major releases.  That process
 tends to have additional steps and nuances, which is one of the
 compelling arguments in favor of orchestration: when it’s done well,
 most operators don’t need to rev their own homebrew orchestration to set
 the right flags at the right time, etc.  But one of the great things
 about OSS is that you have the flexibility to roll you own if you so
 choose.

  I am never going to run a ‘ceph orch upgrade
start –ceph-version
  16.2.0’. I want to see if everything is ok after each command I issue. I
 want to see if scrubbing stopped, I want to see if osd have correctly
 accepted the new config.

 So you want to do all the things that an orchestrated rolling upgrade
 does for you.  Check.

  I have a small cluster so I do not see this
procedure as a waste of
  time. If I look at your telemetry data[3]. I see 600 clusters with 35k
 osd’s, that is an average of 60 osd per cluster. So these are quite
 small clusters, I would think these admins have a similar point of view
 as I have.

 Careful with those inferences.

We are not operating here on some website cpanel project. Due to the size of these storage
solutions one should expect lots of third party data is stored there. So I would argue you
should not want to have 'I only know Kubernetes commands' sysadmins operating
ceph.

...

 * Operators who submit telemetry may not be a representative sample
 * Sites may have many more than one cluster,  If one has a 20 OSD lab
 cluster and a 1000 OSD production cluster, perspectives and processes
 are going to be different than someone with a single 60 OSD cluster.
 * Average != median

Whatever, prove me wrong in that not the vast majority of clusters are small.
Logics just dictates this. 

...
   I am rather
getting the impression you need to have an easy deployment
  tool for ceph than you want to really utilize containers. First there
 was this ceph-deploy and ceph-ansible which I luckily skipped both

 That’s more than a little harsh.  A lot of people get a lot of value out
 of those tools.

...
   The ceph
daemons seem to be not prepared for container use, ceph
  containers can’t use cpu/memory limits

 They don’t make julienne fries either.  What of it?

So the argument for cephadm to move to containers is non-sense. If there would be a true
container aspiration, you would have to want to apply at  least a few of the
suggestions.

...
   And last but
not least you totally bypass that the (ceph) admin should
  choose the OC platform and not you, because he probably has more than
 just ceph nodes.

 Nobody’s stopping you from rolling your own containers, using
 traditional packages, or heck even deploying with tarballs.  That’s the
 beauty of OSS.  Let’s leave Orange County out of it though.

I agree, there should be no strong relationship between ceph and any OC. I tend to think
that Kubernetes or other OC's should be responsible for offering a ceph
implementation.

...
   So my question
to you: What problem is it actually that your cephadm
  dev team is trying to solve? That is not clear to me.

 Asked and answered, sir

Could you write it to me like I am six year old? I do not get from your 
text what problem the cephadm team is trying to solve.