Re: monitoring

28 Nov 2019

On Thu, 28 Nov 2019, Jan Fajerski wrote:
...
  On Wed, Nov 27, 2019 at 10:01:01PM +0000, Sage Weil
wrote:
 Adding dev list.  We haven't talked through
much of this in any detail in
the orchestrator calls yet aside from a vague discussion about what
should/shouldn't be in scope.  Even if its an unpopular opinion: I think
monitoring is absolutely out of scope 
 for the orchestrator. Ceph works just fine without prometheus/grafana in place.  
 Since I seem to be in the minority camp with this, I'd argue for at least making 
 this optional and integrating well with existing (further down called DIY case, 
 though some might call what the orchestrator does DIY ;) monitoring deployments. 
Working with existing DIY metrics infra is the only non-negotiable 
requirement here.  My goal is simply to have something that is simple and 
painless (and not necessarily even HA) just to make all the other 
dashboard graphs light up.

...
  On Thu, 28 Nov
2019, Paul Cuzner wrote:
  On Thu, Nov 28, 2019 at 2:37 AM Sage Weil
&lt;sweil(a)redhat.com&gt; wrote:

  On Wed, 27 Nov 2019, Paul Cuzner wrote:
 > Hi,
 >
 > I've got a working gist for the add/remove of the monitoring solution.
 > https://gist.github.com/pcuzner/ac542ce3fa9a4699bb9310b1fd5095d0
 >
 >  I'm out for the next couple of days, but will get a PR raised next week
 to
 > get this started properly.

 For some reason it won't let me comment on that gist.

 - I don't think we should install anything on the host outside of the unit
 file and /var/lib/ceph/$fsid/$thing.  I suggest $thing be 'prometheus',
 'alertmanager', 'node-exporter', 'grafana'.  We could combine all
but
 node-exporter into a single 'monitoring' thing but i'm worried this
 obscures things too much when, for example, the user might have an
 external prometheus but still need alertmanager, and so on.

 So all the configs should live in
 /var/lib/ceph/$fsid/$thing/prometheus.yml and so on, and then bound to the
 right /etc/whatever location by the container config.

 I struggle with this one. Channelling my inner sysadmin: "I expect config
 settings to be in /etc and data to be in /var/lib - that's what FHS says
 and that's how other systems look that I have to manage, so why does Ceph
 have to do things differently?" 
1- Because it's a containerized service.  Things are in etc inside the
container, not outside.  Sprinkling these configs in /etc mixes
containerized service configs with the *host*'s configs, which seems very
untidy to me.
2. Putting it all in /var/lib/ceph/whatever means it's find and
clean up.

  I'm also not sure of the value of fsid in the
dir names. I can see the
 value if a host has to support multiple ceph clusters - but outside dev is
 that something that the community or our customers actually want? 
Most deployments won't need it, but it will avoid a whole range of
problems when they do.  Especially when it becomes trivial to bootstrap
clusters, you also make it trivial to make multiple clusters overlap on
the same host.

And, like above, it keeps things tidy.

  The gist downloads the separate containers we
need in parallel - which I
 think is a good thing! reduces time 
Sure... that's something we could do regardless of whether it's a separate
script of part of ceph-daemon.  Probably what we actually want is for the
ssh 'host add' commadn to kick off some prestaging of containers in the
background so that the first daemon deployment doesn't wait for a
container download at all.

  IMO, having monitoring-add deploy grafana/prom
and alert manager together
 by default is the way to go. TBH, when I started this, I was putting them
 all in the same pod under podman for management and treat them as a single
 unit - but having to support 'legacy' docker put an end to that :)

 If a user wishes to use a separate prometheus, that will normally have it's
 own alertmanager too. Which alertmanager a prometheus server is defined in
 the prometheus.yml. With external prometheus, rules, alerts and receiver
 definitions are going to be an exercise for the reader. We'll need to
 document the settings, but the admin will need to apply them - in this
 scenario, we could possibly generate sample files that the admin can pick
 up and apply? To my mind deployment of monitoring has two pathways;
 default - "monitoring add" yields prom/grafana/alertmanager containers
 deployed to machine
 external-prom - "monitoring add" just deploys grafana, and points it's
 default data source at the external prom url. We're also making an
 assumption here that the prometheus server is open and doesn't require auth
 (OCP's prometheus for example has auth enabled) 
I think it makes sense to focus on the out-of-the-box opinionated easy
scenario vs the DIY case, in general at least.  But I have a few
questions...  I think this focus will leave some users in the dust. Monitoring with
prometheus 
 can get complex, especially if it is to be fault tolerant (which imho is 
 important for confidence in such a system). Also typically users don't want 
 several monitoring systems in their environment. So let's keep the case of 
 existing prometheus systems in mind please. 
That's what I want meant by 'vs' above... perhaps I should have said
'or'.  
Either we deploy something simple and opinionated, or the user attaches to 
their existing or self-configured setup.  We don't probably need to worry 
about the various points in the middle ground where we manage only part of 
the metrics solution.

(Also, I'm trying to use 'metrics' to mean prometheus etc, vs
'monitoring' 
which in my mind is nagios or pagerduty or whatever and presumably has a 
level of HA required, and/or needs to be external instead of baked-in.)

sage

...

- In the DIY case, does it makes sense to leave the node-exporter to the
reader too?  Or might it make sense for us to help deploy the
node-exporter, but they run the external/existing prometheus instance?

- Likewise, the alertmanager is going to have a bunch of ceph-specific
alerts configured, right?  Might they want their own prom but we deploy
our alerts?  (Is there any dependency in the dashboard on a particular set
of alerts in prometheus?)

I'm guessing you think no in both these cases...  
 What I'm missing from proposals I've seen so far is an interface to query the 
 orchestrator for various prometheus bits. First and foremost the orchestrator 
 should have a command that returns a prometheus file_sd_config of exporters that 
 an external prometheus stack should scrape. Whether this is just the mgr 
 exporter or also node_exporters (or others) depends on how far the orchestrator 
 will take control.
 Alerts are currently handled as an rpm but could certainly be provided through a 
 similar interface.

 At the very least, if the consensus will be that the orchestrator absolutely has 
 to deploy everything itself, please at least provide an interface so that a 
 federated setup is easily possible (an external prometheus scraping the 
 orch-deployed prometheus) so that users don't have to care what the orchestrator 
 does with monitoring (other then duplicating recorded metrics). See
 https://prometheus.io/docs/prometheus/latest/federation/#hierarchical-feder…

 I'd really like to encourage the orchestrator team to carefully think this 
 through. Monitoring is (at least for some users) a critical infrastructure 
 component with its own inherent complexity. I'm worried that just doing this in 
 a best-effort fashion and not offering an alternative path if going to weaken 
 the ceph ecosystem.

   -
Let's teach ceph-daemon how to do this, so that you do 'ceph-daemon
 deploy --fsid ... --name prometheus.foo -i input.json'.  ceph-daemon
 has the framework for opening firewall ports etc now... just add ports
 based on the daemon type.

 TBH, I'd keep the monitoring containers away from the ceph daemons. They
 require different parameters, config files etc so why not keep them
 separate and keep the ceph logic clean. This also allows us to change
 monitoring without concerns over logic changes to normal ceph daemon
 management. 
Okay, but mgr/ssh is still going to be wired up to deploy these. And to do
so on a per-cluster, containerized basis... which means all of the infra
in ceph-daemon will still be useful.  It seems easiest to just add it
there.

Your points above seem to point toward simplifying the containers we
deploy to just two containers, one that's one-per-cluster for
prom+alertmanager+grafana, and one that's per-host for the node-exporter.
But I think making it fit in nicely with the other ceph containers (e.g.,
/var/lib/ceph/$fsid/$thing) makes sense.  Esp since we can just deploy
these during bootstrap by default (unless some --external-prometheus is
passed) and this all happens without the admin having to think about it.

   WDYT?

  I'm sure a lot of the above has already been discussed at length with the
 SuSE folks, so apologies for going over ground that you've already covered. 
Not yet! :)

sage
_______________________________________________
Dev mailing list -- dev(a)ceph.io
To unsubscribe send an email to dev-leave(a)ceph.io  
 -- 
 Jan Fajerski
 Senior Software Engineer Enterprise Storage
 SUSE Software Solutions Germany GmbH
 Maxfeldstr. 5, 90409 Nürnberg, Germany
 (HRB 36809, AG Nürnberg)
 Geschäftsführer: Felix Imendörffer
 _______________________________________________
 Dev mailing list -- dev(a)ceph.io
 To unsubscribe send an email to dev-leave(a)ceph.io

2024

2023

2022

2021

2020

2019

Re: monitoring