Multiple Metric Generation Locations in Ceph - Dev

27 Feb 2024

Redouane and Avan came to me with an issue with RGW related metrics that
warrants a broader community discussion for all daemons. For more
information, the issue is being tracked by
https://tracker.ceph.com/issues/64598

Currently, metrics consumed by Prometheus related to the RGW are being
generated by combining two parts:
1. The RGW perf counters: these counters are generated by the ceph-exporter
by parsing the output of the rgw command `ceph counter dump`.
2. The RGW metadata (daemon, ceph-version, hostname, etc): this information
is generated by the prometheus mgr module.

To combine the two parts ceph-exporter uses a key field called instance_id,
which is generated as follows:
1. On the ceph-exporter side asok admin socket filename is parsed to
extract the daemon_id which is used to derive the instance_id.
2. On the prometheus-mgr module side orchestrator (cephadm or rook) is
called to get the daemon_id then instance_id is derived from the daemon_id

This approach/design suffers from the following issues:
1. It creates a strong dependency between prometheus-mgr module and the
orchestrator module (this has already caused issues for Rook environments,
ceph v18.2.1 metrics are completely broken because of this)
2. instance_id on the ceph-exporter side mgmt is weak as it relies on
socket filename parsing
3. instance_id generation is error-prone as it relies on how daemon_ids are
handled by the orchestrator module (which is difference between rook and
cephadm)

The issue for RGW is that with certain orchestrators, for example in Rook,
there is a mismatch between the instance IDs for the metrics emitted by the
exporter and the metrics from the prometheus manager module.
This has ramifications when running queries in Prometheus when the instance
id is the primary key between the metrics in the queries.

There are many options for solutions, and I'd be happy to hear the
community's thoughts about what they think.

Here are ours (Avan, Redouane, and I):
1. We think daemon specific metrics meant for Prometheus should only be
emitted from one place, and that place should be the newer ceph-exporter.
2. We discussed having a command you can run on an admin socket that would
emit all of the metadata that is currently being sent by the manager
module. This way we're not relying on parsing file names anymore.
3. promtheus-mgr module will still exist and will be used to emit cluster
wise metrics

The command could be something like `ceph who-am-i` that you would expect
to work on any daemons admin socket, or something daemon specific like
`ceph rgw-info`.

In other words, move the metadata source from the mgr-prometheus module to
the ceph-exporter and use this new command `ceph who-am-i` to get it. This
way, each ceph-daemon will be self-sufficient and able to provide the
metadata needed to label/tag its metrics.

At this moment this affects at least two daemons: rgw and rbd-mirror, but
following the approach above and by introducing the new generic command we
can follow the same pattern for other legacy (or new) daemons.

Look forward to hearing other thoughts,
Ali, Redouane, Avan