Hi Prashant,
Is this move just limited to the impact of the cluster log
in the mon store db or is it part of a larger mon db clean-up
effort?
I'm asking this because, besides de cluster log, the mon
store db is currently used (and perhaps abused) also by some
mgr modules via:
- set_module_option() to set
MODULE_OPTIONS values via CLI commands.
- set_store(): there are 2 main
storage use cases here:
- Immutable/sensitive data: instead of exposing
those as MODULE_OPTIONS (password hashes, private
certificates, API keys, etc.),
- Changing data: mgr-module internal state. While
this shouldn't cause the db to grow in the long term, it
might cause short-term/compaction issues (I'm not
familiar with rocksdb internals, just extrapolating from
experience with sstable/leveldb)
For the latter case there, Dashboard developers have been
looking for an efficient alternative to persistently store
rapidly-changing data. We discarded the idea of using a pool
since the Dashboard should be able to operate prior to any
OSD provisioning and in case of storage downtimes
Coming back to your original questions, I understand that
there are two different issues at stake:
- Cluster log processing: currently mon via Paxos
(Do we really need Paxos ack for logs? Can we live with
some type of eventually-consistent/best-effort storage
here?)
- Cluster log storage: currently mon store db.
AFAIK this is the main issue, right?
From there, I see 2 possible paths:
- Keep cluster-wide logs as a Ceph concern:
- IMHO putting some throttling in place should be a
must, since client-triggered cluster logs could easily
become a DoS vector.
- I wouldn't put them into a rados pool, not so much
for the data availability in case of OSD service
downtime (logs will still be recoverable
from logfiles), but as for the potential interference
with user workloads/deployment patterns (as Frank
mentioned before).
- Could we run the ".mgr" pool on a new type of
"internal/service-only" colocated OSDs (memstore)?
- Save logs to a fixed-size/TTL-bound priority or
multi-level queue structure?
- Add some (eventually-consistent) store db to the
ceph-mgr?
- To solve ceph-mgr scalability issues, we recently
added a new kind of Ceph utility daemon
(ceph-exporter) whose sole purpose is to fetch metrics
from co-located Ceph daemon's perf-counters and make
those available for Prometheus scraping. We could
think about a similar thing but for logs... (although
it'd be very similar to the Loki approach below).
- Move them outside Ceph:
- Cephadm + Dashboard now
support Centralized Logging via Loki + Promtail,
which basically polls all daemon logfiles and sends
new log traces to a central service (Loki) where they
can be monitored/filtered in real-time.
- If we find the previous solution too bulky for
regular cluster monitoring, we could explore systemd-journal-remote/rsyslog/...
- The main downside of this approach is that it might
break the "ceph log" command (rados_monitor_log and
log events could still be watched I guess).