Re: Moving cluster log storage from monstore db

28 Mar 2023

Hi Ernesto,

Thanks for your valuable inputs. Kindly find my answers inline below.

On Wed, Mar 22, 2023 at 6:11 AM Ernesto Puerta &lt;epuertat(a)redhat.com&gt; wrote:

...
  Hi Prashant,

 Is this move just limited to the impact of the cluster log in the mon
 store db or is it part of a larger mon db clean-up effort?

Yes, it's limited to moving cluster logs from the monstore db.

...

 I'm asking this because, besides de cluster log, the mon store db is
 currently used (and perhaps abused) also by some mgr modules via:

    - set_module_option()

<https://docs.ceph.com/en/quincy/mgr/modules/#mgr_module.MgrModule.set_module_option>
to
    set MODULE_OPTIONS values via CLI commands.
    - set_store()
    <https://docs.ceph.com/en/quincy/mgr/modules/#mgr_module.MgrModule.set_store>:
    there are 2 main storage use cases here:
       - *Immutable/sensitive data*: instead of exposing those as
       MODULE_OPTIONS (password hashes, private certificates, API keys, etc.),
       - *Changing data*: mgr-module internal state. While this shouldn't
       cause the db to grow in the long term, it might cause short-term/compaction
       issues (I'm not familiar with rocksdb internals, just extrapolating from
       experience with sstable/leveldb)

 The config related information stored in db should not be a problem here. We are
only concerned about logm entries in the event of health error that
too when logm entries are not getting trimmed.

...
  For the latter case there, Dashboard developers have
been looking for an
 efficient alternative to persistently store rapidly-changing data. We
 discarded the idea of using a pool since the Dashboard should be able to
 operate prior to any OSD provisioning and in case of storage downtimes

 Coming back to your original questions, I understand that there are two
 different issues at stake:

    - *Cluster log processing*: currently mon via Paxos (Do we really need
    Paxos ack for logs? Can we live with some type of
    eventually-consistent/best-effort storage here?)

 Yes, we need paxos ack for logm. The logm entries gets written to monstore on paxos
proposal and gets written to cluster log on update from paxos.
Yes, we are working on different approaches and one of them is to write to
the dedicated pool.

...
     - *Cluster log storage*: currently mon store db.
AFAIK this is the
    main issue, right?

 Yes, that's right. 

...
  From there, I see 2 possible paths:

    - *Keep cluster-wide logs as a Ceph concern:*
       - IMHO putting some throttling in place should be a must, since
       client-triggered cluster logs could easily become a DoS vector.
       - I wouldn't put them into a rados pool, not so much for the data
       availability in case of OSD service downtime (logs will still
       be recoverable from logfiles), but as for the potential interference with
       user workloads/deployment patterns (as Frank mentioned before).
          - Could we run the ".mgr" pool on a new type of
          "internal/service-only" colocated OSDs (memstore)?
       - Save logs to a fixed-size/TTL-bound priority or multi-level queue
       structure?
       - Add some (eventually-consistent) store db to the ceph-mgr?
       - To solve ceph-mgr scalability issues, we recently added a new
       kind of Ceph utility daemon (ceph-exporter) whose sole purpose is to fetch
       metrics from co-located Ceph daemon's perf-counters and make those
       available for Prometheus scraping. We could think about a similar thing but
       for logs... (although it'd be very similar to the Loki approach below).
    - *Move them outside Ceph:*
    - Cephadm + Dashboard now support Centralized Logging via Loki +
       Promtail <https://ceph.io/en/news/blog/2022/centralized_logging/>,
       which basically polls all daemon logfiles and sends new log traces to a
       central service (Loki) where they can be monitored/filtered in real-time.
          - If we find the previous solution too bulky for regular cluster
          monitoring, we could explore systemd-journal-remote

<https://www.freedesktop.org/software/systemd/man/systemd-journal-remote.service.html>
          /rsyslog/...
       - The main downside of this approach is that it might break the
       "ceph log" command (rados_monitor_log and log events could still be
watched
       I guess).

 This is really helpful. Let me explore these paths. If required, we will
propose a meeting with a wider audience to discuss this further.

...
  Kind Regards,
 Ernesto

Regards,
Prashant

...

 On Wed, Mar 22, 2023 at 11:12 AM Janne Johansson &lt;icepic.dz(a)gmail.com&gt;
 wrote:

> > 2) .mgr pool
> >
> > 2.1) I have become really tired of these administrative pools that are
> created on the fly without any regards to device classes, available
> capacity, PG allocation and the like. The first one that showed up without
> warning was device_health_metrics, which turned the cluster health_err
> right away because the on-the-fly pool creation is, well, not exactly smart.
> >
> > We don't even have drives below the default root. We have a lot of
> different pools on different (custom!) device classes with different
> replication schemes to accommodate a large variety of use cases.
> Administrative pools showing up randomly somewhere in the tree are a real
> pain. There are ceph-user cases where people deleted and recreated it only
> to make the device health module useless, because it seems to store the
> pool ID and there is no way to tell it to use the new pool.
> >
>
> Ah, that's why it looked unused after I also had to remake it. Since
> it gets created when you don't have the OSDs yet, the possibilities
> for it ending up wrong seem very large.
>
> --
> May the most significant bit of your life be positive.
> _______________________________________________
> Dev mailing list -- dev(a)ceph.io
> To unsubscribe send an email to dev-leave(a)ceph.io
>
> 

2024

2023

2022

2021

2020

2019

Re: Moving cluster log storage from monstore db