Re: Moving cluster log storage from monstore db

23 Mar 2023

Hi Prashant, et. al.,

separating the logs from the DB might be a good thing.

I would second what Frank suggested: local storage. Local to the mon 
instances hosts, perhaps just saying that flash is required which 
shouldn't be an issue nowadays. This would also give the best latency to 
avoid starvation on IOPS in case of the disaster. With redundancy in the 
instances, data is available, at least from one of the mon instance 
hosts. Relying on pools would assume that communication is intact even 
between the actors of the pool. An exclusive pool for just this only 
would still depend on the network connection and introducing additional 
latency, too.

The other alternatives sound promising as well, however, I would like to 
raise some concerns.

Pushing the logs only to a central location would impose a dependency on 
this location in case of a disaster. A disaster could be also in 
conjunction with a network issue affecting the connection to outside 
world. So, might be an add-on but for troubleshooting rather some kind 
of additional challenge.

Eventually consistent distribution of data might be hard for 
troubleshooting. The basic assumption would be that the logs aren't that 
important to be available in full in some of the places, as in the 
different mon instance hosts. Eventual consistency also would add 
another level of trouble to troubleshoot in conjunction with a disaster. 
Those interconnection requirements may be void or at least the service 
may be at limited availability that might not help to get the data into 
the place just in need.

Kind regards,
-matt

On 22.03.23 14:10, Ernesto Puerta wrote:
...
  Hi Prashant,

 Is this move just limited to the impact of the cluster log in the mon 
 store db or is it part of a larger mon db clean-up effort?

 I'm asking this because, besides de cluster log, the mon store db is 
 currently used (and perhaps abused) also by some mgr modules via:

   * set_module_option()

<https://docs.ceph.com/en/quincy/mgr/modules/#mgr_module.MgrModule.set_module_option> to
     set MODULE_OPTIONS values via CLI commands.
   * set_store()
     <https://docs.ceph.com/en/quincy/mgr/modules/#mgr_module.MgrModule.set_store>:
     there are 2 main storage use cases here:
       o *Immutable/sensitive data*: instead of exposing those as
         MODULE_OPTIONS (password hashes, private certificates, API
         keys, etc.),
       o *Changing data*: mgr-module internal state. While this
         shouldn't cause the db to grow in the long term, it might
         cause short-term/compaction issues (I'm not familiar
         with rocksdb internals, just extrapolating from experience
         with sstable/leveldb)

 For the latter case there, Dashboard developers have been looking for 
 an efficient alternative to persistently store rapidly-changing data. 
 We discarded the idea of using a pool since the Dashboard should be 
 able to operate prior to any OSD provisioning and in case of storage 
 downtimes

 Coming back to your original questions, I understand that there are 
 two different issues at stake:

   * *Cluster log processing*: currently mon via Paxos (Do we really
     need Paxos ack for logs? Can we live with some type of
     eventually-consistent/best-effort storage here?)
   * *Cluster log storage*: currently mon store db. AFAIK this is the
     main issue, right?

 From there, I see 2 possible paths:

   * *Keep cluster-wide logs as a Ceph concern:*
       o IMHO putting some throttling in place should be a must, since
         client-triggered cluster logs could easily become a DoS vector.
       o I wouldn't put them into a rados pool, not so much for the
         data availability in case of OSD service downtime (logs will
         still be recoverable from logfiles), but as for the potential
         interference with user workloads/deployment patterns (as Frank
         mentioned before).
           + Could we run the ".mgr" pool on a new type of
             "internal/service-only" colocated OSDs (memstore)?
       o Save logs to a fixed-size/TTL-bound priority or multi-level
         queue structure?
       o Add some (eventually-consistent) store db to the ceph-mgr?
       o To solve ceph-mgr scalability issues, we recently added a new
         kind of Ceph utility daemon (ceph-exporter) whose sole purpose
         is to fetch metrics from co-located Ceph daemon's
         perf-counters and make those available for Prometheus
         scraping. We could think about a similar thing but for logs...
         (although it'd be very similar to the Loki approach below).
   * *Move them outside Ceph:*
       o Cephadm + Dashboard now support Centralized Logging via Loki +
         Promtail
         <https://ceph.io/en/news/blog/2022/centralized_logging/>,
         which basically polls all daemon logfiles and sends new log
         traces to a central service (Loki) where they can be
         monitored/filtered in real-time.
           + If we find the previous solution too bulky for regular
             cluster monitoring, we could explore
             systemd-journal-remote

<https://www.freedesktop.org/software/systemd/man/systemd-journal-remote.service.html>/rsyslog/...
       o The main downside of this approach is that it might break the
         "ceph log" command (rados_monitor_log and log events could
         still be watched I guess).

 Kind Regards,
 Ernesto

 On Wed, Mar 22, 2023 at 11:12 AM Janne Johansson &lt;icepic.dz(a)gmail.com&gt; 
 wrote:

  2) .mgr pool

 2.1) I have become really tired of these administrative pools      that are created
on the fly without any regards to device classes,
     available capacity, PG allocation and the like. The first one that
     showed up without warning was device_health_metrics, which turned
     the cluster health_err right away because the on-the-fly pool
     creation is, well, not exactly smart.

 We don't even have drives below the default root. We have a lot      of
different pools on different (custom!) device classes with
     different replication schemes to accommodate a large variety of
     use cases. Administrative pools showing up randomly somewhere in
     the tree are a real pain. There are ceph-user cases where people
     deleted and recreated it only to make the device health module
     useless, because it seems to store the pool ID and there is no way
     to tell it to use the new pool.

     Ah, that's why it looked unused after I also had to remake it. Since
     it gets created when you don't have the OSDs yet, the possibilities
     for it ending up wrong seem very large.

     -- 
     May the most significant bit of your life be positive.
     _______________________________________________
     Dev mailing list -- dev(a)ceph.io
     To unsubscribe send an email to dev-leave(a)ceph.io

 _______________________________________________
 Dev mailing list --dev(a)ceph.io
 To unsubscribe send an email todev-leave(a)ceph.io 
-- 
——————————————————
Matthias Muench
Principal Specialist Solution Architect
EMEA Storage Specialist
matthias.muench(a)redhat.com
Phone: +49-160-92654111

Red Hat GmbH
Technopark II
Werner-von-Siemens-Ring 12
85630 Grasbrunn
Germany
_______________________________________________________________________
Red Hat GmbH, Registered seat: Werner von Siemens Ring 12, D-85630 Grasbrunn, Germany
Commercial register: Amtsgericht Muenchen/Munich, HRB 153243,
Managing Directors: Ryan Barnhart, Charles Cachera, Michael O'Neill, Amy Ross

2024

2023

2022

2021

2020

2019

Re: Moving cluster log storage from monstore db