Re: PSA: sqlite3 databases now available for ceph-mgr modules

18 Jun 2021

Hi Kefu,

On Thu, Jun 17, 2021 at 9:24 PM kefu chai &lt;tchaikov(a)gmail.com&gt; wrote:
...

 On Wed, Jun 16, 2021 at 10:23 PM Patrick Donnelly &lt;pdonnell(a)redhat.com&gt; wrote:

 Introduced by [1] for Quincy release. This builds on work in [2] to
 add RADOS-backed sqlite3 support to Ceph (available in Pacific).

 The MgrModule API for accessing your module's database is introduced
 in [3]. An example of a module ("devicehealth") using the API can be
 seen in [4].

 Please let me know if you have any questions or feedback. 

 Hi Patrick,

 my concern is that, without carefully planning on the segmentation of
 the pool for storing the healthy data and the pools being monitored,
 we could interfere with the system being monitored by mutating its
 status.

 for instance, if a cluster is experiencing large-scale slow ops, and
 pumping lots of warning messages and/or structured performance related
 metrics, some mgr module might want to collect this information from
 the health monitoring subsystem, and persist them into the sqlite3
 database. but it is in turn backed by the same cluster. without
 carefully planning, the objects stored in .mgr pool could be mapped to
 the same set of OSDs and monitors which are suffering from the
 performance issue. in the worst case, this could in turn even worsen
 the situation. but to allocate dedicated OSDs and create a CRUSH map
 picking them just for the .mgr pool might be difficult or overkill
 from the maintainability point of view.

 we actually had the same issue when adding the cluster log back to OSD
 for recording the slow requests. the large amount of clog puts more
 burden on the shoulder of the monitors. if the slow requests is caused
 by monitor, these clogs actually in turn slow down the monitors
 further.

 shall we switch to a (local) backup sqlite backend if we identify a
 performance issue, and restore / backfill the records once the issue
 is resolved? 
Thanks for bringing this up. I think it would be reasonable to decide
this depending on what the mgr module is doing. For example, I think
devicehealth and snap_schedule are innocuous enough that we don't need
to give special consideration for the system potentially being under
load. Also these modules' mutations of the databases do not depend on
the cluster state, healthy or degraded. OTOH, a module that is
collecting large streams of data into the database might first ingest
that data into a local in-memory database and only backup [1] that
in-memory database to RADOS when the cluster is healthy. If the
database is very large then a backup would not be desirable as the
in-memory database would be too large. In that case I would suggest
streaming batch updates in large transactions.

What do you think?

[1] https://www.sqlite.org/backup.html

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

2024

2023

2022

2021

2020

2019

Re: PSA: sqlite3 databases now available for ceph-mgr modules