Hi Kefu,
On Thu, Jun 17, 2021 at 9:24 PM kefu chai <tchaikov(a)gmail.com> wrote:
On Wed, Jun 16, 2021 at 10:23 PM Patrick Donnelly <pdonnell(a)redhat.com> wrote:
Introduced by [1] for Quincy release. This builds on work in [2] to
add RADOS-backed sqlite3 support to Ceph (available in Pacific).
The MgrModule API for accessing your module's database is introduced
in [3]. An example of a module ("devicehealth") using the API can be
seen in [4].
Please let me know if you have any questions or feedback.
Hi Patrick,
my concern is that, without carefully planning on the segmentation of
the pool for storing the healthy data and the pools being monitored,
we could interfere with the system being monitored by mutating its
status.
for instance, if a cluster is experiencing large-scale slow ops, and
pumping lots of warning messages and/or structured performance related
metrics, some mgr module might want to collect this information from
the health monitoring subsystem, and persist them into the sqlite3
database. but it is in turn backed by the same cluster. without
carefully planning, the objects stored in .mgr pool could be mapped to
the same set of OSDs and monitors which are suffering from the
performance issue. in the worst case, this could in turn even worsen
the situation. but to allocate dedicated OSDs and create a CRUSH map
picking them just for the .mgr pool might be difficult or overkill
from the maintainability point of view.
we actually had the same issue when adding the cluster log back to OSD
for recording the slow requests. the large amount of clog puts more
burden on the shoulder of the monitors. if the slow requests is caused
by monitor, these clogs actually in turn slow down the monitors
further.
shall we switch to a (local) backup sqlite backend if we identify a
performance issue, and restore / backfill the records once the issue
is resolved?
Thanks for bringing this up. I think it would be reasonable to decide
this depending on what the mgr module is doing. For example, I think
devicehealth and snap_schedule are innocuous enough that we don't need
to give special consideration for the system potentially being under
load. Also these modules' mutations of the databases do not depend on
the cluster state, healthy or degraded. OTOH, a module that is
collecting large streams of data into the database might first ingest
that data into a local in-memory database and only backup [1] that
in-memory database to RADOS when the cluster is healthy. If the
database is very large then a backup would not be desirable as the
in-memory database would be too large. In that case I would suggest
streaming batch updates in large transactions.
What do you think?
[1]
https://www.sqlite.org/backup.html
--
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D