[ceph-users] Re: Monitor leveldb growing without bound v14.2.16

3 Mar 2021

Slow mon sync can be caused by too large mon_sync_max_payload_size. The default is usually
way too high. I had sync problems until I set

mon_sync_max_payload_size = 4096

Since then mon sync is not an issue any more.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Peter Woodman &lt;peter(a)shortbus.org&gt;
Sent: 03 March 2021 06:26:47
To: Lincoln Bryant
Cc: ceph-users
Subject: [ceph-users] Re: Monitor leveldb growing without bound v14.2.16

is the ceph insights plugin enabled? this caused huge huge bloat of the mon
stores for me. before i figured that out, i turned on leveldb compression
options on the mon store and got pretty significant savings, also.

On Tue, Mar 2, 2021 at 6:56 PM Lincoln Bryant &lt;lincolnb(a)uchicago.edu&gt; wrote:

...
  Hi list,

 We recently had a cluster outage over the weekend where several OSDs were
 inaccessible over night for several hours. When I found the cluster in the
 morning, the monitors' root disks (which contained both the monitor's
 leveldb and the Ceph logs) had completely filled.

 After restarting OSDs, cleaning out the monitors' logs, moving
 /var/lib/ceph to dedicated disks on the mons, and starting recovery (in
 which there was 1 unfound object that I marked lost, if that has any
 relevancy), the leveldb continued/continues to grow without bound. The
 cluster has all PGs in active+clean at this point, yet I'm accumulating
 what seems like approximately ~1GB/hr of new leveldb data.

 Two of the monitors (a, c) are in quorum, while the third (b) has been
 synchronizing for the last several hours, but doesn't seem to be able to
 catch up. Mon 'b' has been running for 4 hours now in the
'synchronizing'
 state. The mon's log has many messages about compacting and deleting files,
 yet we never exit the synchronization state.

 The ceph.log is also rapidly accumulating complaints that the mons are
 slow (not surprising, I suppose, since the levelDBs are ~100GB at this
 point).

 I've found that using monstore tool to do compaction on mons 'a' and
'c'
 thelps but is only a temporary fix. Soon the database inflates again and
 I'm back to where I started.

 Thoughts on how to proceed here? Some ideas I had:
    - Would it help to add some new monitors that use RocksDB?
    - Stop a monitor and dump the keys via monstoretool, just to get an
 idea of what's going on?
    - Increase mon_sync_max_payload_size to try to move data in larger
 chunks?
    - Drop down to a single monitor, and see if normal compaction triggers
 and stops growing unbounded?
    - Stop both 'a' and 'c', compact them, start them, and immediately
 start 'b' ?

 Appreciate any advice.

 Regards,
 Lincoln

 _______________________________________________
 ceph-users mailing list -- ceph-users(a)ceph.io
 To unsubscribe send an email to ceph-users-leave(a)ceph.io
 _______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io

2024

2023

2022

2021

2020

2019

[ceph-users] Re: Monitor leveldb growing without bound v14.2.16