Hi all,
Thanks for the responses.
I stopped the monitor that wasn't syncing and dumped keys with the monstoretool. The
keys seemed to mostly be of type 'logm' which I guess matches up with the huge
amount of log messages I was getting about slow ops. I tried injecting
clog_to_monitor=false along the way but it did not help in my case.
I ended up doing a rolling restart of the whole cluster, which must have cleared whatever
was blocking things because the monitors automatically compacted and 'b' rejoined
the quorum about 75% of the way through.
Thanks,
Lincoln
________________________________
From: Wido den Hollander <wido(a)42on.com>
Sent: Wednesday, March 3, 2021 2:03 AM
To: Lincoln Bryant <lincolnb(a)uchicago.edu>du>; ceph-users <ceph-users(a)ceph.io>
Subject: Re: [ceph-users] Monitor leveldb growing without bound v14.2.16
On 03/03/2021 00:55, Lincoln Bryant wrote:
Hi list,
We recently had a cluster outage over the weekend where several OSDs were inaccessible
over night for several hours. When I found the cluster in the morning, the monitors'
root disks (which contained both the monitor's leveldb and the Ceph logs) had
completely filled.
After restarting OSDs, cleaning out the monitors' logs, moving /var/lib/ceph to
dedicated disks on the mons, and starting recovery (in which there was 1 unfound object
that I marked lost, if that has any relevancy), the leveldb continued/continues to grow
without bound. The cluster has all PGs in active+clean at this point, yet I'm
accumulating what seems like approximately ~1GB/hr of new leveldb data.
Two of the monitors (a, c) are in quorum, while the third (b) has been synchronizing for
the last several hours, but doesn't seem to be able to catch up. Mon 'b' has
been running for 4 hours now in the 'synchronizing' state. The mon's log has
many messages about compacting and deleting files, yet we never exit the synchronization
state.
The ceph.log is also rapidly accumulating complaints that the mons are slow (not
surprising, I suppose, since the levelDBs are ~100GB at this point).
I've found that using monstore tool to do compaction on mons 'a' and
'c' thelps but is only a temporary fix. Soon the database inflates again and
I'm back to where I started.
Are all the PGs in the active+clean state? I don't assume so? This will
cause the MONs to keep a large history of OSDMaps in their DB and thus
it will keep growing.
Thoughts on how to proceed here? Some ideas I had:
- Would it help to add some new monitors that use RocksDB?
They would need to sync which can take a lot of time. Moving to RocksDB
is a good idea when this is all fixed.
- Stop a monitor and dump the keys via
monstoretool, just to get an idea of what's going on?
- Increase mon_sync_max_payload_size to try to move data in larger chunks?
I would just try it.
- Drop down to a single monitor, and see if normal
compaction triggers and stops growing unbounded?
It will keep growing, the compact only works for a limited time. Make
sure the PGs become clean again.
In the meantime make sure you have enough disk space.
Wido
- Stop both 'a' and 'c', compact
them, start them, and immediately start 'b' ?
Appreciate any advice.
Regards,
Lincoln
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io