On Thu, Dec 12, 2019 at 11:46:19PM +0000, Bryan Stillwell <bstillwell(a)godaddy.com>
wrote:
On our test cluster after upgrading to 14.2.5 I'm
having problems with the mons pegging a CPU core while moving data around. I'm
currently converting the OSDs from FileStore to BlueStore by marking the OSDs out in
multiple nodes, destroying the OSDs, and then recreating them with ceph-volume lvm batch.
This seems too get the ceph-mon process into a state where it pegs a CPU core on one of
the mons:
I had a similar issue in a partially virtualized test cluster with
14.2.4 when I removed many OSDs. I resovled it either by adding more RAM
to the MON VMs or by restarting the MONs multiple times. The VM changed
from 2gb -> 4gb. ceph-mon in this small test cluster normally uses
around 100-200mb; during the issue it used (at least) 1.7gb for a short
time. I can't tell you exactly which of the two resolved it since adding
more RAM to the VM required a reboot and thus a restart of the affected
process.
Also my cluster was unable to elect a MGR and it was pretty much dead
until the MONs recovered again. I'm not sure if that is the case with
your cluster as well.
HTH,
Florian