Hi guys,

I am running red hat ceph (basically luminous - ceph version 12.2.12-48.el7cp (26388d73d88602005946d4381cc5796d42904858)) and am seeing something similar on our test cluster.

One of the mons is running at around 300% cpu non stop. It doesn't seem to be the lead mon or one in particular, but the cpu load shifts to another mon if high load mon is restarted.
I thought it might be related to this thread since it seems to have started happening when removing and adding a lot of OSDs. In fact I have removed and added several times all the OSDs in the cluster, and mons have been restarted several times but the load persists.

At debug_mon 20/5, I see endless lines of this which seems to be to do with the osdmap:

2019-12-17 11:59:47.916098 7f27dfba1700 10 mon.mon1@1(peon) e4 handle_get_version mon_get_version(what=osdmap handle=2874836684) v1
2019-12-17 11:59:47.916139 7f27dfba1700 20 mon.mon1@1(peon) e4 _ms_dispatch existing session 0x55ab61fb6300 for client.27824428 10.0.0.2:0/461841538
2019-12-17 11:59:47.916146 7f27dfba1700 20 mon.mon1@1(peon) e4  caps allow *
2019-12-17 11:59:47.916149 7f27dfba1700 20 is_capable service=mon command= read on cap allow *
2019-12-17 11:59:47.916151 7f27dfba1700 20  allow so far , doing grant allow *
2019-12-17 11:59:47.916152 7f27dfba1700 20  allow all
2019-12-17 11:59:47.916153 7f27dfba1700 10 mon.mon1@1(peon) e4 handle_get_version mon_get_version(what=osdmap handle=2871621985) v1
2019-12-17 11:59:47.916203 7f27dfba1700 20 mon.mon1@1(peon) e4 _ms_dispatch existing session 0x55ab61d7c780 for client.27824430 10.0.0.2:0/898487246
2019-12-17 11:59:47.916210 7f27dfba1700 20 mon.mon1@1(peon) e4  caps allow *
2019-12-17 11:59:47.916213 7f27dfba1700 20 is_capable service=mon command= read on cap allow *
2019-12-17 11:59:47.916215 7f27dfba1700 20  allow so far , doing grant allow *
2019-12-17 11:59:47.916216 7f27dfba1700 20  allow all
2019-12-17 11:59:47.916217 7f27dfba1700 10 mon.mon1@1(peon) e4 handle_get_version mon_get_version(what=osdmap handle=2882637609) v1
2019-12-17 11:59:47.916254 7f27dfba1700 20 mon.mon1@1(peon) e4 _ms_dispatch existing session 0x55ab62649c80 for client.27824431 10.0.0.2:0/972633098
2019-12-17 11:59:47.916262 7f27dfba1700 20 mon.mon1@1(peon) e4  caps allow *
2019-12-17 11:59:47.916266 7f27dfba1700 20 is_capable service=mon command= read on cap allow *
2019-12-17 11:59:47.916268 7f27dfba1700 20  allow so far , doing grant allow *
2019-12-17 11:59:47.916269 7f27dfba1700 20  allow all

Continuing to investigate.

Raf

On Tue, 17 Dec 2019 at 11:53, Sasha Litvak <alexander.v.litvak@gmail.com> wrote:
Bryan, thank you.  We are about to start testing 14.2.2 -> 14.2.5 upgrade, so folks here are a bit cautious :-)  We don't need to convert but may have to rebuild few disks after an upgrade.

On Mon, Dec 16, 2019 at 3:57 PM Bryan Stillwell <bstillwell@godaddy.com> wrote:
Sasha,

I was able to get past it by restarting the ceph-mon processes every time it got stuck, but that's not a very good solution for a production cluster.

Right now I'm trying to narrow down what is causing the problem.  Rebuilding the OSDs with BlueStore doesn't seem to be enough.  I believe it could be related to us using the extra space on the journal device as an SSD-based OSD.  During the conversion process I'm removing this SSD-based OSD (since with BlueStore the omap data is ending up on the SSD anyways), and I'm suspecting it might be causing this problem.

Bryan

On Dec 14, 2019, at 10:27 AM, Sasha Litvak <alexander.v.litvak@gmail.com> wrote:

Notice: This email is from an external sender. 
 
Bryan,

Were you able to resolve this?  If yes, can you please share with the list?

On Fri, Dec 13, 2019 at 10:08 AM Bryan Stillwell <bstillwell@godaddy.com> wrote:
Adding the dev list since it seems like a bug in 14.2.5.

I was able to capture the output from perf top:

  21.58%  libceph-common.so.0               [.] ceph::buffer::v14_2_0::list::append
  20.90%  libstdc++.so.6.0.19               [.] std::getline<char, std::char_traits<char>, std::allocator<char> >
  13.25%  libceph-common.so.0               [.] ceph::buffer::v14_2_0::list::append
  10.11%  libstdc++.so.6.0.19               [.] std::istream::sentry::sentry
   8.94%  libstdc++.so.6.0.19               [.] std::basic_ios<char, std::char_traits<char> >::clear
   3.24%  libceph-common.so.0               [.] ceph::buffer::v14_2_0::ptr::unused_tail_length
   1.69%  libceph-common.so.0               [.] std::getline<char, std::char_traits<char>, std::allocator<char> >@plt
   1.63%  libstdc++.so.6.0.19               [.] std::istream::sentry::sentry@plt
   1.21%  [kernel]                          [k] __do_softirq
   0.77%  libpython2.7.so.1.0               [.] PyEval_EvalFrameEx
   0.55%  [kernel]                          [k] _raw_spin_unlock_irqrestore

I increased mon debugging to 20 and nothing stuck out to me.

Bryan

> On Dec 12, 2019, at 4:46 PM, Bryan Stillwell <bstillwell@godaddy.com> wrote:
> 
> On our test cluster after upgrading to 14.2.5 I'm having problems with the mons pegging a CPU core while moving data around.  I'm currently converting the OSDs from FileStore to BlueStore by marking the OSDs out in multiple nodes, destroying the OSDs, and then recreating them with ceph-volume lvm batch.  This seems too get the ceph-mon process into a state where it pegs a CPU core on one of the mons:
> 
> 1764450 ceph      20   0 4802412   2.1g  16980 S 100.0 28.1   4:54.72 ceph-mon
> 
> Has anyone else run into this with 14.2.5 yet?  I didn't see this problem while the cluster was running 14.2.4.
> 
> Thanks,
> Bryan
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-leave@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-leave@ceph.io


--
Rafael Lopez
Research Devops Engineer
Monash University eResearch Centre

E: rafael.lopez@monash.edu