On Fri, Apr 9, 2021 at 8:39 PM Robert LeBlanc <robert(a)leblancnet.us> wrote:
On Fri, Apr 9, 2021 at 11:49 AM Dan van der Ster <dan(a)vanderster.com> wrote:
Thanks. I didn't see anything ultra obvious to me.
But I did notice the nearfull warnings so I wonder if this cluster is
churning through osdmaps? Did you see a large increase in inbound or
outbound network traffic on this mon following the upgrade?
Totally speculating here, but maybe there is an issue where you have
some old clients, which can't decode an incremental osdmap from a
nautilus mon, so the single mon is busy serving up these maps to the
clients.
Does the mon load decrease if you stop the osdmap churn?, e.g. by
setting norebalance if that is indeed ongoing.
Could you also share debug_ms = 1 for a minute of busy cpu mon?
Here are the new logs with the debug_ms=1 for a bit.
https://owncloud.leblancnet.us/owncloud/index.php/s/1hvtJo3s2oLPpWn
Something strange in this is there is one hammer client that is asking
for nearly a million incremental osdmaps, seemingly every 30s:
client.131831153 at 172.16.212.55 is asking for incrementals from
1170448..1987355 (see [1])
Can you try to evict/kill/block that client and see if your mon load drops?
-- dan
[1]
-43> 2021-04-09 13:12:37.032 7f50de246700 5
mon.sun-storemon01(a)0(leader).osd e1987341 send_incremental
[1170448..1987341] to client.131831153
2021-04-09 17:07:27.238 7f9fc83e4700 10 mon.sun-storemon01@0(leader)
e45 handle_subscribe
mon_subscribe({mdsmap=3914079+,monmap=0+,osdmap=1170448})
2021-04-09 17:07:27.238 7f9fc83e4700 10
mon.sun-storemon01(a)0(leader).osd e1987355 check_osdmap_sub
0x55e2e2133de0 next 1170448 (onetime)
2021-04-09 17:07:27.238 7f9fc83e4700 5
mon.sun-storemon01(a)0(leader).osd e1987355 send_incremental
[1170448..1987355] to client.131831153
2021-04-09 17:07:50.910 7f9fc83e4700 5 mon.sun-storemon01@0(leader)
e45 dispatch_op client.131831153 v1:172.16.212.55:0/527701465 is not
authenticated, dropping
mon_subscribe({mdsmap=3914079+,monmap=0+,osdmap=1170448})
2021-04-09 18:14:47.295 7f9fc83e4700 1 --
[v2:10.65.7.203:3300/0,v1:10.65.7.203:6789/0] <== client.131831153
v1:172.16.212.55:0/527701465 3 ====
mon_subscribe({mdsmap=3914127+,monmap=0+,osdmap=1170448}) ==== 85+0+0
(unknown 1413914345 0 0) 0x55e2dbc52c00 con 0x55e2e1cf5680
2021-04-09 18:15:17.006 7f9fc83e4700 1 --
[v2:10.65.7.203:3300/0,v1:10.65.7.203:6789/0] <== client.131831153
v1:172.16.212.55:0/527701465 2 ====
mon_subscribe({mdsmap=3914127+,monmap=0+,osdmap=1170448}) ==== 85+0+0
(unknown 1413914345 0 0) 0x55e2da565200 con 0x55e2df00a880
2021-04-09 18:15:17.278 7f9fc83e4700 1 --
[v2:10.65.7.203:3300/0,v1:10.65.7.203:6789/0] <== client.131831153
v1:172.16.212.55:0/527701465 3 ====
mon_subscribe({mdsmap=3914127+,monmap=0+,osdmap=1170448}) ==== 85+0+0
(unknown 1413914345 0 0) 0x55e2de443000 con 0x55e2ee3d8400