On Sat, Apr 10, 2021 at 10:11 AM Robert LeBlanc <robert(a)leblancnet.us> wrote:
On Fri, Apr 9, 2021 at 4:04 PM Dan van der Ster <dan(a)vanderster.com> wrote:
Here's what you should look for, with debug_mon=10. It shows clearly
that it takes the mon 23 seconds to run through
get_removed_snaps_range.
So if this is happening every 30s, it explains at least part of why
this mon is busy.
2021-04-09 17:07:27.238 7f9fc83e4700 10 mon.sun-storemon01@0(leader)
e45 handle_subscribe
mon_subscribe({mdsmap=3914079+,monmap=0+,osdmap=1170448})
2021-04-09 17:07:27.238 7f9fc83e4700 10
mon.sun-storemon01(a)0(leader).osd e1987355 check_osdmap_sub
0x55e2e2133de0 next 1170448 (onetime)
2021-04-09 17:07:27.238 7f9fc83e4700 5
mon.sun-storemon01(a)0(leader).osd e1987355 send_incremental
[1170448..1987355] to client.131831153
2021-04-09 17:07:28.590 7f9fc83e4700 10
mon.sun-storemon01(a)0(leader).osd e1987355 get_removed_snaps_range 0
[1~3]
2021-04-09 17:07:29.898 7f9fc83e4700 10
mon.sun-storemon01(a)0(leader).osd e1987355 get_removed_snaps_range 5 []
2021-04-09 17:07:31.258 7f9fc83e4700 10
mon.sun-storemon01(a)0(leader).osd e1987355 get_removed_snaps_range 6 []
2021-04-09 17:07:32.562 7f9fc83e4700 10
mon.sun-storemon01(a)0(leader).osd e1987355 get_removed_snaps_range 20
[]
2021-04-09 17:07:33.866 7f9fc83e4700 10
mon.sun-storemon01(a)0(leader).osd e1987355 get_removed_snaps_range 21
[]
2021-04-09 17:07:35.162 7f9fc83e4700 10
mon.sun-storemon01(a)0(leader).osd e1987355 get_removed_snaps_range 22
[]
2021-04-09 17:07:36.470 7f9fc83e4700 10
mon.sun-storemon01(a)0(leader).osd e1987355 get_removed_snaps_range 23
[]
2021-04-09 17:07:37.778 7f9fc83e4700 10
mon.sun-storemon01(a)0(leader).osd e1987355 get_removed_snaps_range 24
[]
2021-04-09 17:07:39.090 7f9fc83e4700 10
mon.sun-storemon01(a)0(leader).osd e1987355 get_removed_snaps_range 25
[]
2021-04-09 17:07:40.398 7f9fc83e4700 10
mon.sun-storemon01(a)0(leader).osd e1987355 get_removed_snaps_range 26
[]
2021-04-09 17:07:41.706 7f9fc83e4700 10
mon.sun-storemon01(a)0(leader).osd e1987355 get_removed_snaps_range 27
[]
2021-04-09 17:07:43.006 7f9fc83e4700 10
mon.sun-storemon01(a)0(leader).osd e1987355 get_removed_snaps_range 28
[]
2021-04-09 17:07:44.322 7f9fc83e4700 10
mon.sun-storemon01(a)0(leader).osd e1987355 get_removed_snaps_range 29
[]
2021-04-09 17:07:45.630 7f9fc83e4700 10
mon.sun-storemon01(a)0(leader).osd e1987355 get_removed_snaps_range 30
[]
2021-04-09 17:07:46.938 7f9fc83e4700 10
mon.sun-storemon01(a)0(leader).osd e1987355 get_removed_snaps_range 31
[]
2021-04-09 17:07:48.246 7f9fc83e4700 10
mon.sun-storemon01(a)0(leader).osd e1987355 get_removed_snaps_range 32
[]
2021-04-09 17:07:49.562 7f9fc83e4700 10
mon.sun-storemon01(a)0(leader).osd e1987355 get_removed_snaps_range 34
[]
2021-04-09 17:07:50.862 7f9fc83e4700 10
mon.sun-storemon01(a)0(leader).osd e1987355 get_removed_snaps_range 35
[]
2021-04-09 17:07:50.862 7f9fc83e4700 20
mon.sun-storemon01(a)0(leader).osd e1987355 send_incremental starting
with base full 1986745 664086 bytes
2021-04-09 17:07:50.862 7f9fc83e4700 10
mon.sun-storemon01(a)0(leader).osd e1987355 build_incremental
[1986746..1986785] with features 107b84a842aca
So have a look for that client again or other similar traces.
So, even though I blacklisted the client and we remounted the file
system on it, it wasn't enough for it to keep performing the same bad
requests. We found another node that had two sessions to the same
mount point. We rebooted both nodes and the CPU is now back at a
reasonable 4-6% and the cluster is running at full performance again.
I've added in back both MONs to have all 3 mons in the system and
there are no more elections. Thank you for helping us track down the
bad clients out of over 2,000 clients.
Maybe if
that code path isn't needed in Nautilus it can be removed in
the next point release?
I think there were other major changes in this area that might make
such a backport difficult. And we should expect nautilus to be nearing
its end...
But ... we just got to Nautilus... :)
Thank you,
Robert LeBlanc
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io