Hi,
I've been seeing a mon segfault in current master which can be consistently
tripped from a kernel CephFS mount attempt against a vstart cluster:
Thread 14 "msgr-worker-2" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f42403bf700 (LWP 92639)]
0x00007f4247d2b960 in __lll_unlock_elision () from /lib64/libpthread.so.0
(gdb) bt
#0 0x00007f4247d2b960 in __lll_unlock_elision () from /lib64/libpthread.so.0
#1 0x00007f424b1b4f4f in __gthread_mutex_unlock (__mutex=0x5613b7e0a2c8) at
/usr/include/c++/7/x86_64-suse-linux/bits/gthr-default.h:778
#2 0x00007f424b1baa8a in std::mutex::unlock (this=0x5613b7e0a2c8) at
/usr/include/c++/7/bits/std_mutex.h:121
#3 0x00007f424b5664d0 in ProtocolV1::open (this=0x5613b83a7800, reply=...,
authorizer_reply=...) at /home/david/ceph/src/msg/async/ProtocolV1.cc:2481
#4 0x00007f424b561e4a in ProtocolV1::handle_connect_message_2 (this=0x5613b83a7800) at
/home/david/ceph/src/msg/async/ProtocolV1.cc:2055
#5 0x00007f424b55fd1f in ProtocolV1::handle_connect_message_1 (this=0x5613b83a7800,
buffer=0x5613b8289000 "\332*\244j\270\217\001/\b", r=0)
at /home/david/ceph/src/msg/async/ProtocolV1.cc:1915
(rest in
https://paste.opensuse.org/43958295 )
Git bisect points at the following commit as the culprit:
c48a29b9edde3c6d3c msg/async: do not register lossy client connections
I'll raise a ticket to track this, but just thought I'd ping the list to
see whether others were hitting it...
Cheers, David