On Thu, May 14, 2020 at 3:12 AM Brad Hubbard <bhubbard(a)redhat.com> wrote:
+1 to needing the backtrace. I don't see any indications of a problem in
that log.
apport.log
Wed May 13 09:35:30 2020: host pid 4440 crashed in a separate mount
namespace,
ignoring
kernel.log
May 13 09:35:30 compute5 kernel: [123071.373217] fn-radosclient[4485]:
segfault at
0 ip 00007f4c8c85d7ed sp 00007f4c66ffc470 error 4 in
librbd.so.1.12.0[7f4c8c65a000+5cb000]
May 13 09:35:30 compute5 kernel: [123071.373228]
Code: 8d 44 24 08 48 81
c3 d8 3e 00 00 49 21 f9 48 c1 e8 30 83 c0 01 48 c1 e0 30 48
89 02 48 8b 03
48 89 04 24 48 8b 34 24 48 21 fe <48> 8b 06 48 89 44 24 08 48 8b 44 24 08
48 8b 0b 48 21 f8 48 39 0c
May 13 09:35:33 compute5 kernel: [123074.832700]
brqa72d845b-e9: port
1(tap33511c4d-2c) entered disabled state
May 13 09:35:33 compute5 kernel: [123074.838520]
device tap33511c4d-2c
left promiscuous mode
May 13 09:35:33 compute5 kernel: [123074.838527]
brqa72d845b-e9: port
1(tap33511c4d-2c) entered disabled state
syslog
compute5 kernel: [123071.373217] fn-radosclient[4485]: segfault at 0 ip
00007f4c8c85d7ed sp 00007f4c66ffc470 error 4 i
> n librbd.so.1.12.0[7f4c8c65a000+5cb000]
May 13 09:35:30 compute5 kernel: [123071.373228]
Code: 8d 44 24 08 48 81
c3 d8 3e 00 00 49 21 f9 48 c1 e8 30 83 c0 01 48 c1 e0 30 48
8
9 02 48 8b 03 48 89 04 24 48 8b 34 24 48 21 fe
<48> 8b 06 48 89 44 24 08
48 8b 44 24 08 48 8b 0b 48 21 f8 48 39 0c
May 13 09:35:30 compute5 libvirtd[1844]: internal
error: End of file
from qemu monitor
May 13 09:35:33 compute5 systemd-networkd[1326]:
tap33511c4d-2c: Link
DOWN
May 13 09:35:33 compute5 systemd-networkd[1326]:
tap33511c4d-2c: Lost
carrier
May 13 09:35:33 compute5 kernel: [123074.832700]
brqa72d845b-e9: port
1(tap33511c4d-2c) entered disabled state
May 13 09:35:33 compute5 kernel: [123074.838520]
device tap33511c4d-2c
left promiscuous mode
May 13 09:35:33 compute5 kernel: [123074.838527]
brqa72d845b-e9: port
1(tap33511c4d-2c) entered disabled state
May 13 09:35:33 compute5
networkd-dispatcher[1614]: Failed to request
link: No such device
On Fri, May 8, 2020 at 5:40 AM Brad Hubbard <bhubbard(a)redhat.com> wrote:
>
> On Fri, May 8, 2020 at 12:10 PM Lomayani S. Laizer <lomlaizer(a)gmail.com>
wrote:
> >
> > Hello,
> > On my side at point of vm crash these are logs below. At the moment
my
debug is at 10 value. I will rise to 20 for full debug. these crashes
are random and so far happens on very busy vms. Downgrading clients in host
to Nautilus these crashes disappear
>
> You could try adding debug_rados as well but you may get a very large
> log so keep an eye on things.
>
> >
> > Qemu is not shutting down in general because other vms on the same
host
continues working
>
> A process can not reliably continue after encountering a segfault so
> the qemu-kvm process must be ending and therefore it should be
> possible to capture a coredump with the right configuration.
>
> In the following example, if you were to search for pid 6060 you would
> find it is no longer running.
> >> > [ 7682.233684] fn-radosclient[6060]: segfault at 2b19 ip
00007f8165cc0a50 sp 00007f81397f6490 error 4 in
librbd.so.1.12.0[7f8165ab4000+537000]
>
> Without a backtrace at a minimum it may be very difficult to work out
> what's going on with certainty. If you open a tracker for the issue
> though maybe one of the devs specialising in rbd may have some
> feedback.
>
> >
> > 2020-05-07T13:02:12.121+0300 7f88d57fa700 10 librbd::io::ReadResult:
0x7f88c80bfbf0 finish: got {} for [0,24576] bl 24576
> > 2020-05-07T13:02:12.193+0300
7f88d57fa700 10 librbd::io::ReadResult:
0x7f88c80f9330 finish: C_ObjectReadRequest:
r=0
> > 2020-05-07T13:02:12.193+0300
7f88d57fa700 10 librbd::io::ReadResult:
0x7f88c80f9330 finish: got {} for
[0,16384] bl 16384
> > 2020-05-07T13:02:28.694+0300
7f890ba90500 10 librbd::ImageState:
0x5569b5da9bb0 0x5569b5da9bb0
send_close_unlock
> > 2020-05-07T13:02:28.694+0300
7f890ba90500 10 librbd::ImageState:
0x5569b5da9bb0 0x5569b5da9bb0
send_close_unlock
> > 2020-05-07T13:02:28.694+0300
7f890ba90500 10
librbd::image::CloseRequest: 0x7f88c8175fd0
send_block_image_watcher
> > 2020-05-07T13:02:28.694+0300
7f890ba90500 10 librbd::ImageWatcher:
0x7f88c400dfe0 block_notifies
> > 2020-05-07T13:02:28.694+0300
7f890ba90500 5 librbd::Watcher:
0x7f88c400dfe0 block_notifies: blocked_count=1
> > 2020-05-07T13:02:28.694+0300
7f890ba90500 10
librbd::image::CloseRequest: 0x7f88c8175fd0
handle_block_image_watcher: r=0
> > 2020-05-07T13:02:28.694+0300
7f890ba90500 10
librbd::image::CloseRequest: 0x7f88c8175fd0
send_shut_down_update_watchers
> > 2020-05-07T13:02:28.694+0300
7f88d4ff9700 10
librbd::image::CloseRequest: 0x7f88c8175fd0
handle_shut_down_update_watchers: r=0
> > 2020-05-07T13:02:28.694+0300
7f88d4ff9700 10
librbd::image::CloseRequest: 0x7f88c8175fd0
send_shut_down_io_queue
> > 2020-05-07T13:02:28.694+0300
7f88d4ff9700 5
librbd::io::ImageRequestWQ: 0x7f88e8001570 shut_down: shut_down:
in_flight=0
> > 2020-05-07T13:02:28.694+0300
7f88d4ff9700 10
librbd::image::CloseRequest: 0x7f88c8175fd0
handle_shut_down_io_queue: r=0
> > 2020-05-07T13:02:28.694+0300
7f88d4ff9700 10
librbd::image::CloseRequest: 0x7f88c8175fd0
send_shut_down_exclusive_lock
> > 2020-05-07T13:02:28.694+0300
7f88d4ff9700 10 librbd::ExclusiveLock:
0x7f88c4011ba0 shut_down
> > 2020-05-07T13:02:28.694+0300
7f88d4ff9700 10 librbd::ManagedLock:
0x7f88c4011bb8 shut_down:
> > 2020-05-07T13:02:28.694+0300
7f88d4ff9700 10 librbd::ManagedLock:
0x7f88c4011bb8 send_shutdown:
> > 2020-05-07T13:02:28.694+0300
7f88d4ff9700 10 librbd::ManagedLock:
0x7f88c4011bb8 send_shutdown_release:
> > 2020-05-07T13:02:28.694+0300
7f88d4ff9700 10 librbd::ExclusiveLock:
0x7f88c4011ba0 pre_release_lock_handler
> > 2020-05-07T13:02:28.694+0300
7f88d4ff9700 10
librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020
send_cancel_op_requests:
> > 2020-05-07T13:02:28.694+0300
7f88d4ff9700 10
librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020
handle_cancel_op_requests: r=0
> > 2020-05-07T13:02:28.694+0300
7f88d4ff9700 10
librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020
send_block_writes:
> > 2020-05-07T13:02:28.698+0300
7f88d4ff9700 5
librbd::io::ImageRequestWQ: 0x7f88e8001570 block_writes:
0x5569b5e1ffd0,
num=1
> > 2020-05-07T13:02:28.698+0300
7f88d4ff9700 10
librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020
handle_block_writes: r=0
> > 2020-05-07T13:02:28.698+0300
7f88d4ff9700 10
librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020
send_wait_for_ops:
> > 2020-05-07T13:02:28.698+0300
7f88d4ff9700 10
librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020
handle_wait_for_ops:
> > 2020-05-07T13:02:28.698+0300
7f88d4ff9700 10
librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020
send_invalidate_cache:
> > 2020-05-07T13:02:28.698+0300
7f88d4ff9700 5
librbd::io::ObjectDispatcher: 0x5569b5dab700 invalidate_cache:
> > 2020-05-07T13:02:28.698+0300
7f88d4ff9700 10
librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020
handle_invalidate_cache: r=0
> > 2020-05-07T13:02:28.698+0300
7f88d4ff9700 10
librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020
send_flush_notifies:
> > 2020-05-07T13:02:28.698+0300
7f88d4ff9700 10
librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020
handle_flush_notifies:
> > 2020-05-07T13:02:28.698+0300
7f88d4ff9700 10
librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020
send_close_object_map:
> > 2020-05-07T13:02:28.698+0300
7f88d4ff9700 10
librbd::object_map::UnlockRequest: 0x7f88c807a450 send_unlock:
oid=rbd_object_map.2f18f2a67fad72
> > 2020-05-07T13:02:28.702+0300
7f88d57fa700 10
librbd::object_map::UnlockRequest: 0x7f88c807a450 handle_unlock:
r=0
> > 2020-05-07T13:02:28.702+0300
7f88d57fa700 10
librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020
handle_close_object_map: r=0
> > 2020-05-07T13:02:28.702+0300
7f88d57fa700 10
librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020
send_unlock:
> > 2020-05-07T13:02:28.702+0300
7f88d4ff9700 10 librbd::ManagedLock:
0x7f88c4011bb8 handle_shutdown_pre_release:
r=0
> > 2020-05-07T13:02:28.702+0300
7f88d4ff9700 10
librbd::managed_lock::ReleaseRequest: 0x7f88c80b68a0 send_unlock:
entity=client.58292796, cookie=auto 140225447738256
> > 2020-05-07T13:02:28.702+0300
7f88d57fa700 10
librbd::managed_lock::ReleaseRequest: 0x7f88c80b68a0 handle_unlock:
r=0
> > 2020-05-07T13:02:28.702+0300
7f88d4ff9700 10 librbd::ExclusiveLock:
0x7f88c4011ba0 post_release_lock_handler:
r=0 shutting_down=1
> > 2020-05-07T13:02:28.702+0300
7f88d4ff9700 5
librbd::io::ImageRequestWQ: 0x7f88e8001570 unblock_writes:
0x5569b5e1ffd0,
num=0
> > 2020-05-07T13:02:28.702+0300
7f88d4ff9700 10 librbd::ImageWatcher:
0x7f88c400dfe0 notify released lock
> > 2020-05-07T13:02:28.702+0300
7f88d4ff9700 10 librbd::ImageWatcher:
0x7f88c400dfe0 current lock owner: [0,0]
> > 2020-05-07T13:02:28.702+0300
7f88d4ff9700 10 librbd::ManagedLock:
0x7f88c4011bb8 handle_shutdown_post_release:
r=0
> > 2020-05-07T13:02:28.702+0300
7f88d4ff9700 10 librbd::ManagedLock:
0x7f88c4011bb8 wait_for_tracked_ops: r=0
> > 2020-05-07T13:02:28.702+0300
7f88d4ff9700 10 librbd::ManagedLock:
0x7f88c4011bb8 complete_shutdown: r=0
> > 2020-05-07T13:02:28.702+0300
7f88d4ff9700 10
librbd::image::CloseRequest: 0x7f88c8175fd0
handle_shut_down_exclusive_lock: r=0
> > 2020-05-07T13:02:28.702+0300
7f88d4ff9700 10
librbd::image::CloseRequest: 0x7f88c8175fd0
send_unregister_image_watcher
> > 2020-05-07T13:02:28.702+0300
7f88d4ff9700 10 librbd::ImageWatcher:
0x7f88c400dfe0 unregistering image watcher
> > 2020-05-07T13:02:28.702+0300
7f88d4ff9700 10 librbd::Watcher:
0x7f88c400dfe0 unregister_watch:
> > 2020-05-07T13:02:28.702+0300
7f88d57fa700 5 librbd::Watcher:
0x7f88c400dfe0 notifications_blocked: blocked=1
> > 2020-05-07T13:02:28.706+0300
7f88ceffd700 10
librbd::image::CloseRequest: 0x7f88c8175fd0
handle_unregister_image_watcher: r=0
> > 2020-05-07T13:02:28.706+0300
7f88ceffd700 10
librbd::image::CloseRequest: 0x7f88c8175fd0 send_flush_readahead
> > 2020-05-07T13:02:28.706+0300
7f88d4ff9700 10
librbd::image::CloseRequest: 0x7f88c8175fd0 handle_flush_readahead:
r=0
> > 2020-05-07T13:02:28.706+0300
7f88d4ff9700 10
librbd::image::CloseRequest: 0x7f88c8175fd0
send_shut_down_object_dispatcher
> > 2020-05-07T13:02:28.706+0300
7f88d4ff9700 5
librbd::io::ObjectDispatcher: 0x5569b5dab700 shut_down:
> > 2020-05-07T13:02:28.706+0300
7f88d4ff9700 5
librbd::io::ObjectDispatch: 0x5569b5ee8360 shut_down:
> > 2020-05-07T13:02:28.706+0300
7f88d4ff9700 5
librbd::io::SimpleSchedulerObjectDispatch: 0x7f88c4013ce0
shut_down:
> > 2020-05-07T13:02:28.706+0300
7f88d4ff9700 5
librbd::cache::WriteAroundObjectDispatch: 0x7f88c8003780
shut_down:
> > 2020-05-07T13:02:28.706+0300
7f88d4ff9700 10
librbd::image::CloseRequest: 0x7f88c8175fd0
handle_shut_down_object_dispatcher: r=0
> > 2020-05-07T13:02:28.706+0300
7f88d4ff9700 10
librbd::image::CloseRequest: 0x7f88c8175fd0
send_flush_op_work_queue
> > 2020-05-07T13:02:28.706+0300
7f88d4ff9700 10
librbd::image::CloseRequest: 0x7f88c8175fd0
handle_flush_op_work_queue: r=0
> > 2020-05-07T13:02:28.706+0300
7f88d4ff9700 10
librbd::image::CloseRequest: 0x7f88c8175fd0
handle_flush_image_watcher: r=0
> > 2020-05-07T13:02:28.706+0300
7f88d4ff9700 10 librbd::ImageState:
0x5569b5da9bb0 0x5569b5da9bb0 handle_close:
r=0
> >
> > On Fri, May 8, 2020 at 12:40 AM Brad Hubbard <bhubbard(a)redhat.com>
wrote:
> >>
> >> On Fri, May 8, 2020 at 3:42 AM Erwin Lubbers <erwin(a)lubbers.org>
wrote:
> >> >
> >> > Hi,
> >> >
> >> > Did anyone found a way to resolve the problem? I'm seeing the same
on a clean Octopus Ceph installation on Ubuntu 18 with an Octopus compiled
KVM server running on CentOS 7.8. The KVM machine shows:
> >> >
> >> > [ 7682.233684] fn-radosclient[6060]: segfault at 2b19 ip
00007f8165cc0a50 sp 00007f81397f6490 error 4 in
librbd.so.1.12.0[7f8165ab4000+537000]
> >>
> >> Are you able to either capture a backtrace from a coredump or set up
> >> logging and hopefully capture a backtrace that way?
> >>
> >> >
> >> > Ceph is healthy and stable for a few weeks and I did not get these
messages while running on KVM compiled with Luminous libraries.
> >> >
> >> > Regards,
> >> > Erwin
> >> > _______________________________________________
> >> > ceph-users mailing list -- ceph-users(a)ceph.io
> >> > To unsubscribe send an email to ceph-users-leave(a)ceph.io
> >> >
> >>
> >>
> >> --
> >> Cheers,
> >> Brad
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users(a)ceph.io
> >> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>
>
>
> --
> Cheers,
> Brad
>
--
Cheers,
Brad
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io