Re: [ceph-users] reproducable rbd-nbd crashes

23 Jul 2019

Hi Mike,

Am 22.07.19 um 16:48 schrieb Mike Christie:
...
  On 07/22/2019 06:00 AM, Marc Schöchlin wrote:
   With
older kernels no timeout would be set for each command by default,
 so if you were not running that tool then you would not see the nbd
 disconnect+io_errors+xfs issue. You would just see slow IOs.

 With newer kernels, like 4.15, nbd.ko always sets a per command timeout
 even if you do not set it via a nbd ioctl/netlink command. By default
 the timeout is 30 seconds. After the timeout period then the kernel does
 that disconnect+IO_errors error handling which causes xfs to get errors.
  Did i get you correctly: Setting a unlimited timeout should prevent crashes on
kernel 4.15?  It looks like with newer kernels there is no way to turn it off.

 You can set it really high. There is no max check and so it depends on
 various calculations and what some C types can hold and how your kernel
 is compiled. You should be able to set the timer to an hour. 
Okay, i already experimented with high timeouts (i.e 600 seconds). As i can remember this
leaded to pretty unusable system if i put high amounts of io on the ec volume.
This system also runs als krbd volume which saturates the system with ~30-60% iowait -
this volume never had a problem.

A comment writer in https://tracker.ceph.com/issues/40822#change-141205 suggests me to
reduce the rbd cache.
What do you think about that?

...

  For testing purposes i set the timeout to
unlimited ("nbd_set_ioctl /dev/nbd0 0", on already mounted device).
 I re-executed the problem procedure and discovered that the compression-procedure crashes
not at the same file, but crashes 30 seconds later with the same crash behavior.

 0 will cause the default timeout of 30 secs to be used. 
Okay, then the usage description of
https://github.com/OnApp/nbd-kernel_mod/blob/master/nbd_set_timeout.c not seems to be
correct :-)

Regards
Marc

2024

2023

2022

2021

2020

2019

Re: [ceph-users] reproducable rbd-nbd crashes