Hello Jason,
Am 18.07.19 um 20:10 schrieb Jason Dillaman:
On Thu, Jul 18, 2019 at 1:47 PM Marc Schöchlin
<ms(a)256bit.org> wrote:
Hello cephers,
rbd-nbd crashes in a reproducible way here.
I don't see a crash report in the
log below. Is it really crashing or
is it shutting down? If it is crashing and it's reproducable, can you
install the debuginfo packages, attach gdb, and get a full backtrace
of the crash?
I do not get a crash report of rbd-nbd.
I seems that "rbd-nbd" just terminates, and crashes the xfs filesystem because
the nbd device is not available anymore.
("rbd nbd ls" shows no mapped device anymore)
It seems like your cluster cannot keep up w/ the load and the nbd
kernel driver is timing out the IO and shutting down. There is a
"--timeout" option on "rbd-nbd" that you can use to increase the
kernel IO timeout for nbd.
I have also a 36TB XFS (non_ec) volume on this virtual system mapped by krbd which
is under really heavy read/write usage.
I never experienced problems like this on this system with similar usage patterns.
The volume which is involved in the problem only handles a really low load and i was
capable to create the error situation by using the simple "find . -type f -name
"*.sql" -exec ionice -c3 nice -n 20 gzip -v {} \;" command.
I copied and read ~1.5 TB of data to this volume without a problem - it seems that the
gzip command provokes a io pattern which leads to the error situation.
As described i use a luminous "12.2.11" client which does not support that
"--timeout" option (btw. a backport would be nice).
Our ceph system runs with a heavy write load, therefore we already set a 60 seconds
timeout using the following code:
(
https://github.com/OnApp/nbd-kernel_mod/blob/master/nbd_set_timeout.c)
We have ~500 heavy load rbd-nbd devices in our xen cluster (rbd-nbd 12.2.5, kernel
4.4.0+10, centos clone) and ~20 high load krbd devices (kernel 4.15.0-45, ubuntu 16.04) -
we never experienced problems like this.
We only experience problems like this with rbd-nbd > 12.2.5 on ubuntu 16.04 (kernel
4.15) or ubuntu 18.04 (kernel 4.15) with erasure encoding or without.
Regards
Marc