I am wondering what the problem is with these error messages I am
having. I think it is not really related to capability release because
those messages I am getting at a later time.
I have two processes being affected by this a rsync on a ceph fuse mount
and a rsync on a nfs-ganesha mount running at the same time. The ceph
fuse mount gets at a later time the cache pressure notification.
I think everything starts to go wrong after the mds is reporting on the
xlock. This message is being logged:
2020-06-13 03:38:36.981 7fb5edd82700 0 log_channel(cluster) log [WRN] :
slow request 244.377406 seconds old, received at 2020-06-13
03:34:32.604412: client_request(client.4019800:20354 setattr
mtime=2020-04-22 12:58:47.000000 atime=2020-06-13 03:34:32.000000
#0x100001b9177 2020-06-13 03:34:32.604119 caller_uid=500,
caller_gid=500{500,1,2,3,4,6,10,}) currently failed to xlock, waiting
From the charts here you can see that caps are climbing and inodes are
dropping.
https://snapshot.raintank.io/dashboard/snapshot/4ij6AF1JoDzdZNI6WzCyewkn7Oq…
(I am not entirely sure about the correctness of the used units, and the
ino/caps chart should have dual y-axis)
It looks like that if I start multiple concurrent rsyncs on the
nfs-ganesha I can trigger this problem. Every time it seems I am getting
the same xlock listed at the mds log.
2020-06-13 17:32:24.920 7fb5edd82700 0 log_channel(cluster) log [WRN] :
slow request 240.294505 seconds old, received at 2020-06-13
17:28:24.626608: client_request(client.4021284:2468 setattr
mtime=2020-04-22 12:58:47.000000 atime=2020-06-13 17:28:24.000000
#0x100001b9177 2020-06-13 17:28:24.626527 caller_uid=500,
caller_gid=500{500,1,2,3,4,6,10,}) currently failed to xlock, waiting
Question: I have seen this work around/solution[1] being offered a lot,
but I do not get why I have the xlock. When does one get an xlock?
I think when I can prevent the xlock, I do not need to set osd op queue
options.
[1]
https://www.mail-archive.com/ceph-users@ceph.io/msg04421.html
With 'work around':
osd op queue = wpq
osd op queue cut off = high