On Feb 13, 2020, at 3:52 AM, Jeff Layton
<jlayton(a)redhat.com> wrote:
If the OSD daemon dies, then it will have closed all of its fd's and
there should be no more lock. Therefore you almost certainly have some
other process running that is holding the lock.
You may have to do a bit of digging in /proc/locks. Determine the
dev+inode number of the file on which the lock is being set and find it
in /proc/locks. Then you can track down the PID that's holding that
lock.
I have checked the locks with lslocks, here is the locks when I vstarted ceph
(bluestore block = /dev/sdc where sdc is a raw device):
COMMAND PID TYPE SIZE MODE M START END PATH
ceph-mgr 19852 POSIX WRITE 0 0 0 /...
iscsid 1061 POSIX WRITE 0 0 0 /run...
ceph-mgr 14889 POSIX WRITE 0 0 0 /...
rpcbind 990 FLOCK WRITE 0 0 0 /run...
ceph-mon 16430 POSIX WRITE 0 0 0 /...
ceph-mon 16430 POSIX WRITE 0 0 0 /...
ceph-mon 18107 POSIX WRITE 0 0 0 /...
ceph-mon 18107 POSIX WRITE 0 0 0 /...
ceph-mon 19711 POSIX WRITE 0 0 0 /...
ceph-mon 19711 POSIX WRITE 0 0 0 /...
ceph-mon 10495 POSIX WRITE 0 0 0 /...
ceph-mon 10495 POSIX WRITE 0 0 0 /...
ceph-mon 14748 POSIX WRITE 0 0 0 /...
ceph-mon 14748 POSIX WRITE 0 0 0 /...
cron 1085 FLOCK WRITE 0 0 0 /run...
ceph-mgr 18247 POSIX WRITE 0 0 0 /...
atd 1111 POSIX WRITE 0 0 0 /run...
lvmetad 807 POSIX WRITE 0 0 0 /run...
ceph-mgr 10635 POSIX WRITE 0 0 0 /...
ceph-mgr 16571 POSIX WRITE 0 0 0 /…
Then I kill all related processes and restart cluster, the error “_lock flock failed on
/users/xxx/ceph/build/dev/osd0/block” persists.
After the kill, locks are:
COMMAND PID TYPE SIZE MODE M START END PATH
rpcbind 20267 FLOCK WRITE 0 0 0 /run...
lvmetad 20266 POSIX WRITE 0 0 0 /run…
The error happens in KernelDevice.cc:
int r = ::flock(fd_directs[WRITE_LIFE_NOT_SET], LOCK_EX | LOCK_NB);
Where r gives -1, and fd_directs[WRITE_LIFE_NOT_SET] will give 11, and WRITE_LIFE_NOT_SET
is 0.
Any suggestions how to proceed with the issue?
Thanks,
-ym
Cheers,
Jeff
On Wed, 2020-02-12 at 09:03 -0800, Yiming Zhang wrote:
The weird thing is I don’t have systemd-udev
installed on my server.
Is there any other possible solutions?
The error only happens when I redirect osd data to a raw device.
Thanks,
Yiming
On Feb 12, 2020, at 8:36 AM, Sage Weil
<sage(a)newdream.net> wrote:
Talib was chasing down a similar issue a while back and found that the
root cause was systemd-udev, which spawns a process that opens the device
after it is closed. You might try removing or disabling that package
and see if it goes away?
On Wed, 12 Feb 2020, Yiming Zhang wrote:
> Hi All,
>
> I noticed a locking issue in kernel device.
> When I stopped the ceph cluster and all daemons, the kernel device _lock somehow is
still held and this line below will return r < 0:
>
> int KernelDevice::_lock()
> {
> int r = ::flock(fd_directs[WRITE_LIFE_NOT_SET], LOCK_EX | LOCK_NB);
> …
> }
>
> The way I stop the cluster and daemons:
>
> sudo ../src/stop.sh
> sudo bin/init-ceph --verbose forcestop
>
> This error happens even after the reboot when I try to use vstart:
>
> bdev _lock flock failed on ceph/build/dev/osd0/block
> bdev open failed to lock /home/yzhan298/ceph/build/dev/osd0/block: (11) Resource
temporarily unavailable
> OSD::mkfs: couldn't mount ObjectStore: error (11) Resource temporarily
unavailable
> ** ERROR: error creating empty object store in ceph/build/dev/osd0: (11) Resource
temporarily unavailable
>
>
> Please advice. (On master branch)
>
> Thanks,
> Yiming
> _______________________________________________
> Dev mailing list -- dev(a)ceph.io
> To unsubscribe send an email to dev-leave(a)ceph.io
_______________________________________________
Dev mailing list -- dev(a)ceph.io
To unsubscribe send an email to dev-leave(a)ceph.io
--
Jeff Layton <jlayton(a)redhat.com>