Hi Ilya,
thanks a lot for the information. Yes, I was talking about the exclusive lock feature and
was under the impression that only one rbd client can get write access on connect and will
keep it until disconnect. The problem we are facing with multi-VM write access is, that
this will inevitably corrupt the file system created on the rbd if two instances can get
write access. Its not a shared file system, its just an xfs formatted virtual disk.
There is a way to disable automatic lock transitions
but I don't think
it's wired up in QEMU.
Can you point me to some documentation about that? It sounds like this is what would be
needed to avoid the file system corruption in our use case. The lock transition should be
initiated from the outside and the lock should then stay fixed on the client holding it
until it is instructed to give up the lock or it disconnects.
Is this a
known problem with libceph and libvirtd?
Not sure what you mean by libceph.
I simply meant that its not a krbd client. Libvirt uses libceph (or was it librbd?) to
emulate virtual drives, not krbd.
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
________________________________________
From: Ilya Dryomov <idryomov(a)gmail.com>
Sent: 18 January 2023 14:26:54
To: Frank Schilder
Cc: ceph-users(a)ceph.io
Subject: Re: [ceph-users] Ceph rbd clients surrender exclusive lock in critical situation
On Wed, Jan 18, 2023 at 1:19 PM Frank Schilder <frans(a)dtu.dk> wrote:
Hi all,
we are observing a problem on a libvirt virtualisation cluster that might come from ceph
rbd clients. Something went wrong during execution of a live-migration operation and as a
result we have two instances of the same VM running on 2 different hosts, the source- and
the destination host. What we observe now is the the exclusive lock of the RBD disk image
moves between these two clients periodically (every few minutes the owner flips).
Hi Frank,
If you are talking about RBD exclusive lock feature ("exclusive-lock"
under "features" in "rbd info" output) then this is expected. This
feature provides automatic cooperative lock transitions between clients
to ensure that only a single client is writing to the image at any
given time. It's there to protect internal per-image data structures
such as the object map, the journal or the client-side PWL (persistent
write log) cache from concurrent modifications in case the image is
opened by two or more clients. The name is confusing but it's NOT
about preventing other clients from opening and writing to the image.
Rather it's about serializing those writes.
We are pretty sure that no virsh commands possibly having that effect are executed during
this time. The client connections are not lost and the OSD blacklist is empty. I don't
understand why a ceph rbd client would surrender an exclusive lock in such a split brain
situation, its exactly when it needs to hold on to it. As a result, the affected virtual
drives are corrupted.
There is no split-brain from the Ceph POV here. RBD has always
supported the multiple clients use case.
The questions we have in this context are:
Under what conditions does a ceph rbd client surrender an exclusive lock?
Exclusive lock transitions are cooperative so any time another client
asks for it (not immediately though -- the current lock owner finishes
processing in-flight I/O and flushes its caches first).
Could this be a bug in the client or a ceph config
error?
Very unlikely.
There is a way to disable automatic lock transitions but I don't think
it's wired up in QEMU.
Is this a known problem with libceph and libvirtd?
Not sure what you mean by libceph.
Thanks,
Ilya