Hi everyone,
apologies in advance; this will be long. It's also been through a bunch
of edits and rewrites, so I don't know how well I'm expressing myself at
this stage — please holler if anything is unclear and I'll be happy to
try to clarify.
I am currently in the process of investigating the behavior of OpenStack
Nova instances when being snapshotted and suspended, in conjunction with
qemu-guest-agent (qemu-ga). I realize that RBD-backed Nova/libvirt
instances are expected to behave differently from file-backed ones, but
I think I might have reason to believe that the RBD-backed ones are
indeed behaving incorrectly, and I'd like to verify that.
So first up, for comparison, let's recap how a Nova/libvirt/KVM instance
behaves when it is *not* backed by RBD (such as, it's using a qcow2 file
that is on a Nova compute node in /var/lib/nova/instances), is booted
from an image with the hw_qemu_guest_agent=yes meta property set, and
runs qemu-guest-agent within the guest:
- User issues "nova suspend" or "openstack server suspend".
- If nova-compute on the compute node decides that the instance has
qemu-guest-agent running (which is the case if it's qemu or kvm, and its
image has hw_qemu_guest_agent=yes), it sends a guest-sync command over
the guest agent VirtIO serial port. This command registers in the
qemu-ga log file in the guest.
- nova-compute on the compute node sends a libvirt managed-save command.
- Nova reports the instance as suspended.
- User issues "nova resume" or "openstack server resume".
- nova-compute on the compute node sends a libvirt start command.
- Again, if nova-compute on the compute node knows that the instance has
qemu-guest-agent running, it sends another command over the serial port,
namely guest-set-time. This, too, registers in the guest's qemu-ga log.
- Nova reports the instance as active (running normally) again.
Now, when I instead use a Nova environment that is fully RBD-backed, I
see exactly the same behavior as described above. So I know that in
principle, nova-compute/qemu-ga communication works in both an
RBD-backed and a non-RBD-backed environment.
However, things appear to get very different when it comes to snapshots.
Again, starting with a file-backed environment:
- User issues "nova image-create" or "openstack server image create".
- If nova-compute on the compute node decides that the instance can be
quiesced (which is the case if it's qemu or kvm, and its image has
hw_qemu_guest_agent=yes), then it sends a "guest-fsfreeze-freeze"
command over the guest agent VirtIO serial port.
- The guest agent inside the guest loops over all mounted filesystems,
and issues the FIFREEZE ioctl (which maps to the kernel freeze_super()
function). This can be seen in the qemu-ga log file in the guest, and it
is also verifiable by using ftrace on the qemu-ga PID and checking for
the freeze_super() function call.
- nova-compute then takes a live snapshot of the instance.
- Once complete, the guest gets a "guest-fsfreeze-thaw" command, and
again I can see this in the qemu-ga log, and with ftrace.
And now with RBD:
- User issues "nova image-create" or "openstack server image create".
- The guest-fsfreeze-freeze agent command never happens.
Now I can see the info message from
https://opendev.org/openstack/nova/src/commit/7bf75976016aae5d458eca9f6ddac…
in my nova-compute log, which confirms that we're attempting a live
snapshot.
I also do *not* see the warning from
https://opendev.org/openstack/nova/src/commit/7bf75976016aae5d458eca9f6ddac…,
so it looks like the direct_snapshot() call from
https://opendev.org/openstack/nova/src/commit/7bf75976016aae5d458eca9f6ddac…
succeeds. This is defined in
https://opendev.org/openstack/nova/src/commit/7bf75976016aae5d458eca9f6ddac…
and it uses RBD functionality only. Importantly, it never interacts with
qemu-ga, so it appears to not worry at all about freezing the filesystem.
(Which does seem to contradict
https://docs.ceph.com/docs/master/rbd/rbd-openstack/?highlight=uuid#image-p…,
by the way, so that may be a documentation bug.)
Now here's another interesting part. Were the direct snapshot to fail,
if I read
https://opendev.org/openstack/nova/src/commit/7bf75976016aae5d458eca9f6ddac…
and
https://opendev.org/openstack/nova/src/commit/7bf75976016aae5d458eca9f6ddac…
correctly, the fallback behavior would be as follows: The domain would
next be "suspended" (note, again this is Nova suspend, which maps to
libvirt managed-save per
https://opendev.org/openstack/nova/src/commit/7bf75976016aae5d458eca9f6ddac…),
then snapshotted using a libvirt call and resumed again post-snapshot.
In which case there would be a guest-sync call on suspend.
And it's this part that has me a bit worried. If an RBD backed instance,
on a successful snapshot, never freezes its filesystem *and* never does
any kind of sync, either, doesn't that mean that such an instance can't
be made to produce consistent snapshots? (Particularly in the case of
write-back caching, which is recommended and normally safe for
RBD/virtio devices.) Or is there some magic within the Qemu RBD storage
driver that I am unaware of, that makes any such contortions unnecessary?
Thanks in advance for your insights!
Cheers,
Florian