RBD, OpenStack Nova, libvirt, qemu-guest-agent, and FIFREEZE: is this working as intended? - ceph-users

21 Aug 2019

Hi everyone,

apologies in advance; this will be long. It's also been through a bunch
of edits and rewrites, so I don't know how well I'm expressing myself at
this stage — please holler if anything is unclear and I'll be happy to
try to clarify.

I am currently in the process of investigating the behavior of OpenStack
Nova instances when being snapshotted and suspended, in conjunction with
qemu-guest-agent (qemu-ga). I realize that RBD-backed Nova/libvirt
instances are expected to behave differently from file-backed ones, but
I think I might have reason to believe that the RBD-backed ones are
indeed behaving incorrectly, and I'd like to verify that.

So first up, for comparison, let's recap how a Nova/libvirt/KVM instance
behaves when it is *not* backed by RBD (such as, it's using a qcow2 file
that is on a Nova compute node in /var/lib/nova/instances), is booted
from an image with the hw_qemu_guest_agent=yes meta property set, and
runs qemu-guest-agent within the guest:

- User issues "nova suspend" or "openstack server suspend".

- If nova-compute on the compute node decides that the instance has
qemu-guest-agent running (which is the case if it's qemu or kvm, and its
image has hw_qemu_guest_agent=yes), it sends a guest-sync command over
the guest agent VirtIO serial port. This command registers in the
qemu-ga log file in the guest.

- nova-compute on the compute node sends a libvirt managed-save command.

- Nova reports the instance as suspended.

- User issues "nova resume" or "openstack server resume".

- nova-compute on the compute node sends a libvirt start command.

- Again, if nova-compute on the compute node knows that the instance has
qemu-guest-agent running, it sends another command over the serial port,
namely guest-set-time. This, too, registers in the guest's qemu-ga log.

- Nova reports the instance as active (running normally) again.

Now, when I instead use a Nova environment that is fully RBD-backed, I
see exactly the same behavior as described above. So I know that in
principle, nova-compute/qemu-ga communication works in both an
RBD-backed and a non-RBD-backed environment.

However, things appear to get very different when it comes to snapshots.

Again, starting with a file-backed environment:

- User issues "nova image-create" or "openstack server image create".

- If nova-compute on the compute node decides that the instance can be
quiesced (which is the case if it's qemu or kvm, and its image has
hw_qemu_guest_agent=yes), then it sends a "guest-fsfreeze-freeze"
command over the guest agent VirtIO serial port.

- The guest agent inside the guest loops over all mounted filesystems,
and issues the FIFREEZE ioctl (which maps to the kernel freeze_super()
function). This can be seen in the qemu-ga log file in the guest, and it
is also verifiable by using ftrace on the qemu-ga PID and checking for
the freeze_super() function call.

- nova-compute then takes a live snapshot of the instance.

- Once complete, the guest gets a "guest-fsfreeze-thaw" command, and
again I can see this in the qemu-ga log, and with ftrace.

And now with RBD:

- User issues "nova image-create" or "openstack server image create".

- The guest-fsfreeze-freeze agent command never happens.

Now I can see the info message from
https://opendev.org/openstack/nova/src/commit/7bf75976016aae5d458eca9f6ddac…
in my nova-compute log, which confirms that we're attempting a live
snapshot.

I also do *not* see the warning from
https://opendev.org/openstack/nova/src/commit/7bf75976016aae5d458eca9f6ddac…,
so it looks like the direct_snapshot() call from
https://opendev.org/openstack/nova/src/commit/7bf75976016aae5d458eca9f6ddac…
succeeds. This is defined in
https://opendev.org/openstack/nova/src/commit/7bf75976016aae5d458eca9f6ddac…
and it uses RBD functionality only. Importantly, it never interacts with
qemu-ga, so it appears to not worry at all about freezing the filesystem.

(Which does seem to contradict
https://docs.ceph.com/docs/master/rbd/rbd-openstack/?highlight=uuid#image-p…,
by the way, so that may be a documentation bug.)

Now here's another interesting part. Were the direct snapshot to fail,
if I read
https://opendev.org/openstack/nova/src/commit/7bf75976016aae5d458eca9f6ddac…
and
https://opendev.org/openstack/nova/src/commit/7bf75976016aae5d458eca9f6ddac…
correctly, the fallback behavior would be as follows: The domain would
next be "suspended" (note, again this is Nova suspend, which maps to
libvirt managed-save per
https://opendev.org/openstack/nova/src/commit/7bf75976016aae5d458eca9f6ddac…),
then snapshotted using a libvirt call and resumed again post-snapshot.
In which case there would be a guest-sync call on suspend.

And it's this part that has me a bit worried. If an RBD backed instance,
on a successful snapshot, never freezes its filesystem *and* never does
any kind of sync, either, doesn't that mean that such an instance can't
be made to produce consistent snapshots? (Particularly in the case of
write-back caching, which is recommended and normally safe for
RBD/virtio devices.) Or is there some magic within the Qemu RBD storage
driver that I am unaware of, that makes any such contortions unnecessary?

Thanks in advance for your insights!

Cheers,
Florian