On Sat, Jan 11, 2020 at 4:55 PM Sage Weil <sage(a)newdream.net> wrote:
On Fri, 10 Jan 2020, Neal Gompa wrote:
I don't know how many of you folks are aware, but early last year,
Datto (full disclosure, my current employer, though I'm sending this
email pretty much on my own) released a tool called "zfs2ceph" as an
open source project. This project was the result of a week-long
internal hackathon (SUSE folks may be familiar with this concept from
their own "HackWeek" program) that Datto held internally in
December 2018. I was a member of that team, helping with research,
setting up infra, and making demos for it.
Anyway, I'm bringing it up here because I'd had some conversations
with some folks individually who suggested that I bring it up here in
the mailing list and to talk about some of the motivations and what
I'd like to see in the future from Ceph on this.
The main motivation here was to provide a
seamless mechanism to
transfer ZFS based datasets with the full chain of historical
snapshots onto Ceph storage with as much fidelity as possible to allow
a storage migration without requiring 2x-4x system resources. Datto is
in the disaster recovery business, so working backups with full
history are extremely valuable to Datto, its partners, and their
customers. That's why the traditional path of just syncing the current
state and letting the old stuff die off is not workable. At the scale
of having literally thousands of servers with each server having
hundreds of terabytes of ZFS storage (making up in aggregate to
hundreds of petabytes of data), there's no feasible way to consider
alternative storage options without having a way to transfer datasets
from ZFS to Ceph so that we can cut over servers to being Ceph nodes
with minimal downtime and near zero new server purchasing requirements
(there's obviously a little bit of extra hardware needed to "seed" a
Ceph cluster, but that's fine).
The current zfs2ceph implementation handles zvol sends and transforms
them into rbd v1 import streams. I don't recall exactly the reason why
we don't use v2 anymore, but I think there was some gaps that made it
so it wasn't usable for our case back then (we were using Ceph
Luminous). I'm unsure if this is improved now, though it wouldn't
surprise me if it has. However, zvols aren't enough for us. Most of
I'd be surprised if there was something that v1 had that v2 didn't. Any
other details you can remember? Jason, does this bring anything to mind?
One of my teammates retrieved our notes from the time we were hacking
on it, and now I have some of those details:
The Ceph export feature set was inconsistent across v1 and v2:
* v1 format does not support multiple snapshots
* v2 format does not support partial exports
* import/export move complete volumes (v1 or v2)
* import-diff/export-diff send diffs between snapshots (v1)
Perhaps once v2 has all the features of v1, it'd be a better proper replacement.
datasets are in the ZFS filesystem form, not the ZVol block
device form. Unfortunately, there is no import equivalent for CephFS,
which blocked an implemented of this capability. I had filed a
request about it on the issue tracker, but it was rejected on the
basis of something was being worked on. However, I haven't seen
something exactly like what I need land in CephFS yet.
Patrick would know more (copied).
The code is pretty simple, and I think it would
be easy enough for it
to be incorporated into Ceph itself. However, there's a greater
question here. Is there interest from the Ceph developer community in
developing and supporting strategies to migrate from legacy data
stores to Ceph with as much fidelity as reasonably possible?
Personally, I hope so. My hope is that this post generates some
interesting conversation about how to make this a better supported
capability within Ceph for block and filesystem data. :)
I think including this into the relevant tools (e.g., rbd CLI) makes
sense... as long as we can bring some tests along with it to ensure we're
properly handing the 'zfs send' data stream.
That would be very cool!
真実はいつも一つ！/ Always, there's only one truth!