On Fri, Mar 20, 2020 at 1:29 PM <vitalif(a)yourcmc.ru> wrote:
Hi.
For a long time I was under an impression that clones are as efficient
in bluestore as snapshots.
But today I finally decided to test it and ... I discovered it was an
utterly wrong impression :) RBD copies the whole 4 MB object even when a
small 4 KB block is modified within it in the child image. In my
all-NVMe cluster this leads to 40 (40!!!) random write iops (bs=4k
iodepth=1) in a fresh RBD clone, which is terrible.
Anything with an iodepth of 1 is going to be (relatively) terrible on RBD.
Question of the day: is it possible to reimplement RBD
clones using
"sparse objects"? As I understand the support for sparse objects
themselves is already there. So maybe librbd could only write the
modified part to the child image when writing and read "holes" from
parents when reading?
The forthcoming Octopus release of librbd adds support for sparse
copy-up writes [1] when your min OSD release is set to Octopus (reads
from the parent image were already sparse-read ops). Using holes was
previously not very practical due to the large allocations sizes on
the OSD, but with the change to 4KiB minimum block sizes, such a
technique would be possible (albeit a breaking change for all older
clients controlled via a new feature bit). You also have the ability
to control the RBD object sizes and use something smaller than the
4MiB default.
--
Vitaliy Filippov
_______________________________________________
Dev mailing list -- dev(a)ceph.io
To unsubscribe send an email to dev-leave(a)ceph.io
[1]
https://github.com/ceph/ceph/pull/27999
--
Jason