[ceph-users] Re: Using RBD to pack billions of small files

2 Feb 2021

I’d be nervous about a plan to utilize a single volume, growing indefinitely.  I would
think that from a blast radius perspective that you’d want to strike a balance between a
single monolithic blockchain-style volume vs a zillion tiny files.  Perhaps a strategy to
shard into, say, 10 TB volumes.  That size is large enough to hold lots of immutable code
yet not so unweildy that it becomes infeasible to manage.

...
  Packing's obviously a good idea for storing these
kinds of artifacts
 in Ceph, and hacking through the existing librbd might indeed be
 easier than building something up from raw RADOS, especially if you
 want to use stuff like rbd-mirror.

 My main concern would just be as Dan points out, that we don't test
 rbd with extremely large images and we know deleting that image will
 take a looooong time — I don't know of other issues off the top of my
 head, and in the worst case you could always fall back to manipulating
 it with raw librados if there is an issue.

 But you might also check in on the status of Danny Al-Gaaf's rados
 email project. Email and these artifacts seemingly have a lot in
 common.
 -Greg

 On Mon, Feb 1, 2021 at 12:52 PM Loïc Dachary &lt;loic(a)dachary.org&gt; wrote:

 Hi Dan,

 On 01/02/2021 21:13, Dan van der Ster wrote:
  Hi Loïc,

 We've never managed 100TB+ in a single RBD volume. I can't think of
 anything, but perhaps there are some unknown limitations when they get so
 big.
 It should be easy enough to use rbd bench to create and fill a massive test
 image to validate everything works well at that size.  Good idea! I'll look for
a cluster with 100TB of free space and post my findings.

 Also, I assume you'll be doing the IO from just one client? Multiple
 readers/writers to a single volume could get complicated.  Yes.

 Otherwise, yes RBD sounds very convenient for what you need.  It is inspired by
https://static.usenix.org/event/osdi10/tech/full_papers/Beaver.pdf which suggests an
ad-hoc implementation to pack immutable objects together. But I think RBD already provides
the underlying logic, even though it is not specialized for this use case. RGW also packs
small objects together and would be a good candidate. But it provides more flexibility to
modify/delete objects and I assume it will be slower to write N objects with RGW than to
write them sequentially on an RBD image. But I did not try and maybe I should.

 To be continued.

 Cheers, Dan

 On Sat, Jan 30, 2021, 4:01 PM Loïc Dachary &lt;loic(a)dachary.org&gt; wrote:

  Bonjour,

 In the context Software Heritage (a noble mission to preserve all source
 code)[0], artifacts have an average size of ~3KB and there are billions of
 them. They never change and are never deleted. To save space it would make
 sense to write them, one after the other, in an every growing RBD volume
 (more than 100TB). An index, located somewhere else, would record the
 offset and size of the artifacts in the volume.

 I wonder if someone already implemented this idea with success? And if
 not... does anyone see a reason why it would be a bad idea?

 Cheers

 [0] https://docs.softwareheritage.org/

 --
 Loïc Dachary, Artisan Logiciel Libre

 _______________________________________________
 ceph-users mailing list -- ceph-users(a)ceph.io
 To unsubscribe send an email to ceph-users-leave(a)ceph.io
   _______________________________________________
 ceph-users mailing list -- ceph-users(a)ceph.io
 To unsubscribe send an email to ceph-users-leave(a)ceph.io  
 --
 Loïc Dachary, Artisan Logiciel Libre

 _______________________________________________
 ceph-users mailing list -- ceph-users(a)ceph.io
 To unsubscribe send an email to ceph-users-leave(a)ceph.io 
_______________________________________________
 ceph-users mailing list -- ceph-users(a)ceph.io
 To unsubscribe send an email to ceph-users-leave(a)ceph.io 

2024

2023

2022

2021

2020

2019

[ceph-users] Re: Using RBD to pack billions of small files