[ceph-users] Re: Using RBD to pack billions of small files

4 Feb 2021

Hi Matt,

I did not know about pixz, thanks for the pointer. The idea it implements is also new to
me and it looks like it can
usefully be applied to this use case. I'm not going to say "awesome" because
I can't grasp how useful it really is
right now. But I'll definitely think about it :-)

Cheers

On 03/02/2021 22:02, Matt Wilder wrote:
...
  If it were me, I would do something along the lines
of:

 - Bundle larger blocks of code into pixz
 <https://github.com/vasi/pixz> (essentially
 indexed tar files, allowing random access) and store them in RadosGW.
 - Build a small frontend that fetches (with caching) them and provides the
 file contents via whatever your UI is.

 On Wed, Feb 3, 2021 at 12:55 AM Burkhard Linke <
 Burkhard.Linke(a)computational.bio.uni-giessen.de&gt; wrote:

> Hi,
>
> On 2/3/21 9:41 AM, Loïc Dachary wrote:
>>> Just my 2 cents:
>>>
>>> You could use the first byte of the SHA sum to identify the image, e.g.
> using a fixed number of 256 images. Or some flexible approach similar to
> the way filestore used to store rados objects.
>> A friend suggested the same to save space. Good idea.
>
> If you want to further reduce the index size, you can just store the
> offset, and the first 4? 8? bytes at that offset define the size of the
> following artifacts. That's similar to the way Pascal used to store
> strings in the good ol' times. You might also want to think about using
> a complete header which also includes the artifact's name etc. This will
> allow you to rebuild the index if it becomes corrupted. The storage
> overhead should be insignificant
>
> Your index will become a simple mapping of SHA sum -> offset, and you
> might also be able to use optimized implementations.
>
>
> Regards,
>
> Burkhard
>
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
> 
-- 
Loïc Dachary, Artisan Logiciel Libre

2024

2023

2022

2021

2020

2019

[ceph-users] Re: Using RBD to pack billions of small files