Bonjour,
Reading Karan's blog post about benchmarking the insertion of billions objects to Ceph
via S3 / RGW[0] from last year, it reads:
we decided to lower bluestore_min_alloc_size_hdd to
18KB and re-test. As represented in chart-5, the object creation rate found to be notably
reduced after lowering the bluestore_min_alloc_size_hdd parameter from 64KB (default) to
18KB. As such, for objects larger than the bluestore_min_alloc_size_hdd , the default
values seems to be optimal, smaller objects further require more investigation if you
intended to reduce bluestore_min_alloc_size_hdd parameter.
There also is a mail thread dated 2018 on this topic as well, with the same conclusion
although using RADOS directly and not RGW[3]. I read the RGW data layout page in the
documentation[1] and concluded that by default every object inserted with S3 / RGW will
indeed use at least 64kb. A pull request from last year[2] seems to confirm it and also
suggests modifying bluestore_min_alloc_size_hdd has adverse side effects.
That being said, I'm curious to know if people developed strategies to cope with this
overhead. Someone mentioned packing objects together client side to make them larger. But
maybe there are simpler ways to do the same?
Cheers
[0]
https://www.redhat.com/en/blog/scaling-ceph-billion-objects-and-beyond
[1]
https://docs.ceph.com/en/latest/radosgw/layout/
[2]
https://github.com/ceph/ceph/pull/32809
[3]
https://www.spinics.net/lists/ceph-users/msg45755.html
--
Loïc Dachary, Artisan Logiciel Libre