question about rgw delete speed - ceph-users

11 Nov 2020

Hey guys,

I'm in charge of a local cloud-storage service. Our primary object 
storage is a vendor-based one and I want to replace it in the near 
future with Ceph with the following setup :

- 6 OSD servers with 36 SATA 16TB drives each and 3 big NVME per server 
(1 big NVME for every 12 drives so I can reserve 300GB NVME storage for 
every SATA drive), 3 MON, 2 RGW with Epyc 7402p and 128GB RAM. So in the 
end we'll have ~ 3PB of raw data and 216 SATA drives.

Currently we have ~ 100 millions of files on the primary storage with 
the following distribution :

- ~10% = very small files ( less than 1MB - thumbnails, text&office 
files and so on)

- ~60%= small files (between 1MB and 10MB)

-  20% = medium files ( between 10MB and 1GB)

- 10% = big files (over 1GB).

My main concern is the speed of delete operations. We have around 
500k-600k delete ops every 24 hours so quite a lot. Our current storage 
is not deleting all the files fast enough (it's always 1 week-10 days 
behind) , I guess is not only a software issue and probably the delete 
speed will get better if we add more drives (we now have 108).

What do you think about Ceph delete speed ? I read on other threads that 
it's not very fast . I wonder if this hw setup can handle our current 
delete load better than our current storage. On RGW servers I want to 
use Swift , not S3.

And another question :   can I start deploying in production directly 
the latest Ceph version (Octopus) or is it safer to start with Nautilus 
until Octopus will be more stable ?

Any input would be greatly appreciated !

Thanks,

Adrian.