On Mon, Dec 9, 2019 at 7:47 AM Arash Shams <ara4sh@hotmail.com> wrote:
Dear All, 

I have almost 30 million objects and I want to list them and index them somewhere else, 
Im using boto3 with continuation Marker but it takes almost 9 hours

can I run it in multiple threads to make it faster? what solution do you suggest to speedup this process, 


Thanks 

I've thought about indexing objects elsewhere as well. One thought I had was hooking into the HTTP flow where a PUT or DEL would update the objects in some kind of database (async of course). We could also gather stats with GET and POST. Initially, my thoughts were to hook into haproxy since we already use it, but possibly RGW if that is an option. That way it would always be up to date and not have to do big scans on the buckets (our buckets would not perform well with this). I haven't actually gotten to the implementation phase of this idea.

----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1