recently I installed Hadoop and other frameworks such as
Hive/TPCDS/TeraGen/SPARK into container, and configure it to work with S3.
thus, its possible to access the same S3-data via all of those utilities.
the container resides on docker-hub (in order to download it, all it needs
is to run-it)
attached document describing the container content and how to use it.
https://docs.google.com/document/d/1v5jiarEK6CEU0cstBjpyQ8T5ucTF7LWSwHe-Fct…
will appreciate your comments.
Gal.