It downloads GitHub Archive 2016 data and uploads it to an Amazon S3 bucket.
It's preferred to run it inside an Amazon EC2 instance, for better bandwidth and latency.
With Python 3 and pip:
pip install -r requirements.txt
You need to setup your AWS credentials, the same way it's done with AWS CLI.
Then run:
export S3_BUCKET=YOUR_BUCKET
./gh2s3.py
docker build -t gh2s3 .
docker run \
--rm \
-e "AWS_ACCESS_KEY_ID=YOUR_ID" \
-e "AWS_SECRET_ACCESS_KEY=YOUR_KEY" \
-e "S3_BUCKET=YOUR_BUCKET" \
gh2s3