刘凡 9ff4d1d109 add S3,archive,truncate | 2 years ago | |
---|---|---|
.. | ||
.dockerignore | 2 years ago | |
Dockerfile | 2 years ago | |
README.md | 2 years ago | |
gh2s3.py | 2 years ago | |
requirements.txt | 2 years ago |
It downloads GitHub Archive 2016 data and uploads it to an Amazon S3 bucket.
It's preferred to run it inside an Amazon EC2 instance, for better bandwidth and latency.
With Python 3 and pip:
pip install -r requirements.txt
You need to setup your AWS credentials, the same way it's done with AWS CLI.
Then run:
export S3_BUCKET=YOUR_BUCKET
./gh2s3.py
docker build -t gh2s3 .
docker run \
--rm \
-e "AWS_ACCESS_KEY_ID=YOUR_ID" \
-e "AWS_SECRET_ACCESS_KEY=YOUR_KEY" \
-e "S3_BUCKET=YOUR_BUCKET" \
gh2s3