刘凡 9ff4d1d109 add S3,archive,truncate | преди 2 години | |
---|---|---|
.. | ||
.dockerignore | преди 2 години | |
Dockerfile | преди 2 години | |
README.md | преди 2 години | |
gh2s3.py | преди 2 години | |
requirements.txt | преди 2 години |
It downloads GitHub Archive 2016 data and uploads it to an Amazon S3 bucket.
It's preferred to run it inside an Amazon EC2 instance, for better bandwidth and latency.
With Python 3 and pip:
pip install -r requirements.txt
You need to setup your AWS credentials, the same way it's done with AWS CLI.
Then run:
export S3_BUCKET=YOUR_BUCKET
./gh2s3.py
docker build -t gh2s3 .
docker run \
--rm \
-e "AWS_ACCESS_KEY_ID=YOUR_ID" \
-e "AWS_SECRET_ACCESS_KEY=YOUR_KEY" \
-e "S3_BUCKET=YOUR_BUCKET" \
gh2s3