# ckanext-s3multipart An extension to allow client-side multipart uploads of files to Amazon S3. This allows 4 concurrent uploads of 5mb parts of a large file with retrying of failed parts as well as file upload progress. Access to S3 is only made available with temporary 60 minute AWS API keys to users with CKAN access to create packages. ## Requirements "boto3" python library Amazon Web Services account for S3 API usage ## Installation To install ckanext-s3multipart: 1. Activate your CKAN virtual environment, for example: . /usr/lib/ckan/default/bin/activate 2. Install the ckanext-s3multipart Python package into your virtual environment: git clone https://github.com/maxious/ckanext-s3multipart.git cd ckanext-s3multipart python setup.py develop pip install -r dev-requirements.txt 3. Add `s3multipart` to the `ckan.plugins` setting in your CKAN config file (by default the config file is located at `/etc/ckan/default/production.ini`). 4. Create an s3 bucket and set Bucket Name and Region in CKAN config file ​5. You need to allow CORS access to your bucket Make sure your CORS settings for your S3 bucket looks similar to what is provided below (The PUT allowed method and the ETag exposed header are critical). https://*.yourdomain.com http://*.yourdomain.com PUT POST DELETE 3000 x-amz-version-id ETag * * GET * ​5. Create an S3 access policy. For maximum security, rather than using the Amazon managed policies, create a custom IAM policy. You should also insert the name of your bucket in the resource clause to further limit access eg. "Resource": "arn:aws:s3:::bucketname/\*" (Remove any spaces before the first bracket after copying) { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject\*", "s3:GetBucketLocation", "s3:PutObject\*", "s3:\*Multipart\*" ], "Resource": "\*" } ] } ​6. Setup an s3 access only IAM role The type of role is "Role for Cross-Account Access -\> Provide access between AWS accounts you own" (the account will be accessing itself) You'll need your 12 digit Amazon Account ID from the Billing Information control panel Then create add the role name to the ckan config file ​6. Set up a IAM user with the S3 access policy and also AWS Security Token Service access to AssumeRole. You should also insert the name of the IAM role in the resource clause to further limit access eg. "Resource": "arn:aws:iam::1234:role/S3MultipartUploadOnly" { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "sts:AssumeRole", "Resource": "\*" } ] } Make those credentials available to "boto" the python library for AWS eg. by creating environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY 7. Restart CKAN. For example if you've deployed CKAN with Apache on Ubuntu: sudo service apache2 reload Config Settings =============== # S3 bucket name ckanext.s3multipart.s3_bucket = bucket_name # S3 region eg. ap-southeast-2 ckanext.s3multipart.s3_region = region_name # S3 IAM role ARN eg. "arn:aws:iam::$account-id:role/$role-name" ckanext.s3multipart.s3_role = arn:aws:iam::1234:role/S3MultipartUploadOnly # organisations that can use s3, as url slugs space seperated ckanext.s3multipart.enabled_orgs = abc dta dto TODOs ===== Additional Key Value metadata including original portal, user, package and resource id when uploaded via JS Limit file upload size using IAM policy on content-length headers? Prefill metadata like format Make persistent download links appear/redirect in web UI. Analytics for downloads