README.md 15 KB

s3tk

A security toolkit for Amazon S3

Another day, another leaky Amazon S3 bucket

— The Register, 12 Jul 2017

Don’t be the... next... big... data... leak

Screenshot

:tangerine: Battle-tested at Instacart

Installation

Run:

pip install s3tk

You can use the AWS CLI or AWS Vault to set up your AWS credentials:

pip install awscli
aws configure

See IAM policies needed for each command.

Commands

Scan

Scan your buckets for:

  • ACL open to public
  • policy open to public
  • public access blocked
  • logging enabled
  • versioning enabled
  • default encryption enabled
s3tk scan

Only run on specific buckets

s3tk scan my-bucket my-bucket-2

Also works with wildcards

s3tk scan "my-bucket*"

Confirm correct log bucket(s) and prefix

s3tk scan --log-bucket my-s3-logs --log-bucket other-region-logs --log-prefix "{bucket}/"

Check CloudTrail object-level logging [experimental]

s3tk scan --object-level-logging

Skip logging, versioning, or default encryption

s3tk scan --skip-logging --skip-versioning --skip-default-encryption

Get email notifications of failures (via SNS)

s3tk scan --sns-topic arn:aws:sns:...

List Policy

List bucket policies

s3tk list-policy

Only run on specific buckets

s3tk list-policy my-bucket my-bucket-2

Show named statements

s3tk list-policy --named

Set Policy

Note: This replaces the previous policy

Only private uploads

s3tk set-policy my-bucket --no-object-acl

Delete Policy

Delete policy

s3tk delete-policy my-bucket

Block Public Access

Block public access on specific buckets

s3tk block-public-access my-bucket my-bucket-2

Use the --dry-run flag to test

Enable Logging

Enable logging on all buckets

s3tk enable-logging --log-bucket my-s3-logs

Only on specific buckets

s3tk enable-logging my-bucket my-bucket-2 --log-bucket my-s3-logs

Set log prefix ({bucket}/ by default)

s3tk enable-logging --log-bucket my-s3-logs --log-prefix "logs/{bucket}/"

Use the --dry-run flag to test

A few notes about logging:

  • buckets with logging already enabled are not updated at all
  • the log bucket must in the same region as the source bucket - run this command multiple times for different regions
  • it can take over an hour for logs to show up

Enable Versioning

Enable versioning on all buckets

s3tk enable-versioning

Only on specific buckets

s3tk enable-versioning my-bucket my-bucket-2

Use the --dry-run flag to test

Enable Default Encryption

Enable default encryption on all buckets

s3tk enable-default-encryption

Only on specific buckets

s3tk enable-default-encryption my-bucket my-bucket-2

This does not encrypt existing objects - use the encrypt command for this

Use the --dry-run flag to test

Scan Object ACL

Scan ACL on all objects in a bucket

s3tk scan-object-acl my-bucket

Only certain objects

s3tk scan-object-acl my-bucket --only "*.pdf"

Except certain objects

s3tk scan-object-acl my-bucket --except "*.jpg"

Reset Object ACL

Reset ACL on all objects in a bucket

s3tk reset-object-acl my-bucket

This makes all objects private. See bucket policies for how to enforce going forward.

Use the --dry-run flag to test

Specify certain objects the same way as scan-object-acl

Encrypt

Encrypt all objects in a bucket with server-side encryption

s3tk encrypt my-bucket

Use S3-managed keys by default. For KMS-managed keys, use:

s3tk encrypt my-bucket --kms-key-id arn:aws:kms:...

For customer-provided keys, use:

s3tk encrypt my-bucket --customer-key secret-key

Use the --dry-run flag to test

Specify certain objects the same way as scan-object-acl

Note: Objects will lose any custom ACL

Delete Unencrypted Versions

Delete all unencrypted versions of objects in a bucket

s3tk delete-unencrypted-versions my-bucket

For safety, this will not delete any current versions of objects

Use the --dry-run flag to test

Specify certain objects the same way as scan-object-acl

Scan DNS

Scan Route 53 for buckets to make sure you own them

s3tk scan-dns

Otherwise, you may be susceptible to subdomain takeover

Credentials

Credentials can be specified in ~/.aws/credentials or with environment variables. See this guide for an explanation of environment variables.

You can specify a profile to use with:

AWS_PROFILE=your-profile s3tk

IAM Policies

Here are the permissions needed for each command. Only include statements you need.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Scan",
            "Effect": "Allow",
            "Action": [
                "s3:ListAllMyBuckets",
                "s3:GetBucketAcl",
                "s3:GetBucketPolicy",
                "s3:GetBucketPublicAccessBlock",
                "s3:GetBucketLogging",
                "s3:GetBucketVersioning",
                "s3:GetEncryptionConfiguration"
            ],
            "Resource": "*"
        },
        {
            "Sid": "ScanObjectLevelLogging",
            "Effect": "Allow",
            "Action": [
                "cloudtrail:ListTrails",
                "cloudtrail:GetTrail",
                "cloudtrail:GetEventSelectors",
                "s3:GetBucketLocation"
            ],
            "Resource": "*"
        },
        {
            "Sid": "ScanDNS",
            "Effect": "Allow",
            "Action": [
                "s3:ListAllMyBuckets",
                "route53:ListHostedZones",
                "route53:ListResourceRecordSets"
            ],
            "Resource": "*"
        },
        {
            "Sid": "ListPolicy",
            "Effect": "Allow",
            "Action": [
                "s3:ListAllMyBuckets",
                "s3:GetBucketPolicy"
            ],
            "Resource": "*"
        },
        {
            "Sid": "SetPolicy",
            "Effect": "Allow",
            "Action": [
                "s3:PutBucketPolicy"
            ],
            "Resource": "*"
        },
        {
            "Sid": "DeletePolicy",
            "Effect": "Allow",
            "Action": [
                "s3:DeleteBucketPolicy"
            ],
            "Resource": "*"
        },
        {
            "Sid": "BlockPublicAccess",
            "Effect": "Allow",
            "Action": [
                "s3:ListAllMyBuckets",
                "s3:PutBucketPublicAccessBlock"
            ],
            "Resource": "*"
        },
        {
            "Sid": "EnableLogging",
            "Effect": "Allow",
            "Action": [
                "s3:ListAllMyBuckets",
                "s3:PutBucketLogging"
            ],
            "Resource": "*"
        },
        {
            "Sid": "EnableVersioning",
            "Effect": "Allow",
            "Action": [
                "s3:ListAllMyBuckets",
                "s3:PutBucketVersioning"
            ],
            "Resource": "*"
        },
        {
            "Sid": "EnableDefaultEncryption",
            "Effect": "Allow",
            "Action": [
                "s3:ListAllMyBuckets",
                "s3:PutEncryptionConfiguration"
            ],
            "Resource": "*"
        },
        {
            "Sid": "ResetObjectAcl",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:GetObjectAcl",
                "s3:PutObjectAcl"
            ],
            "Resource": [
                "arn:aws:s3:::my-bucket",
                "arn:aws:s3:::my-bucket/*"
            ]
        },
        {
            "Sid": "Encrypt",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:GetObject",
                "s3:PutObject"
            ],
            "Resource": [
                "arn:aws:s3:::my-bucket",
                "arn:aws:s3:::my-bucket/*"
            ]
        },
        {
            "Sid": "DeleteUnencryptedVersions",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucketVersions",
                "s3:GetObjectVersion",
                "s3:DeleteObjectVersion"
            ],
            "Resource": [
                "arn:aws:s3:::my-bucket",
                "arn:aws:s3:::my-bucket/*"
            ]
        }
    ]
}

Access Logs

Amazon Athena is great for querying S3 logs. Create a table (thanks to this post for the table structure) with:

CREATE EXTERNAL TABLE my_bucket (
    bucket_owner string,
    bucket string,
    time string,
    remote_ip string,
    requester string,
    request_id string,
    operation string,
    key string,
    request_verb string,
    request_url string,
    request_proto string,
    status_code string,
    error_code string,
    bytes_sent string,
    object_size string,
    total_time string,
    turn_around_time string,
    referrer string,
    user_agent string,
    version_id string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
    'serialization.format' = '1',
    'input.regex' = '([^ ]*) ([^ ]*) \\[(.*?)\\] ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) \\\"([^ ]*) ([^ ]*) (- |[^ ]*)\\\" (-|[0-9]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) (\"[^\"]*\\") ([^ ]*)$'
) LOCATION 's3://my-s3-logs/my-bucket/';

Change the last line to point to your log bucket (and prefix) and query away

SELECT
    date_parse(time, '%d/%b/%Y:%H:%i:%S +0000') AS time,
    request_url,
    remote_ip,
    user_agent
FROM
    my_bucket
WHERE
    requester = '-'
    AND status_code LIKE '2%'
    AND request_url LIKE '/some-keys%'
ORDER BY 1

CloudTrail Logs

Amazon Athena is also great for querying CloudTrail logs. Create a table (thanks to this post for the table structure) with:

CREATE EXTERNAL TABLE cloudtrail_logs (
    eventversion STRING,
    userIdentity STRUCT<
        type:STRING,
        principalid:STRING,
        arn:STRING,
        accountid:STRING,
        invokedby:STRING,
        accesskeyid:STRING,
        userName:String,
        sessioncontext:STRUCT<
            attributes:STRUCT<
                mfaauthenticated:STRING,
                creationdate:STRING>,
            sessionIssuer:STRUCT<
                type:STRING,
                principalId:STRING,
                arn:STRING,
                accountId:STRING,
                userName:STRING>>>,
    eventTime STRING,
    eventSource STRING,
    eventName STRING,
    awsRegion STRING,
    sourceIpAddress STRING,
    userAgent STRING,
    errorCode STRING,
    errorMessage STRING,
    requestId  STRING,
    eventId  STRING,
    resources ARRAY<STRUCT<
        ARN:STRING,
        accountId:STRING,
        type:STRING>>,
    eventType STRING,
    apiVersion  STRING,
    readOnly BOOLEAN,
    recipientAccountId STRING,
    sharedEventID STRING,
    vpcEndpointId STRING,
    requestParameters STRING,
    responseElements STRING,
    additionalEventData STRING,
    serviceEventDetails STRING
)
ROW FORMAT SERDE 'com.amazon.emr.hive.serde.CloudTrailSerde'
STORED  AS INPUTFORMAT 'com.amazon.emr.cloudtrail.CloudTrailInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION  's3://my-cloudtrail-logs/'

Change the last line to point to your CloudTrail log bucket and query away

SELECT
    eventTime,
    eventName,
    userIdentity.userName,
    requestParameters
FROM
    cloudtrail_logs
WHERE
    eventName LIKE '%Bucket%'
ORDER BY 1

Best Practices

Keep things simple and follow the principle of least privilege to reduce the chance of mistakes.

  • Strictly limit who can perform bucket-related operations
  • Avoid mixing objects with different permissions in the same bucket (use a bucket policy to enforce this)
  • Don’t specify public read permissions on a bucket level (no GetObject in bucket policy)
  • Monitor configuration frequently for changes

Bucket Policies

Only private uploads

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Deny",
            "Principal": "*",
            "Action": "s3:PutObjectAcl",
            "Resource": "arn:aws:s3:::my-bucket/*"
        }
    ]
}

Performance

For commands that iterate over bucket objects (scan-object-acl, reset-object-acl, encrypt, and delete-unencrypted-versions), run s3tk on an EC2 server for minimum latency.

Notes

The set-policy, block-public-access, enable-logging, enable-versioning, and enable-default-encryption commands are provided for convenience. We recommend Terraform for managing your buckets.

resource "aws_s3_bucket" "my_bucket" {
  bucket = "my-bucket"
  acl    = "private"

  logging {
    target_bucket = "my-s3-logs"
    target_prefix = "my-bucket/"
  }

  versioning {
    enabled = true
  }
}

resource "aws_s3_bucket_public_access_block" "my_bucket" {
  bucket = "${aws_s3_bucket.my_bucket.id}"

  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

Upgrading

Run:

pip install s3tk --upgrade

To use master, run:

pip install git+https://github.com/ankane/s3tk.git --upgrade

Docker

Run:

docker run -it ankane/s3tk aws configure

Commit your credentials:

docker commit $(docker ps -l -q) my-s3tk

And run:

docker run -it my-s3tk s3tk scan

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/ankane/s3tk.git
cd s3tk
pip install -r requirements.txt