Amazon S3 Find and Forget is a solution you deploy in your own AWS account using AWS CloudFormation. There is no charge for the solution: you pay only for the AWS services used to run the solution. This page outlines the services used by the solution, and examples of the charges you should expect for typical usage of the solution.
Disclaimer
You are responsible for the cost of the AWS services used while running this deployment. There is no additional cost for using the solution. For full details, see the following pricing pages for each AWS service you will be using. Prices are subject to change.
The Amazon S3 Find and Forget solution uses a serverless computing architecture. This model minimises costs when you're not actively using the solution, and allows the solution to scale while only paying for what you use.
The sample VPC provided in this solution makes use of VPC Endpoints, which have an hourly cost as well as data transfer cost. All the other costs depend on the usage of the API, and for typical usage, the greatest proportion of what you pay will be for use of Amazon Athena, Amazon S3 and AWS Fargate.
The Forget phase of the solution uses AWS Fargate. Using Fargate, you pay for the duration that Fargate tasks run during the Forget phase.
The AWS Fargate cost is affected by the number of Fargate tasks you choose to run concurrently, and their configuration (vCPU and memory). You can configure these parameters when deploying the Solution.
AWS Glue Data Catalog is used by the solution to define data mappers. You pay a monthly fee based on the number of objects stored in the data catalog, and for requests made to the AWS Glue service when the solution runs.
AWS Lambda Functions are used throughout the solution. You pay for the requests to, and execution time of, these functions. Functions execute when using the solution web interface, API, and when a deletion job runs.
AWS Step Functions Standard Workflows are used when a deletion job runs. You pay for the amount of state transitions in the Step Function Workflow. The number of state transitions will increase with the number of data mappers, and partitions in those data mappers, included in a deletion job.
Amazon API Gateway is used to provide the solution web interface and API. You pay for requests made when using the web interface or API, and any data transferred out.
Amazon Athena scans your data lake during the Find phase of a deletion job. You pay for the Athena queries run based on the amount of data scanned.
You can achieve significant cost savings and performance gains by reducing the quantity of data Athena needs to scan per query by using compression, partitioning and conversion of your data to a columnar format. See Supported Data Formats for more information regarding supported data and compression formats.
The Amazon Athena Pricing page contains an overview of prices and provides a calculator to estimate the Athena query cost for each deletion job run based on the Data Lake size. See Using Workgroups to Control Query Access and Costs for more information on using workgroups to set limits on the amount of data each query or the entire workgroup can process, and to track costs.
If you choose to deploy a CloudFront distribution for the solution interface, you will pay CloudFront charges for requests and data transferred when you access the web interface.
Amazon Cognito provides authentication to secure access to the API using an administrative user created during deployment. You pay a monthly fee for active users in the Cognito User Pool.
Amazon DynamoDB stores internal state data for the solution. All tables created by the solution use the on-demand capacity mode of pricing. You pay for storage used by these tables, and DynamoDB capacity used when interacting with the solution web interface, API, or running a deletion job.
Four types of charges occur when working with Amazon S3: Storage, Requests and data retrievals, Data Transfer, and Management.
Uses of Amazon S3 in the solution include:
The solution uses standard and FIFO SQS queues to handle internal state during a deletion job. You pay for the number of requests made to SQS. The number of requests increases with the number of data mappers, partitions in those data mappers, and the number of Amazon S3 objects processed in a deletion job.
Amazon VPC provides network connectivity for AWS Fargate tasks that run during the Forget phase.
How you build the VPC will determine the prices you pay. For example, VPC Endpoints and NAT Gateways are two different ways to provide network access to the solutions' dependencies. Both ways have different hourly prices and costs for data transferred.
The sample VPC provided in this solution makes use of VPC Endpoints, which have an hourly cost as well as data transfer cost. You can choose to use this sample VPC, however it may be more cost-efficient to use an existing suitable VPC in your account if you have one.
During deployment, the solution uses AWS CodeBuild, AWS CodePipeline and AWS Lambda custom resources to deploy the frontend and the backend. AWS Fargate uses Amazon Elastic Container Registry to store container images.
You are responsible for the cost of the AWS services used while running this solution. As of the date of publication of this version of the source code, the estimated cost to run a job with different Data Lake configurations in the Europe (Ireland) region is shown in the tables below. The estimates do not include VPC costs.
Summary | |
---|---|
Scenario 1 | 100GB Snappy Parquet |
Scenario 2 | 750GB Snappy Parquet |
Scenario 3 | 10TB Snappy Parquet |
Scenario 4 | 50TB Snappy Parquet |
Scenario 5 | 100GB Gzip JSON |
This example shows how the charges would be calculated for a deletion job where:
Service | Spending | Notes |
---|---|---|
Amazon Athena | \$0.03 | 6.8GB of data scanned |
AWS Fargate | \$0.89 | 3 tasks x 4 vCPUs, 30GB memory x 1 hour |
Amazon S3 | \$0.01 | \$0.01 of requests and data retrieval. \$0 of data transfer |
Other services | \$0.05 | n/a |
Total | \$0.98 | n/a |
Note: This estimate doesn't include the costs for Amazon VPC
This example shows how the charges would be calculated for a deletion job where:
Service | Spending | Notes |
---|---|---|
Amazon Athena | \$0.05 | 10GB of data scanned |
AWS Fargate | \$11.07 | 50 tasks x 4 vCPUs, 30GB memory x 0.75 hours |
Amazon S3 | \$0.01 | \$0.01 of requests and data retrieval. \$0 of data transfer |
Other services | \$0.01 | n/a |
Total | \$11.14 | n/a |
Note: This estimate doesn't include the costs for Amazon VPC
This example shows how the charges would be calculated for a deletion job where:
Service | Spending | Notes |
---|---|---|
Amazon Athena | \$0.76 | 156GB of data scanned |
AWS Fargate | \$73.82 | 100 tasks x 4 vCPUs, 30GB memory x 2.5 hours |
Amazon S3 | \$0.11 | \$0.11 of requests and data retrieval. \$0 of data transfer |
Other services | \$1 | n/a |
Total | \$75.69 | n/a |
Note: This estimate doesn't include the costs for Amazon VPC
This example shows how the charges would be calculated for a deletion job where:
Service | Spending | Notes |
---|---|---|
Amazon Athena | \$3.28 | 671GB of data scanned |
AWS Fargate | \$310.03 | 100 tasks x 4 vCPUs, 30GB memory x 10.5 hours |
Amazon S3 | \$0.49 | \$0.49 of requests and data retrieval. \$0 of data transfer |
Other services | \$3 | n/a |
Total | \$316.80 | n/a |
Note: This estimate doesn't include the costs for Amazon VPC
This example shows how the charges would be calculated for a deletion job where:
Service | Spending | Notes |
---|---|---|
Amazon Athena | \$0.50 | 100GB of data scanned |
AWS Fargate | \$5.31 | 50 tasks x 4 vCPUs, 30GB memory x 0.36 hours |
Amazon S3 | \$0.03 | \$0.03 of requests and data retrieval. \$0 of data transfer |
Other services | \$0.05 | n/a |
Total | \$5.89 | n/a |
Note: This estimate doesn't include the costs for Amazon VPC