# Cost Overview Amazon S3 Find and Forget is a solution you deploy in your own AWS account using [AWS CloudFormation]. There is no charge for the solution: you pay only for the AWS services used to run the solution. This page outlines the services used by the solution, and examples of the charges you should expect for typical usage of the solution. > **Disclaimer** > > You are responsible for the cost of the AWS services used while running this > deployment. There is no additional cost for using the solution. For full > details, see the following pricing pages for each AWS service you will be > using. Prices are subject to change. ## Index - [Overview](#overview) - [AWS Fargate](#aws-fargate) - [AWS Glue](#aws-glue) - [AWS Lambda](#aws-lambda) - [AWS Step Functions](#aws-step-functions) - [Amazon API Gateway](#amazon-api-gateway) - [Amazon Athena](#amazon-athena) - [Amazon CloudFront](#amazon-cloudfront) - [Amazon Cognito](#amazon-cognito) - [Amazon DynamoDB](#amazon-dynamodb) - [Amazon S3](#amazon-s3) - [Amazon SQS](#amazon-sqs) - [Amazon VPC](#amazon-vpc) - [Other Supporting Services](#other-supporting-services) - [Solution Cost Estimate](#solution-cost-estimate) - [Scenario 1](#scenario-1) - [Scenario 2](#scenario-2) - [Scenario 3](#scenario-3) - [Scenario 4](#scenario-4) - [Scenario 5](#scenario-5) ## Overview The Amazon S3 Find and Forget solution uses a serverless computing architecture. This model minimises costs when you're not actively using the solution, and allows the solution to scale while only paying for what you use. The sample VPC provided in this solution makes use of VPC Endpoints, which have an hourly cost as well as data transfer cost. All the other costs depend on the usage of the API, and for typical usage, the greatest proportion of what you pay will be for use of Amazon Athena, Amazon S3 and AWS Fargate. ### AWS Fargate The Forget phase of the solution uses AWS Fargate. Using Fargate, you pay for the duration that Fargate tasks run during the Forget phase. The AWS Fargate cost is affected by the number of Fargate tasks you choose to run concurrently, and their configuration (vCPU and memory). You can configure these parameters when deploying the Solution. [AWS Fargate Pricing] ### AWS Glue AWS Glue Data Catalog is used by the solution to define data mappers. You pay a monthly fee based on the number of objects stored in the data catalog, and for requests made to the AWS Glue service when the solution runs. [AWS Glue Pricing] ### AWS Lambda AWS Lambda Functions are used throughout the solution. You pay for the requests to, and execution time of, these functions. Functions execute when using the solution web interface, API, and when a deletion job runs. [AWS Lambda Pricing] ### AWS Step Functions AWS Step Functions Standard Workflows are used when a deletion job runs. You pay for the amount of state transitions in the Step Function Workflow. The number of state transitions will increase with the number of data mappers, and partitions in those data mappers, included in a deletion job. [AWS Step Functions Pricing][deletion job workflow] ### Amazon API Gateway Amazon API Gateway is used to provide the solution web interface and API. You pay for requests made when using the web interface or API, and any data transferred out. [Amazon API Gateway Pricing] ### Amazon Athena Amazon Athena scans your data lake during the _Find phase_ of a deletion job. You pay for the Athena queries run based on the amount of data scanned. You can achieve significant cost savings and performance gains by reducing the quantity of data Athena needs to scan per query by using compression, partitioning and conversion of your data to a columnar format. See [Supported Data Formats](LIMITS.md#supported-data-formats) for more information regarding supported data and compression formats. The [Amazon Athena Pricing] page contains an overview of prices and provides a calculator to estimate the Athena query cost for each deletion job run based on the Data Lake size. See [Using Workgroups to Control Query Access and Costs] for more information on using workgroups to set limits on the amount of data each query or the entire workgroup can process, and to track costs. ### Amazon CloudFront If you choose to deploy a CloudFront distribution for the solution interface, you will pay CloudFront charges for requests and data transferred when you access the web interface. [Amazon CloudFront Pricing] ### Amazon Cognito Amazon Cognito provides authentication to secure access to the API using an administrative user created during deployment. You pay a monthly fee for active users in the Cognito User Pool. [Amazon Cognito Pricing] ### Amazon DynamoDB Amazon DynamoDB stores internal state data for the solution. All tables created by the solution use the on-demand capacity mode of pricing. You pay for storage used by these tables, and DynamoDB capacity used when interacting with the solution web interface, API, or running a deletion job. - [Amazon DynamoDB Pricing] - [Solution Persistence Layer] ### Amazon S3 Four types of charges occur when working with Amazon S3: Storage, Requests and data retrievals, Data Transfer, and Management. Uses of Amazon S3 in the solution include: - The solution web interface is deployed to, and served, from an S3 Bucket - During the _Find_ phase, Amazon Athena will: 1. Retrieve data from Amazon S3 for the columns defined in the data mapper 1. Store its results in an S3 bucket - During the _Forget_ phase, a program run in AWS Fargate processes each object identified in the Find phase will: 1. Retrieve the entire object and its metadata 1. Create a new version of the file, and PUT this object to a staging bucket 1. Delete the original object 1. Copy the updated object from the staging bucket to the data bucket, and sets any metadata identified from the original object 1. Delete the object from the staging bucket - Some artefacts, and state data relating to AWS Step Functions Workflows may be stored in S3 [Amazon S3 Pricing] ### Amazon SQS The solution uses standard and FIFO SQS queues to handle internal state during a deletion job. You pay for the number of requests made to SQS. The number of requests increases with the number of data mappers, partitions in those data mappers, and the number of Amazon S3 objects processed in a deletion job. [Amazon SQS Pricing] ### Amazon VPC Amazon VPC provides network connectivity for AWS Fargate tasks that run during the _Forget_ phase. How you build the VPC will determine the prices you pay. For example, VPC Endpoints and NAT Gateways are two different ways to provide network access to the solutions' dependencies. Both ways have different hourly prices and costs for data transferred. The sample VPC provided in this solution makes use of VPC Endpoints, which have an hourly cost as well as data transfer cost. You can choose to use this sample VPC, however it may be more cost-efficient to use an existing suitable VPC in your account if you have one. - [Amazon VPC Pricing] - [AWS PrivateLink Pricing] ### Other Supporting Services During deployment, the solution uses [AWS CodeBuild], [AWS CodePipeline] and [AWS Lambda] custom resources to deploy the frontend and the backend. [AWS Fargate] uses [Amazon Elastic Container Registry] to store container images. ## Solution Cost Estimate You are responsible for the cost of the AWS services used while running this solution. As of the date of publication of this version of the source code, the estimated cost to run a job with different Data Lake configurations in the Europe (Ireland) region is shown in the tables below. The estimates do not include VPC costs. | Summary | | | ------------------------- | -------------------- | | [Scenario 1](#scenario-1) | 100GB Snappy Parquet | | [Scenario 2](#scenario-2) | 750GB Snappy Parquet | | [Scenario 3](#scenario-3) | 10TB Snappy Parquet | | [Scenario 4](#scenario-4) | 50TB Snappy Parquet | | [Scenario 5](#scenario-5) | 100GB Gzip JSON | ### Scenario 1 This example shows how the charges would be calculated for a deletion job where: - Your dataset is 100GB of Snappy compressed Parquet objects that are distributed across 2 Partitions - The S3 bucket containing the objects is in the same region as the S3 Find and Forget Solution - The total size of the data held in the column queried by Athena is 6.8GB - The Find phase returns 15 objects which need to be modified - The Forget phase uses 3 Fargate tasks with 4 vCPUs and 30GB of memory each, running concurrently for 60 minutes | Service | Spending | Notes | | -------------- | -------- | ----------------------------------------------------------- | | Amazon Athena | \$0.03 | 6.8GB of data scanned | | AWS Fargate | \$0.89 | 3 tasks x 4 vCPUs, 30GB memory x 1 hour | | Amazon S3 | \$0.01 | \$0.01 of requests and data retrieval. \$0 of data transfer | | Other services | \$0.05 | n/a | | Total | \$0.98 | n/a | > Note: This estimate doesn't include the costs for Amazon VPC ### Scenario 2 This example shows how the charges would be calculated for a deletion job where: - Your dataset is 750GB of Snappy compressed Parquet objects that are distributed across 1000 Partitions - The S3 bucket containing the objects is in the same region as the S3 Find and Forget Solution - The total size of the data held in the column queried by Athena is 10GB - The Find phase returns 1000 objects which need to be modified - The Forget phase uses 50 Fargate tasks with 4 vCPUs and 30GB of memory each, running concurrently for 45 minutes | Service | Spending | Notes | | -------------- | -------- | ----------------------------------------------------------- | | Amazon Athena | \$0.05 | 10GB of data scanned | | AWS Fargate | \$11.07 | 50 tasks x 4 vCPUs, 30GB memory x 0.75 hours | | Amazon S3 | \$0.01 | \$0.01 of requests and data retrieval. \$0 of data transfer | | Other services | \$0.01 | n/a | | Total | \$11.14 | n/a | > Note: This estimate doesn't include the costs for Amazon VPC ### Scenario 3 This example shows how the charges would be calculated for a deletion job where: - Your dataset is 10TB of Snappy compressed Parquet objects that are distributed across 2000 Partitions - The S3 bucket containing the objects is in the same region as the S3 Find and Forget Solution - The total size of the data held in the column queried by Athena is 156GB - The Find phase returns 11000 objects which need to be modified - The Forget phase uses 100 Fargate tasks with 4 vCPUs and 30GB of memory each, running concurrently for 150 minutes | Service | Spending | Notes | | -------------- | -------- | ----------------------------------------------------------- | | Amazon Athena | \$0.76 | 156GB of data scanned | | AWS Fargate | \$73.82 | 100 tasks x 4 vCPUs, 30GB memory x 2.5 hours | | Amazon S3 | \$0.11 | \$0.11 of requests and data retrieval. \$0 of data transfer | | Other services | \$1 | n/a | | Total | \$75.69 | n/a | > Note: This estimate doesn't include the costs for Amazon VPC ### Scenario 4 This example shows how the charges would be calculated for a deletion job where: - Your dataset is 50TB of Snappy compressed Parquet objects that are distributed across 5300 Partitions - The S3 bucket containing the objects is in the same region as the S3 Find and Forget Solution - The total size of the data held in the column queried by Athena is 671GB - The Find phase returns 45300 objects which need to be modified - The Forget phase uses 100 Fargate tasks with 4 vCPUs and 30GB of memory each, running concurrently for 10.5 hours | Service | Spending | Notes | | -------------- | -------- | ----------------------------------------------------------- | | Amazon Athena | \$3.28 | 671GB of data scanned | | AWS Fargate | \$310.03 | 100 tasks x 4 vCPUs, 30GB memory x 10.5 hours | | Amazon S3 | \$0.49 | \$0.49 of requests and data retrieval. \$0 of data transfer | | Other services | \$3 | n/a | | Total | \$316.80 | n/a | > Note: This estimate doesn't include the costs for Amazon VPC ### Scenario 5 This example shows how the charges would be calculated for a deletion job where: - Your dataset is 100GB of Gzip compressed JSON objects that are distributed across 310 Partitions - The S3 bucket containing the objects is in the same region as the S3 Find and Forget Solution - The Find phase returns 3500 objects which need to be modified - The Forget phase uses 50 Fargate tasks with 4 vCPUs and 30GB of memory each, running concurrently for 22 minutes | Service | Spending | Notes | | -------------- | -------- | ----------------------------------------------------------- | | Amazon Athena | \$0.50 | 100GB of data scanned | | AWS Fargate | \$5.31 | 50 tasks x 4 vCPUs, 30GB memory x 0.36 hours | | Amazon S3 | \$0.03 | \$0.03 of requests and data retrieval. \$0 of data transfer | | Other services | \$0.05 | n/a | | Total | \$5.89 | n/a | > Note: This estimate doesn't include the costs for Amazon VPC [aws cloudformation]: https://aws.amazon.com/cloudformation/ [aws codebuild]: https://aws.amazon.com/codebuild/pricing/ [aws codepipeline]: https://aws.amazon.com/codepipeline/pricing/ [aws fargate pricing]: https://aws.amazon.com/fargate/pricing/ [aws fargate]: https://aws.amazon.com/fargate/pricing/ [aws glue pricing]: https://aws.amazon.com/glue/pricing/ [aws lambda pricing]: https://aws.amazon.com/lambda/pricing/ [aws lambda]: https://aws.amazon.com/lambda/pricing/ [aws privatelink pricing]: https://aws.amazon.com/privatelink/pricing/ [aws step functions pricing]: https://aws.amazon.com/step-functions/pricing/ [amazon api gateway pricing]: https://aws.amazon.com/api-gateway/pricing/ [amazon athena pricing]: https://aws.amazon.com/athena/pricing/ [amazon cloudfront pricing]: https://aws.amazon.com/cloudfront/pricing/ [amazon cognito pricing]: https://aws.amazon.com/cognito/pricing/ [amazon dynamodb pricing]: https://aws.amazon.com/dynamodb/pricing/ [amazon elastic container registry]: https://aws.amazon.com/ecr/pricing/ [amazon s3 pricing]: https://aws.amazon.com/s3/pricing/ [amazon sqs pricing]: https://aws.amazon.com/sqs/pricing/ [amazon vpc pricing]: https://aws.amazon.com/vpc/pricing/ [deletion job workflow]: ARCHITECTURE.md#deletion-job-workflow [solution persistence layer]: ARCHITECTURE.md#persistence-layer [using workgroups to control query access and costs]: https://docs.aws.amazon.com/athena/latest/ug/manage-queries-control-costs-with-workgroups.html [vpc configuration]: USER_GUIDE.md#pre-requisite-Configuring-a-vpc-for-the-solution [some VPC endpoints]: [https://github.com/awslabs/amazon-s3-find-and-forget/blob/master/templates/vpc.yaml]