|
@@ -0,0 +1,942 @@
|
|
|
+# User Guide
|
|
|
+
|
|
|
+This section describes how to install, configure and use the Amazon S3 Find and
|
|
|
+Forget solution.
|
|
|
+
|
|
|
+## Index
|
|
|
+
|
|
|
+- [User Guide](#user-guide)
|
|
|
+ - [Index](#index)
|
|
|
+ - [Pre-requisites](#pre-requisites)
|
|
|
+ - [Configuring a VPC for the Solution](#configuring-a-vpc-for-the-solution)
|
|
|
+ - [Creating a New VPC](#creating-a-new-vpc)
|
|
|
+ - [Using an Existing VPC](#using-an-existing-vpc)
|
|
|
+ - [Provisioning Data Access IAM Roles](#provisioning-data-access-iam-roles)
|
|
|
+ - [Deploying the Solution](#deploying-the-solution)
|
|
|
+ - [Accessing the application](#accessing-the-application)
|
|
|
+ - [Logging in for the first time (only relevant if the Web UI is deployed)](#logging-in-for-the-first-time-only-relevant-if-the-web-ui-is-deployed)
|
|
|
+ - [Managing users (only relevant if Cognito is chosen for authentication)](#managing-users-only-relevant-if-cognito-is-chosen-for-authentication)
|
|
|
+ - [Making authenticated API requests](#making-authenticated-api-requests)
|
|
|
+ - [Cognito](#cognito)
|
|
|
+ - [IAM](#iam)
|
|
|
+ - [Integrating the solution with other applications using CloudFormation stack outputs](#integrating-the-solution-with-other-applications-using-cloudformation-stack-outputs)
|
|
|
+ - [Configuring Data Mappers](#configuring-data-mappers)
|
|
|
+ - [AWS Lake Formation Configuration](#aws-lake-formation-configuration)
|
|
|
+ - [Data Mapper Creation](#data-mapper-creation)
|
|
|
+ - [Granting Access to Data](#granting-access-to-data)
|
|
|
+ - [Updating your Bucket Policy](#updating-your-bucket-policy)
|
|
|
+ - [Data Encrypted with a Customer Managed CMK](#data-encrypted-with-a-customer-managed-cmk)
|
|
|
+ - [Adding to the Deletion Queue](#adding-to-the-deletion-queue)
|
|
|
+ - [Running a Deletion Job](#running-a-deletion-job)
|
|
|
+ - [Deletion Job Statuses](#deletion-job-statuses)
|
|
|
+ - [Deletion Job Event Types](#deletion-job-event-types)
|
|
|
+ - [Adjusting Configuration](#adjusting-configuration)
|
|
|
+ - [Updating the Solution](#updating-the-solution)
|
|
|
+ - [Identify current solution version](#identify-current-solution-version)
|
|
|
+ - [Identify the Stack URL to deploy](#identify-the-stack-url-to-deploy)
|
|
|
+ - [Minor Upgrades: Perform CloudFormation Stack Update](#minor-upgrades-perform-cloudformation-stack-update)
|
|
|
+ - [Major Upgrades: Manual Rolling Deployment](#major-upgrades-manual-rolling-deployment)
|
|
|
+ - [Deleting the Solution](#deleting-the-solution)
|
|
|
+
|
|
|
+## Pre-requisites
|
|
|
+
|
|
|
+### Configuring a VPC for the Solution
|
|
|
+
|
|
|
+The Fargate tasks used by this solution to perform deletions must be able to
|
|
|
+access the following AWS services, either via an Internet Gateway or via [VPC
|
|
|
+Endpoints]:
|
|
|
+
|
|
|
+- Amazon S3 (gateway endpoint _com.amazonaws.**region**.s3_)
|
|
|
+- Amazon DynamoDB (gateway endpoint _com.amazonaws.**region**.dynamodb_)
|
|
|
+- Amazon CloudWatch Monitoring (interface endpoint
|
|
|
+ _com.amazonaws.**region**.monitoring_) and Logs (interface endpoint
|
|
|
+ _com.amazonaws.**region**.logs_)
|
|
|
+- AWS ECR API (interface endpoint _com.amazonaws.**region**.ecr.api_) and Docker
|
|
|
+ (interface endpoint _com.amazonaws.**region**.ecr.dkr_)
|
|
|
+- Amazon SQS (interface endpoint _com.amazonaws.**region**.sqs_)
|
|
|
+- AWS STS (interface endpoint _com.amazonaws.**region**.sts_)
|
|
|
+- AWS KMS (interface endpoint _com.amazonaws.**region**.kms_) - **required only
|
|
|
+ if S3 Objects are encrypted using AWS KMS client-side encryption**
|
|
|
+
|
|
|
+#### Creating a New VPC
|
|
|
+
|
|
|
+By default the CloudFormation template will create a new VPC that has been
|
|
|
+purpose-built for the solution. The VPC includes VPC endpoints for the
|
|
|
+aforementioned services, and does not provision internet connectivity.
|
|
|
+
|
|
|
+You can use the provided VPC to operate the solution with no further
|
|
|
+customisations. However, if you have more complex requirements it is recommended
|
|
|
+to use an existing VPC as described in the following section.
|
|
|
+
|
|
|
+#### Using an Existing VPC
|
|
|
+
|
|
|
+Amazon S3 Find and Forget can also be used in an existing VPC. You may want to
|
|
|
+do this if you have requirements that aren't met by using the VPC provided with
|
|
|
+the solution.
|
|
|
+
|
|
|
+To use an existing VPC, set the `DeployVpc` parameter to `false` when launching
|
|
|
+the solution CloudFormation stack. You must also specify the subnet and security
|
|
|
+groups that the Fargate tasks will use by setting the `VpcSubnets` and
|
|
|
+`VpcSecurityGroups` parameters respectively.
|
|
|
+
|
|
|
+The subnets and security groups that you specify must allow the tasks to connect
|
|
|
+to the aforementioned AWS services. Forget solution, you must ensure that when
|
|
|
+deploying the solution you select subnets and security groups which permit
|
|
|
+access to the aforementioned services and you set _DeployVpc_ to false.
|
|
|
+
|
|
|
+You can obtain your subnet and security group IDs from the AWS Console or by
|
|
|
+using the AWS CLI. If using the AWS CLI, you can use the following command to
|
|
|
+get a list of VPCs:
|
|
|
+
|
|
|
+```bash
|
|
|
+aws ec2 describe-vpcs \
|
|
|
+ --query 'Vpcs[*].{ID:VpcId,Name:Tags[?Key==`Name`].Value | [0], IsDefault: IsDefault}'
|
|
|
+```
|
|
|
+
|
|
|
+Once you have found the VPC you wish to use, to get a list of subnets and
|
|
|
+security groups in that VPC:
|
|
|
+
|
|
|
+```bash
|
|
|
+export VPC_ID=<chosen-vpc-id>
|
|
|
+aws ec2 describe-subnets \
|
|
|
+ --filter Name=vpc-id,Values="$VPC_ID" \
|
|
|
+ --query 'Subnets[*].{ID:SubnetId,Name:Tags[?Key==`Name`].Value | [0],AZ:AvailabilityZone}'
|
|
|
+aws ec2 describe-security-groups \
|
|
|
+ --filter Name=vpc-id,Values="$VPC_ID" \
|
|
|
+ --query 'SecurityGroups[*].{ID:GroupId,Name:GroupName}'
|
|
|
+```
|
|
|
+
|
|
|
+### Provisioning Data Access IAM Roles
|
|
|
+
|
|
|
+The Fargate tasks used by this solution to perform deletions require a specific
|
|
|
+IAM role to exist in each account that owns a bucket that you will use with the
|
|
|
+solution. The role must have the exact name **S3F2DataAccessRole** (no path). A
|
|
|
+CloudFormation template is available as part of this solution which can be
|
|
|
+deployed separately to the main stack in each account. A way to deploy this role
|
|
|
+to many accounts, for example across your organization, is to use [AWS
|
|
|
+CloudFormation StackSets].
|
|
|
+
|
|
|
+To deploy this template manually, use the IAM Role Template "Deploy to AWS
|
|
|
+button" in [Deploying the Solution](#deploying-the-solution) then follow steps
|
|
|
+5-9. The **Outputs** tab will contain the Role ARN which you will need when
|
|
|
+adding data mappers.
|
|
|
+
|
|
|
+You will need to grant this role read and write access to your data. We
|
|
|
+recommend you do this using a bucket policy. For more information, see
|
|
|
+[Granting Access to Data](#granting-access-to-data).
|
|
|
+
|
|
|
+## Deploying the Solution
|
|
|
+
|
|
|
+The solution is deployed as an
|
|
|
+[AWS CloudFormation](https://aws.amazon.com/cloudformation) template and should
|
|
|
+take about 20 to 40 minutes to deploy.
|
|
|
+
|
|
|
+Your access to the AWS account must have IAM permissions to launch AWS
|
|
|
+CloudFormation templates that create IAM roles and to create the solution
|
|
|
+resources.
|
|
|
+
|
|
|
+> **Note** You are responsible for the cost of the AWS services used while
|
|
|
+> running this solution. For full details, see the pricing pages for each AWS
|
|
|
+> service you will be using in this sample. Prices are subject to change.
|
|
|
+
|
|
|
+1. Deploy the latest CloudFormation template using the AWS Console by choosing
|
|
|
+ the "_Launch Template_" button below for your preferred AWS region. If you
|
|
|
+ wish to [deploy using the AWS CLI] instead, you can refer to the "_Template
|
|
|
+ Link_" to download the template files.
|
|
|
+
|
|
|
+| Region | Launch Template | Template Link | Launch IAM Role Template | IAM Role Template Link |
|
|
|
+| ------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------- |
|
|
|
+| **US East (N. Virginia)** (us-east-1) | [Launch](https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/new?stackName=S3F2&templateURL=https://solution-builders-us-east-1.s3.us-east-1.amazonaws.com/amazon-s3-find-and-forget/latest/template.yaml) | [Link](https://solution-builders-us-east-1.s3.us-east-1.amazonaws.com/amazon-s3-find-and-forget/latest/template.yaml) | [Launch](https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/new?stackName=S3F2-Role&templateURL=https://solution-builders-us-east-1.s3.us-east-1.amazonaws.com/amazon-s3-find-and-forget/latest/role.yaml) | [Link](https://solution-builders-us-east-1.s3.us-east-1.amazonaws.com/amazon-s3-find-and-forget/latest/role.yaml) |
|
|
|
+| **US East (Ohio)** (us-east-2) | [Launch](https://console.aws.amazon.com/cloudformation/home?region=us-east-2#/stacks/new?stackName=S3F2&templateURL=https://solution-builders-us-east-2.s3.us-east-2.amazonaws.com/amazon-s3-find-and-forget/latest/template.yaml) | [Link](https://solution-builders-us-east-2.s3.us-east-2.amazonaws.com/amazon-s3-find-and-forget/latest/template.yaml) | [Launch](https://console.aws.amazon.com/cloudformation/home?region=us-east-2#/stacks/new?stackName=S3F2-Role&templateURL=https://solution-builders-us-east-2.s3.us-east-2.amazonaws.com/amazon-s3-find-and-forget/latest/role.yaml) | [Link](https://solution-builders-us-east-2.s3.us-east-2.amazonaws.com/amazon-s3-find-and-forget/latest/role.yaml) |
|
|
|
+| **US West (Oregon)** (us-west-2) | [Launch](https://console.aws.amazon.com/cloudformation/home?region=us-west-2#/stacks/new?stackName=S3F2&templateURL=https://solution-builders-us-west-2.s3.us-west-2.amazonaws.com/amazon-s3-find-and-forget/latest/template.yaml) | [Link](https://solution-builders-us-west-2.s3.us-west-2.amazonaws.com/amazon-s3-find-and-forget/latest/template.yaml) | [Launch](https://console.aws.amazon.com/cloudformation/home?region=us-west-2#/stacks/new?stackName=S3F2-Role&templateURL=https://solution-builders-us-west-2.s3.us-west-2.amazonaws.com/amazon-s3-find-and-forget/latest/role.yaml) | [Link](https://solution-builders-us-west-2.s3.us-west-2.amazonaws.com/amazon-s3-find-and-forget/latest/role.yaml) |
|
|
|
+| **Asia Pacific (Sydney)** (ap-southeast-2) | [Launch](https://console.aws.amazon.com/cloudformation/home?region=ap-southeast-2#/stacks/new?stackName=S3F2&templateURL=https://solution-builders-ap-southeast-2.s3.ap-southeast-2.amazonaws.com/amazon-s3-find-and-forget/latest/template.yaml) | [Link](https://solution-builders-ap-southeast-2.s3.ap-southeast-2.amazonaws.com/amazon-s3-find-and-forget/latest/template.yaml) | [Launch](https://console.aws.amazon.com/cloudformation/home?region=ap-southeast-2#/stacks/new?stackName=S3F2-Role&templateURL=https://solution-builders-ap-southeast-2.s3.ap-southeast-2.amazonaws.com/amazon-s3-find-and-forget/latest/role.yaml) | [Link](https://solution-builders-ap-southeast-2.s3.ap-southeast-2.amazonaws.com/amazon-s3-find-and-forget/latest/role.yaml) |
|
|
|
+| **Asia Pacific (Tokyo)** (ap-northeast-1) | [Launch](https://console.aws.amazon.com/cloudformation/home?region=ap-northeast-1#/stacks/new?stackName=S3F2&templateURL=https://solution-builders-ap-northeast-1.s3.ap-northeast-1.amazonaws.com/amazon-s3-find-and-forget/latest/template.yaml) | [Link](https://solution-builders-ap-northeast-1.s3.ap-northeast-1.amazonaws.com/amazon-s3-find-and-forget/latest/template.yaml) | [Launch](https://console.aws.amazon.com/cloudformation/home?region=ap-northeast-1#/stacks/new?stackName=S3F2-Role&templateURL=https://solution-builders-ap-northeast-1.s3.ap-northeast-1.amazonaws.com/amazon-s3-find-and-forget/latest/role.yaml) | [Link](https://solution-builders-ap-northeast-1.s3.ap-northeast-1.amazonaws.com/amazon-s3-find-and-forget/latest/role.yaml) |
|
|
|
+| **EU (Ireland)** (eu-west-1) | [Launch](https://console.aws.amazon.com/cloudformation/home?region=eu-west-1#/stacks/new?stackName=S3F2&templateURL=https://solution-builders-eu-west-1.s3.eu-west-1.amazonaws.com/amazon-s3-find-and-forget/latest/template.yaml) | [Link](https://solution-builders-eu-west-1.s3.eu-west-1.amazonaws.com/amazon-s3-find-and-forget/latest/template.yaml) | [Launch](https://console.aws.amazon.com/cloudformation/home?region=eu-west-1#/stacks/new?stackName=S3F2-Role&templateURL=https://solution-builders-eu-west-1.s3.eu-west-1.amazonaws.com/amazon-s3-find-and-forget/latest/role.yaml) | [Link](https://solution-builders-eu-west-1.s3.eu-west-1.amazonaws.com/amazon-s3-find-and-forget/latest/role.yaml) |
|
|
|
+| **EU (London)** (eu-west-2) | [Launch](https://console.aws.amazon.com/cloudformation/home?region=eu-west-2#/stacks/new?stackName=S3F2&templateURL=https://solution-builders-eu-west-2.s3.eu-west-2.amazonaws.com/amazon-s3-find-and-forget/latest/template.yaml) | [Link](https://solution-builders-eu-west-2.s3.eu-west-2.amazonaws.com/amazon-s3-find-and-forget/latest/template.yaml) | [Launch](https://console.aws.amazon.com/cloudformation/home?region=eu-west-2#/stacks/new?stackName=S3F2-Role&templateURL=https://solution-builders-eu-west-2.s3.eu-west-2.amazonaws.com/amazon-s3-find-and-forget/latest/role.yaml) | [Link](https://solution-builders-eu-west-2.s3.eu-west-2.amazonaws.com/amazon-s3-find-and-forget/latest/role.yaml) |
|
|
|
+| **EU (Frankfurt)** (eu-central-1) | [Launch](https://console.aws.amazon.com/cloudformation/home?region=eu-central-1#/stacks/new?stackName=S3F2&templateURL=https://solution-builders-eu-central-1.s3.eu-central-1.amazonaws.com/amazon-s3-find-and-forget/latest/template.yaml) | [Link](https://solution-builders-eu-central-1.s3.eu-central-1.amazonaws.com/amazon-s3-find-and-forget/latest/template.yaml) | [Launch](https://console.aws.amazon.com/cloudformation/home?region=eu-central-1#/stacks/new?stackName=S3F2-Role&templateURL=https://solution-builders-eu-central-1.s3.eu-central-1.amazonaws.com/amazon-s3-find-and-forget/latest/role.yaml) | [Link](https://solution-builders-eu-central-1.s3.eu-central-1.amazonaws.com/amazon-s3-find-and-forget/latest/role.yaml) |
|
|
|
+| **EU (Stockholm)** (eu-north-1) | [Launch](https://console.aws.amazon.com/cloudformation/home?region=eu-north-1#/stacks/new?stackName=S3F2&templateURL=https://solution-builders-eu-north-1.s3.eu-north-1.amazonaws.com/amazon-s3-find-and-forget/latest/template.yaml) | [Link](https://solution-builders-eu-north-1.s3.eu-north-1.amazonaws.com/amazon-s3-find-and-forget/latest/template.yaml) | [Launch](https://console.aws.amazon.com/cloudformation/home?region=eu-north-1#/stacks/new?stackName=S3F2-Role&templateURL=https://solution-builders-eu-north-1.s3.eu-north-1.amazonaws.com/amazon-s3-find-and-forget/latest/role.yaml) | [Link](https://solution-builders-eu-north-1.s3.eu-north-1.amazonaws.com/amazon-s3-find-and-forget/latest/role.yaml) |
|
|
|
+
|
|
|
+2. If prompted, login using your AWS account credentials.
|
|
|
+3. You should see a screen titled "_Create Stack_" at the "_Specify template_"
|
|
|
+ step. The fields specifying the CloudFormation template are pre-populated.
|
|
|
+ Choose the _Next_ button at the bottom of the page.
|
|
|
+4. On the "_Specify stack details_" screen you should provide values for the
|
|
|
+ following parameters of the CloudFormation stack:
|
|
|
+
|
|
|
+ - **Stack Name:** (Default: S3F2) This is the name that is used to refer to
|
|
|
+ this stack in CloudFormation once deployed.
|
|
|
+ - **AdminEmail:** The email address you wish to setup as the initial user of
|
|
|
+ this Amazon S3 Find and Forget deployment.
|
|
|
+ - **DeployWebUI:** (Default: true) Whether to deploy the Web UI as part of
|
|
|
+ the solution. If set to **true**, the AuthMethod parameter must be set to
|
|
|
+ **Cognito**. If set to **false**, interaction with the solution is
|
|
|
+ performed via the API Gateway only.
|
|
|
+ - **AuthMethod:** (Default: Cognito) The authentication method to be used for
|
|
|
+ the solution. Must be set to **Cognito** if DeployWebUI is true.
|
|
|
+
|
|
|
+ The following parameters are optional and allow further customisation of the
|
|
|
+ solution if required:
|
|
|
+
|
|
|
+ - **DeployVpc:** (Default: true) Whether to deploy the solution provided VPC.
|
|
|
+ If you wish to use your own VPC, set this value to false. The solution
|
|
|
+ provided VPC uses VPC Endpoints to access the required services which will
|
|
|
+ incur additional costs. For more details, see the [VPC Endpoint Pricing]
|
|
|
+ page.
|
|
|
+ - **VpcSecurityGroups:** (Default: "") List of security group IDs to apply to
|
|
|
+ Fargate deletion tasks. For more information on how to obtain these IDs,
|
|
|
+ see
|
|
|
+ [Configuring a VPC for the Solution](#configuring-a-vpc-for-the-solution).
|
|
|
+ If _DeployVpc_ is true, this parameter is ignored.
|
|
|
+ - **VpcSubnets:** (Default: "") List of subnets to run Fargate deletion tasks
|
|
|
+ in. For more information on how to obtain these IDs, see
|
|
|
+ [Configuring a VPC for the Solution](#configuring-a-vpc-for-the-solution).
|
|
|
+ If _DeployVpc_ is true, this parameter is ignored.
|
|
|
+ - **FlowLogsGroup**: (Default: "") If using the solution provided VPC,
|
|
|
+ defines the CloudWatch Log group which should be used for flow logs. If not
|
|
|
+ set, flow logs will not be enabled. If _DeployVpc_ is false, this parameter
|
|
|
+ is ignored. Enabling flow logs will incur additional costs. See the
|
|
|
+ [CloudWatch Logs Pricing] page for the associated costs.
|
|
|
+ - **FlowLogsRoleArn**: (Default: "") If using the solution provided VPC,
|
|
|
+ defines which IAM Role should be used to send flow logs to CloudWatch. If
|
|
|
+ not set, flow logs will not be enabled. If _DeployVpc_ is false, this
|
|
|
+ parameter is ignored.
|
|
|
+ - **CreateCloudFrontDistribution:** (Default: true) Creates a CloudFront
|
|
|
+ distribution for accessing the web interface of the solution.
|
|
|
+ - **AccessControlAllowOriginOverride:** (Default: false) Allows overriding
|
|
|
+ the origin from which the API can be called. If 'false' is provided, the
|
|
|
+ API will only accept requests from the Web UI origin.
|
|
|
+ - **AthenaConcurrencyLimit:** (Default: 20) The number of concurrent Athena
|
|
|
+ queries the solution will run when scanning your data lake.
|
|
|
+ - **AthenaQueryMaxRetries:** (Default: 2) Max number of retries to each
|
|
|
+ Athena query after a failure
|
|
|
+ - **DeletionTasksMaxNumber:** (Default: 3) Max number of concurrent Fargate
|
|
|
+ tasks to run when performing deletions.
|
|
|
+ - **DeletionTaskCPU:** (Default: 4096) Fargate task CPU limit. For more info
|
|
|
+ see [Fargate Configuration]
|
|
|
+ - **DeletionTaskMemory:** (Default: 30720) Fargate task memory limit. For
|
|
|
+ more info see [Fargate Configuration]
|
|
|
+ - **QueryExecutionWaitSeconds:** (Default: 3) How long to wait when checking
|
|
|
+ if an Athena Query has completed.
|
|
|
+ - **QueryQueueWaitSeconds:** (Default: 3) How long to wait when checking if
|
|
|
+ there the current number of executing queries is less than the specified
|
|
|
+ concurrency limit.
|
|
|
+ - **ForgetQueueWaitSeconds:** (Default: 30) How long to wait when checking if
|
|
|
+ the Forget phase is complete
|
|
|
+ - **AccessLogsBucket:** (Default: "") The name of the bucket to use for
|
|
|
+ storing the Web UI access logs. Leave blank to disable UI access logging.
|
|
|
+ Ensure the provided bucket has the appropriate permissions configured. For
|
|
|
+ more information see [CloudFront Access Logging Permissions] if
|
|
|
+ **CreateCloudFrontDistribution** is set to true, or [S3 Access Logging
|
|
|
+ Permissions] if not.
|
|
|
+ - **CognitoAdvancedSecurity:** (Default: "OFF") The setting to use for
|
|
|
+ Cognito advanced security. Allowed values for this parameter are: OFF,
|
|
|
+ AUDIT and ENFORCED. For more information on this parameter, see [Cognito
|
|
|
+ Advanced Security]
|
|
|
+ - **EnableAPIAccessLogging:** (Default: false) Whether to enable access
|
|
|
+ logging via CloudWatch Logs for API Gateway. Enabling this feature will
|
|
|
+ incur additional costs.
|
|
|
+ - **EnableContainerInsights:** (Default: false) Whether to enable CloudWatch
|
|
|
+ Container Insights.
|
|
|
+ - **JobDetailsRetentionDays:** (Default: 0) How long job records should
|
|
|
+ remain in the Job table and how long job manifests should remain in the S3
|
|
|
+ manifests bucket. Use 0 to retain data indefinitely. **Note**: if the
|
|
|
+ retention setting is changed it will only apply to new deletion jobs in
|
|
|
+ DynamoDB, existing deletion jobs will retain the TTL at the time they were
|
|
|
+ ran; but the policy will apply immediately to new and existing job
|
|
|
+ manifests in S3.
|
|
|
+ - **EnableDynamoDBBackups:** (Default: false) Whether to enable [DynamoDB
|
|
|
+ Point-in-Time Recovery] for the DynamoDB tables. Enabling this feature will
|
|
|
+ incur additional costs. See the [DynamoDB Pricing] page for the associated
|
|
|
+ costs.
|
|
|
+ - **RetainDynamoDBTables:** (Default: true) Whether to retain the DynamoDB
|
|
|
+ tables upon Stack Update and Stack Deletion.
|
|
|
+ - **AthenaWorkGroup:** (Default: primary) The Athena work group that should
|
|
|
+ be used for when the solution runs Athena queries.
|
|
|
+ - **PreBuiltArtefactsBucketOverride:** (Default: false) Overrides the default
|
|
|
+ Bucket containing Front-end and Back-end pre-built artefacts. Use this if
|
|
|
+ you are using a customised version of these artefacts.
|
|
|
+ - **ResourcePrefix:** (Default: S3F2) Resource prefix to apply to resource
|
|
|
+ names when creating statically named resources.
|
|
|
+ - **KMSKeyArns** (Default: "") Comma-delimited list of KMS Key Arns used for
|
|
|
+ Client-side Encryption. Leave empty if data is not client-side encrypted
|
|
|
+ with KMS.
|
|
|
+
|
|
|
+ When completed, click _Next_
|
|
|
+
|
|
|
+5. [Configure stack options](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-console-add-tags.html)
|
|
|
+ if desired, then click _Next_.
|
|
|
+6. On the review screen, you must check the boxes for:
|
|
|
+
|
|
|
+ - "_I acknowledge that AWS CloudFormation might create IAM resources_"
|
|
|
+ - "_I acknowledge that AWS CloudFormation might create IAM resources with
|
|
|
+ custom names_"
|
|
|
+ - "_I acknowledge that AWS CloudFormation might require the following
|
|
|
+ capability: CAPABILITY_AUTO_EXPAND_"
|
|
|
+
|
|
|
+ These are required to allow CloudFormation to create a Role to allow access
|
|
|
+ to resources needed by the stack and name the resources in a dynamic way.
|
|
|
+
|
|
|
+7. Choose _Create Stack_
|
|
|
+8. Wait for the CloudFormation stack to launch. Completion is indicated when the
|
|
|
+ "Stack status" is "_CREATE_COMPLETE_".
|
|
|
+ - You can monitor the stack creation progress in the "Events" tab.
|
|
|
+9. Note the _WebUIUrl_ displayed in the _Outputs_ tab for the stack. This is
|
|
|
+ used to access the application.
|
|
|
+
|
|
|
+## Accessing the application
|
|
|
+
|
|
|
+The solution provides a web user interface and a REST API to allow you to
|
|
|
+integrate it in your own applications. If you have chosen not to deploy the Web
|
|
|
+UI you will need to use the API to interface with the solution.
|
|
|
+
|
|
|
+### Logging in for the first time (only relevant if the Web UI is deployed)
|
|
|
+
|
|
|
+1. Note the _WebUIUrl_ displayed in the _Outputs_ tab for the stack. This is
|
|
|
+ used to access the application.
|
|
|
+2. When accessing the web user interface for the first time, you will be
|
|
|
+ prompted to insert a username and a password. In the username field, enter
|
|
|
+ the admin e-mail specified during stack creation. In the password field,
|
|
|
+ enter the temporary password sent by the system to the admin e-mail. Then
|
|
|
+ select "Sign In".
|
|
|
+3. Next, you will need to reset the password. Enter a new password and then
|
|
|
+ select "Submit".
|
|
|
+4. Now you should be able to access all the functionalities.
|
|
|
+
|
|
|
+### Managing users (only relevant if Cognito is chosen for authentication)
|
|
|
+
|
|
|
+To add more users to the application:
|
|
|
+
|
|
|
+1. Access the [Cognito Console] and choose "Manage User Pools".
|
|
|
+2. Select the solution's User Pool (its name is displayed as
|
|
|
+ _CognitoUserPoolName_ in the _Outputs_ tab for the CloudFormation stack).
|
|
|
+3. Select "Users and Groups" from the menu on the right.
|
|
|
+4. Use this page to create or manage users. For more information, consult the
|
|
|
+ [Managing Users in User Pools Guide].
|
|
|
+
|
|
|
+### Making authenticated API requests
|
|
|
+
|
|
|
+To use the API directly, you will need to authenticate requests using the
|
|
|
+Cognito User Pool or IAM. The method for authenticating differs depending on
|
|
|
+which authentication option was chosen:
|
|
|
+
|
|
|
+#### Cognito
|
|
|
+
|
|
|
+After resetting the password via the UI, you can make authenticated requests
|
|
|
+using the AWS CLI:
|
|
|
+
|
|
|
+1. Note the _CognitoUserPoolId_, _CognitoUserPoolClientId_ and _ApiUrl_
|
|
|
+ parameters displayed in the _Outputs_ tab for the stack.
|
|
|
+2. Take note of the Cognito user email and password.
|
|
|
+3. Generate a token by running this command with the values you noted in the
|
|
|
+ previous steps:
|
|
|
+
|
|
|
+ ```sh
|
|
|
+ aws cognito-idp admin-initiate-auth \
|
|
|
+ --user-pool-id $COGNITO_USER_POOL_ID \
|
|
|
+ --client-id $COGNITO_USER_POOL_CLIENT_ID \
|
|
|
+ --auth-flow ADMIN_NO_SRP_AUTH \
|
|
|
+ --auth-parameters '{"USERNAME":"$USER_EMAIL_ADDRESS","PASSWORD":"$USER_PASSWORD"}'
|
|
|
+ ```
|
|
|
+
|
|
|
+4. Use the `IdToken` generated by the previous command to make an authenticated
|
|
|
+ request to the API. For instance, the following command will show the matches
|
|
|
+ in the deletion queue:
|
|
|
+
|
|
|
+ ```sh
|
|
|
+ curl $API_URL/v1/queue -H "Authorization: Bearer $ID_TOKEN"
|
|
|
+ ```
|
|
|
+
|
|
|
+For more information, consult the [Cognito REST API integration guide].
|
|
|
+
|
|
|
+#### IAM
|
|
|
+
|
|
|
+IAM authentication for API requests uses the
|
|
|
+[Signature Version 4 signing process](https://docs.aws.amazon.com/general/latest/gr/signature-version-4.html).
|
|
|
+Add the resulting signature to the **Authorization** header when making requests
|
|
|
+to the API.
|
|
|
+
|
|
|
+Use the Sigv4 process linked above to generate the Authorization header value
|
|
|
+and then call the API as normal:
|
|
|
+
|
|
|
+```sh
|
|
|
+curl $API_URL/v1/queue -H "Authorization: $Sigv4Auth"
|
|
|
+```
|
|
|
+
|
|
|
+IAM authentication can be used anywhere you have AWS credentials with the
|
|
|
+correct permissions, this could be an IAM User or an assumed IAM Role.
|
|
|
+
|
|
|
+Please refer to the documentation
|
|
|
+[here](https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-control-access-using-iam-policies-to-invoke-api.html)
|
|
|
+to understand how to define the IAM policy to match your requirements. The ARN
|
|
|
+for the api can be found in the value of the `ApiArn` CloudFormation Stack
|
|
|
+Output.
|
|
|
+
|
|
|
+### Integrating the solution with other applications using CloudFormation stack outputs
|
|
|
+
|
|
|
+Applications deployed using AWS CloudFormation in the same AWS account and
|
|
|
+region can integrate with Find and Forget by using CloudFormation output values.
|
|
|
+You can use the solution stack as a nested stack to use its outputs (such as the
|
|
|
+API URL) as inputs for another application.
|
|
|
+
|
|
|
+Some outputs are also available as exports. You can import these values to use
|
|
|
+in your own CloudFormation stacks that you deploy following the Find and Forget
|
|
|
+stack.
|
|
|
+
|
|
|
+**Note for using exports:** After another stack imports an output value, you
|
|
|
+can't delete the stack that is exporting the output value or modify the exported
|
|
|
+output value. All of the imports must be removed before you can delete the
|
|
|
+exporting stack or modify the output value.
|
|
|
+
|
|
|
+Consult the [exporting stack output values] guide to review the differences
|
|
|
+between importing exported values and using nested stacks.
|
|
|
+
|
|
|
+## Configuring Data Mappers
|
|
|
+
|
|
|
+After [Deploying the Solution](#deploying-the-solution), your first step should
|
|
|
+be to configure one or more [data mappers](ARCHITECTURE.md#data-mappers) which
|
|
|
+will connect your data to the solution. Identify the S3 Bucket containing the
|
|
|
+data you wish to connect to the solution and ensure you have defined a table in
|
|
|
+your data catalog and that all existing and future partitions (as they are
|
|
|
+created) are known to the Data Catalog. Currently AWS Glue is the only supported
|
|
|
+data catalog provider. For more information on defining your data in the Glue
|
|
|
+Data Catalog, see [Defining Glue Tables]. You must define your Table in the Glue
|
|
|
+Data Catalog in the same region and account as the S3 Find and Forget solution.
|
|
|
+
|
|
|
+### AWS Lake Formation Configuration
|
|
|
+
|
|
|
+For data lakes registered with AWS Lake Formation, you must grant additional
|
|
|
+permissions in Lake Formation before you can use them with the solution. If you
|
|
|
+are not using Lake Formation, proceed directly to the
|
|
|
+[Data Mapper creation](#data-mapper-creation) section.
|
|
|
+
|
|
|
+To grant these permissions in Lake Formation:
|
|
|
+
|
|
|
+1. Using the **WebUIRole** output from the solution CloudFormation stack as the
|
|
|
+ IAM principal, use the [Lake Formation Data Permissions Console] to grant the
|
|
|
+ `Describe` permission for all Glue Databases that you will want to use with
|
|
|
+ the solution; then grant the `Describe` and `Select` permissions to the role
|
|
|
+ for all Glue Tables that you will want to use with the solution. These
|
|
|
+ permissions are necessary to create data mappers in the web interface.
|
|
|
+2. Using the **PutDataMapperRole** output from the solution CloudFormation stack
|
|
|
+ as the IAM principal, use the [Lake Formation Data Permissions Console] to
|
|
|
+ grant `Describe` and `Select` permissions for all Glue Tables that you will
|
|
|
+ want to use with the solution. These permissions allow the solution to access
|
|
|
+ Table metadata when creating a Data Mapper.
|
|
|
+3. Using the **AthenaExecutionRole** and **GenerateQueriesRole** outputs from
|
|
|
+ the solution CloudFormation stack as IAM principals, use the [Lake Formation
|
|
|
+ Data Permissions Console] to grant the `Describe` and `Select` permissions to
|
|
|
+ both principals for all of the tables that you will want to use with the
|
|
|
+ solution. These permissions allow the solution to plan and execute Athena
|
|
|
+ queries during the Find Phase.
|
|
|
+
|
|
|
+### Data Mapper Creation
|
|
|
+
|
|
|
+1. Access the application UI via the **WebUIUrl** displayed in the _Outputs_ tab
|
|
|
+ for the stack.
|
|
|
+2. Choose **Data Mappers** from the menu then choose **Create Data Mapper**
|
|
|
+3. On the Create Data Mapper page input a **Name** to uniquely identify this
|
|
|
+ Data Mapper.
|
|
|
+4. Select a **Query Executor Type** then choose the **Database** and **Table**
|
|
|
+ in your data catalog which describes the target data in S3. A list of columns
|
|
|
+ will be displayed for the chosen Table.
|
|
|
+5. From the Partition Keys list, select the partition key(s) that you want the
|
|
|
+ solution to use when generating the queries. If you select none, only one
|
|
|
+ query will be performed for the data mapper. If you select any or all, you'll
|
|
|
+ have a greater number of smaller queries (the same query will be repeated
|
|
|
+ with a `WHERE` additional clause for each combination of partition values).
|
|
|
+ If you have a lot of small partitions, it may be more efficient to choose
|
|
|
+ none or a subset of partition keys from the list in order to increase speed
|
|
|
+ of execution. If instead you have very big partitions, it may be more
|
|
|
+ efficient to choose all the partition keys in order to reduce probability of
|
|
|
+ failure caused by query timeout. We recommend the average query size not to
|
|
|
+ exceed the hundreds of GBs and not to take more than 5 minutes.
|
|
|
+
|
|
|
+ > As an example, let's consider 10 years of daily data with partition keys of
|
|
|
+ > `year`, `month` and `day` with total size of `10TB`. By declaring
|
|
|
+ > PartitionKeys=`[]` (none) a single query of `10TB` would run during the
|
|
|
+ > Find phase, and that may be too much to complete within the 30m limit of
|
|
|
+ > Athena execution time. On the other hand, using all the combinations of the
|
|
|
+ > partition keys we would have approximately `3652` queries, each being
|
|
|
+ > probably very small, and given the default Athena concurrency limit of
|
|
|
+ > `20`, it may take very long to execute all of them. The best in this
|
|
|
+ > scenario is possibly the `['year','month']` combination, which would result
|
|
|
+ > in `120` queries.
|
|
|
+
|
|
|
+6. From the columns list, choose the column(s) the solution should use to to
|
|
|
+ find items in the data which should be deleted. For example, if your table
|
|
|
+ has three columns named **customer_id**, **description** and **created_at**
|
|
|
+ and you want to search for items using the **customer_id**, you should choose
|
|
|
+ only the **customer_id** column from this list.
|
|
|
+7. Enter the ARN of the role for Fargate to assume when modifying objects in S3
|
|
|
+ buckets. This role should already exist if you have followed the
|
|
|
+ [Provisioning Data Access IAM Roles](#provisioning-data-access-iam-roles)
|
|
|
+ steps.
|
|
|
+8. If you do not want the solution to delete all older versions except the
|
|
|
+ latest created object version, deselect _Delete previous object versions
|
|
|
+ after update_. By default the solution will delete all previous of versions
|
|
|
+ after creating a new version.
|
|
|
+9. If you want the solution to ignore Object Not Found exceptions, select
|
|
|
+ _Ignore object not found exceptions during deletion_. By default deletion
|
|
|
+ jobs will fail if any objects that are found by the Find phase don't exist in
|
|
|
+ the Delete phase. This setting can be useful if you have some other system
|
|
|
+ deleting objects from the bucket, for example S3 lifecycle policies.
|
|
|
+
|
|
|
+ Note that the solution **will not** delete old versions for these objects.
|
|
|
+ This can cause data to be **retained longer than intended**. Make sure there
|
|
|
+ is some mechanism to handle old versions. One option would be to configure
|
|
|
+ [S3 lifecycle policies] on non-current versions.
|
|
|
+
|
|
|
+10. Choose **Create Data Mapper**.
|
|
|
+11. A message is displayed advising you to update the S3 Bucket Policy for the
|
|
|
+ S3 Bucket referenced by the newly created data mapper. See
|
|
|
+ [Granting Access to Data](#granting-access-to-data) for more information on
|
|
|
+ how to do this. Choose **Return to Data Mappers**.
|
|
|
+
|
|
|
+You can also create Data Mappers directly via the API. For more information, see
|
|
|
+the [API Documentation].
|
|
|
+
|
|
|
+## Granting Access to Data
|
|
|
+
|
|
|
+After configuring a data mapper you must ensure that the S3 Find and Forget
|
|
|
+solution has the required level of access to the S3 location the data mapper
|
|
|
+refers to. The recommended way to achieve this is through the use of [S3 Bucket
|
|
|
+Policies].
|
|
|
+
|
|
|
+> **Note:** AWS IAM uses an
|
|
|
+> [eventual consistency model](https://docs.aws.amazon.com/IAM/latest/UserGuide/troubleshoot_general.html#troubleshoot_general_eventual-consistency)
|
|
|
+> and therefore any change you make to IAM, Bucket or KMS Key policies may take
|
|
|
+> time to become visible. Ensure you have allowed time for permissions changes
|
|
|
+> to propagate to all endpoints before starting a job. If your job fails with a
|
|
|
+> status of FIND_FAILED and the `QueryFailed` events indicate S3 permissions
|
|
|
+> issues, you may need to wait for the permissions changes to propagate.
|
|
|
+
|
|
|
+### Updating your Bucket Policy
|
|
|
+
|
|
|
+To update the S3 bucket policy to grant **read** access to the IAM role used by
|
|
|
+Amazon Athena, and **write** access to the Data Access IAM role used by AWS
|
|
|
+Fargate, follow these steps:
|
|
|
+
|
|
|
+1. Access the application UI via the **WebUIUrl** displayed in the _Outputs_ tab
|
|
|
+ for the stack.
|
|
|
+2. Choose **Data Mappers** from the menu then choose the radio button for the
|
|
|
+ relevant data mapper from the **Data Mappers** list.
|
|
|
+3. Choose **Generate Access Policies** and follow the instructions on the
|
|
|
+ **Bucket Access** tab to update the bucket policy. If you already have a
|
|
|
+ bucket policy in place, add the statements shown to your existing bucket
|
|
|
+ policy rather than replacing it completely. If your data is encrypted with an
|
|
|
+ **Customer Managed CMK** rather than an **AWS Managed CMK**, see
|
|
|
+ [Data Encrypted with Customer Managed CMK](#data-encrypted-with-a-customer-managed-cmk)
|
|
|
+ to grant the solution access to the Customer Managed CMK. For more
|
|
|
+ information on using Server-Side Encryption (SSE) with S3, see [Using SSE
|
|
|
+ with CMKs].
|
|
|
+
|
|
|
+### Data Encrypted with a Customer Managed CMK
|
|
|
+
|
|
|
+Where the data you are connecting to the solution is encrypted with an Customer
|
|
|
+Managed CMK rather than an AWS Managed CMK, you must also grant the Athena and
|
|
|
+Data Access IAM roles access to use the key so that the data can be decrypted
|
|
|
+when reading, re-encrypted when writing.
|
|
|
+
|
|
|
+Once you have updated the bucket policy as described in
|
|
|
+[Updating the Bucket Policy](#updating-the-bucket-policy), choose the **KMS
|
|
|
+Access** tab from the **Generate Access Policies** modal window and follow the
|
|
|
+instructions to update the key policy with the provided statements. The
|
|
|
+statements provided are for use when using the **policy view** in the AWS
|
|
|
+console or making updates to the key policy via the CLI, CloudFormation or the
|
|
|
+API. If you wish, to use the **default view** in th AWS console, add the
|
|
|
+**Principals** in the provided statements as **key users**. For more
|
|
|
+information, see [How to Change a Key Policy].
|
|
|
+
|
|
|
+## Adding to the Deletion Queue
|
|
|
+
|
|
|
+Once your Data Mappers are configured, you can begin adding "Matches" to the
|
|
|
+[Deletion Queue](ARCHITECTURE.md#deletion-queue).
|
|
|
+
|
|
|
+1. Access the application UI via the **WebUIUrl** displayed in the _Outputs_ tab
|
|
|
+ for the stack.
|
|
|
+2. Choose **Deletion Queue** from the menu then choose **Add Match to the
|
|
|
+ Deletion Queue**.
|
|
|
+
|
|
|
+Matches can be **Simple** or **Composite**.
|
|
|
+
|
|
|
+- A **Simple** match is a value to be matched against any column identifier of
|
|
|
+ one or more data mappers. For instance a value _12345_ to be matched against
|
|
|
+ the _customer_id_ column of _DataMapperA_ or the _admin_id_ of _DataMapperB_.
|
|
|
+- A **Composite** match consists on one or more values to be matched against
|
|
|
+ specific column identifiers of a multi-column based data mapper. For instance
|
|
|
+ a tuple _John_ and _Doe_ to be matched against the _first_name_ and
|
|
|
+ _last_name_ columns of _DataMapperC_
|
|
|
+
|
|
|
+To add a simple match:
|
|
|
+
|
|
|
+1. Choose _Simple_ as **Match Type**
|
|
|
+2. Input a **Match**, which is the value to search for in your data mappers. If
|
|
|
+ you wish to search for the match from all data mappers choose **All Data
|
|
|
+ Mappers**, otherwise choose **Select your Data Mappers** then select the
|
|
|
+ relevant data mappers from the list.
|
|
|
+3. Choose **Add Item to the Deletion Queue** and confirm you can see the match
|
|
|
+ in the Deletion Queue.
|
|
|
+
|
|
|
+To add a composite match you need to have at least one data mapper with more
|
|
|
+than one column identifier. Then:
|
|
|
+
|
|
|
+1. Choose _Composite_ as **Match Type**
|
|
|
+2. Select the Data Mapper from the List
|
|
|
+3. Select all the columns (at least one) that you want to map to a match and
|
|
|
+ then provide a value for each of them. Empty is a valid value.
|
|
|
+4. Choose **Add Item to the Deletion Queue** and confirm you can see the match
|
|
|
+ in the Deletion Queue.
|
|
|
+
|
|
|
+You can also add matches to the Deletion Queue directly via the API. For more
|
|
|
+information, see the [API Documentation].
|
|
|
+
|
|
|
+When the next deletion job runs, the solution will scan the configured columns
|
|
|
+of your data for any occurrences of the Matches present in the queue at the time
|
|
|
+the job starts and remove any items where one of the Matches is present.
|
|
|
+
|
|
|
+If across all your data mappers you can find all items related to a single
|
|
|
+logical entity using the same value, you only need to add one Match value to the
|
|
|
+deletion queue to delete that logical entity from all data mappers.
|
|
|
+
|
|
|
+If the value used to identify a single logical entity is not consistent across
|
|
|
+your data mappers, you should add an item to the deletion queue **for each
|
|
|
+distinct value** which identifies the logical entity, selecting the specific
|
|
|
+data mapper(s) to which that value is relevant.
|
|
|
+
|
|
|
+If you make a mistake when adding a Match to the deletion queue, you can remove
|
|
|
+that match from the queue as long as there is no job running. Once a job has
|
|
|
+started no items can be removed from the deletion queue until the running job
|
|
|
+has completed. You may continue to add matches to the queue whilst a job is
|
|
|
+running, but only matches which were present when the job started will be
|
|
|
+processed by that job. Once a job completes, only the matches that job has
|
|
|
+processed will be removed from the queue.
|
|
|
+
|
|
|
+In order to facilitate different teams using a single deployment within an
|
|
|
+organisation, the same match can be added to the deletion queue more than once.
|
|
|
+When the job executes, it will merge the lists of data mappers for duplicates in
|
|
|
+the queue.
|
|
|
+
|
|
|
+## Running a Deletion Job
|
|
|
+
|
|
|
+Once you have configured your data mappers and added one or more items to the
|
|
|
+deletion queue, you can stat a job.
|
|
|
+
|
|
|
+1. Access the application UI via the **WebUIUrl** displayed in the _Outputs_ tab
|
|
|
+ for the stack.
|
|
|
+2. Choose **Deletion Jobs** from the menu and ensure there are no jobs currently
|
|
|
+ running. Choose **Start a Deletion Job** and review the settings displayed on
|
|
|
+ the screen. For more information on how to edit these settings, see
|
|
|
+ [Adjusting Configuration](#adjusting-configuration).
|
|
|
+3. If you are happy with the current solution configuration choose **Start a
|
|
|
+ Deletion Job**. The job details page should be displayed.
|
|
|
+
|
|
|
+Once a job has started, you can leave the page and return to view its progress
|
|
|
+at point by choosing the job ID from the Deletion Jobs list. The job details
|
|
|
+page will automatically refresh and to display the current status and statistics
|
|
|
+for the job. For more information on the possible statuses and their meaning,
|
|
|
+see [Deletion Job Statuses](#deletion-job-statuses).
|
|
|
+
|
|
|
+You can also start jobs and check their status using the API. For more
|
|
|
+information, see the [API Documentation].
|
|
|
+
|
|
|
+Job events are continuously emitted whilst a job is running. These events are
|
|
|
+used to update the status and statistics for the job. You can view all the
|
|
|
+emitted events for a job in the **Job Events** table. Whilst a job is running,
|
|
|
+the **Load More** button will continue to be displayed even if no new events
|
|
|
+have been received. Once a job has finished, the **Load More** button will
|
|
|
+disappear once you have loaded all the emitted events. For more information on
|
|
|
+the events which can be emitted during a job, see
|
|
|
+[Deletion Job Event Types](#deletion-job-event-types)
|
|
|
+
|
|
|
+To optimise costs, it is best practice when using the solution to start jobs on
|
|
|
+a regular schedule, rather than every time a single item is added to the
|
|
|
+Deletion Queue. This is because the marginal cost of the Find phase when
|
|
|
+deleting an additional item from the queue is far less that re-executing the
|
|
|
+Find phase (where the data mappers searched are the same). Similarly, the
|
|
|
+marginal cost of removing an additional match from an object is negligible when
|
|
|
+there is already at least 1 match present in the object contents.
|
|
|
+
|
|
|
+> **Important**
|
|
|
+>
|
|
|
+> Ensure no external processes perform write/delete actions against exist
|
|
|
+> objects whilst a job is running. For more information, consult the [Limits]
|
|
|
+> guide
|
|
|
+
|
|
|
+### Deletion Job Statuses
|
|
|
+
|
|
|
+The list of possible job statuses is as follows:
|
|
|
+
|
|
|
+- `QUEUED`: The job has been accepted but has yet to start. Jobs are started
|
|
|
+ asynchronously by a Lambda invoked by the [DynamoDB event
|
|
|
+ stream][dynamodb streams] for the Jobs table.
|
|
|
+- `RUNNING`: The job is still in progress.
|
|
|
+- `FORGET_COMPLETED_CLEANUP_IN_PROGRESS`: The job is still in progress.
|
|
|
+- `COMPLETED`: The job finished successfully.
|
|
|
+- `COMPLETED_CLEANUP_FAILED`: The job finished successfully however the deletion
|
|
|
+ queue items could not be removed. You should manually remove these or leave
|
|
|
+ them to be removed on the next job
|
|
|
+- `FORGET_PARTIALLY_FAILED`: The job finished but it was unable to successfully
|
|
|
+ process one or more objects. The Deletion DLQ for messages will contain a
|
|
|
+ message per object that could not be updated.
|
|
|
+- `FIND_FAILED`: The job failed during the Find phase as there was an issue
|
|
|
+ querying one or more data mappers.
|
|
|
+- `FORGET_FAILED`: The job failed during the Forget phase as there was an issue
|
|
|
+ running the Fargate tasks.
|
|
|
+- `FAILED`: An unknown error occurred during the Find and Forget workflow, for
|
|
|
+ example, the Step Functions execution timed out or the execution was manually
|
|
|
+ cancelled.
|
|
|
+
|
|
|
+For more information on how to resolve statuses indicative of errors, consult
|
|
|
+the [Troubleshooting] guide.
|
|
|
+
|
|
|
+### Deletion Job Event Types
|
|
|
+
|
|
|
+The list of events is as follows:
|
|
|
+
|
|
|
+- `JobStarted`: Emitted when the deletion job state machine first starts. Causes
|
|
|
+ the status of the job to transition from `QUEUED` to `RUNNING`
|
|
|
+- `FindPhaseStarted`: Emitted when the deletion job has purged any messages from
|
|
|
+ the query and object queues and is ready to be searching for data.
|
|
|
+- `FindPhaseEnded`: Emitted when all queries have executed and written their
|
|
|
+ results to the objects queue.
|
|
|
+- `FindPhaseFailed`: Emitted when one or more queries fail. Causes the status to
|
|
|
+ transition to `FIND_FAILED`.
|
|
|
+- `ForgetPhaseStarted`: Emitted when the Find phase has completed successfully
|
|
|
+ and the Forget phase is starting.
|
|
|
+- `ForgetPhaseEnded`: Emitted when the Forget phase has completed. If the Forget
|
|
|
+ phase completes with no errors, this event causes the status to transition to
|
|
|
+ `FORGET_COMPLETED_CLEANUP_IN_PROGRESS`. If the Forget phase completes but
|
|
|
+ there was an error updating one or more objects, this causes the status to
|
|
|
+ transition to `FORGET_PARTIALLY_FAILED`.
|
|
|
+- `ForgetPhaseFailed`: Emitted when there was an issue running the Fargate
|
|
|
+ tasks. Causes the status to transition to `FORGET_FAILED`.
|
|
|
+- `CleanupSucceeded`: The **final** event emitted when a job has executed
|
|
|
+ successfully and the Deletion Queue has been cleaned up. Causes the status to
|
|
|
+ transition to `COMPLETED`.
|
|
|
+- `CleanupFailed`: The **final** event emitted when the job executed
|
|
|
+ successfully but there was an error removing the processed matches from the
|
|
|
+ Deletion Queue. Causes the status to transition to `COMPLETED_CLEANUP_FAILED`.
|
|
|
+- `CleanupSkipped`: Emitted when the job is finalising and the job status is one
|
|
|
+ of `FIND_FAILED`, `FORGET_FAILED` or `FAILED`.
|
|
|
+- `QuerySucceeded`: Emitted whenever a single query executes successfully.
|
|
|
+- `QueryFailed`: Emitted whenever a single query fails.
|
|
|
+- `ObjectUpdated`: Emitted whenever an updated object is written to S3 and any
|
|
|
+ associated deletions are complete.
|
|
|
+- `ObjectUpdateFailed`: Emitted whenever an object cannot be updated, an object
|
|
|
+ version integrity conflict is detected or an associated deletion fails.
|
|
|
+- `ObjectRollbackFailed`: Emitted whenever a rollback (triggered by a detected
|
|
|
+ version integrity conflict) fails.
|
|
|
+- `Exception`: Emitted whenever a generic error occurs during the job execution.
|
|
|
+ Causes the status to transition to `FAILED`.
|
|
|
+
|
|
|
+## Adjusting Configuration
|
|
|
+
|
|
|
+There are several parameters to set when
|
|
|
+[Deploying the Solution](#deploying-the-solution) which affect the behaviour of
|
|
|
+the solution in terms of data retention and performance:
|
|
|
+
|
|
|
+- `AthenaConcurrencyLimit`: Increasing the number of concurrent queries that
|
|
|
+ should be executed will decrease the total time spent performing the Find
|
|
|
+ phase. You should not increase this value beyond your account Service Quota
|
|
|
+ for concurrent DML queries, and should ensure that the value set takes into
|
|
|
+ account any other Athena DML queries that may be executing whilst a job is
|
|
|
+ running.
|
|
|
+- `DeletionTasksMaxNumber`: Increasing the number of concurrent tasks that
|
|
|
+ should consume messages from the object queue will decrease the total time
|
|
|
+ spent performing the Forget phase.
|
|
|
+- `QueryExecutionWaitSeconds`: Decreasing this value will decrease the length of
|
|
|
+ time between each check to see whether a query has completed. You should aim
|
|
|
+ to set this to the "ceiling function" of your average query time. For example,
|
|
|
+ if you average query takes 3.2 seconds, set this to 4.
|
|
|
+- `QueryQueueWaitSeconds`: Decreasing this value will decrease the length of
|
|
|
+ time between each check to see whether additional queries can be scheduled
|
|
|
+ during the Find phase. If your jobs fail due to exceeding the Step Functions
|
|
|
+ execution history quota, you may have set this value to low and should
|
|
|
+ increase it to allow more queries to be scheduled after each check.
|
|
|
+- `ForgetQueueWaitSeconds`: Decreasing this value will decrease the length of
|
|
|
+ time between each check to see whether the Fargate object queue is empty. If
|
|
|
+ your jobs fail due to exceeding the Step Functions execution history quota,
|
|
|
+ you may have set this value to low.
|
|
|
+- `JobDetailsRetentionDays`: Changing this value will change how long records
|
|
|
+ job details and events are retained for. Set this to 0 to retain them
|
|
|
+ indefinitely.
|
|
|
+
|
|
|
+The values for these parameters are stored in an SSM Parameter Store String
|
|
|
+Parameter named `/s3f2/S3F2-Configuration` as a JSON object. The recommended
|
|
|
+approach for updating these values is to perform a
|
|
|
+[Stack Update](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-direct.html)
|
|
|
+and change the relevant parameters for the stack.
|
|
|
+
|
|
|
+It is possible to [update the SSM Parameter][updating an ssm parameter] directly
|
|
|
+however this is not a recommended approach. **You should not alter the structure
|
|
|
+or data types of the configuration JSON object.**
|
|
|
+
|
|
|
+Once updated, the configuration will affect any **future** job executions. In
|
|
|
+progress and previous executions will **not** be affected. The current
|
|
|
+configuration values are displayed when confirming that you wish to start a job.
|
|
|
+
|
|
|
+You can only update the vCPUs/memory allocated to Fargate tasks by performing a
|
|
|
+stack update. For more information, see
|
|
|
+[Updating the Solution](#updating-the-solution).
|
|
|
+
|
|
|
+## Updating the Solution
|
|
|
+
|
|
|
+To benefit from the latest features and improvements, you should update the
|
|
|
+solution deployed to your account when a new version is published. To find out
|
|
|
+what the latest version is and what has changed since your currently deployed
|
|
|
+version, check the [Changelog].
|
|
|
+
|
|
|
+How you update the solution depends on the difference between versions. If the
|
|
|
+new version is a _minor_ upgrade (for instance, from version 3.45 to 3.67) you
|
|
|
+should deploy using a CloudFormation Stack Update. If the new version is a
|
|
|
+_major_ upgrade (for instance, from 2.34 to 3.0) you should perform a manual
|
|
|
+rolling deployment.
|
|
|
+
|
|
|
+Major version releases are made in exceptional circumstances and may contain
|
|
|
+changes that prohibit backward compatibility. Minor versions releases are
|
|
|
+backward-compatible.
|
|
|
+
|
|
|
+### Identify current solution version
|
|
|
+
|
|
|
+You can find the version of the currently deployed solution by retrieving the
|
|
|
+`SolutionVersion` output for the solution stack. The solution version is also
|
|
|
+shown on the Dashboard of the Web UI.
|
|
|
+
|
|
|
+### Identify the Stack URL to deploy
|
|
|
+
|
|
|
+After reviewing the [Changelog], obtain the `Template Link` url of the latest
|
|
|
+version from ["Deploying the Solution"](#deploying-the-solution) (it will be
|
|
|
+similar to
|
|
|
+`https://solution-builders-us-east-1.s3.us-east-1.amazonaws.com/amazon-s3-find-and-forget/latest/template.yaml`).
|
|
|
+If you wish to deploy a specific version rather than the latest version, replace
|
|
|
+`latest` from the url with the chosen version, for instance
|
|
|
+`https://solution-builders-us-east-1.s3.us-east-1.amazonaws.com/amazon-s3-find-and-forget/v0.2/template.yaml`.
|
|
|
+
|
|
|
+### Minor Upgrades: Perform CloudFormation Stack Update
|
|
|
+
|
|
|
+To deploy via AWS Console:
|
|
|
+
|
|
|
+1. Open the [CloudFormation Console Page] and choose the Solution by selecting
|
|
|
+ to the stack's radio button, then choose "Update"
|
|
|
+2. Choose "Replace current template" and then input the template URL for the
|
|
|
+ version you wish to deploy in the "Amazon S3 URL" textbox, then choose "Next"
|
|
|
+3. On the _Stack Details_ screen, review the Parameters and then choose "Next"
|
|
|
+4. On the _Configure stack options_ screen, choose "Next"
|
|
|
+5. On the _Review stack_ screen, you must check the boxes for:
|
|
|
+
|
|
|
+ - "_I acknowledge that AWS CloudFormation might create IAM resources_"
|
|
|
+ - "_I acknowledge that AWS CloudFormation might create IAM resources with
|
|
|
+ custom names_"
|
|
|
+ - "_I acknowledge that AWS CloudFormation might require the following
|
|
|
+ capability: CAPABILITY_AUTO_EXPAND_"
|
|
|
+
|
|
|
+ These are required to allow CloudFormation to create a Role to allow access
|
|
|
+ to resources needed by the stack and name the resources in a dynamic way.
|
|
|
+
|
|
|
+6. Choose "Update stack" to start the stack update.
|
|
|
+7. Wait for the CloudFormation stack to finish updating. Completion is indicated
|
|
|
+ when the "Stack status" is "_UPDATE_COMPLETE_".
|
|
|
+
|
|
|
+To deploy via the AWS CLI
|
|
|
+[consult the documentation](https://docs.aws.amazon.com/cli/latest/reference/cloudformation/update-stack.html).
|
|
|
+
|
|
|
+### Major Upgrades: Manual Rolling Deployment
|
|
|
+
|
|
|
+The process for a manual rolling deployment is as follows:
|
|
|
+
|
|
|
+1. Create a new stack from scratch
|
|
|
+2. Export the data from the old stack to the new stack
|
|
|
+3. Migrate consumers to new API and Web UI URLs
|
|
|
+4. Delete the old stack.
|
|
|
+
|
|
|
+The steps for performing this process are:
|
|
|
+
|
|
|
+1. Deploy a new instance of the Solution by following the instructions contained
|
|
|
+ in the ["Deploying the Solution" section](#deploying-the-solution). Make sure
|
|
|
+ you use unique values for Stack Name and ResourcePrefix parameter which
|
|
|
+ differ from existing stack.
|
|
|
+2. Migrate Data from DynamoDB to ensure the new stack contains the necessary
|
|
|
+ configuration related to Data Mappers and settings. When both stacks are
|
|
|
+ deployed in the same account and region, the simplest way to migrate is via
|
|
|
+ [On-Demand Backup and Restore]. If the stacks are deployed in different
|
|
|
+ regions or accounts, you can use [AWS Data Pipeline].
|
|
|
+3. Ensure that all the bucket policies for the Data Mappers are in place for the
|
|
|
+ new stack. See the
|
|
|
+ ["Granting Access to Data" section](#granting-access-to-data) for steps to do
|
|
|
+ this.
|
|
|
+4. Review the [Changelog] for changes that may affect how you use the new
|
|
|
+ deployment. This may require you to make changes to any software you have
|
|
|
+ that interacts with the solution's API.
|
|
|
+5. Once all the consumers are migrated to the new stack (API and Web UI), delete
|
|
|
+ the old stack.
|
|
|
+
|
|
|
+## Deleting the Solution
|
|
|
+
|
|
|
+To delete a stack via AWS Console:
|
|
|
+
|
|
|
+1. Open the [CloudFormation Console Page] and choose the solution stack, then
|
|
|
+ choose "Delete"
|
|
|
+2. Once the confirmation modal appears, choose "Delete stack".
|
|
|
+3. Wait for the CloudFormation stack to finish updating. Completion is indicated
|
|
|
+ when the "Stack status" is "_DELETE_COMPLETE_".
|
|
|
+
|
|
|
+To delete a stack via the AWS CLI
|
|
|
+[consult the documentation](https://docs.aws.amazon.com/cli/latest/reference/cloudformation/delete-stack.html).
|
|
|
+
|
|
|
+[api documentation]: api/README.md
|
|
|
+[troubleshooting]: TROUBLESHOOTING.md
|
|
|
+[fargate configuration]:
|
|
|
+ https://docs.aws.amazon.com/AmazonECS/latest/developerguide/AWS_Fargate.html#fargate-tasks-size
|
|
|
+[vpc endpoints]:
|
|
|
+ https://docs.aws.amazon.com/vpc/latest/userguide/vpc-endpoints.html
|
|
|
+[vpc endpoint pricing]: https://aws.amazon.com/privatelink/pricing/
|
|
|
+[cloudwatch logs pricing]: https://aws.amazon.com/cloudwatch/pricing/
|
|
|
+[dynamodb streams]:
|
|
|
+ https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html
|
|
|
+[dynamodb point-in-time recovery]:
|
|
|
+ https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/PointInTimeRecovery.html
|
|
|
+[dynamodb pricing]: https://aws.amazon.com/dynamodb/pricing/on-demand/
|
|
|
+[defining glue tables]:
|
|
|
+ https://docs.aws.amazon.com/glue/latest/dg/tables-described.html
|
|
|
+[s3 bucket policies]:
|
|
|
+ https://docs.aws.amazon.com/AmazonS3/latest/dev/using-iam-policies.html
|
|
|
+[using sse with cmks]:
|
|
|
+ https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingKMSEncryption.html
|
|
|
+[customer master keys]:
|
|
|
+ https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#master_keys
|
|
|
+[how to change a key policy]:
|
|
|
+ https://docs.aws.amazon.com/kms/latest/developerguide/key-policy-modifying.html#key-policy-modifying-how-to
|
|
|
+[cross account s3 access]:
|
|
|
+ https://docs.aws.amazon.com/AmazonS3/latest/dev/example-walkthroughs-managing-access-example2.html
|
|
|
+[cross account kms access]:
|
|
|
+ https://docs.aws.amazon.com/kms/latest/developerguide/key-policy-modifying-external-accounts.html
|
|
|
+[updating an ssm parameter]:
|
|
|
+ https://docs.aws.amazon.com/systems-manager/latest/userguide/sysman-paramstore-cli.html
|
|
|
+[deploy using the aws cli]:
|
|
|
+ https://docs.aws.amazon.com/cli/latest/reference/cloudformation/deploy/index.html
|
|
|
+[cloudformation console page]:
|
|
|
+ https://console.aws.amazon.com/cloudformation/home
|
|
|
+[changelog]: ../CHANGELOG.md
|
|
|
+[on-demand backup and restore]:
|
|
|
+ https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/BackupRestore.html
|
|
|
+[aws data pipeline]: https://aws.amazon.com/datapipeline
|
|
|
+[cognito advanced security]:
|
|
|
+ https://docs.aws.amazon.com/cognito/latest/developerguide/cognito-user-pool-settings-advanced-security.html
|
|
|
+[cloudfront access logging permissions]:
|
|
|
+ https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/AccessLogs.html#AccessLogsBucketAndFileOwnership
|
|
|
+[s3 access logging permissions]:
|
|
|
+ https://docs.aws.amazon.com/AmazonS3/latest/dev/enable-logging-programming.html#grant-log-delivery-permissions-general
|
|
|
+[limits]: LIMITS.md
|
|
|
+[aws cloudformation stacksets]:
|
|
|
+ https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/what-is-cfnstacksets.html
|
|
|
+[cognito console]: https://console.aws.amazon.com/cognito
|
|
|
+[managing users in user pools guide]:
|
|
|
+ https://docs.aws.amazon.com/cognito/latest/developerguide/managing-users.html
|
|
|
+[cognito rest api integration guide]:
|
|
|
+ https://docs.aws.amazon.com/apigateway/latest/developerguide/apigateway-invoke-api-integrated-with-cognito-user-pool.html
|
|
|
+[lake formation data permissions console]:
|
|
|
+ https://docs.aws.amazon.com/lake-formation/latest/dg/granting-catalog-permissions.html
|
|
|
+[exporting stack output values]:
|
|
|
+ https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-stack-exports.html
|
|
|
+[s3 lifecycle policies]:
|
|
|
+ https://docs.aws.amazon.com/AmazonS3/latest/userguide/intro-lifecycle-rules.html
|