Limits

This section describes current limitations of the Amazon S3 Find and Forget solution. We are actively working on adding additional features and supporting more data formats. For feature requests, please open an issue on our Issue Tracker.

Supported Data Formats

The following data formats are supported:

Apache Parquet


Compression on Read	Snappy, Brotli, Gzip, uncompressed
Compression on Write	Snappy
Supported Types for Column Identifier	bigint, char, decimal, double, float, int, smallint, string, tinyint, varchar. Nested types (types whose parent is a struct, map, array) are only supported for struct type (*).
Notes	(*) When using a type nested in a struct as column identifier with Apache Parquet files, use the Athena's version 2 engine. For more information, see Managing Workgroups

JSON


Compression on Read	Gzip, uncompressed (**)
Compression on Write	Gzip, uncompressed (**)
Supported Types for Column Identifier	number, string. Nested types (types whose parent is a object, array) are only supported for object type.
Notes	(**) The compression type is determined from the file extension. If no file extension is present the solution treats the data as uncompressed. If the data is compressed make sure the file name includes the compression extension, such as `gz`. When using OpenX JSON SerDe, `ignore.malformed.json` cannot be `TRUE`, `dots.in.keys` cannot be `TRUE`, and column mappings are not supported. For more information, see OpenX JSON SerDe

Supported Query Providers

The following data catalog provider and query executor combinations are supported:

Catalog Provider	Query Executor
AWS Glue	Amazon Athena

Concurrency Limits

Catalog Provider	Query Executor
Max Concurrent Jobs	1
Max Athena Concurrency	See account service quota
Max Fargate Concurrency	See account service quota

Other Limitations

Only buckets with versioning set to Enabled are supported
Decompressed individual object size must be less than the Fargate task memory limit (DeletionTaskMemory) specified when launching the stack
S3 Objects using the GLACIER or DEEP_ARCHIVE storage classes are not supported and will be ignored
The bucket targeted by a data mapper must be in the same region as the Amazon S3 Find and Forget deployment
Client-side encrypted S3 Objects are supported only when a symmetric customer master key (CMK) is stored in AWS Key Management Service (AWS KMS) and encrypted using one of the AWS supported SDKs.
If the bucket targeted by a data mapper belongs to an account other than the account that the Amazon S3 Find and Forget Solution is deployed in, only SSE-KMS with a customer master key (CMK) may be used for encryption
To avoid race conditions when objects are processed by the solution, manipulating existing data lake objects must not occur while a Job is running. The solution will attempt to verify object integrity between read and write operations and attempt to rollback any changes if an inconsistency is detected. If the rollback fails, you will need to manually reconcile the object versions to avoid data inconsistency or loss
We recommend that you avoid running a Deletion Job in parallel to a workload that reads from the data lake unless it has been designed to handle temporary inconsistencies between objects
Buckets with MFA Delete enabled are not supported
When the Ignore object not found exceptions during deletion setting is enabled, the solution will not delete old versions for ignored objects. Make sure there is some mechanism for deleting these old versions to avoid retaining data longer than intended.

Service Quotas

If you wish to increase the number of concurrent queries that can be by Athena and therefore speed up the Find phase, you will need to request a Service Quota increase for Athena. For more, information consult the Athena Service Quotas page. Similarly, to increase the number of concurrent Fargate tasks and therefore speed up the Forget phase, consult the Fargate Service Quotas page. When configuring the solution, you should not set an AthenaConcurrencyLimit or DeletionTasksMaxNumber greater than the respective Service Quote for your account.

Amazon S3 Find and Forget is also bound by any other service quotas which apply to the underlying AWS services that it leverages. For more information, consult the AWS docs for Service Quotas and the relevant Service Quota page for the service in question:

LIMITS.md 8.9 KB Riwayat Mentahan