This section describes current limitations of the Amazon S3 Find and Forget solution. We are actively working on adding additional features and supporting more data formats. For feature requests, please open an issue on our Issue Tracker.
The following data formats are supported:
Compression on Read | Snappy, Brotli, Gzip, uncompressed |
Compression on Write | Snappy |
Supported Types for Column Identifier | bigint, char, decimal, double, float, int, smallint, string, tinyint, varchar. Nested types (types whose parent is a struct, map, array) are only supported for struct type (*). |
Notes | (*) When using a type nested in a struct as column identifier with Apache Parquet files, use the Athena's version 2 engine. For more information, see Managing Workgroups |
Compression on Read | Gzip, uncompressed (**) |
Compression on Write | Gzip, uncompressed (**) |
Supported Types for Column Identifier | number, string. Nested types (types whose parent is a object, array) are only supported for object type. |
Notes | (**) The compression type is determined from the file extension. If no file extension is present the solution treats the data as uncompressed. If the data is compressed make sure the file name includes the compression extension, such as gz .When using OpenX JSON SerDe, ignore.malformed.json cannot be TRUE , dots.in.keys cannot be TRUE , and column mappings are not supported. For more information, see OpenX JSON SerDe |
The following data catalog provider and query executor combinations are supported:
Catalog Provider | Query Executor |
---|---|
AWS Glue | Amazon Athena |
Catalog Provider | Query Executor |
---|---|
Max Concurrent Jobs | 1 |
Max Athena Concurrency | See account service quota |
Max Fargate Concurrency | See account service quota |
DeletionTaskMemory
) specified when launching the stackGLACIER
or DEEP_ARCHIVE
storage classes are not
supported and will be ignoredIf you wish to increase the number of concurrent queries that can be by Athena
and therefore speed up the Find phase, you will need to request a Service Quota
increase for Athena. For more, information consult the Athena Service Quotas
page. Similarly, to increase the number of concurrent Fargate tasks and
therefore speed up the Forget phase, consult the Fargate Service Quotas page.
When configuring the solution, you should not set an AthenaConcurrencyLimit
or
DeletionTasksMaxNumber
greater than the respective Service Quote for your
account.
Amazon S3 Find and Forget is also bound by any other service quotas which apply to the underlying AWS services that it leverages. For more information, consult the AWS docs for Service Quotas and the relevant Service Quota page for the service in question: