刘凡 9ff4d1d109 add S3,archive,truncate | 2 years ago | |
---|---|---|
.. | ||
.dependabot | 2 years ago | |
.github | 2 years ago | |
backend | 2 years ago | |
ci | 2 years ago | |
docs | 2 years ago | |
frontend | 2 years ago | |
templates | 2 years ago | |
tests | 2 years ago | |
.dockerignore | 2 years ago | |
.gitignore | 2 years ago | |
.pre-commit-config.yaml | 2 years ago | |
.pylintrc | 2 years ago | |
CHANGELOG.md | 2 years ago | |
CODE_OF_CONDUCT.md | 2 years ago | |
CONTRIBUTING.md | 2 years ago | |
LICENSE | 2 years ago | |
Makefile | 2 years ago | |
NOTICE | 2 years ago | |
README.md | 2 years ago | |
cfn-publish.config | 2 years ago | |
docker_run_with_creds.sh | 2 years ago | |
package-lock.json | 2 years ago | |
package.json | 2 years ago | |
pytest.ini | 2 years ago | |
requirements.in | 2 years ago | |
requirements.txt | 2 years ago |
Warning: Consult the Production Readiness guidelines prior to using the solution with production data
Amazon S3 Find and Forget is a solution to the need to selectively erase records from data lakes stored on Amazon Simple Storage Service (Amazon S3). This solution can assist data lake operators to handle data erasure requests, for example, pursuant to the European General Data Protection Regulation (GDPR).
The solution can be used with Parquet and JSON format data stored in Amazon S3 buckets. Your data lake is connected to the solution via AWS Glue tables and by specifying which columns in the tables need to be used to identify the data to be erased.
Once configured, you can queue record identifiers that you want the corresponding data erased for. You can then run a deletion job to remove the data corresponding to the records specified from the objects in the data lake. A report log is provided of all the S3 objects modified.
The solution is available as an AWS CloudFormation template and should take about 20 to 40 minutes to deploy. See the deployment guide for one-click deployment instructions, and the cost overview guide to learn about costs.
The solution provides a web user interface, and a REST API to allow you to integrate it in your own applications.
See the user guide to learn how to use the solution and the API specification to integrate the solution with your own applications.
The goal of the solution is to provide a secure, reliable, performant and cost effective tool for finding and removing individual records within objects stored in S3 buckets. In order to achieve this goal the solution has adopted the following design principles:
See the Architecture guide to learn more about the architecture.
Contributions are more than welcome. Please read the code of conduct and the contributing guidelines.
This project is licensed under the Apache-2.0 License.