The following metrics are important indicators of the health of the Amazon S3 Find and Forget Solution:
AWS/SQS - ApproximateNumberOfMessagesVisible
for the Object Deletion Queue
DLQ. Any value > 0 for this metric indicates that 1 or more objects could not
be processed during a deletion job. The job which triggered the message(s) to
be put in the queue will have a status of COMPLETED_WITH_ERRORS and the
ObjectUpdateFailed
event(s) will contain further debugging information.AWS/SQS - ApproximateNumberOfMessagesVisible
for the Events DLQ. Any value >
0 for this metrics indicates that 1 or more Job Events could not be processed.AWS/Athena - ProcessedBytes/TotalExecutionTime
. If the average processed
bytes and/or total execution time per query is rising, it may be indicative of
the average partition size also growing in size. This is not an issue per se,
however if partitions grow too large (or your dataset is unpartitioned), you
may eventually encounter Athena errors.AWS/States - ExecutionsFailed
. State machine executions failing indicates
that the Amazon S3 Find and Forget solution is misconfigured error. To resolve
this, find the State Machine execution which failed and investigate the cause
of the failure.AWS/States - ExecutionsTimedOut
. State machine timeouts indicate that Amazon
S3 Find and Forget is unable to complete a job before Step Functions kills the
execution due to it exceeding the allowed execution time limit. See
Troubleshooting for more details.If required, you can create CloudWatch Alarms for any of the aforementioned metrics to be notified of potential solution misconfiguration.
All standard metrics for the services used by the Amazon S3 Find and Forget Solution are available. For detailed information about the metrics and logging for a given service, view the relevant Monitoring docs for that service. The key services used by the solution:
1 CloudWatch Container Insights can be be enabled when deploying the
solution by setting EnableContainerInsights
to true
. Using Container
Insights will incur additional charges. It is disabled by default.
2 To obtain Athena metrics, you will need to enable metrics for the
workgroup you are using to execute the queries as described in the Athena
docs. By default the solution uses the primary workgroup,
however you can change this when deploying the stack using the AthenaWorkGroup
parameter