Understanding SQS Retry Behavior with Visibility Timeout and Max Receive Count

2025-08-26

When building resilient, event-driven applications with AWS Simple Queue Service (SQS), understanding and correctly configuring the retry mechanism is crucial. When a consumer fails to process a message, SQS can automatically allow the message to be processed again. This behavior is controlled by two key settings: the Visibility Timeout and the Max Receive Count (part of the Redrive policy).

Let's explore how these settings work together to create a robust retry strategy.

Core Configuration Settings

Here is an overview of the essential configurations for an SQS queue that dictates its retry behavior.

1. Visibility Timeout

The Visibility Timeout is the cornerstone of SQS message processing. When a consumer picks up a message from the queue, that message becomes "invisible" to other consumers for the duration of the visibility timeout. This prevents multiple consumers from processing the same message simultaneously.

In our example, the Visibility Timeout is set to 2 minutes. This means if a message processing fails, SQS will wait 2 minutes before making it available for another attempt.

2. Max Receive Count

The Max Receive Count is part of the queue's "Redrive policy." It defines the maximum number of times a message can be received (i.e., attempted) by consumers. If the receive count for a message exceeds this number, SQS will move the message to a designated Dead-Letter Queue (DLQ).

This is a critical safety mechanism to prevent "poison pill" messages (messages that consistently fail to process) from getting stuck in an infinite retry loop, which could clog your queue and waste resources.

In this configuration, the Max Receive Count is set to 3. A message will be attempted a total of three times before being moved to the DLQ.

Observing Retry Behavior in Logs

Let's analyze the Lambda execution logs to see these settings in action. The logs show a message being processed three times, with a consistent interval between each attempt.


timestamp,message
1756176908995,START RequestId: 367aa8d0-b575-5a72-8221-c9f79135dcc3 Version: $LATEST
1756176909833,level=error msg="Error on send mt receive request..."
1756176910078,{"errorMessage":"failed jobs: [{ItemIdentifier:ece84c9e-5555-4e44-b131-f6d74efa5e53}]","errorType":"errorString"}
...
1756177028635,START RequestId: d69a9bf8-c254-50cb-8f13-fadfe6ee4d07 Version: $LATEST
1756177028820,level=error msg="Error on send mt receive request..."
1756177028940,{"errorMessage":"failed jobs: [{ItemIdentifier:ece84c9e-5555-4e44-b131-f6d74efa5e53}]","errorType":"errorString"}
...
1756177148645,START RequestId: 759f2721-238e-5047-9efc-460606bc5d8b Version: $LATEST
1756177148823,level=error msg="Error on send mt receive request..."
1756177148938,{"errorMessage":"failed jobs: [{ItemIdentifier:ece84c9e-5555-4e44-b131-f6d74efa5e53}]","errorType":"errorString"}
    

Log Analysis:

Since the Max Receive Count is 3, after this third failed attempt, the message will not be returned to the main queue. Instead, it will be moved to the configured Dead-Letter Queue for inspection and manual intervention.

Conclusion

By combining Visibility Timeout and Max Receive Count, you gain fine-grained control over your message processing and retry strategy. The Visibility Timeout sets the delay between retries, giving transient issues time to resolve, while the Max Receive Count acts as a fail-safe to isolate problematic messages, ensuring the health and stability of your system.