
Cargando...
Fully managed message queuing that scales infinitely and decouples distributed systems with precision
Amazon Simple Queue Service (SQS) is a fully managed message queuing service that enables you to decouple and scale microservices, distributed systems, and serverless applications. It eliminates the complexity and overhead of managing and operating message-oriented middleware, allowing producers and consumers to operate independently at their own pace. SQS offers two queue types — Standard (at-least-once, best-effort ordering) and FIFO (exactly-once processing, strict ordering) — each optimized for different workload requirements.
Asynchronous message buffering between application components to prevent data loss, absorb traffic spikes, and enable independent scaling of producers and consumers
Use When
Avoid When
Standard Queue (At-Least-Once Delivery)
Best-effort ordering, nearly unlimited throughput, messages may be delivered more than once — requires idempotent consumers
FIFO Queue (Exactly-Once Processing)
Strict message ordering per message group ID, exactly-once processing within 5-minute deduplication window, 300 TPS (3,000 with batching)
Dead Letter Queue (DLQ)
Automatically routes failed messages after maxReceiveCount attempts; DLQ must match source queue type (Standard→Standard, FIFO→FIFO)
Long Polling
WaitTimeSeconds 1-20; reduces empty responses, API costs, and CPU usage on consumers; strongly recommended over short polling
Short Polling
Default behavior (WaitTimeSeconds=0); queries a subset of servers, may return empty responses even when messages exist — not recommended for production
Delay Queues
Postpone delivery of ALL new messages by 0-900 seconds (15 minutes) at the queue level
Message Timers (Per-Message Delay)
Individual message delay 0-900 seconds; overrides queue-level delay for that specific message
Visibility Timeout
Hides message from other consumers during processing; default 30 seconds, max 12 hours; extendable via ChangeMessageVisibility
Server-Side Encryption (SSE) with KMS
Encrypts message body at rest using AWS KMS CMKs; message attributes are NOT encrypted by SSE — store sensitive metadata in the message body
SSE with SQS-managed keys (SSE-SQS)
No additional KMS cost; uses SQS-managed encryption keys; less control than KMS but simpler and cheaper
VPC Endpoints (Interface Endpoints via PrivateLink)
Access SQS from within a VPC without traversing the public internet; required for compliance-sensitive workloads
Resource-Based Policies (Queue Policies)
Control who can send/receive messages; required for cross-account access and SNS→SQS delivery
Lambda Event Source Mapping (ESM)
Lambda polls SQS on your behalf; automatically scales concurrency based on queue depth; supports batch size 1-10,000 for Standard, 1-10 for FIFO
Message Deduplication (FIFO only)
Content-based deduplication (SHA-256 hash of body) or explicit MessageDeduplicationId; 5-minute deduplication window
Message Group ID (FIFO only)
Messages with the same MessageGroupId are processed in strict FIFO order; different group IDs can be processed in parallel by different consumers
Temporary Queues (Virtual Queues)
Client-side feature using the Temporary Queue Client library; enables request-response pattern without creating actual SQS queues for each request
SQS Extended Client Library
Stores large payloads (up to 2 GB) in S3 and sends S3 reference in SQS message; transparent to consumers using the same library
CloudWatch Metrics Integration
ApproximateNumberOfMessagesVisible, ApproximateAgeOfOldestMessage, NumberOfMessagesSent — use these for auto-scaling triggers and alarms
Cross-Account Queue Access
Requires queue policy (resource-based) to grant permissions to the external account; IAM policy alone is insufficient for cross-account
Message Filtering (via SNS Subscription Filter Policies)
SNS filters messages before delivering to SQS — SQS itself does not natively filter; filtering happens at the SNS layer in SNS→SQS fan-out
Batch Operations (Send, Receive, Delete)
Up to 10 messages per batch API call; reduces API calls and cost by up to 10x; critical for cost optimization
At-Rest Encryption
Via SSE-SQS (free) or SSE-KMS (KMS charges apply per API call)
In-Transit Encryption
HTTPS (TLS) enforced via queue policy condition aws:SecureTransport
High Throughput FIFO Mode
Increases FIFO throughput beyond standard limits; check current Service Quotas for exact values as limits have evolved
Lambda Event Source Mapping (ESM) — Serverless Queue Consumer
high freqLambda polls SQS queues automatically via ESM. Lambda scales concurrency based on queue depth (up to 1,000 concurrent executions for Standard queues). Failed batches can be sent to a DLQ or Lambda destination. Batch size 1-10,000 (Standard) or 1-10 (FIFO). Use ReportBatchItemFailures to partially succeed a batch — only failed messages return to queue. Critical: set Lambda timeout > visibility timeout to prevent duplicate processing.
SNS Fan-Out to Multiple SQS Queues
high freqOne SNS topic delivers messages to multiple SQS queues simultaneously. Each queue can have different consumers, enabling parallel independent processing of the same event. Use SNS Subscription Filter Policies to route specific message subsets to specific queues. Queue policy must allow SNS to send messages (sns:SendMessage). This is the canonical decoupling + fan-out pattern on AWS exams.
Queue-Based Auto Scaling
high freqUse ApproximateNumberOfMessagesVisible CloudWatch metric to trigger Auto Scaling policies. Scale out EC2 instances when queue depth grows; scale in when queue drains. Target Tracking Scaling with a custom metric (messages per instance) is the recommended approach. This pattern decouples traffic spikes from processing capacity and is a core SAA-C03 scenario.
S3 Event Notifications → SQS
high freqS3 sends event notifications (ObjectCreated, ObjectRemoved, etc.) directly to an SQS queue. The queue buffers events for reliable, ordered (Standard: best-effort) processing. Queue policy must allow S3 to send messages. Use this pattern when you need to buffer S3 events before processing rather than triggering Lambda directly — provides durability and retry capability.
SQS as Step Functions Task Input / Wait for Callback
high freqStep Functions can send messages to SQS and pause execution using .waitForTaskToken integration. A worker picks up the SQS message, processes it, and calls SendTaskSuccess/SendTaskFailure with the task token to resume the workflow. This enables human approval workflows and long-running external integrations within Step Functions state machines.
EventBridge → SQS Dead Letter / Buffering
high freqEventBridge rules can target SQS queues directly, buffering events for downstream consumers. SQS acts as a reliable buffer when the downstream service cannot keep up with EventBridge event rates. Configure a DLQ on the EventBridge rule target (not just on SQS) to capture events that EventBridge fails to deliver — these are two separate DLQ configurations that serve different failure modes.
Queue Depth Monitoring and Alerting
high freqMonitor ApproximateNumberOfMessagesVisible for queue backlog, ApproximateAgeOfOldestMessage for processing latency SLA violations, and NumberOfMessagesSent vs NumberOfMessagesDeleted for throughput balance. Set CloudWatch Alarms on these metrics for operational visibility. ApproximateAgeOfOldestMessage is the most important metric for detecting stuck consumers or DLQ buildup.
Write Buffer / Database Offload Pattern
high freqProducers write to SQS instead of directly to a database. Consumers read from SQS and write to RDS/DynamoDB at a controlled rate. This prevents database overload during traffic spikes, enables retry on database failures, and smooths write throughput. Common in e-commerce, gaming leaderboards, and IoT data ingestion architectures.
Standard vs FIFO Decision Framework: If the question mentions 'exactly-once', 'ordered', 'deduplication', 'financial transactions', or 'inventory' → FIFO. If it mentions 'maximum throughput', 'best effort', or 'at-least-once' → Standard. FIFO queue names MUST end in .fifo.
Visibility Timeout is NOT the same as message retention. Visibility timeout hides a message from other consumers while one consumer processes it (default 30s, max 12h). Retention is how long messages stay in the queue before being deleted (default 4 days, max 14 days). Confusing these two settings is the #1 SQS exam trap.
DLQ must match queue type: Standard queue → Standard DLQ. FIFO queue → FIFO DLQ. If you configure a FIFO source with a Standard DLQ, it will fail. This is tested in DVA-C02 and DOP-C02.
For Lambda + SQS: If Lambda fails to process a batch, the ENTIRE batch returns to the queue (unless you use ReportBatchItemFailures). To avoid reprocessing successful messages in a failed batch, implement partial batch success reporting — this is a DVA-C02 favorite.
Long polling (WaitTimeSeconds=1-20) is ALWAYS better than short polling in production. Short polling is the default and queries only a subset of SQS servers, which can return empty responses even when messages exist. Long polling waits up to 20 seconds for a message, reducing empty responses and API costs.
Visibility Timeout ≠ Message Retention. Visibility Timeout (default 30s, max 12h) hides a message during processing. Retention (default 4 days, max 14 days) determines when unread messages are permanently deleted. Confusing these two is the #1 SQS mistake on all certification exams.
FIFO DLQ must be FIFO. Standard DLQ must be Standard. You cannot cross queue types for Dead Letter Queues. Also: FIFO queues require the .fifo suffix in their name, including the DLQ.
For Lambda + SQS, set Lambda timeout GREATER than the SQS visibility timeout, OR use ChangeMessageVisibility to extend it. If Lambda takes longer than the visibility timeout, the message reappears in the queue and gets processed again — causing duplicates even in FIFO queues.
Message size > 256 KB: Use the SQS Extended Client Library. Store the payload in S3, send the S3 object reference in the SQS message. Both producer and consumer must use the Extended Client Library. Maximum payload via this pattern is 2 GB (stored in S3).
SNS → SQS Fan-Out requires a Queue Policy (resource-based policy) that explicitly allows SNS to call sqs:SendMessage on the queue. IAM role on the SNS topic alone is NOT sufficient — the queue must also grant permission. Cross-account fan-out requires both IAM and queue policy.
FIFO Message Group ID controls parallelism: messages with the SAME group ID are processed sequentially by ONE consumer. Messages with DIFFERENT group IDs can be processed in parallel by different consumers. Use multiple group IDs to increase FIFO throughput while maintaining per-group ordering.
Pricing gotcha: SQS charges per 64 KB chunk, not per message. A single 256 KB message = 4 billing requests. Always batch messages (up to 10 per API call) to minimize API call count and reduce costs. Batch operations are billed as one request per 64 KB chunk of the total batch payload.
For queue-based auto scaling of EC2, use the custom CloudWatch metric 'ApproximateNumberOfMessagesVisible / number of running instances' as the target for Target Tracking Scaling. This ensures each instance processes a roughly equal share of the queue. This is the canonical answer for 'how do you auto-scale consumers based on SQS queue depth'.
Delay Queue vs Message Timer vs Visibility Timeout: Delay Queue delays ALL new messages (queue setting, 0-15 min). Message Timer delays ONE specific message (per-message, 0-15 min). Visibility Timeout hides a message AFTER it has been received (not a delivery delay). These three are commonly confused in scenario questions.
SQS is NOT a streaming service. It does not support replay/reprocessing of consumed messages. If you need to reprocess historical events, use Amazon Kinesis Data Streams (retention up to 365 days with extended retention). SQS messages are permanently deleted after successful consumption.
SSE with KMS: message attributes are NOT encrypted, only the message body is. If you need to encrypt sensitive metadata, include it in the message body, not as message attributes. SSE-SQS (SQS-managed keys) is free and encrypts the body — use KMS only when you need key rotation control or audit trails.
Common Mistake
FIFO queues guarantee exactly-once delivery forever — once a message is sent, it will never be duplicated under any circumstance
Correct
FIFO exactly-once deduplication only applies within the 5-minute deduplication window. After 5 minutes, a message with the same MessageDeduplicationId is treated as a NEW message and WILL be delivered again. Additionally, if you don't provide a deduplication ID and don't enable content-based deduplication, FIFO provides no deduplication at all.
Exam questions test whether you understand the 5-minute window constraint. If a scenario describes a producer retrying after a network timeout that lasted longer than 5 minutes, FIFO will not deduplicate — the consumer must be idempotent regardless.
Common Mistake
A Dead Letter Queue (DLQ) is a performance optimization — it speeds up queue processing by removing slow messages
Correct
A DLQ is a RELIABILITY and OBSERVABILITY feature, not a performance tool. Its purpose is to capture messages that repeatedly fail processing (exceeding maxReceiveCount) so they can be inspected, debugged, and reprocessed — preventing 'poison pill' messages from blocking or endlessly cycling in the source queue. DLQs do not improve throughput.
This misconception appears directly in exam questions. The correct framing is: DLQ = failure isolation + debugging capability + prevents infinite retry loops. Always pair a DLQ with CloudWatch alarms on the DLQ depth to detect processing failures proactively.
Common Mistake
Setting a long visibility timeout is always safe because it prevents duplicate processing
Correct
An excessively long visibility timeout means that if a consumer crashes or is terminated mid-processing, the message remains hidden from other consumers for the entire timeout duration — causing significant processing delays. The correct approach is to set visibility timeout slightly longer than your expected processing time, and use ChangeMessageVisibility to extend it dynamically if processing takes longer than expected.
There is a direct tradeoff: too short → duplicates; too long → delayed recovery from consumer failures. The exam tests whether you understand this balance and know that ChangeMessageVisibility is the mechanism for dynamic extension.
Common Mistake
SQS Standard queues deliver messages in the order they were sent
Correct
Standard queues offer 'best-effort ordering' — messages are generally delivered in the order sent, but this is NOT guaranteed. Messages can arrive out of order, and the same message can be delivered more than once (at-least-once delivery). If ordering and exactly-once delivery matter, you MUST use a FIFO queue.
This is one of the most common wrong answers on scenario questions. When a question describes a use case requiring strict ordering (financial ledger, inventory updates), Standard queue is always wrong — even if it seems simpler or cheaper.
Common Mistake
You can use a Standard SQS queue as the Dead Letter Queue for a FIFO SQS queue
Correct
The DLQ must be the SAME TYPE as the source queue. A FIFO queue's DLQ must be a FIFO queue (with .fifo suffix). A Standard queue's DLQ must be a Standard queue. Mixing types is not supported and will result in configuration errors.
This is a specific, testable constraint in DVA-C02 and DOP-C02. Remember: FIFO → FIFO DLQ, Standard → Standard DLQ. The .fifo suffix requirement for the DLQ name is also tested.
Common Mistake
Long polling always waits the full 20 seconds before returning, making it slower than short polling for time-sensitive applications
Correct
Long polling returns IMMEDIATELY when a message is available — it only waits up to the configured WaitTimeSeconds if no messages are present. If messages are in the queue, long polling returns them instantly, just like short polling. Long polling is strictly better than short polling in virtually all production scenarios.
This misconception causes candidates to incorrectly recommend short polling for 'low latency' scenarios. Long polling has equal or better latency when messages are present, and dramatically lower cost and CPU usage when the queue is empty.
Common Mistake
Configuring a DLQ on an SQS queue is sufficient to capture all failed events in an EventBridge → SQS → Lambda pipeline
Correct
In an EventBridge → SQS → Lambda pipeline, there are THREE potential failure points, each requiring its own DLQ: (1) EventBridge rule target DLQ — captures events EventBridge fails to deliver to SQS; (2) SQS DLQ — captures messages Lambda fails to process after maxReceiveCount attempts; (3) Lambda destination (on-failure) — captures Lambda execution failures. Configuring only the SQS DLQ misses EventBridge delivery failures.
This is a critical architectural gap that appears in DOP-C02 and SAP-C02 questions. Each service layer in an event-driven pipeline can fail independently and needs its own failure capture mechanism.
Common Mistake
SQS automatically scales Lambda to handle any volume of messages without any configuration
Correct
While Lambda ESM does automatically scale Lambda concurrency based on SQS queue depth, there are limits: Lambda concurrency is bounded by the account-level concurrency limit and any function-level reserved concurrency. If Lambda hits its concurrency limit, messages remain in the queue (not lost) but processing is delayed. You must monitor Lambda throttling and set appropriate reserved concurrency and queue visibility timeout to avoid cascading failures.
Exam questions test whether you understand that Lambda auto-scaling has limits and that throttled Lambda invocations cause messages to return to the queue (increasing visibility timeout pressure), not to be lost — but this can cause processing delays and DLQ entries if the visibility timeout expires.
FIFO = Finance, Inventory, Financial Orders — use FIFO when order and exactness matter in these domains
DLQ = Dead Letter = Debugging + Logging + Quarantine — it's for investigation, not speed
Visibility Timeout vs Retention: 'V' for Visibility = Vanishes temporarily (hidden during processing). 'R' for Retention = Really gone after expiry (deleted from queue).
Long Polling: 'WAIT for it' — waits up to 20s if empty, returns INSTANTLY if messages exist. Never slower, always cheaper.
Standard = Scatter (any order, any number of times). FIFO = First In, First Out (strict order, once per 5-min window).
Pricing chunks: 64 KB = 1 request. 256 KB message = 4 requests. Think of it as '64 KB slices of pizza — you pay per slice, not per pizza.'
SNS → SQS fan-out: 'The queue must INVITE SNS' — queue policy must grant sqs:SendMessage to SNS, or the delivery fails silently.
CertAI Tutor · SAA-C03, SAP-C02, DVA-C02, DEA-C01, DOP-C02, CLF-C02 · 2026-02-21
In the Same Category
Comparisons