
Cargando...
Massively scalable, durable, real-time data streaming — built for millisecond ingestion and replay
Amazon Kinesis Data Streams (KDS) is a fully managed, serverless (on-demand mode) or provisioned real-time data streaming service capable of capturing gigabytes of data per second from hundreds of thousands of sources. Data is stored durably for up to 365 days and can be replayed, enabling multiple independent consumers to process the same stream at different times. Unlike message queues, KDS preserves ordering within shards and supports fan-out to multiple concurrent consumers without message deletion.
Ingest and process high-throughput, ordered, real-time data streams with durability and multi-consumer fan-out — the backbone of event-driven analytics pipelines on AWS.
Use When
Avoid When
Provisioned Shards Mode
Manual shard management; predictable cost; use when throughput is known and stable
On-Demand Mode
Auto-scales shards; no capacity planning; higher per-GB cost; ideal for variable traffic
Enhanced Fan-Out (EFO)
HTTP/2 push delivery; 2 MB/sec dedicated per consumer per shard; additional cost per shard-hour per consumer
Server-Side Encryption (SSE)
AWS KMS encryption at rest; can use AWS-managed or customer-managed CMKs
Data Replay
Consumers can rewind to any point within the retention window; critical differentiator vs SQS
Ordering Guarantees
Strict ordering guaranteed WITHIN a shard only; use consistent partition keys for related records
VPC Endpoints (PrivateLink)
Interface VPC endpoints available for private connectivity without internet gateway
CloudWatch Metrics
Stream-level and shard-level metrics; GetShardIterator, PutRecord, GetRecords metrics available
CloudTrail Integration
API calls logged to CloudTrail for auditing; data plane calls (PutRecord, GetRecords) can be logged
Lambda Event Source Mapping
Lambda polls KDS using event source mapping (not push); supports bisect-on-error, parallelization factor, tumbling windows
Kinesis Client Library (KCL)
Java/Python/.NET library for building consumer applications; handles checkpointing via DynamoDB table
Kinesis Producer Library (KPL)
Aggregates small records into larger ones (up to 1 MB) for cost efficiency; uses PutRecords under the hood
Shard Splitting and Merging
Manual resharding operations; splitting increases capacity, merging reduces cost; takes time to complete
Dead Letter Queue (native)
KDS itself has no DLQ; Lambda event source mapping supports on-failure destinations (SQS/SNS) for failed batch handling
Message Filtering (native)
KDS does not filter records; all consumers receive all records in a shard; filtering must be done in consumer code
FIFO Guarantee across shards
Ordering is per-shard only; cross-shard ordering is NOT guaranteed — this is a critical exam distinction
Event Source Mapping (Polling Consumer)
high freqLambda polls KDS shards using event source mapping — NOT a push model. Lambda reads batches of records and processes them. Supports parallelization factor (1-10) to process multiple batches per shard concurrently. Supports bisect-on-error to split failing batches. On-failure destinations route failed batches to SQS or SNS. CRITICAL: This is polling, not event-driven push like S3→Lambda.
KDS as Firehose Source
high freqKinesis Data Firehose can read directly from a KDS stream as its source, enabling managed delivery to S3, Redshift, OpenSearch, or Splunk without custom consumer code. Use this pattern when you need both real-time processing (KDS consumers) AND managed archival (Firehose) from the same stream simultaneously.
Stream Monitoring and Alarming
high freqCloudWatch collects KDS metrics including GetRecords.IteratorAgeMilliseconds (consumer lag), WriteProvisionedThroughputExceeded, and ReadProvisionedThroughputExceeded. Set alarms on IteratorAgeMilliseconds to detect when consumers fall behind — this is the primary KDS health metric in production.
KCL Checkpointing Backend
high freqThe Kinesis Client Library (KCL) automatically creates and manages a DynamoDB table to store shard checkpoints and coordinate multi-instance consumer applications. Each shard gets one row in DynamoDB. Provision adequate DynamoDB capacity or use on-demand mode to avoid throttling the KCL coordination layer.
Hybrid Stream + Queue Architecture
high freqKDS handles ordered, high-throughput ingestion and fan-out; Lambda or EC2 consumers read from KDS and write processed results or work items to SQS for downstream workers that need visibility timeout, DLQ, and at-least-once delivery semantics. Use when combining real-time processing with reliable task distribution.
Streaming Service Selection
high freqKDS vs MSK (Managed Streaming for Apache Kafka) is a frequent exam comparison. Choose KDS for AWS-native integration, no broker management, and simpler ops. Choose MSK when you need Kafka ecosystem compatibility (Kafka Connect, Kafka Streams, existing Kafka producers/consumers), or when migrating existing Kafka workloads to AWS.
Kinesis → Lambda → EventBridge Fan-Out
high freqLambda consumes KDS records and publishes structured events to EventBridge for content-based routing to multiple targets. Use this pattern when downstream consumers need filtering, schema validation, or routing to heterogeneous targets (Step Functions, SNS, SQS, HTTP endpoints) that KDS cannot natively support.
Audit and Compliance Logging
high freqCloudTrail logs all KDS management API calls (CreateStream, DeleteStream, AddTagsToStream, etc.) automatically. Data plane operations (PutRecord, GetRecords) can optionally be logged. Use for compliance auditing of who accessed stream data and when.
Lambda + KDS uses EVENT SOURCE MAPPING (polling), NOT a push model. Lambda polls the stream — KDS does not invoke Lambda directly. This is fundamentally different from S3→Lambda (push/async invoke) and SNS→Lambda (push/sync invoke). If an exam question describes 'Lambda being triggered by KDS,' the mechanism is polling via event source mapping.
Standard consumers SHARE the 2 MB/sec read throughput per shard. If you have multiple consumers reading the same shard via GetRecords, they compete for this bandwidth. The solution is Enhanced Fan-Out (EFO), which gives each registered consumer a DEDICATED 2 MB/sec per shard. Exam questions describing slow consumers or read throttling with multiple applications = EFO answer.
KDS guarantees ordering WITHIN a shard only. To ensure related records are ordered (e.g., all events for user_id=123), use a consistent partition key (user_id). Records with the same partition key always go to the same shard. Cross-shard ordering is never guaranteed regardless of configuration.
IteratorAgeMilliseconds is the most important KDS operational metric. It measures how far behind the last record read is from the latest record in the stream (consumer lag). If this metric grows, your consumers cannot keep up — you need more shards, more consumer parallelism, or Enhanced Fan-Out. Set CloudWatch alarms on this metric in production.
ProvisionedThroughputExceededException on WRITE means you've exceeded 1 MB/sec OR 1,000 records/sec on a shard. Solutions: (1) Increase shard count, (2) Implement exponential backoff with jitter in producers, (3) Use KPL for automatic aggregation, (4) Switch to on-demand mode. Do NOT increase record size — that makes it worse.
Lambda POLLS Kinesis (event source mapping) — KDS does NOT push to Lambda. This is identical to SQS→Lambda behavior. S3→Lambda is a push model. Getting this wrong invalidates your understanding of retry behavior, concurrency, and error handling for KDS+Lambda architectures.
Standard consumer read throughput (2 MB/sec) is SHARED across all consumers on a shard. Multiple applications reading the same shard = bandwidth competition. Solution = Enhanced Fan-Out for dedicated 2 MB/sec per registered consumer. Exam keyword: 'multiple consumer applications reading the same stream' → Enhanced Fan-Out.
Ordering in KDS is guaranteed WITHIN a shard only. Use consistent partition keys to route related records to the same shard. There is NO cross-shard ordering guarantee regardless of any configuration. Single shard = global order but limits throughput to 1 MB/sec write.
On-demand mode vs Provisioned mode decision: Choose ON-DEMAND when traffic is unpredictable/spiky or you want zero capacity planning. Choose PROVISIONED when traffic is predictable and you want to optimize cost (on-demand is more expensive per GB at high, sustained throughput). Both modes support all KDS features.
KDS data retention defaults to 24 hours. Maximum is 365 days (not 7 days — that was the old maximum). Extended retention costs extra per shard-hour. Long-term retention (>7 days) costs per GB-hour. Exam scenarios involving data replay or late-arriving consumers require checking if retention is sufficient.
PutRecords (batch) can partially fail — some records succeed while others fail within the same API call. Always inspect the FailedRecordCount in the response and retry only failed records. This is NOT an atomic/transactional operation. Contrast with SQS SendMessageBatch which also has partial failure semantics.
KCL (Kinesis Client Library) uses DynamoDB to store checkpoints and coordinate lease assignment across consumer instances. If your KCL application is throttling DynamoDB, provision more DynamoDB capacity or switch the KCL table to on-demand mode. This is a real operational gotcha that appears in associate and professional exam scenarios.
When comparing KDS vs SQS: KDS = ordered within shard, replayable, fan-out to multiple consumers, retention up to 365 days, requires consumer to track position. SQS = unordered (standard) or FIFO (limited throughput), message deleted after consumption, single logical consumer per message, built-in DLQ, visibility timeout. For 'multiple consumers independently processing the same message' — always KDS (or SNS fan-out to multiple SQS queues).
Lambda parallelization factor for KDS: You can process up to 10 batches per shard concurrently by setting ParallelizationFactor (1-10). This increases throughput without adding shards but does NOT preserve ordering across parallel batches within the same shard. Use only when order within a shard doesn't matter.
The Kinesis Producer Library (KPL) aggregates multiple small records into a single KDS record (up to 1 MB) to maximize throughput and reduce cost (fewer PUT payload units billed). The Kinesis Client Library (KCL) automatically de-aggregates KPL records. If you use KPL producers but a non-KCL consumer, you must manually de-aggregate records.
Common Mistake
Lambda is 'triggered' by Kinesis Data Streams in a push model, similar to how S3 event notifications push events to Lambda.
Correct
Lambda uses EVENT SOURCE MAPPING to POLL Kinesis Data Streams. Lambda's internal poller reads batches from the stream on your behalf — KDS never directly invokes Lambda. This is the same polling model used for SQS. S3 event notifications are a completely different mechanism (asynchronous push invocation).
This is the #1 misconception across multiple certifications. Exam questions deliberately mix up push vs pull invocation models. Remember: KDS + SQS = Lambda POLLS. S3 + SNS + API Gateway = Lambda is PUSHED. Getting this wrong leads to incorrect answers about retry behavior, error handling, and concurrency.
Common Mistake
Multiple consumer applications can all read from a Kinesis shard at full speed because each gets its own 2 MB/sec read throughput.
Correct
Standard (GetRecords) consumers SHARE the 2 MB/sec read throughput per shard across ALL consumers. If you have 4 consumers each needing 2 MB/sec, you need Enhanced Fan-Out — which gives each REGISTERED consumer a dedicated 2 MB/sec per shard via HTTP/2 push (SubscribeToShard API), at additional cost.
Candidates assume 2 MB/sec is per consumer. It's per shard total for standard consumers. This causes real production issues and appears frequently in exam questions about slow consumers or read throttling. The keyword 'multiple applications consuming the same stream' should immediately make you think Enhanced Fan-Out.
Common Mistake
Kinesis Data Streams guarantees strict ordering of all records across the entire stream.
Correct
KDS guarantees ordering ONLY WITHIN a single shard. Records with the same partition key always go to the same shard (preserving relative order for that key). Records across different shards have NO guaranteed ordering. If you need global ordering, you must use a single shard (which limits throughput to 1 MB/sec write, 2 MB/sec read).
Exam questions often describe a scenario requiring 'ordered processing of financial transactions' and ask which service/configuration ensures this. The correct answer involves using a consistent partition key (like account_id) to route related records to the same shard — not assuming the whole stream is ordered.
Common Mistake
Kinesis Data Streams and Amazon Kinesis Data Firehose (Amazon Data Firehose) are the same service or interchangeable.
Correct
KDS is a raw streaming service requiring custom consumer code; you control retention, replay, and processing logic. Firehose is a fully managed delivery service that automatically loads data to destinations (S3, Redshift, OpenSearch, Splunk) with built-in transformation via Lambda — no shard management, no consumer code. Firehose CAN use KDS as its source, but they serve different purposes.
AWS has multiple 'Kinesis' branded services and candidates conflate them. On exams: 'real-time processing with replay' = KDS. 'Managed delivery to S3/Redshift with no consumer code' = Firehose. 'Real-time analytics with SQL' = Kinesis Data Analytics (now Amazon Managed Service for Apache Flink).
Common Mistake
Kinesis Data Streams provides exactly-once delivery of records to consumers.
Correct
KDS provides AT-LEAST-ONCE delivery. Duplicate records can occur due to producer retries (network timeouts where the record was actually written), shard splits/merges, or consumer checkpointing failures. Consumer applications must implement idempotency to handle duplicates. KDS does NOT provide exactly-once semantics natively.
Candidates assume 'managed service' = exactly-once. This matters for exam questions about financial systems or deduplication requirements — the correct answer involves idempotent consumers, not relying on KDS for deduplication. Contrast with SQS FIFO which offers exactly-once processing within a 5-minute deduplication window.
Common Mistake
Switching from provisioned to on-demand mode (or vice versa) is instant and has no operational impact.
Correct
You can switch between on-demand and provisioned modes, but you can only switch once per rolling 24-hour period per stream. Plan mode switches carefully — this is not a toggle you can flip multiple times per day for cost optimization.
Candidates assume full flexibility to switch modes at will. The 24-hour cooldown between mode switches is a real operational constraint that appears in scenario questions about cost optimization and traffic pattern changes.
Common Mistake
Adding more shards to a Kinesis stream immediately solves all throughput problems for both producers and consumers.
Correct
Adding shards increases write capacity (more 1 MB/sec write lanes) and standard read capacity (more 2 MB/sec read lanes). However: (1) Resharding takes time and the stream is not immediately at full capacity. (2) If the bottleneck is consumer processing speed (not shard read throughput), adding shards without adding consumer instances won't help. (3) Hot shards from poor partition key distribution won't be fixed by adding shards — fix the partition key strategy first.
Resharding is often presented as the universal fix for KDS performance issues. Exam questions test whether you understand the root cause: write throttling (add shards or fix partition key), read throttling with multiple consumers (use EFO), consumer processing lag (add consumer parallelism), hot shards (improve partition key cardinality).
SHARD = Size (1MB write), Hundred records (1,000/sec write), Age (IteratorAge = lag metric), Replay (retention up to 365d), Dedicated (EFO = 2MB/sec per consumer)
KDS vs SQS memory trick: KDS = KEEP (data stays, replay possible, multiple consumers read same data). SQS = CONSUME (message gone after consumption, one logical consumer per message)
Enhanced Fan-Out = 'Everyone gets their OWN pipe' — each registered consumer gets dedicated 2 MB/sec, no sharing, HTTP/2 push instead of polling
Partition Key = Postal Code: all packages with the same postal code go to the same shard (delivery route), ensuring they arrive in order relative to each other
Lambda + KDS = Lambda LOOKS (polls). Lambda + S3 = S3 SHOUTS (push). Remember: streams require looking, events shout at you.
CertAI Tutor · SAP-C02, DEA-C01, DOP-C02, SAA-C03, DVA-C02, CLF-C02 · 2026-02-21