messagingSAP-C02DEA-C01DOP-C02SAA-C03DVA-C02+1 more

Amazon Kinesis Data Streams: The Real-Time Data Highway

Massively scalable, durable, real-time data streaming — built for millisecond ingestion and replay

Updated 2026-02-21

Overview

Amazon Kinesis Data Streams (KDS) is a fully managed, serverless (on-demand mode) or provisioned real-time data streaming service capable of capturing gigabytes of data per second from hundreds of thousands of sources. Data is stored durably for up to 365 days and can be replayed, enabling multiple independent consumers to process the same stream at different times. Unlike message queues, KDS preserves ordering within shards and supports fan-out to multiple concurrent consumers without message deletion.

Ingest and process high-throughput, ordered, real-time data streams with durability and multi-consumer fan-out — the backbone of event-driven analytics pipelines on AWS.

Use When

Real-time log and event ingestion from thousands of sources (clickstreams, IoT sensors, application logs) requiring sub-second latency
Scenarios requiring message replay — multiple downstream consumers need to reprocess the same data independently (e.g., analytics + ML + archival simultaneously)
Ordered data processing within a partition key where sequence matters (e.g., financial transactions per account)
Building real-time dashboards or anomaly detection pipelines where data must be processed within milliseconds of arrival
Fan-out architectures where 5+ consumers must independently read the same stream without one consumer affecting another's progress

Avoid When

Simple task queuing or job scheduling — use Amazon SQS instead; KDS is not a queue and has no visibility timeout or dead-letter queue natively
Fully managed ETL without custom code — use Amazon Kinesis Data Firehose (now Amazon Data Firehose) which handles delivery to S3/Redshift/OpenSearch automatically without shard management
Pub/sub messaging to heterogeneous subscribers with filtering — use Amazon SNS or Amazon EventBridge which support protocol-agnostic fan-out and content-based filtering
Long-term message retention beyond 365 days — use S3 or a data lake; KDS is not a storage service
Low-volume, low-frequency workloads where SQS cost and simplicity are more appropriate

Key Features

Provisioned Shards Mode

Manual shard management; predictable cost; use when throughput is known and stable

On-Demand Mode

Auto-scales shards; no capacity planning; higher per-GB cost; ideal for variable traffic

Enhanced Fan-Out (EFO)

HTTP/2 push delivery; 2 MB/sec dedicated per consumer per shard; additional cost per shard-hour per consumer

Server-Side Encryption (SSE)

AWS KMS encryption at rest; can use AWS-managed or customer-managed CMKs

Data Replay

Consumers can rewind to any point within the retention window; critical differentiator vs SQS

Ordering Guarantees

Strict ordering guaranteed WITHIN a shard only; use consistent partition keys for related records

VPC Endpoints (PrivateLink)

Interface VPC endpoints available for private connectivity without internet gateway

CloudWatch Metrics

Stream-level and shard-level metrics; GetShardIterator, PutRecord, GetRecords metrics available

CloudTrail Integration

API calls logged to CloudTrail for auditing; data plane calls (PutRecord, GetRecords) can be logged

Lambda Event Source Mapping

Lambda polls KDS using event source mapping (not push); supports bisect-on-error, parallelization factor, tumbling windows

Kinesis Client Library (KCL)

Java/Python/.NET library for building consumer applications; handles checkpointing via DynamoDB table

Kinesis Producer Library (KPL)

Aggregates small records into larger ones (up to 1 MB) for cost efficiency; uses PutRecords under the hood

Shard Splitting and Merging

Manual resharding operations; splitting increases capacity, merging reduces cost; takes time to complete

Dead Letter Queue (native)

KDS itself has no DLQ; Lambda event source mapping supports on-failure destinations (SQS/SNS) for failed batch handling

Message Filtering (native)

KDS does not filter records; all consumers receive all records in a shard; filtering must be done in consumer code

FIFO Guarantee across shards

Ordering is per-shard only; cross-shard ordering is NOT guaranteed — this is a critical exam distinction

Integration Patterns

Event Source Mapping (Polling Consumer)

high freq

Amazon Kinesis Data StreamsAWS Lambda

Lambda polls KDS shards using event source mapping — NOT a push model. Lambda reads batches of records and processes them. Supports parallelization factor (1-10) to process multiple batches per shard concurrently. Supports bisect-on-error to split failing batches. On-failure destinations route failed batches to SQS or SNS. CRITICAL: This is polling, not event-driven push like S3→Lambda.

KDS as Firehose Source

high freq

Amazon Kinesis Data StreamsAmazon Kinesis Data Firehose

Kinesis Data Firehose can read directly from a KDS stream as its source, enabling managed delivery to S3, Redshift, OpenSearch, or Splunk without custom consumer code. Use this pattern when you need both real-time processing (KDS consumers) AND managed archival (Firehose) from the same stream simultaneously.

Stream Monitoring and Alarming

high freq

Amazon Kinesis Data StreamsAmazon CloudWatch

CloudWatch collects KDS metrics including GetRecords.IteratorAgeMilliseconds (consumer lag), WriteProvisionedThroughputExceeded, and ReadProvisionedThroughputExceeded. Set alarms on IteratorAgeMilliseconds to detect when consumers fall behind — this is the primary KDS health metric in production.

KCL Checkpointing Backend

high freq

Amazon Kinesis Data StreamsAmazon DynamoDB

The Kinesis Client Library (KCL) automatically creates and manages a DynamoDB table to store shard checkpoints and coordinate multi-instance consumer applications. Each shard gets one row in DynamoDB. Provision adequate DynamoDB capacity or use on-demand mode to avoid throttling the KCL coordination layer.

Hybrid Stream + Queue Architecture

high freq

Amazon Kinesis Data StreamsAmazon SQS

KDS handles ordered, high-throughput ingestion and fan-out; Lambda or EC2 consumers read from KDS and write processed results or work items to SQS for downstream workers that need visibility timeout, DLQ, and at-least-once delivery semantics. Use when combining real-time processing with reliable task distribution.

Streaming Service Selection

high freq

Amazon Kinesis Data StreamsAmazon MSK

KDS vs MSK (Managed Streaming for Apache Kafka) is a frequent exam comparison. Choose KDS for AWS-native integration, no broker management, and simpler ops. Choose MSK when you need Kafka ecosystem compatibility (Kafka Connect, Kafka Streams, existing Kafka producers/consumers), or when migrating existing Kafka workloads to AWS.

Kinesis → Lambda → EventBridge Fan-Out

high freq

Amazon Kinesis Data StreamsAmazon EventBridge

Lambda consumes KDS records and publishes structured events to EventBridge for content-based routing to multiple targets. Use this pattern when downstream consumers need filtering, schema validation, or routing to heterogeneous targets (Step Functions, SNS, SQS, HTTP endpoints) that KDS cannot natively support.

Audit and Compliance Logging

high freq

Amazon Kinesis Data StreamsAWS CloudTrail

CloudTrail logs all KDS management API calls (CreateStream, DeleteStream, AddTagsToStream, etc.) automatically. Data plane operations (PutRecord, GetRecords) can optionally be logged. Use for compliance auditing of who accessed stream data and when.

Service Limits & Quotas

LimitValueNote

Default shard limit per account per region

200 shards (on-demand streams count toward this) shards

Historically this was 50 shards; it has been raised over time. Do not memorize old values.

Write throughput per shard (provisioned)

1 MB/sec or 1,000 records/sec per shard

Both limits apply simultaneously — whichever is hit first causes throttling. Candidates forget the 1,000 records/sec limit and focus only on MB/sec.

Read throughput per shard — standard consumers

2 MB/sec shared across ALL standard consumers per shard

This is the #1 source of confusion. Standard GetRecords = shared 2 MB/sec. Enhanced Fan-Out = 2 MB/sec PER registered consumer.

Read throughput per shard — Enhanced Fan-Out consumers

2 MB/sec per registered consumer (dedicated) per shard per consumer

Enhanced Fan-Out has an additional per-shard-hour cost on top of standard shard costs.

Maximum record size

1 MB per record

Commonly confused with SQS max message size (256 KB) and SNS max message size (256 KB). KDS allows 1 MB.

Data retention period

24 hours (default) up to 365 days (extended) hours/days

Old exams referenced 7 days as the maximum. The maximum is now 365 days. If you see '7 days' as an answer option alongside '365 days' for maximum retention, choose 365 days.

GetRecords API calls per shard per second (standard consumers)

5 transactions/sec per shard

This limit is separate from the 2 MB/sec throughput limit — both apply. Candidates forget the TPS limit entirely.

Maximum shards per stream

No documented hard per-stream limit; account-level quota applies shards

Do not confuse per-stream limits with per-account limits.

On-demand mode maximum throughput

200 MB/sec write, 400 MB/sec read per stream (scales automatically) per stream

On-demand mode was launched in 2021. Pre-2021 exam prep materials won't mention it — be aware it exists and is a valid answer for 'no shard management' scenarios.

Number of registered Enhanced Fan-Out consumers per stream

20 registered consumers per stream (soft limit) consumers per stream

Do not confuse 'registered consumers' (Enhanced Fan-Out) with standard consumers — standard consumers have no registration limit.

PutRecords batch size

Up to 500 records per PutRecords call, total payload ≤ 5 MB records per API call

PutRecords does NOT guarantee all-or-nothing — partial success is possible. This is a common trap distinguishing KDS from transactional systems.

Shard iterator expiry

5 minutes minutes

Relevant for custom GetRecords consumers; KCL and Lambda event source mapping handle this automatically.

Pricing Model

Pay for shard-hours (provisioned) or per-GB ingested (on-demand) plus optional extended retention and Enhanced Fan-Out

Provisioned mode: charged per shard-hour (24/7 whether used or not) + per PUT payload unit (25 KB chunks)
On-demand mode: charged per GB of data written and per GB of data read — no shard-hour charge; higher per-GB rate than provisioned at scale
Extended data retention (beyond 24 hours up to 7 days): additional per-shard-hour charge
Long-term data retention (beyond 7 days up to 365 days): additional per-GB-hour charge
Enhanced Fan-Out: additional per-shard-hour per registered consumer + per-GB retrieved charge
No charge for data transfer within the same AWS region between KDS and Lambda, Kinesis Data Firehose, or other AWS services in the same region (standard data transfer rates may apply across regions)

Exam Tips

criticalLambda Event Source Mapping vs Push Invocation

Lambda + KDS uses EVENT SOURCE MAPPING (polling), NOT a push model. Lambda polls the stream — KDS does not invoke Lambda directly. This is fundamentally different from S3→Lambda (push/async invoke) and SNS→Lambda (push/sync invoke). If an exam question describes 'Lambda being triggered by KDS,' the mechanism is polling via event source mapping.

criticalEnhanced Fan-Out vs Standard Consumer Throughput

Standard consumers SHARE the 2 MB/sec read throughput per shard. If you have multiple consumers reading the same shard via GetRecords, they compete for this bandwidth. The solution is Enhanced Fan-Out (EFO), which gives each registered consumer a DEDICATED 2 MB/sec per shard. Exam questions describing slow consumers or read throttling with multiple applications = EFO answer.

criticalPartition Keys and Shard Ordering

KDS guarantees ordering WITHIN a shard only. To ensure related records are ordered (e.g., all events for user_id=123), use a consistent partition key (user_id). Records with the same partition key always go to the same shard. Cross-shard ordering is never guaranteed regardless of configuration.

criticalConsumer Lag Monitoring

IteratorAgeMilliseconds is the most important KDS operational metric. It measures how far behind the last record read is from the latest record in the stream (consumer lag). If this metric grows, your consumers cannot keep up — you need more shards, more consumer parallelism, or Enhanced Fan-Out. Set CloudWatch alarms on this metric in production.

criticalWrite Throttling and Resharding

ProvisionedThroughputExceededException on WRITE means you've exceeded 1 MB/sec OR 1,000 records/sec on a shard. Solutions: (1) Increase shard count, (2) Implement exponential backoff with jitter in producers, (3) Use KPL for automatic aggregation, (4) Switch to on-demand mode. Do NOT increase record size — that makes it worse.

critical

Lambda POLLS Kinesis (event source mapping) — KDS does NOT push to Lambda. This is identical to SQS→Lambda behavior. S3→Lambda is a push model. Getting this wrong invalidates your understanding of retry behavior, concurrency, and error handling for KDS+Lambda architectures.

critical

Standard consumer read throughput (2 MB/sec) is SHARED across all consumers on a shard. Multiple applications reading the same shard = bandwidth competition. Solution = Enhanced Fan-Out for dedicated 2 MB/sec per registered consumer. Exam keyword: 'multiple consumer applications reading the same stream' → Enhanced Fan-Out.

critical

Ordering in KDS is guaranteed WITHIN a shard only. Use consistent partition keys to route related records to the same shard. There is NO cross-shard ordering guarantee regardless of any configuration. Single shard = global order but limits throughput to 1 MB/sec write.

importantCapacity Planning

On-demand mode vs Provisioned mode decision: Choose ON-DEMAND when traffic is unpredictable/spiky or you want zero capacity planning. Choose PROVISIONED when traffic is predictable and you want to optimize cost (on-demand is more expensive per GB at high, sustained throughput). Both modes support all KDS features.

importantData Retention and Replay

KDS data retention defaults to 24 hours. Maximum is 365 days (not 7 days — that was the old maximum). Extended retention costs extra per shard-hour. Long-term retention (>7 days) costs per GB-hour. Exam scenarios involving data replay or late-arriving consumers require checking if retention is sufficient.

importantPutRecords Partial Failure Handling

PutRecords (batch) can partially fail — some records succeed while others fail within the same API call. Always inspect the FailedRecordCount in the response and retry only failed records. This is NOT an atomic/transactional operation. Contrast with SQS SendMessageBatch which also has partial failure semantics.

importantKCL DynamoDB Dependency

KCL (Kinesis Client Library) uses DynamoDB to store checkpoints and coordinate lease assignment across consumer instances. If your KCL application is throttling DynamoDB, provision more DynamoDB capacity or switch the KCL table to on-demand mode. This is a real operational gotcha that appears in associate and professional exam scenarios.

importantKDS vs SQS Service Selection

When comparing KDS vs SQS: KDS = ordered within shard, replayable, fan-out to multiple consumers, retention up to 365 days, requires consumer to track position. SQS = unordered (standard) or FIFO (limited throughput), message deleted after consumption, single logical consumer per message, built-in DLQ, visibility timeout. For 'multiple consumers independently processing the same message' — always KDS (or SNS fan-out to multiple SQS queues).

Good to KnowLambda Parallelization Factor

Lambda parallelization factor for KDS: You can process up to 10 batches per shard concurrently by setting ParallelizationFactor (1-10). This increases throughput without adding shards but does NOT preserve ordering across parallel batches within the same shard. Use only when order within a shard doesn't matter.

Good to KnowKPL Aggregation and KCL De-aggregation

The Kinesis Producer Library (KPL) aggregates multiple small records into a single KDS record (up to 1 MB) to maximize throughput and reduce cost (fewer PUT payload units billed). The Kinesis Client Library (KCL) automatically de-aggregates KPL records. If you use KPL producers but a non-KCL consumer, you must manually de-aggregate records.

Common Misconceptions & Traps

Common Mistake

Lambda is 'triggered' by Kinesis Data Streams in a push model, similar to how S3 event notifications push events to Lambda.

Correct

Lambda uses EVENT SOURCE MAPPING to POLL Kinesis Data Streams. Lambda's internal poller reads batches from the stream on your behalf — KDS never directly invokes Lambda. This is the same polling model used for SQS. S3 event notifications are a completely different mechanism (asynchronous push invocation).

This is the #1 misconception across multiple certifications. Exam questions deliberately mix up push vs pull invocation models. Remember: KDS + SQS = Lambda POLLS. S3 + SNS + API Gateway = Lambda is PUSHED. Getting this wrong leads to incorrect answers about retry behavior, error handling, and concurrency.

Common Mistake

Multiple consumer applications can all read from a Kinesis shard at full speed because each gets its own 2 MB/sec read throughput.

Correct

Standard (GetRecords) consumers SHARE the 2 MB/sec read throughput per shard across ALL consumers. If you have 4 consumers each needing 2 MB/sec, you need Enhanced Fan-Out — which gives each REGISTERED consumer a dedicated 2 MB/sec per shard via HTTP/2 push (SubscribeToShard API), at additional cost.

Candidates assume 2 MB/sec is per consumer. It's per shard total for standard consumers. This causes real production issues and appears frequently in exam questions about slow consumers or read throttling. The keyword 'multiple applications consuming the same stream' should immediately make you think Enhanced Fan-Out.

Common Mistake

Kinesis Data Streams guarantees strict ordering of all records across the entire stream.

Correct

KDS guarantees ordering ONLY WITHIN a single shard. Records with the same partition key always go to the same shard (preserving relative order for that key). Records across different shards have NO guaranteed ordering. If you need global ordering, you must use a single shard (which limits throughput to 1 MB/sec write, 2 MB/sec read).

Exam questions often describe a scenario requiring 'ordered processing of financial transactions' and ask which service/configuration ensures this. The correct answer involves using a consistent partition key (like account_id) to route related records to the same shard — not assuming the whole stream is ordered.

Common Mistake

Kinesis Data Streams and Amazon Kinesis Data Firehose (Amazon Data Firehose) are the same service or interchangeable.

Correct

KDS is a raw streaming service requiring custom consumer code; you control retention, replay, and processing logic. Firehose is a fully managed delivery service that automatically loads data to destinations (S3, Redshift, OpenSearch, Splunk) with built-in transformation via Lambda — no shard management, no consumer code. Firehose CAN use KDS as its source, but they serve different purposes.

AWS has multiple 'Kinesis' branded services and candidates conflate them. On exams: 'real-time processing with replay' = KDS. 'Managed delivery to S3/Redshift with no consumer code' = Firehose. 'Real-time analytics with SQL' = Kinesis Data Analytics (now Amazon Managed Service for Apache Flink).

Common Mistake

Kinesis Data Streams provides exactly-once delivery of records to consumers.

Correct

KDS provides AT-LEAST-ONCE delivery. Duplicate records can occur due to producer retries (network timeouts where the record was actually written), shard splits/merges, or consumer checkpointing failures. Consumer applications must implement idempotency to handle duplicates. KDS does NOT provide exactly-once semantics natively.

Candidates assume 'managed service' = exactly-once. This matters for exam questions about financial systems or deduplication requirements — the correct answer involves idempotent consumers, not relying on KDS for deduplication. Contrast with SQS FIFO which offers exactly-once processing within a 5-minute deduplication window.

Common Mistake

Switching from provisioned to on-demand mode (or vice versa) is instant and has no operational impact.

Correct

You can switch between on-demand and provisioned modes, but you can only switch once per rolling 24-hour period per stream. Plan mode switches carefully — this is not a toggle you can flip multiple times per day for cost optimization.

Candidates assume full flexibility to switch modes at will. The 24-hour cooldown between mode switches is a real operational constraint that appears in scenario questions about cost optimization and traffic pattern changes.

Common Mistake

Adding more shards to a Kinesis stream immediately solves all throughput problems for both producers and consumers.

Correct

Adding shards increases write capacity (more 1 MB/sec write lanes) and standard read capacity (more 2 MB/sec read lanes). However: (1) Resharding takes time and the stream is not immediately at full capacity. (2) If the bottleneck is consumer processing speed (not shard read throughput), adding shards without adding consumer instances won't help. (3) Hot shards from poor partition key distribution won't be fixed by adding shards — fix the partition key strategy first.

Resharding is often presented as the universal fix for KDS performance issues. Exam questions test whether you understand the root cause: write throttling (add shards or fix partition key), read throttling with multiple consumers (use EFO), consumer processing lag (add consumer parallelism), hot shards (improve partition key cardinality).

Memory Tricks

🧠

SHARD = Size (1MB write), Hundred records (1,000/sec write), Age (IteratorAge = lag metric), Replay (retention up to 365d), Dedicated (EFO = 2MB/sec per consumer)

🧠

KDS vs SQS memory trick: KDS = KEEP (data stays, replay possible, multiple consumers read same data). SQS = CONSUME (message gone after consumption, one logical consumer per message)

🧠

Enhanced Fan-Out = 'Everyone gets their OWN pipe' — each registered consumer gets dedicated 2 MB/sec, no sharing, HTTP/2 push instead of polling

🧠

Partition Key = Postal Code: all packages with the same postal code go to the same shard (delivery route), ensuring they arrive in order relative to each other

🧠

Lambda + KDS = Lambda LOOKS (polls). Lambda + S3 = S3 SHOUTS (push). Remember: streams require looking, events shout at you.

CertAI Tutor · SAP-C02, DEA-C01, DOP-C02, SAA-C03, DVA-C02, CLF-C02 · 2026-02-21

Ready to test your knowledge?

Practice SAP-C02, DEA-C01, DOP-C02, SAA-C03, DVA-C02, CLF-C02 exam questions with AI-powered explanations — free to start.

Amazon Kinesis Data Streams: The Real-Time Data Highway

Overview

Key Features

Integration Patterns

Service Limits & Quotas

Pricing Model

Exam Tips

Common Misconceptions & Traps

Memory Tricks

Ready to test your knowledge?

Related Cheat Sheets