
Cargando...
Run SQL or Apache Flink on streaming data without managing infrastructure
Amazon Kinesis Data Analytics enables you to process and analyze streaming data in real time using SQL (legacy) or Apache Flink (current recommended path), without needing to manage servers or clusters. It reads from Kinesis Data Streams or Kinesis Data Firehose, applies transformations or analytics, and writes results to downstream AWS services. The service is fully managed, auto-scaling, and integrates natively into the AWS streaming ecosystem.
Perform continuous, real-time analytics on streaming data using SQL or Apache Flink without provisioning or managing compute resources
Use When
Avoid When
Apache Flink runtime support
Recommended path; supports Java, Scala, Python; stateful stream processing with exactly-once semantics
SQL-based stream processing (legacy)
Original offering using ANSI SQL with streaming extensions; AWS recommends migrating to Flink-based applications
Kinesis Data Streams as source
Native, low-latency integration; primary streaming source
Kinesis Data Firehose as source
Can read from Firehose delivery streams as input
Amazon MSK (Managed Streaming for Kafka) as source
Flink applications support Apache Kafka sources via MSK connector
Auto-scaling
Flink applications auto-scale KPUs based on load; SQL applications also scale automatically
Stateful processing with checkpointing
Flink only; checkpoints and savepoints stored in S3 for fault tolerance
Windowed aggregations
Tumbling, sliding, and session windows supported in both SQL and Flink
Lambda function as output destination
Results can be sent to Lambda for further processing or custom delivery
S3 as output destination
Flink applications can write directly to S3
Kinesis Data Firehose as output destination
Common pattern: analytics results → Firehose → S3/Redshift
Kinesis Data Streams as output destination
Enables chained stream processing pipelines
VPC support
Flink applications can run in a VPC to access private resources like RDS or MSK
CloudWatch metrics and logging
Application-level metrics and logs available natively
Kinesis Data Analytics Studio (Zeppelin notebooks)
Interactive development environment using Apache Zeppelin; runs Flink under the hood
Real-Time ETL to Data Lake
high freqKinesis Data Streams ingests raw events → Kinesis Data Analytics (Flink) transforms, enriches, and filters → Kinesis Data Firehose buffers and delivers → S3 data lake. Classic architecture for near-real-time analytics pipelines.
Real-Time Leaderboard / Aggregation
high freqGame events or IoT readings stream via Kinesis Data Streams → Kinesis Data Analytics computes windowed aggregations (top scores, running averages) → results written to DynamoDB for low-latency reads by application tier.
Real-Time Anomaly Detection and Alerting
high freqKinesis Data Analytics processes live stream with anomaly detection logic → anomalous records sent to Lambda → Lambda publishes alert to SNS. Used for fraud detection, security monitoring, and operational alerting.
Kafka-to-Warehouse Analytics
medium freqMSK (Kafka) as event source → Kinesis Data Analytics Flink application performs windowed aggregations → results delivered to Redshift via Firehose for BI and reporting. Bridges Kafka ecosystems with AWS analytics.
Real-Time Log Analytics Dashboard
medium freqApplication logs streamed via Kinesis Data Streams → Kinesis Data Analytics enriches and structures log events → OpenSearch Service for indexing and Kibana dashboards. Common for operational intelligence.
IoT Telemetry Processing
medium freqIoT devices publish to AWS IoT Core → IoT Rules route to Kinesis Data Streams → Kinesis Data Analytics applies time-series windowing and threshold detection → alerts or aggregated data to S3.
Kinesis Data Analytics for Apache Flink is the CURRENT and RECOMMENDED service. The original SQL-based offering is considered legacy. Exam questions about 'real-time stream processing with stateful operations' point to Flink.
Kinesis Data Analytics does NOT replace Kinesis Data Streams or Kinesis Data Firehose — it SITS BETWEEN them as the processing layer. Know the full pipeline: ingest (KDS/MSK) → process (KDA) → deliver (Firehose/S3/Redshift).
When a scenario asks for real-time analytics WITH stateful processing (e.g., session tracking, exactly-once semantics, fault-tolerant aggregations), the answer is Kinesis Data Analytics for Apache Flink — not Kinesis Data Streams alone.
KDA is a PROCESSING layer, not a replacement for Kinesis Data Streams — you need both: KDS ingests, KDA analyzes. Never select KDA as a standalone ingestion solution.
When a scenario requires stateful stream processing, exactly-once semantics, or fault-tolerant real-time analytics — the answer is Kinesis Data Analytics for Apache Flink, not the legacy SQL application.
KDA is STREAMING only — any question mentioning historical data, batch processing, or data at rest should lead you away from KDA toward Glue, Athena, or EMR.
Kinesis Data Analytics Studio uses Apache Zeppelin notebooks backed by Flink. It is ideal for INTERACTIVE development and exploration of streaming data, but notebook clusters should be stopped when not in use to control costs.
For the DVA-C02 exam: understand that Flink applications use checkpoints (automatic, periodic) and savepoints (manual, on-demand) for fault tolerance and stateful restart. Snapshots are stored in S3.
Kinesis Data Analytics can output to Kinesis Data Streams, enabling CHAINED pipelines where the output of one analytics application becomes the input of another. This is a tested architecture pattern.
For CLF-C02: categorize Kinesis Data Analytics under 'Analytics' services. Know it processes STREAMING (real-time) data, distinguishing it from batch services like AWS Glue (ETL) or Amazon EMR (big data batch).
You are NOT billed for a Flink application in READY/stopped state — only for RUNNING applications consuming KPUs. This is a cost optimization fact that appears in scenario-based pricing questions.
Common Mistake
Kinesis Data Analytics replaces Kinesis Data Streams — you only need one of them for real-time processing
Correct
Kinesis Data Analytics is a PROCESSING layer that sits on top of Kinesis Data Streams (or MSK). You need both: KDS to ingest and buffer the stream, and KDA to analyze it. They serve different, complementary roles.
This is the #1 architectural confusion on the SAA-C03 exam. Think of it as: KDS = the pipe, KDA = the brain processing what flows through the pipe. Neither replaces the other.
Common Mistake
Kinesis Data Analytics can process data stored in S3 or databases directly (batch processing)
Correct
Kinesis Data Analytics is designed exclusively for STREAMING data sources (Kinesis Data Streams, Kinesis Data Firehose, Amazon MSK). For batch processing of data at rest in S3, use AWS Glue, Amazon Athena, or Amazon EMR.
Candidates confuse the 'analytics' in the name with general-purpose analytics. The service is stream-only. When you see 'historical data' or 'data at rest' in a question, Kinesis Data Analytics is wrong answer.
Common Mistake
The SQL-based Kinesis Data Analytics and Flink-based Kinesis Data Analytics are interchangeable with identical capabilities
Correct
The SQL-based application is the LEGACY offering with limited capabilities. Apache Flink is the current, recommended runtime with support for stateful processing, exactly-once semantics, multiple languages (Java, Scala, Python), and richer connector ecosystem. AWS recommends migrating SQL applications to Flink.
Exam questions about advanced features like exactly-once delivery, session windows, and stateful aggregations should lead you to Flink. SQL applications lack these capabilities and are being phased out.
Common Mistake
Kinesis Data Analytics automatically handles all scaling with no limits — you can process unlimited throughput
Correct
Kinesis Data Analytics scales automatically within account-level KPU quotas. If your application needs more KPUs than the default quota allows, you must request a limit increase via AWS Support. Unbounded scaling is a myth.
Quota awareness is tested on SAA-C03. Always recommend requesting limit increases proactively for production workloads, and design with quotas in mind.
Common Mistake
Kinesis Data Analytics Studio notebooks are free to use for development because they're just a development tool
Correct
Kinesis Data Analytics Studio notebooks incur KPU charges while running, even during development. You are billed for the compute resources consumed by the underlying Flink cluster backing the notebook. Always stop notebooks when not actively developing.
This is a common cost management trap in DVA-C02 scenarios. 'Development environments are free' is never true in AWS when compute is involved.
KDA = Kitchen Disposal for Analytics: raw streaming data goes IN, clean processed insights come OUT — but you still need the pipes (Kinesis Data Streams) to bring the data to the disposal
FLINK = Fault-tolerant, Low-latency, Intelligent, aNalytics worKflow — the 5 reasons to choose Flink over legacy SQL in KDA
KDA Pipeline: Ingest (KDS/MSK) → Analyze (KDA) → Deliver (Firehose/S3/Redshift) — remember I-A-D like 'I Analyzed Data'
CertAI Tutor · SAA-C03, DVA-C02, CLF-C02 · 2026-02-22