monitoringSAP-C02DEA-C01DOP-C02SAA-C03DVA-C02+3 more

Amazon CloudWatch: The Observability Command Center

Your unified monitoring, logging, and alerting backbone for every AWS workload

Updated 2026-02-21

Overview

Amazon CloudWatch is AWS's native observability service that collects metrics, logs, events, and traces from virtually every AWS service and custom application. It enables you to set alarms, visualize operational data in dashboards, automatically react to changes, and troubleshoot issues — all from a single pane of glass. CloudWatch is the foundational monitoring layer that feeds into automation, security, and operational excellence across the AWS ecosystem.

Provide unified operational visibility into AWS resources, applications, and on-premises infrastructure through metrics, logs, alarms, dashboards, and automated actions.

Use When

Monitoring AWS resource utilization (CPU, memory via agent, disk, network) and setting threshold-based alarms
Centralizing application and infrastructure logs for search, filtering, and retention using CloudWatch Logs
Creating event-driven automation by triggering Lambda, SNS, or Systems Manager actions when metric thresholds are breached
Building operational dashboards that aggregate cross-account and cross-region metrics for executive or NOC visibility
Detecting anomalies in metric patterns using CloudWatch Anomaly Detection powered by ML
Tracking custom business KPIs by publishing custom metrics from applications via the PutMetricData API

Avoid When

Auditing WHO made API calls to AWS services — use AWS CloudTrail instead; CloudWatch Logs can receive CloudTrail events but CloudTrail is the authoritative API audit source
Deep distributed tracing across microservices with flame graphs and service maps — use AWS X-Ray for end-to-end request tracing; CloudWatch ServiceLens wraps X-Ray but X-Ray is the tracing engine
Long-term cold storage of logs at low cost — archive to S3 via Logs subscription filters and use S3 Glacier for cost-effective retention beyond your CloudWatch retention window
Real-time streaming analytics on log data at high throughput — use Amazon Kinesis Data Streams or Amazon OpenSearch Service for sub-second analytics pipelines

Key Features

CloudWatch Metrics (Standard)

Built-in metrics from 70+ AWS services, 1-minute granularity by default

CloudWatch Custom Metrics

Publish via PutMetricData API or CloudWatch Agent; supports standard (1-min) and high-resolution (1-sec)

CloudWatch Alarms

Threshold, anomaly detection, and metric math-based alarms with M-of-N evaluation

CloudWatch Logs

Ingest, store, search, and stream logs from AWS services, EC2, Lambda, on-premises, and containers

CloudWatch Logs Insights

Interactive query language for log analysis; supports aggregation, filtering, pattern matching

CloudWatch Dashboards

Cross-account, cross-region, shareable dashboards with automatic or custom refresh

CloudWatch Anomaly Detection

ML-based baseline modeling for metrics; auto-adjusts for seasonality and trends

CloudWatch Contributor Insights

Identify top contributors to metric changes (e.g., top callers, top error sources)

CloudWatch Synthetics

Canary scripts that simulate user journeys to proactively monitor endpoints and APIs

CloudWatch ServiceLens

Unified view integrating CloudWatch, X-Ray traces, and logs for service health maps

CloudWatch Internet Monitor

Monitor internet-facing application availability and performance from AWS backbone perspective

CloudWatch Evidently

Feature flagging and A/B testing with metrics-based evaluation

CloudWatch RUM (Real User Monitoring)

Client-side JavaScript snippet captures real browser performance and errors

Cross-Account Observability

Share metrics, logs, and traces across AWS accounts in an organization

Metric Streams

Continuously stream CloudWatch metrics to Kinesis Data Firehose and then to third-party tools (Datadog, Splunk, etc.)

CloudWatch Agent

Unified agent for EC2 and on-premises; collects system-level metrics (memory, disk) and custom logs

Embedded Metric Format (EMF)

Structured JSON log format that CloudWatch automatically extracts as metrics — zero additional API calls

Log Subscriptions (real-time)

Stream logs to Lambda, Kinesis Data Streams, or Kinesis Firehose in near real-time

CloudWatch Logs Live Tail

Real-time streaming view of incoming log events for active debugging sessions

Metric Math

Create new time-series by applying math functions across multiple metrics

Composite Alarms

Combine multiple alarms with AND/OR logic to reduce alarm noise

CloudWatch Application Signals

Auto-instrument applications with SLOs/SLIs using OpenTelemetry-compatible instrumentation

Integration Patterns

API Audit Trail → CloudWatch Logs Alarm

high freq

Amazon CloudWatchAWS CloudTrail

CloudTrail delivers management events to a CloudWatch Logs log group. CloudWatch metric filters extract specific API call patterns (e.g., root login, security group changes) and trigger alarms. This is the standard pattern for real-time security alerting — but remember: CloudTrail is the audit source, CloudWatch is the alerting mechanism.

Serverless Observability via EMF + CloudWatch Logs

high freq

Amazon CloudWatchAWS Lambda

Lambda automatically publishes invocation metrics (Duration, Errors, Throttles, ConcurrentExecutions) to CloudWatch. Custom application metrics can be embedded in structured JSON logs using Embedded Metric Format (EMF) — CloudWatch extracts them as metrics without additional PutMetricData API calls, reducing cost and latency.

CloudWatch Alarms → EventBridge → Automated Remediation

high freq

Amazon CloudWatchAmazon EventBridge

CloudWatch Alarms can send events to EventBridge (formerly CloudWatch Events). EventBridge rules then route to Lambda for auto-remediation, Step Functions for orchestrated responses, or Systems Manager Automation for runbook execution. This is the preferred modern pattern over direct alarm-to-Lambda.

Alarm Notification Fan-Out

high freq

Amazon CloudWatchAmazon SNS

CloudWatch Alarms trigger SNS topics on state change (ALARM, OK, INSUFFICIENT_DATA). SNS fans out to email, SMS, PagerDuty (via HTTPS), Lambda, and SQS simultaneously. This decouples alerting from response logic and enables multi-channel notification.

Dynamic Scaling Policies

high freq

Amazon CloudWatchAWS Auto Scaling

CloudWatch alarms on metrics like CPUUtilization or custom application metrics (requests per instance) trigger Auto Scaling actions. Target tracking policies create and manage CloudWatch alarms automatically. Step scaling and simple scaling require manually defined CloudWatch alarms.

CloudWatch Agent for Enhanced EC2 Monitoring

high freq

Amazon CloudWatchAmazon EC2

The CloudWatch Agent must be installed on EC2 instances to collect memory utilization, disk space, and process-level metrics — these are NOT available from the hypervisor and are NOT published by default. Agent configuration is managed via SSM Parameter Store for fleet-wide deployment.

Configuration Compliance + Metric Correlation

medium freq

Amazon CloudWatchAWS Config

AWS Config detects configuration drift and compliance violations; CloudWatch monitors operational metrics. Together they provide both 'what changed' (Config) and 'what impact did it have' (CloudWatch). Config rules can trigger SNS → CloudWatch Logs for correlation.

CloudWatch ServiceLens: Unified Observability

medium freq

Amazon CloudWatchAWS X-Ray

CloudWatch ServiceLens integrates X-Ray traces with CloudWatch metrics and logs to create service maps showing latency, error rates, and throughput per service node. Use for root-cause analysis in microservices architectures. X-Ray provides the traces; CloudWatch provides the operational context.

CloudWatch Metric Streams → Third-Party Monitoring

medium freq

Amazon CloudWatchAmazon Kinesis Data Firehose

Metric Streams continuously push CloudWatch metrics to Kinesis Data Firehose, which delivers to Datadog, Splunk, New Relic, or an S3 bucket. This is lower-latency and more scalable than polling GetMetricData APIs. Ideal for organizations with existing third-party observability investments.

OpsCenter + CloudWatch Alarms → Automated Runbooks

medium freq

Amazon CloudWatchAWS Systems Manager

CloudWatch Alarms can create OpsItems in Systems Manager OpsCenter. SSM Automation runbooks can then execute remediation steps (e.g., restart service, resize instance) automatically. This closes the loop from detection to remediation without human intervention.

Service Limits & Quotas

LimitValueNote

Standard resolution metric minimum granularity

1 minute per data point

Many candidates assume all metrics are 1-second by default. Only high-resolution custom metrics (published with StorageResolution=1) achieve 1-second granularity.

High-resolution metric minimum granularity

1 second per data point

High-resolution alarms can be set at 10-second or 30-second periods — not arbitrary values. Standard alarms minimum period is 60 seconds.

CloudWatch Logs: maximum log event size

256 KB per event

The batch limit (PutLogEvents) is 1 MB total per call or 10,000 log events per call — these are separate constraints that both apply simultaneously.

CloudWatch Logs: PutLogEvents batch limit (events)

10,000 log events per batch

Commonly confused with the 256 KB per-event limit. Both limits apply independently.

CloudWatch Logs: PutLogEvents batch limit (size)

1 MB per batch

None

CloudWatch Logs: maximum log group retention

3653 days (10 years) configurable per log group

A very common exam trap: candidates assume logs auto-expire. They do NOT unless you configure a retention policy.

CloudWatch Logs: minimum log group retention

1 day per log group

None

CloudWatch Alarms: minimum evaluation period

10 seconds for high-resolution alarms only

You cannot set a 15-second evaluation period — only 10s, 30s, or any multiple of 60s are valid period values.

CloudWatch Alarms: maximum datapoints to alarm

Evaluated over up to 1440 data points within evaluation window

None

CloudWatch Dashboards: maximum widgets per dashboard

500 widgets per dashboard

None

CloudWatch Metrics: data retention for 1-second resolution

3 hours retention period

After 3 hours, 1-second data is aggregated to 1-minute resolution and retained for 15 days. After 15 days it rolls up to 5-minute resolution for 63 days, then 1-hour resolution for 15 months.

CloudWatch Metrics: data retention for 1-minute resolution

15 days retention period

This rollup cascade is heavily tested: 1s→3hr, 1min→15d, 5min→63d, 1hr→15mo. Candidates confuse these windows.

CloudWatch Metrics: data retention for 5-minute resolution

63 days retention period

None

CloudWatch Metrics: data retention for 1-hour resolution

15 months retention period

None

CloudWatch custom metrics: PutMetricData API limit

1,000 different combinations of metric name, namespace, and dimension per API call per PutMetricData request

A single API request containing metrics with different dimension combinations counts as multiple quota units — this directly maps to the exam misconception about Ruby /api/v1/dependencies counting as multiple units.

CloudWatch Logs Insights: maximum query duration

15 minutes per query execution

None

CloudWatch Logs Insights: maximum log groups per query

50 log groups per query

None

CloudWatch Contributor Insights: maximum rules

Refer to current service quotas page per account per region

Limits may vary by account type — always check the service quotas console for your specific account.

CloudWatch Synthetics: canary script timeout

Up to 15 minutes per canary run

None

Metric Math: maximum metrics per expression

10 metrics per math expression

None

CloudWatch Alarm actions per alarm

5 actions per alarm state (ALARM, OK, INSUFFICIENT_DATA)

None

Pricing Model

Pay-per-use across metrics, logs, alarms, dashboards, and API calls

First 10 custom metrics and 10 alarms are FREE under the AWS Free Tier (perpetual, not just 12 months)
Standard AWS service metrics (e.g., EC2 CPU, S3 request counts) are FREE — you only pay for custom metrics you publish
CloudWatch Logs charges for data ingestion (per GB), storage (per GB/month), and data scanned by Logs Insights queries (per GB scanned)
High-resolution metrics (sub-minute) cost more than standard 1-minute metrics — factor this into architecture decisions
CloudWatch Dashboards: first 3 dashboards with up to 50 metrics each are free; additional dashboards are charged per dashboard per month
CloudWatch Alarms: standard resolution alarms and high-resolution alarms have separate per-alarm monthly pricing
Metric Streams pricing is based on number of metric updates streamed — high-cardinality environments can generate significant cost
CloudWatch Synthetics charges per canary run — design canary frequency carefully for cost vs. detection speed tradeoff
CloudWatch RUM charges per event recorded from browser sessions
Log exports to S3 are free for the export itself, but S3 storage costs apply — a cost-effective archival strategy

Exam Tips

criticalCloudWatch vs CloudTrail distinction

CloudWatch is for OPERATIONAL metrics and logs. CloudTrail is for API AUDIT trails. Never use CloudWatch Logs as a substitute for CloudTrail when the question asks 'who made this API call' or 'when was this resource modified by whom'.

criticalCloudWatch Agent, EC2 default metrics

EC2 memory utilization and disk space are NOT published to CloudWatch by default — the hypervisor cannot see inside the OS. You MUST install the CloudWatch Agent to collect these metrics. This is one of the most frequently tested EC2 monitoring facts.

criticalMetric data retention and rollup

CloudWatch metric data follows a rollup cascade: 1-second data survives only 3 hours, 1-minute data lasts 15 days, 5-minute data lasts 63 days, and 1-hour data lasts 15 months. Exam questions about 'why can't I see granular data from 3 weeks ago' test this rollup behavior.

criticalCloudWatch Logs retention

CloudWatch Logs default retention is NEVER EXPIRE. This is a cost trap in real environments AND a compliance trap in exams. Always set explicit retention policies per log group. Maximum retention is 3653 days (10 years).

criticalPutMetricData quotas, dimension cardinality

PutMetricData API calls with metrics that have DIFFERENT dimension combinations count as SEPARATE quota units — a single API call can consume multiple quota units. This is the root of the Ruby /api/v1/dependencies exam misconception.

critical

CloudTrail = API audit (WHO/WHAT/WHEN). CloudWatch = operational monitoring. Never substitute one for the other in exam answers — they solve fundamentally different problems.

critical

EC2 memory and disk metrics REQUIRE the CloudWatch Agent. Enabling Detailed Monitoring only increases frequency of existing metrics to 1-minute — it adds NO new metric types.

critical

CloudWatch Logs never expire by default. Always set retention policies for cost control. Metric data rollup cascade: 1s=3hr, 1min=15d, 5min=63d, 1hr=15mo — know this sequence cold.

importantComposite Alarms, alarm noise reduction

For composite alarm scenarios: use Composite Alarms (AND/OR logic across multiple alarms) to reduce alarm noise and avoid alert fatigue. A single Composite Alarm can represent the health of an entire application tier.

importantAlarm states, missing data treatment

CloudWatch Alarms have THREE states: ALARM, OK, and INSUFFICIENT_DATA. INSUFFICIENT_DATA occurs when there is not enough data to evaluate the alarm — this is NOT the same as OK. Alarms can be configured to treat missing data as 'breaching', 'not breaching', 'ignore', or 'missing'.

importantHigh-resolution alarms, alarm periods

High-resolution alarms can evaluate at 10-second or 30-second periods ONLY — not arbitrary sub-minute values. Standard alarms evaluate at multiples of 60 seconds minimum. Confusing these leads to wrong architecture choices in exam scenarios.

importantLogs Insights query language

CloudWatch Logs Insights uses its own query language (not SQL, not CloudTrail syntax). Key commands: fields, filter, stats, sort, limit, parse. The 'stats' command with 'by' enables GROUP BY equivalent aggregations.

importantEMF, Lambda custom metrics

Embedded Metric Format (EMF) lets Lambda and other services emit custom metrics as structured JSON logs — CloudWatch automatically extracts them as real metrics. This avoids PutMetricData API calls and associated costs/throttling. Preferred pattern for high-volume Lambda metric publishing.

importantCloudWatch Synthetics, proactive monitoring

When a question asks about PROACTIVE monitoring of endpoints (simulating user behavior before real users are affected), the answer is CloudWatch Synthetics — not CloudWatch Alarms on existing metrics. Synthetics canaries run scripts that make real HTTP calls.

Good to KnowCross-account observability, AWS Organizations

For cross-account monitoring in an AWS Organization, use CloudWatch cross-account observability (formerly called CloudWatch cross-account cross-region). A monitoring account can view metrics, logs, and traces from source accounts without requiring data replication.

Common Misconceptions & Traps

Common Mistake

CloudWatch Logs can serve as the authoritative audit trail for 'who made API calls to AWS services'

Correct

AWS CloudTrail is the authoritative service for API auditing. CloudWatch Logs can RECEIVE CloudTrail events (via CloudTrail → CloudWatch Logs integration) for alerting and querying, but CloudTrail is the source of truth. CloudWatch alone cannot tell you who called DeleteBucket or who modified a security group.

This is the #1 most tested misconception. Exam questions deliberately describe a scenario requiring API audit history and include CloudWatch Logs as a distractor. Remember: CloudTrail = WHO did WHAT WHEN. CloudWatch = operational metrics and application logs.

Common Mistake

A single PutMetricData API call always counts as one API request against quotas, regardless of what metrics it contains

Correct

Each unique combination of metric name, namespace, and dimension set within a PutMetricData call counts separately against quotas. An API call publishing a metric for /api/v1/users and /api/v1/dependencies with different dimensions counts as multiple quota units — this is why high-cardinality dimension sets can rapidly exhaust API limits.

Directly maps to the Ruby /api/v1/dependencies exam question pattern. When designing metric publishing architectures, high-cardinality dimensions (like per-URL or per-user-ID) can cause unexpected throttling. Use Embedded Metric Format or aggregate dimensions to avoid this.

Common Mistake

CloudWatch automatically collects memory utilization and disk usage from EC2 instances

Correct

CloudWatch only receives metrics the EC2 hypervisor can observe externally: CPU utilization, network in/out, disk I/O operations, and status checks. Memory utilization, disk space used/available, and process-level metrics require the CloudWatch Agent installed inside the OS.

This appears constantly in SAA-C03 and SysOps scenarios. The key phrase in exam questions is 'memory utilization' or 'available disk space' — these always require the CloudWatch Agent. Distractor answers often include 'enable detailed monitoring' which only increases metric frequency to 1-minute, not memory metrics.

Common Mistake

CloudWatch Logs automatically expire after a set period to control costs

Correct

CloudWatch Logs groups have NO expiration by default — they retain logs indefinitely (never expire) until you explicitly configure a retention policy. This means costs grow unboundedly if not managed. You must set retention per log group (1 day to 3653 days).

In real environments this causes surprise bills. In exams, questions about cost optimization of logging almost always have 'configure log retention policies' as a correct answer. Never assume logs auto-delete.

Common Mistake

CloudWatch Alarms only have two states: ALARM and OK

Correct

CloudWatch Alarms have THREE states: ALARM (threshold breached), OK (within threshold), and INSUFFICIENT_DATA (not enough data points to evaluate). INSUFFICIENT_DATA is particularly important during initial alarm creation or when a metric stops being published. You can configure how missing data is treated.

Exam questions about alarm behavior during startup, instance termination, or metric gaps require understanding INSUFFICIENT_DATA. Missing data treatment options (breaching, not breaching, ignore, missing) are also tested.

Common Mistake

Enabling 'Detailed Monitoring' on EC2 gives you memory and disk metrics

Correct

Detailed Monitoring only changes the frequency of EXISTING hypervisor-visible metrics from 5-minute intervals to 1-minute intervals. It does NOT add new metric types. Memory and disk metrics still require the CloudWatch Agent regardless of monitoring mode.

This is a classic distractor. Exam questions pair 'detailed monitoring' and 'CloudWatch Agent' as answer choices. Detailed Monitoring = more frequent standard metrics. CloudWatch Agent = additional metric types (memory, disk).

Common Mistake

CloudWatch X-Ray integration means CloudWatch performs distributed tracing

Correct

AWS X-Ray performs distributed tracing. CloudWatch ServiceLens is a VISUALIZATION layer that integrates X-Ray trace data with CloudWatch metrics and logs to create service maps. CloudWatch itself does not generate or store traces — X-Ray does.

ServiceLens appears in exam questions as a distractor for X-Ray. Remember: X-Ray = tracing engine. CloudWatch ServiceLens = unified dashboard consuming X-Ray data.

Common Mistake

CloudWatch Logs Insights uses standard SQL syntax

Correct

CloudWatch Logs Insights uses its own proprietary query language with commands like fields, filter, stats, sort, limit, and parse. While conceptually similar to SQL, the syntax is different. The 'parse' command extracts custom fields from unstructured log text using glob or regex patterns.

Exam questions sometimes show query syntax and ask which service it belongs to. Recognizing Logs Insights syntax vs. Athena SQL vs. OpenSearch DSL is tested in DEA-C01 and DOP-C02.

Memory Tricks

🧠

MALD = CloudWatch pillars: Metrics, Alarms, Logs, Dashboards — the four core capabilities

🧠

CloudTrail = 'Trail of breadcrumbs WHO did WHAT' | CloudWatch = 'Watch the HEALTH of what's running' — never swap these

🧠

Memory Needs Agent (MNA): Memory, Network process-level, and Application disk metrics all Need the Agent — they're invisible to the hypervisor

🧠

Rollup Cascade: '3-15-63-15' = 3 hours (1s), 15 days (1min), 63 days (5min), 15 months (1hr) — memorize this sequence

🧠

Alarm States: AOI = ALARM (bad), OK (good), INSUFFICIENT_DATA (unknown) — three states, not two

🧠

Default = Never Expire: CloudWatch Logs default retention is 'never expire' — think of it as a hoarder who never throws anything away until you tell them to

CertAI Tutor · SAP-C02, DEA-C01, DOP-C02, SAA-C03, DVA-C02, SCS-C02, AIF-C01, CLF-C02 · 2026-02-21

Ready to test your knowledge?

Practice SAP-C02, DEA-C01, DOP-C02, SAA-C03, DVA-C02, SCS-C02, AIF-C01, CLF-C02 exam questions with AI-powered explanations — free to start.

Amazon CloudWatch: The Observability Command Center

Overview

Key Features

Integration Patterns

Service Limits & Quotas

Pricing Model

Exam Tips

Common Misconceptions & Traps

Memory Tricks

Ready to test your knowledge?

Related Cheat Sheets