
Cargando...
Your unified monitoring, logging, and alerting backbone for every AWS workload
Amazon CloudWatch is AWS's native observability service that collects metrics, logs, events, and traces from virtually every AWS service and custom application. It enables you to set alarms, visualize operational data in dashboards, automatically react to changes, and troubleshoot issues — all from a single pane of glass. CloudWatch is the foundational monitoring layer that feeds into automation, security, and operational excellence across the AWS ecosystem.
Provide unified operational visibility into AWS resources, applications, and on-premises infrastructure through metrics, logs, alarms, dashboards, and automated actions.
Use When
Avoid When
CloudWatch Metrics (Standard)
Built-in metrics from 70+ AWS services, 1-minute granularity by default
CloudWatch Custom Metrics
Publish via PutMetricData API or CloudWatch Agent; supports standard (1-min) and high-resolution (1-sec)
CloudWatch Alarms
Threshold, anomaly detection, and metric math-based alarms with M-of-N evaluation
CloudWatch Logs
Ingest, store, search, and stream logs from AWS services, EC2, Lambda, on-premises, and containers
CloudWatch Logs Insights
Interactive query language for log analysis; supports aggregation, filtering, pattern matching
CloudWatch Dashboards
Cross-account, cross-region, shareable dashboards with automatic or custom refresh
CloudWatch Anomaly Detection
ML-based baseline modeling for metrics; auto-adjusts for seasonality and trends
CloudWatch Contributor Insights
Identify top contributors to metric changes (e.g., top callers, top error sources)
CloudWatch Synthetics
Canary scripts that simulate user journeys to proactively monitor endpoints and APIs
CloudWatch ServiceLens
Unified view integrating CloudWatch, X-Ray traces, and logs for service health maps
CloudWatch Internet Monitor
Monitor internet-facing application availability and performance from AWS backbone perspective
CloudWatch Evidently
Feature flagging and A/B testing with metrics-based evaluation
CloudWatch RUM (Real User Monitoring)
Client-side JavaScript snippet captures real browser performance and errors
Cross-Account Observability
Share metrics, logs, and traces across AWS accounts in an organization
Metric Streams
Continuously stream CloudWatch metrics to Kinesis Data Firehose and then to third-party tools (Datadog, Splunk, etc.)
CloudWatch Agent
Unified agent for EC2 and on-premises; collects system-level metrics (memory, disk) and custom logs
Embedded Metric Format (EMF)
Structured JSON log format that CloudWatch automatically extracts as metrics — zero additional API calls
Log Subscriptions (real-time)
Stream logs to Lambda, Kinesis Data Streams, or Kinesis Firehose in near real-time
CloudWatch Logs Live Tail
Real-time streaming view of incoming log events for active debugging sessions
Metric Math
Create new time-series by applying math functions across multiple metrics
Composite Alarms
Combine multiple alarms with AND/OR logic to reduce alarm noise
CloudWatch Application Signals
Auto-instrument applications with SLOs/SLIs using OpenTelemetry-compatible instrumentation
API Audit Trail → CloudWatch Logs Alarm
high freqCloudTrail delivers management events to a CloudWatch Logs log group. CloudWatch metric filters extract specific API call patterns (e.g., root login, security group changes) and trigger alarms. This is the standard pattern for real-time security alerting — but remember: CloudTrail is the audit source, CloudWatch is the alerting mechanism.
Serverless Observability via EMF + CloudWatch Logs
high freqLambda automatically publishes invocation metrics (Duration, Errors, Throttles, ConcurrentExecutions) to CloudWatch. Custom application metrics can be embedded in structured JSON logs using Embedded Metric Format (EMF) — CloudWatch extracts them as metrics without additional PutMetricData API calls, reducing cost and latency.
CloudWatch Alarms → EventBridge → Automated Remediation
high freqCloudWatch Alarms can send events to EventBridge (formerly CloudWatch Events). EventBridge rules then route to Lambda for auto-remediation, Step Functions for orchestrated responses, or Systems Manager Automation for runbook execution. This is the preferred modern pattern over direct alarm-to-Lambda.
Alarm Notification Fan-Out
high freqCloudWatch Alarms trigger SNS topics on state change (ALARM, OK, INSUFFICIENT_DATA). SNS fans out to email, SMS, PagerDuty (via HTTPS), Lambda, and SQS simultaneously. This decouples alerting from response logic and enables multi-channel notification.
Dynamic Scaling Policies
high freqCloudWatch alarms on metrics like CPUUtilization or custom application metrics (requests per instance) trigger Auto Scaling actions. Target tracking policies create and manage CloudWatch alarms automatically. Step scaling and simple scaling require manually defined CloudWatch alarms.
CloudWatch Agent for Enhanced EC2 Monitoring
high freqThe CloudWatch Agent must be installed on EC2 instances to collect memory utilization, disk space, and process-level metrics — these are NOT available from the hypervisor and are NOT published by default. Agent configuration is managed via SSM Parameter Store for fleet-wide deployment.
Configuration Compliance + Metric Correlation
medium freqAWS Config detects configuration drift and compliance violations; CloudWatch monitors operational metrics. Together they provide both 'what changed' (Config) and 'what impact did it have' (CloudWatch). Config rules can trigger SNS → CloudWatch Logs for correlation.
CloudWatch ServiceLens: Unified Observability
medium freqCloudWatch ServiceLens integrates X-Ray traces with CloudWatch metrics and logs to create service maps showing latency, error rates, and throughput per service node. Use for root-cause analysis in microservices architectures. X-Ray provides the traces; CloudWatch provides the operational context.
CloudWatch Metric Streams → Third-Party Monitoring
medium freqMetric Streams continuously push CloudWatch metrics to Kinesis Data Firehose, which delivers to Datadog, Splunk, New Relic, or an S3 bucket. This is lower-latency and more scalable than polling GetMetricData APIs. Ideal for organizations with existing third-party observability investments.
OpsCenter + CloudWatch Alarms → Automated Runbooks
medium freqCloudWatch Alarms can create OpsItems in Systems Manager OpsCenter. SSM Automation runbooks can then execute remediation steps (e.g., restart service, resize instance) automatically. This closes the loop from detection to remediation without human intervention.
CloudWatch is for OPERATIONAL metrics and logs. CloudTrail is for API AUDIT trails. Never use CloudWatch Logs as a substitute for CloudTrail when the question asks 'who made this API call' or 'when was this resource modified by whom'.
EC2 memory utilization and disk space are NOT published to CloudWatch by default — the hypervisor cannot see inside the OS. You MUST install the CloudWatch Agent to collect these metrics. This is one of the most frequently tested EC2 monitoring facts.
CloudWatch metric data follows a rollup cascade: 1-second data survives only 3 hours, 1-minute data lasts 15 days, 5-minute data lasts 63 days, and 1-hour data lasts 15 months. Exam questions about 'why can't I see granular data from 3 weeks ago' test this rollup behavior.
CloudWatch Logs default retention is NEVER EXPIRE. This is a cost trap in real environments AND a compliance trap in exams. Always set explicit retention policies per log group. Maximum retention is 3653 days (10 years).
PutMetricData API calls with metrics that have DIFFERENT dimension combinations count as SEPARATE quota units — a single API call can consume multiple quota units. This is the root of the Ruby /api/v1/dependencies exam misconception.
CloudTrail = API audit (WHO/WHAT/WHEN). CloudWatch = operational monitoring. Never substitute one for the other in exam answers — they solve fundamentally different problems.
EC2 memory and disk metrics REQUIRE the CloudWatch Agent. Enabling Detailed Monitoring only increases frequency of existing metrics to 1-minute — it adds NO new metric types.
CloudWatch Logs never expire by default. Always set retention policies for cost control. Metric data rollup cascade: 1s=3hr, 1min=15d, 5min=63d, 1hr=15mo — know this sequence cold.
For composite alarm scenarios: use Composite Alarms (AND/OR logic across multiple alarms) to reduce alarm noise and avoid alert fatigue. A single Composite Alarm can represent the health of an entire application tier.
CloudWatch Alarms have THREE states: ALARM, OK, and INSUFFICIENT_DATA. INSUFFICIENT_DATA occurs when there is not enough data to evaluate the alarm — this is NOT the same as OK. Alarms can be configured to treat missing data as 'breaching', 'not breaching', 'ignore', or 'missing'.
High-resolution alarms can evaluate at 10-second or 30-second periods ONLY — not arbitrary sub-minute values. Standard alarms evaluate at multiples of 60 seconds minimum. Confusing these leads to wrong architecture choices in exam scenarios.
CloudWatch Logs Insights uses its own query language (not SQL, not CloudTrail syntax). Key commands: fields, filter, stats, sort, limit, parse. The 'stats' command with 'by' enables GROUP BY equivalent aggregations.
Embedded Metric Format (EMF) lets Lambda and other services emit custom metrics as structured JSON logs — CloudWatch automatically extracts them as real metrics. This avoids PutMetricData API calls and associated costs/throttling. Preferred pattern for high-volume Lambda metric publishing.
When a question asks about PROACTIVE monitoring of endpoints (simulating user behavior before real users are affected), the answer is CloudWatch Synthetics — not CloudWatch Alarms on existing metrics. Synthetics canaries run scripts that make real HTTP calls.
For cross-account monitoring in an AWS Organization, use CloudWatch cross-account observability (formerly called CloudWatch cross-account cross-region). A monitoring account can view metrics, logs, and traces from source accounts without requiring data replication.
Common Mistake
CloudWatch Logs can serve as the authoritative audit trail for 'who made API calls to AWS services'
Correct
AWS CloudTrail is the authoritative service for API auditing. CloudWatch Logs can RECEIVE CloudTrail events (via CloudTrail → CloudWatch Logs integration) for alerting and querying, but CloudTrail is the source of truth. CloudWatch alone cannot tell you who called DeleteBucket or who modified a security group.
This is the #1 most tested misconception. Exam questions deliberately describe a scenario requiring API audit history and include CloudWatch Logs as a distractor. Remember: CloudTrail = WHO did WHAT WHEN. CloudWatch = operational metrics and application logs.
Common Mistake
A single PutMetricData API call always counts as one API request against quotas, regardless of what metrics it contains
Correct
Each unique combination of metric name, namespace, and dimension set within a PutMetricData call counts separately against quotas. An API call publishing a metric for /api/v1/users and /api/v1/dependencies with different dimensions counts as multiple quota units — this is why high-cardinality dimension sets can rapidly exhaust API limits.
Directly maps to the Ruby /api/v1/dependencies exam question pattern. When designing metric publishing architectures, high-cardinality dimensions (like per-URL or per-user-ID) can cause unexpected throttling. Use Embedded Metric Format or aggregate dimensions to avoid this.
Common Mistake
CloudWatch automatically collects memory utilization and disk usage from EC2 instances
Correct
CloudWatch only receives metrics the EC2 hypervisor can observe externally: CPU utilization, network in/out, disk I/O operations, and status checks. Memory utilization, disk space used/available, and process-level metrics require the CloudWatch Agent installed inside the OS.
This appears constantly in SAA-C03 and SysOps scenarios. The key phrase in exam questions is 'memory utilization' or 'available disk space' — these always require the CloudWatch Agent. Distractor answers often include 'enable detailed monitoring' which only increases metric frequency to 1-minute, not memory metrics.
Common Mistake
CloudWatch Logs automatically expire after a set period to control costs
Correct
CloudWatch Logs groups have NO expiration by default — they retain logs indefinitely (never expire) until you explicitly configure a retention policy. This means costs grow unboundedly if not managed. You must set retention per log group (1 day to 3653 days).
In real environments this causes surprise bills. In exams, questions about cost optimization of logging almost always have 'configure log retention policies' as a correct answer. Never assume logs auto-delete.
Common Mistake
CloudWatch Alarms only have two states: ALARM and OK
Correct
CloudWatch Alarms have THREE states: ALARM (threshold breached), OK (within threshold), and INSUFFICIENT_DATA (not enough data points to evaluate). INSUFFICIENT_DATA is particularly important during initial alarm creation or when a metric stops being published. You can configure how missing data is treated.
Exam questions about alarm behavior during startup, instance termination, or metric gaps require understanding INSUFFICIENT_DATA. Missing data treatment options (breaching, not breaching, ignore, missing) are also tested.
Common Mistake
Enabling 'Detailed Monitoring' on EC2 gives you memory and disk metrics
Correct
Detailed Monitoring only changes the frequency of EXISTING hypervisor-visible metrics from 5-minute intervals to 1-minute intervals. It does NOT add new metric types. Memory and disk metrics still require the CloudWatch Agent regardless of monitoring mode.
This is a classic distractor. Exam questions pair 'detailed monitoring' and 'CloudWatch Agent' as answer choices. Detailed Monitoring = more frequent standard metrics. CloudWatch Agent = additional metric types (memory, disk).
Common Mistake
CloudWatch X-Ray integration means CloudWatch performs distributed tracing
Correct
AWS X-Ray performs distributed tracing. CloudWatch ServiceLens is a VISUALIZATION layer that integrates X-Ray trace data with CloudWatch metrics and logs to create service maps. CloudWatch itself does not generate or store traces — X-Ray does.
ServiceLens appears in exam questions as a distractor for X-Ray. Remember: X-Ray = tracing engine. CloudWatch ServiceLens = unified dashboard consuming X-Ray data.
Common Mistake
CloudWatch Logs Insights uses standard SQL syntax
Correct
CloudWatch Logs Insights uses its own proprietary query language with commands like fields, filter, stats, sort, limit, and parse. While conceptually similar to SQL, the syntax is different. The 'parse' command extracts custom fields from unstructured log text using glob or regex patterns.
Exam questions sometimes show query syntax and ask which service it belongs to. Recognizing Logs Insights syntax vs. Athena SQL vs. OpenSearch DSL is tested in DEA-C01 and DOP-C02.
MALD = CloudWatch pillars: Metrics, Alarms, Logs, Dashboards — the four core capabilities
CloudTrail = 'Trail of breadcrumbs WHO did WHAT' | CloudWatch = 'Watch the HEALTH of what's running' — never swap these
Memory Needs Agent (MNA): Memory, Network process-level, and Application disk metrics all Need the Agent — they're invisible to the hypervisor
Rollup Cascade: '3-15-63-15' = 3 hours (1s), 15 days (1min), 63 days (5min), 15 months (1hr) — memorize this sequence
Alarm States: AOI = ALARM (bad), OK (good), INSUFFICIENT_DATA (unknown) — three states, not two
Default = Never Expire: CloudWatch Logs default retention is 'never expire' — think of it as a hoarder who never throws anything away until you tell them to
CertAI Tutor · SAP-C02, DEA-C01, DOP-C02, SAA-C03, DVA-C02, SCS-C02, AIF-C01, CLF-C02 · 2026-02-21
In the Same Category
Comparisons
Guides & Patterns