monitoringSAA-C03DOP-C02SAP-C02DVA-C02DEA-C01+2 more

AWS X-Ray: The Distributed Tracing Detective

End-to-end request tracing across microservices, Lambda, and APIs — see exactly where latency hides

Updated 2026-02-22

Overview

AWS X-Ray is a distributed tracing service that collects data about requests as they travel through your application, helping you analyze and debug production and distributed applications. It provides a visual service map showing the relationships between services, pinpoints performance bottlenecks, and identifies root causes of errors and latency issues. X-Ray works across EC2, ECS, Lambda, Elastic Beanstalk, API Gateway, and more — but requires SDK instrumentation in your application code to generate trace data.

Identify performance bottlenecks, errors, and root causes in distributed and microservices architectures by providing end-to-end request tracing with a visual service map.

Use When

Debugging latency issues in microservices where a single request touches multiple services (e.g., API Gateway → Lambda → DynamoDB → SNS)
Identifying which downstream service is causing errors or throttling in a complex distributed application
Analyzing performance of serverless applications built on AWS Lambda and API Gateway
Meeting compliance or operational requirements to trace and audit request flows across multi-tier applications
Optimizing application performance by visualizing service dependencies and identifying cold start impacts in Lambda

Avoid When

Infrastructure-level monitoring (use CloudWatch Metrics and Alarms instead — X-Ray is application-level, not infrastructure-level)
Auditing API calls made to AWS services for security/compliance (use CloudTrail instead — X-Ray traces application requests, not AWS control-plane API calls)
Log aggregation and search (use CloudWatch Logs or OpenSearch — X-Ray is not a log management tool)
Monitoring ML model performance or data drift in SageMaker (use SageMaker Model Monitor — X-Ray does not inspect model inference quality)

Key Features

Distributed Tracing with Trace IDs

A unique Trace ID is propagated via HTTP headers (X-Amzn-Trace-Id) across all services in a request path

Service Map (Visual Graph)

Auto-generated visual map of all services and their connections, color-coded by health status

Segments and Subsegments

Segments represent work done by a service; subsegments represent downstream calls (DB queries, HTTP calls, etc.)

Annotations (Indexed Key-Value Pairs)

Filterable metadata attached to segments; used in GetTraceSummaries filter expressions

Metadata (Non-Indexed Key-Value Pairs)

Rich contextual data not searchable/filterable; unlimited types but not indexed

Sampling Rules (Custom)

Define rules by service name, URL, HTTP method, host, and resource ARN to control what gets traced

X-Ray Daemon

A lightweight process that receives UDP traffic from SDKs and forwards to X-Ray API; required on EC2/ECS; built-in on Lambda

X-Ray SDK (Multiple Languages)

Available for Java, Python, Node.js, Ruby, Go, .NET; must be integrated into application code

X-Ray API

PutTraceSegments, PutTelemetryRecords, GetTraceSummaries, GetServiceGraph, GetTraceGraph

Lambda Active Tracing

Enable via Lambda console or SAM/CloudFormation; daemon runs automatically; no separate installation needed

API Gateway Integration

Enable tracing per stage in API Gateway; passes trace header to downstream Lambda or HTTP integrations

Elastic Beanstalk Integration

Enable via .ebextensions or console; daemon is pre-installed on Beanstalk platforms

ECS Integration

Run X-Ray daemon as a sidecar container in the same task definition

X-Ray Groups

Filter expression-based groups; can have CloudWatch alarms on error/fault/throttle rates per group

CloudWatch ServiceLens Integration

ServiceLens in CloudWatch combines X-Ray traces with CloudWatch metrics and logs for unified observability

AWS Distro for OpenTelemetry (ADOT)

AWS-supported distribution of OpenTelemetry that can send traces to X-Ray; recommended for new implementations

Cross-Account Tracing

Traces can span accounts when using ADOT or when services pass the trace header across account boundaries

Encryption

Trace data encrypted at rest using AWS managed keys or customer-managed KMS keys

Filter Expressions

Query traces by annotation values, response time, HTTP status, service name, etc.

Fault vs Error vs Throttle Classification

Faults = 5xx (server-side); Errors = 4xx (client-side); Throttle = 429 (subset of errors)

Integration Patterns

Serverless Active Tracing

high freq

AWS X-RayAWS Lambda

Enable Active Tracing on Lambda function (TracingConfig: Active). X-Ray daemon runs automatically in the Lambda execution environment — no sidecar needed. The SDK instruments the function handler and downstream calls. Lambda generates two segments: Initialization and Invocation. Critical for debugging cold starts and downstream latency.

API Gateway Stage-Level Tracing

high freq

AWS X-RayAmazon API Gateway

Enable X-Ray tracing at the API Gateway stage level. API Gateway creates a segment for each request and passes the X-Amzn-Trace-Id header downstream. IMPORTANT: This only traces the API Gateway portion and what it calls — it does NOT automatically trace all downstream services unless they also instrument with X-Ray SDK. Common exam trap: candidates think enabling API Gateway tracing covers the entire application.

CloudWatch ServiceLens Unified Observability

high freq

AWS X-RayAmazon CloudWatch

CloudWatch ServiceLens integrates X-Ray service maps with CloudWatch metrics and logs into a single pane of glass. You can navigate from a CloudWatch alarm → ServiceLens service map → individual X-Ray traces. X-Ray Groups can trigger CloudWatch alarms on error/fault rates. This is the primary pattern for operations teams needing unified observability.

Sidecar Daemon Container Pattern

high freq

AWS X-RayAmazon ECS

Deploy X-Ray daemon as a sidecar container in the same ECS task definition as your application container. Application containers send UDP traffic to the daemon on port 2000 using the container's local network. The daemon forwards batched segments to X-Ray API. Task IAM role must include xray:PutTraceSegments and xray:PutTelemetryRecords permissions.

Beanstalk Built-in X-Ray Daemon

high freq

AWS X-RayAWS Elastic Beanstalk

X-Ray daemon is pre-installed on Elastic Beanstalk platforms. Enable via the Beanstalk console (Software configuration) or .ebextensions config file. Application must still use X-Ray SDK for instrumentation — enabling the daemon alone does not create traces. Common in exam scenarios asking about the easiest way to add tracing to an existing Beanstalk application.

Complementary Observability (NOT Replacements)

high freq

AWS X-RayAWS CloudTrail

X-Ray traces application-level request flows (what your code does). CloudTrail records AWS API calls (who did what to AWS resources). They serve completely different purposes and complement each other. A common exam scenario asks which service to use for application debugging vs. security auditing — X-Ray for app tracing, CloudTrail for API audit.

Event-Driven Tracing Correlation

medium freq

AWS X-RayAmazon EventBridge

When using EventBridge in event-driven architectures, X-Ray can trace the producer service and consumer Lambda functions independently. Trace context must be manually propagated in event payloads if you need end-to-end correlation across EventBridge boundaries. ADOT (OpenTelemetry) provides better support for this pattern.

Compliance and Configuration Audit (Separate Concerns)

medium freq

AWS X-RayAWS Config

AWS Config tracks resource configuration changes over time. X-Ray traces request flows through applications. They are NOT interchangeable. Exam questions sometimes present Config as an option for application tracing — it is not. Config is for configuration compliance; X-Ray is for performance/error tracing.

Modern Instrumentation with ADOT

medium freq

AWS X-RayAWS Distro for OpenTelemetry (ADOT)

ADOT is the AWS-supported, vendor-neutral alternative to X-Ray SDK for instrumentation. It can send traces to X-Ray, Prometheus, and other backends. AWS recommends ADOT for new workloads. ADOT Lambda Layer available for Lambda functions. Supports cross-account and cross-region tracing more naturally than native X-Ray SDK.

Service Limits & Quotas

LimitValueNote

Trace retention period

30 days days

Candidates often assume X-Ray retains data as long as CloudWatch Logs (configurable). X-Ray retention is fixed at 30 days.

Default sampling rate

1 request per second + 5% of additional requests per reservoir

Many candidates think X-Ray traces 100% of requests by default — it does NOT. Sampling is applied by default to control cost and overhead.

Maximum segment document size

64 KB KB

If a segment document exceeds 64 KB, X-Ray will reject it. Keep annotations and metadata lean.

Annotations per trace

50 annotations

Annotations (indexed, filterable) vs Metadata (not indexed, not filterable) is a critical distinction tested on DVA-C02 and DOP-C02.

X-Ray daemon UDP port

2000 UDP

The X-Ray daemon listens on UDP port 2000. Application SDKs send segment data to the daemon, which then batches and forwards to X-Ray API. Security groups must allow this traffic.

Segment fields: annotation value max length

1,000 characters characters

Annotation values are capped at 1,000 characters. Use metadata for larger values, but remember metadata is not searchable.

GetTraceSummaries API — max results per call

Paginated (use NextToken)

Results are paginated. Always implement NextToken handling when querying large trace datasets programmatically.

Sampling rules per account

25 rules

You can define up to 25 custom sampling rules per account to fine-tune which requests are traced (by URL, HTTP method, service name, etc.).

Groups per account

25 groups

X-Ray Groups allow you to filter traces using filter expressions and set CloudWatch alarms on group metrics. Up to 25 groups per account.

Pricing Model

Pay-per-use based on traces recorded and retrieved

First 100,000 traces recorded per month are FREE; beyond that, charged per million traces recorded
First 1,000,000 traces retrieved (scanned) per month are FREE; beyond that, charged per million traces scanned
No upfront costs, no minimum fees — pure pay-as-you-go
Sampling significantly reduces cost — the default sampling rule means you typically only trace a fraction of total requests
X-Ray daemon itself has no additional charge; costs are based on trace data volume sent to X-Ray service
CloudWatch ServiceLens usage may incur separate CloudWatch charges for metrics and logs

Exam Tips

criticalSDK Instrumentation vs Service-Level Enabling

X-Ray requires SDK instrumentation in your application code — enabling X-Ray at the service level (API Gateway, Lambda) alone is NOT sufficient to trace your custom application logic. You must import and configure the X-Ray SDK in your code to create custom segments and subsegments.

criticalAnnotations vs Metadata

Know the difference between Annotations and Metadata: Annotations are indexed key-value pairs (max 50 per trace, values up to 1,000 chars) that can be used in filter expressions to search traces. Metadata is non-indexed and cannot be used for searching. If a question asks how to search/filter traces by a custom business attribute, the answer is Annotations.

criticalECS Sidecar Pattern and IAM Permissions

For ECS, the X-Ray daemon must run as a SIDECAR container in the same task definition. The application container communicates with the daemon over UDP port 2000. The task IAM role (not the instance profile) must have xray:PutTraceSegments and xray:PutTelemetryRecords permissions.

criticalLambda Active Tracing

Lambda with X-Ray: When Active Tracing is enabled, Lambda automatically runs the X-Ray daemon — you do NOT need to configure or run it yourself. However, you still need the X-Ray SDK in your function code to create custom subsegments and add annotations/metadata to traces.

criticalDefault Sampling Behavior

Sampling is ON by default — X-Ray does NOT trace 100% of requests. The default rule is 1 request/second (reservoir) + 5% of additional requests. For exam questions about cost optimization or reducing overhead, sampling rules are the answer. For compliance requiring 100% tracing, you must configure a custom sampling rule with a fixed rate of 100%.

critical

X-Ray requires SDK instrumentation in application code — enabling the daemon or toggling X-Ray at the service level (API Gateway, Lambda console) is NOT enough to trace custom application logic. You must use the X-Ray SDK or ADOT in your code.

critical

CloudTrail ≠ X-Ray: CloudTrail audits AWS API calls (security/compliance); X-Ray traces application request flows (performance/debugging). They are complementary, never interchangeable. If the question asks about debugging microservices latency or errors, the answer is X-Ray.

critical

Annotations are indexed and searchable (use for filtering traces); Metadata is not indexed (use for rich context). Max 50 annotations per trace. This distinction appears directly in exam questions about finding traces by custom business attributes.

importantX-Ray Groups and CloudWatch Integration

X-Ray Groups allow you to create filtered subsets of traces using filter expressions, and you can configure CloudWatch alarms on the error rate, fault rate, and response time for each group. This is the pattern for alerting on specific service or endpoint degradation.

importantTrace Header Propagation

The X-Amzn-Trace-Id HTTP header carries the Trace ID, Parent ID, and Sampling decision across service boundaries. If your custom HTTP service does not read and forward this header, the trace chain breaks and you get disconnected traces. The X-Ray SDK handles this automatically for supported frameworks.

importantError Classification

Faults vs Errors vs Throttles classification: 5xx responses = Faults (server-side problems); 4xx responses = Errors (client-side problems); 429 Too Many Requests = Throttle (a specific subset of Errors). The X-Ray service map color-codes nodes by these categories. Exam questions may ask which category a specific HTTP status falls into.

importantElastic Beanstalk Integration

For Elastic Beanstalk, you can enable X-Ray via the console (Software configuration → X-Ray daemon) or via .ebextensions. The daemon is pre-installed — you just need to enable it AND instrument your application code with the SDK. This is the fastest path to add tracing to an existing Beanstalk app.

importantCloudWatch ServiceLens

CloudWatch ServiceLens is the unified observability feature that combines X-Ray service maps, CloudWatch metrics, and CloudWatch Logs. When an exam question asks about a single dashboard or console view combining traces, metrics, and logs for a microservices application, the answer is CloudWatch ServiceLens (powered by X-Ray).

importantData Retention and Export

X-Ray trace data is retained for exactly 30 days — this is fixed and cannot be changed. If you need longer retention, use the GetTraceSummaries and BatchGetTraces APIs to export data to S3, then analyze with Athena. This is a common architecture question for compliance-driven organizations.

Good to KnowAI/ML Observability Boundaries

For AI/ML workloads (AIF-C01 relevance): X-Ray is NOT used to monitor model performance, data drift, or inference quality for Bedrock or SageMaker models. Use SageMaker Model Monitor for model quality. X-Ray can trace the application code that calls Bedrock/SageMaker APIs as a downstream service, but it cannot inspect what the model does internally.

Common Misconceptions & Traps

Common Mistake

CloudTrail provides application-level request tracing and can replace X-Ray for debugging microservices latency issues.

Correct

CloudTrail records AWS API calls (control-plane actions) for security auditing — who called which AWS API, when, from where. It has NO visibility into application-level request flows, latency between your microservices, or custom business logic. X-Ray is the correct service for distributed application tracing.

This is the #1 misconception on exams. Remember: CloudTrail = WHO did WHAT to AWS (security audit log). X-Ray = HOW a request traveled through YOUR application (performance and debugging). They answer completely different questions and are complementary, not interchangeable.

Common Mistake

Enabling X-Ray tracing on API Gateway automatically traces the entire application flow including all downstream Lambda functions, databases, and external APIs.

Correct

Enabling tracing on API Gateway only creates an X-Ray segment for the API Gateway portion of the request. Each downstream service (Lambda, EC2, ECS) must independently have X-Ray enabled and SDK-instrumented to contribute to the same trace. The trace ID is propagated via the X-Amzn-Trace-Id header, but downstream services must be configured to use it.

Exam questions often describe a multi-tier app and ask why traces are incomplete. The answer is always that downstream services need their own X-Ray instrumentation. API Gateway tracing is not a magic 'trace everything' switch — it's just the entry point.

Common Mistake

The X-Ray daemon or Systems Manager Agent automatically instruments application code without any SDK changes.

Correct

The X-Ray daemon is a network proxy that collects UDP data from the X-Ray SDK and forwards it to the X-Ray service — it does NOT instrument your code. The SSM Agent has absolutely nothing to do with X-Ray. You MUST modify your application code to import and use the X-Ray SDK (or ADOT) to generate trace segments.

Candidates confuse 'installing the daemon' with 'enabling tracing.' The daemon is infrastructure; the SDK is what creates trace data. Without SDK instrumentation, the daemon has nothing to forward. SSM Agent appearing as a distractor answer is a known exam trap.

Common Mistake

X-Ray Annotations and Metadata are both searchable and can both be used in filter expressions to find specific traces.

Correct

ONLY Annotations are indexed and searchable via filter expressions (e.g., annotation.user_id = '12345'). Metadata is stored with the trace but is NOT indexed and CANNOT be used in filter expressions or GetTraceSummaries queries. Use Annotations when you need to search/filter; use Metadata for rich context that doesn't need to be queried.

This distinction appears directly in DVA-C02 and DOP-C02 questions. The limit of 50 annotations per trace forces you to be selective. A common scenario: 'How do you find all traces for a specific customer ID?' — Add customer ID as an Annotation, not Metadata.

Common Mistake

X-Ray automatically traces 100% of all requests, providing complete visibility into every transaction.

Correct

X-Ray uses sampling by default. The default sampling rule records 1 request per second (the reservoir) plus 5% of additional requests beyond the reservoir. This is intentional to reduce overhead and cost. You can customize sampling rules (up to 25 per account) or set a 100% fixed rate if complete tracing is required (e.g., for compliance), but this increases cost and overhead.

Exam questions test whether you understand the cost/completeness tradeoff. If a question asks why some requests don't appear in X-Ray, sampling is almost always the answer. If a question asks how to ensure every request is traced, the answer is a custom sampling rule with fixed_rate=1.0 (100%).

Common Mistake

SageMaker Model Monitor or Amazon CloudWatch can be used to trace and debug Bedrock model invocations the same way X-Ray traces application requests.

Correct

SageMaker Model Monitor monitors data quality, model quality, bias drift, and feature attribution drift for SageMaker-hosted models — not Bedrock. CloudWatch monitors infrastructure metrics. X-Ray can trace the APPLICATION CODE that invokes Bedrock APIs (as a downstream HTTP call), but cannot inspect what happens inside the model. For Bedrock-specific observability, use CloudWatch metrics for Bedrock and application-level X-Ray tracing for the calling service.

This is a specific trap for AIF-C01 and DVA-C02 candidates working with generative AI. The key insight: X-Ray sees Bedrock as an external HTTP endpoint — it can measure latency and success/failure of the API call, but has no visibility into model internals, token usage details, or inference quality.

Memory Tricks

🧠

SAFE = Segments (work done by a service), Annotations (indexed, filterable), Faults (5xx), Errors (4xx) — the four core X-Ray concepts

🧠

DAM = Daemon collects, Annotations are indexed, Metadata is not — remember which one you can search

🧠

CATS = CloudTrail for API audit, X-Ray (Xray) for Application Tracing in Services — never confuse their purposes

🧠

The X-Ray daemon is like a MAILBOX: your SDK drops letters (segments) in it via UDP, and the daemon posts them to AWS. The mailbox doesn't write the letters — your code (SDK) does.

🧠

Reservoir + Rate = X-Ray sampling: think of the reservoir as a 'guaranteed slots' bucket (1/sec default) and the rate as a 'lottery' for the rest (5% default)

CertAI Tutor · SAA-C03, DOP-C02, SAP-C02, DVA-C02, DEA-C01, AIF-C01, CLF-C02 · 2026-02-22

Ready to test your knowledge?

Practice SAA-C03, DOP-C02, SAP-C02, DVA-C02, DEA-C01, AIF-C01, CLF-C02 exam questions with AI-powered explanations — free to start.

AWS X-Ray: The Distributed Tracing Detective

Overview

Key Features

Integration Patterns

Service Limits & Quotas

Pricing Model

Exam Tips

Common Misconceptions & Traps

Memory Tricks

Ready to test your knowledge?

Related Cheat Sheets