ai-mlAIF-C01

AWS Amazon Bedrock AgentCore: Your AI Agent Execution Powerhouse

Fully managed runtime for deploying, scaling, and orchestrating production-grade AI agents on AWS

Updated 2026-03-09

Overview

Amazon Bedrock AgentCore is a fully managed service that provides a secure, scalable runtime environment for deploying and executing AI agents built with any framework. It handles the heavy lifting of containerized agent execution, session management, and invocation routing so developers can focus on agent logic rather than infrastructure. AgentCore supports both container-based and direct code deployments, streaming responses, asynchronous workloads, and WebSocket connections for real-time agent interactions.

Provide a production-ready, serverless execution environment for AI agents that scales automatically, enforces security boundaries, and integrates natively with the broader Amazon Bedrock ecosystem

Use When

Deploying custom AI agents built with LangChain, LlamaIndex, or proprietary frameworks that need a managed, scalable runtime without managing EC2 or ECS infrastructure
Running long-duration agentic workloads (up to 8 hours async, 60 minutes streaming) that exceed typical Lambda or API Gateway timeout limits
Building multi-agent architectures where individual agents need isolated, version-controlled endpoints (aliases) with traffic routing
Serving high-throughput agent invocations requiring per-endpoint throttling controls, WebSocket streaming, and large payload support (up to 100 MB)

Avoid When

Simple single-turn model inference without agent logic — use Amazon Bedrock InvokeModel API directly for lower latency and cost
Workloads requiring more than 2 vCPU / 8 GB RAM per session — AgentCore hardware limits are fixed and not adjustable, making it unsuitable for memory-intensive ML inference tasks

Key Features

Container-based agent deployment

Deploy agents as Docker images up to 2 GB; supports any framework or runtime

Direct code deployment

Deploy code directly without Docker; 250 MB compressed / 750 MB uncompressed limits apply

Agent versioning

Up to 1,000 versions per agent; inactive versions auto-deleted after 45 days

Endpoint aliases

Up to 10 endpoints per agent for blue/green deployments and environment isolation

Synchronous invocation

Request/response pattern with 15-minute maximum timeout and 100 MB payload limit

Streaming invocation

Server-sent events or WebSocket streaming up to 60 minutes; 10 MB chunk size

Asynchronous job execution

Fire-and-forget jobs running up to 8 hours for long-running agentic workflows

WebSocket bidirectional streaming

32 KB frame size, 250 frames/second per connection, 25 TPS invocation rate

Session management

Isolated session workloads per agent invocation with dedicated compute allocation

Per-endpoint throttling

25 TPS per endpoint; use multiple endpoints to scale throughput horizontally

AWS IAM integration

Fine-grained access control for agent invocation and management APIs

Amazon VPC support

Deploy agents within a VPC for network isolation and private connectivity

Service Quotas integration

Most limits are adjustable via AWS Service Quotas console or API

Multi-region deployment

Available in multiple regions; session limits differ (1,000 in us-east-1/us-west-2, 500 elsewhere)

Custom hardware allocation

Hardware is fixed at 2 vCPU / 8 GB per session — not configurable or adjustable

Integration Patterns

Foundation Model-Powered Agent Runtime

high freq

Amazon Bedrock AgentCoreAmazon Bedrock

AgentCore hosts the agent orchestration logic while Amazon Bedrock InvokeModel/Converse APIs provide the underlying LLM inference; AgentCore manages sessions and routing while Bedrock handles model calls

Large Payload Staging

high freq

Amazon Bedrock AgentCoreAmazon S3

When agent inputs or outputs exceed practical inline sizes, S3 pre-signed URLs are used to stage data; the agent runtime retrieves/stores data from S3 rather than passing large payloads directly through the 100 MB invocation limit

Agent Tool/Function Execution

high freq

Amazon Bedrock AgentCoreAWS Lambda

AgentCore hosts the agent reasoning loop while Lambda functions serve as tools the agent can invoke for specific actions (API calls, data transformation, business logic); Lambda handles short-duration tool execution while AgentCore manages the long-running orchestration

Agent Session State Persistence

medium freq

Amazon Bedrock AgentCoreAmazon DynamoDB

DynamoDB stores conversation history and agent state across sessions; AgentCore provides the compute runtime while DynamoDB provides durable, low-latency state storage for multi-turn agent interactions

Agent Observability and Monitoring

medium freq

Amazon Bedrock AgentCoreAmazon CloudWatch

CloudWatch Logs captures agent execution traces, invocation metrics, and error rates; CloudWatch Alarms trigger on throttling events or session limit breaches to enable proactive scaling via quota increase requests

Container Image Registry for Agent Deployment

medium freq

Amazon Bedrock AgentCoreAmazon ECR

Docker images up to 2 GB are stored in ECR and referenced during AgentCore agent creation; ECR lifecycle policies help manage image versions aligned with AgentCore's 1,000 version limit per agent

Public Agent Endpoint Facade

medium freq

Amazon Bedrock AgentCoreAmazon API Gateway

API Gateway provides a public-facing REST/WebSocket endpoint with authentication, rate limiting, and request validation; it proxies requests to AgentCore's private invocation endpoints, adding an additional security and throttling layer

Multi-Agent Workflow Orchestration

medium freq

Amazon Bedrock AgentCoreAWS Step Functions

Step Functions orchestrates sequences of AgentCore agent invocations for complex multi-agent pipelines; each state in the workflow invokes a specialized agent, with Step Functions handling retry logic, error handling, and state passing between agents

Service Limits & Quotas

LimitValueNote

Active session workloads per account (us-east-1 / us-west-2)

1,000 sessions

Other regions only get 500 active sessions — a common trap when designing multi-region architectures

Active session workloads per account (other regions)

500 sessions

Both the 1,000 and 500 limits are adjustable via Service Quotas — do not assume they are hard limits

Total agents per account

1,000 agents

Adjustable limit — if a scenario describes needing more than 1,000 agents, the answer is to request a quota increase, not redesign

Versions per agent

1,000 versions

The 45-day auto-deletion of inactive versions is a frequently tested lifecycle detail

Endpoints (aliases) per agent

10 endpoints

This is adjustable — do not mark it as a hard architectural constraint in exam scenarios

Maximum Docker image size

2 GB GB

This is one of the few hard, non-adjustable limits — distinguish it from adjustable limits in exam questions

Maximum direct code deployment (compressed)

250 MB MB

Distinguish compressed (250 MB) from uncompressed (750 MB) — both appear in exam scenarios about deployment failures

Maximum direct code deployment (uncompressed)

750 MB MB

Candidates confuse compressed vs uncompressed limits — 250 MB compressed, 750 MB uncompressed

Maximum hardware per session

2 vCPU / 8 GB RAM per session

This is the most important non-adjustable limit for architecture decisions — memorize it

Request timeout

15 minutes minutes

15 minutes matches Lambda's maximum timeout — candidates sometimes confuse the two services

Maximum payload size

100 MB MB

100 MB is far larger than API Gateway's 10 MB limit — AgentCore is designed for large agent context windows

Streaming chunk size

10 MB MB

Chunk size (10 MB) and max payload size (100 MB) are different limits — do not conflate them

Streaming maximum duration

60 minutes minutes

Streaming max (60 min) vs async max (8 hours) vs sync timeout (15 min) — all three values are tested together

Asynchronous job maximum duration

8 hours hours

8-hour async limit is the longest duration limit in AgentCore — critical differentiator from sync (15 min) and streaming (60 min)

Invocations per second per endpoint

25 TPS

Per-endpoint throttling means 10 endpoints × 25 TPS = 250 TPS effective throughput for a single agent

InvokeAgentRuntime API rate

25 TPS per agent, per account TPS

Adjustable — applies per agent per account; design multi-agent systems to distribute load across agents if hitting this limit

InvokeAgentRuntimeWithWebSocketStream API rate

25 TPS per agent, per account TPS

Adjustable — WebSocket streaming invocations share the same 25 TPS rate as standard invocations at the agent level

New sessions created rate (container deployment)

100 TPM per endpoint TPM

Container deployment uses TPM (100/min) while direct code deployment uses TPS (25/sec) — different units for different deployment types

Direct code deploy new session rate

25 TPS per endpoint TPS

Adjustable — direct code deployments can create new sessions much faster than container deployments (25 TPS vs 100 TPM)

WebSocket frame size

32 KB KB

WebSocket frame size (32 KB) vs streaming chunk size (10 MB) are very different — frames are tiny, chunks are large

WebSocket frame rate per connection

250 frames/second frames/sec

NOT adjustable — at 32 KB per frame and 250 frames/sec, maximum WebSocket throughput is approximately 8 MB/sec per connection

CreateAgentRuntime API rate

5 TPS TPS

Control-plane APIs (5 TPS) vs data-plane APIs (25 TPS) — always lower for write/mutate operations

CreateAgentRuntimeEndpoint API rate

5 TPS TPS

Adjustable — endpoint creation is a control-plane operation capped at 5 TPS; plan endpoint creation in advance for large deployments

GetAgentRuntime API rate

50 TPS TPS

Adjustable — read operations (Get) are allowed at 50 TPS, 10x higher than write operations; safe for frequent health checks

GetAgentRuntimeEndpoint API rate

50 TPS TPS

Adjustable — read-heavy patterns like endpoint discovery and health polling are well-supported at 50 TPS

UpdateAgentRuntime API rate

5 TPS TPS

Adjustable — update operations are control-plane writes capped at 5 TPS; batch updates should be spaced out

UpdateAgentRuntimeEndpoint API rate

5 TPS TPS

Adjustable — consistent with other write control-plane APIs at 5 TPS

DeleteAgentRuntime API rate

5 TPS TPS

Adjustable — deletion is a control-plane operation; bulk cleanup scripts should implement exponential backoff

DeleteAgentRuntimeEndpoint API rate

5 TPS TPS

Adjustable — matches all other delete/create/update control-plane rates at 5 TPS

Pricing Model

Pay-per-use based on compute duration, session time, and invocation count

Pricing is consumption-based — you pay for active session compute time, not idle capacity, aligning costs directly with agent usage
Container-based deployments and direct code deployments may have different pricing dimensions — always check the current AWS pricing page for AgentCore as this is a newer service with evolving pricing
No upfront costs or minimum commitments — suitable for variable agentic workloads that scale to zero when not in use
Long-running async jobs (up to 8 hours) accrue costs for the full duration — design agents to terminate early when tasks complete to optimize spend

Exam Tips

criticalInvocation modes and timeout limits

Memorize the three duration limits and their hierarchy: synchronous requests = 15 minutes (NOT adjustable), streaming sessions = 60 minutes (NOT adjustable), async jobs = 8 hours (NOT adjustable). Exam questions will describe a workload duration and ask which invocation mode to use.

criticalSession hardware limits and service selection

The 2 vCPU / 8 GB hardware limit per session is NOT adjustable — this is the single most important architectural constraint. Any exam scenario requiring more compute per agent session means AgentCore is the WRONG service; use ECS or EKS instead.

criticalRegional quota differences

Active session limits differ by region: 1,000 in us-east-1 and us-west-2, but only 500 in all other regions. Multi-region architecture questions must account for this capacity difference — a global deployment cannot assume uniform session capacity.

critical

The 2 vCPU / 8 GB hardware limit per session is NOT adjustable and is the definitive signal that a workload exceeds AgentCore's capabilities — redirect to ECS or EKS for compute-intensive agents

critical

Three invocation modes, three timeouts: Sync = 15 min, Streaming = 60 min, Async = 8 hours — match the workload duration to the correct invocation mode in every architecture scenario

critical

AgentCore (custom agent runtime) ≠ Amazon Bedrock Agents (managed declarative agent builder) — always distinguish these two services when answering service selection questions on AIF-C01

importantVersion lifecycle management

Inactive agent versions are automatically deleted after 45 days. If a scenario involves regulatory compliance, rollback requirements, or audit trails for agent versions, you must implement an external archival strategy (e.g., store version metadata in S3 or DynamoDB) before versions are purged.

importantAPI rate limits and throttling patterns

Control-plane APIs (Create/Update/Delete) are throttled at 5 TPS while data-plane APIs (Invoke) run at 25 TPS and read APIs (Get) run at 50 TPS. This 5/25/50 pattern is a testable API rate hierarchy — always recommend exponential backoff for control-plane operations in automation scripts.

importantDeployment type performance characteristics

Container deployments create new sessions at 100 TPM (transactions per minute) per endpoint, while direct code deployments create sessions at 25 TPS (transactions per second). Direct code deployment is ~15x faster for session creation — choose it when rapid cold-start is critical.

importantThroughput scaling with endpoints

The 10 endpoints (aliases) per agent limit is adjustable and enables horizontal throughput scaling. With 10 endpoints each supporting 25 TPS, a single agent can effectively handle 250 TPS total by distributing load across aliases. This is the correct answer when asked how to scale beyond 25 TPS for one agent.

importantAIF-C01 exam domain alignment

For the AIF-C01 exam, AgentCore is primarily tested in the context of responsible AI deployment, agent lifecycle management, and choosing appropriate AWS services for agentic workloads. Focus on understanding WHEN to use AgentCore vs Lambda vs ECS rather than memorizing every API rate.

Good to KnowWebSocket vs streaming limits

WebSocket frame size (32 KB) and streaming chunk size (10 MB) are completely different limits for different protocols. WebSocket frames are tiny and numerous (250/sec max), while streaming chunks are large and infrequent. Do not confuse these in questions about real-time agent communication.

Common Misconceptions & Traps

Common Mistake

All Amazon Bedrock AgentCore limits are hard limits that cannot be changed

Correct

Most limits are adjustable via AWS Service Quotas — including active sessions, total agents, versions, endpoints, invocation rates, and API throttle limits. Only hardware per session (2vCPU/8GB), Docker image size (2GB), code deployment sizes, request timeout, payload size, streaming limits, and WebSocket frame limits are NOT adjustable.

Exam questions often describe a scenario hitting a limit and ask what to do — the correct answer is usually 'request a quota increase' for adjustable limits, but 'redesign the architecture' for non-adjustable ones. Knowing which is which is critical.

Common Mistake

Amazon Bedrock AgentCore is just another name for Amazon Bedrock Agents (the managed agent service)

Correct

AgentCore is a distinct service that provides a runtime execution environment for custom-built agents (any framework, any code). Amazon Bedrock Agents is a separate managed service for building agents declaratively with Bedrock-native constructs. AgentCore gives you full control over agent code; Bedrock Agents is higher-level and more opinionated.

This is the #1 conceptual confusion for this service. AIF-C01 tests whether candidates understand the AWS AI agent service landscape — confusing these two leads to wrong service selection answers.

Common Mistake

The 15-minute request timeout in AgentCore is the maximum time any agent task can run

Correct

The 15-minute limit applies only to synchronous invocations. Streaming sessions can run for up to 60 minutes, and asynchronous jobs can run for up to 8 hours. Long-running agentic tasks should use async invocation mode, not synchronous.

Candidates familiar with Lambda's 15-minute limit assume the same applies universally to AgentCore. The existence of three different invocation modes with three different timeouts is a key differentiator that appears in architecture scenario questions.

Common Mistake

You need to manage servers or containers manually to run agents in AgentCore

Correct

AgentCore is fully managed — you provide the code or container image, and AWS handles all infrastructure provisioning, scaling, session isolation, and compute management. You never interact with the underlying servers.

The word 'container' in the deployment options makes candidates think they need to manage ECS or EKS. AgentCore abstracts all of that — it's serverless from the user's perspective, similar to how Lambda handles function execution.

Common Mistake

Agent versions in AgentCore are retained indefinitely unless manually deleted

Correct

Inactive agent versions are automatically deleted after 45 days. If you need to retain version metadata for compliance, auditing, or rollback purposes beyond 45 days, you must implement an external archival strategy before versions are purged.

This catches candidates who assume AWS services retain data forever by default. The 45-day auto-deletion is a compliance and operational risk that must be proactively managed in production environments.

Common Mistake

The 100 MB maximum payload size in AgentCore is the same as the streaming chunk size limit

Correct

These are two completely different limits: 100 MB is the maximum total payload for a single invocation, while 10 MB is the maximum size of each individual streaming chunk. A streaming response can deliver up to 100 MB total, but must do so in chunks no larger than 10 MB each.

Conflating these two limits leads to incorrect answers about streaming architecture design. Understanding that large payloads can be streamed in chunks is key to designing scalable agent response patterns.

Memory Tricks

🧠

Duration Ladder — 15 min (sync), 60 min (stream), 8 hours (async): 'Sync Sprints, Streams Stroll, Async Ambles All Day'

🧠

Hardware is FIXED at '2 and 8' — 2 vCPU, 8 GB RAM. Think '2 wheels, 8 cylinders — you can't add more to this engine'

🧠

Control/Data/Read API rates follow the 5/25/50 pattern: 'Five to Write, Twenty-Five to Invoke, Fifty to Read'

🧠

Region session limits: 'East and West get the BEST (1,000), all the REST get less (500)'

CertAI Tutor · AIF-C01 · 2026-03-09

Ready to test your knowledge?

Practice AIF-C01 exam questions with AI-powered explanations — free to start.