
Cargando...
Fully managed runtime for deploying, scaling, and orchestrating production-grade AI agents on AWS
Amazon Bedrock AgentCore is a fully managed service that provides a secure, scalable runtime environment for deploying and executing AI agents built with any framework. It handles the heavy lifting of containerized agent execution, session management, and invocation routing so developers can focus on agent logic rather than infrastructure. AgentCore supports both container-based and direct code deployments, streaming responses, asynchronous workloads, and WebSocket connections for real-time agent interactions.
Provide a production-ready, serverless execution environment for AI agents that scales automatically, enforces security boundaries, and integrates natively with the broader Amazon Bedrock ecosystem
Use When
Avoid When
Container-based agent deployment
Deploy agents as Docker images up to 2 GB; supports any framework or runtime
Direct code deployment
Deploy code directly without Docker; 250 MB compressed / 750 MB uncompressed limits apply
Agent versioning
Up to 1,000 versions per agent; inactive versions auto-deleted after 45 days
Endpoint aliases
Up to 10 endpoints per agent for blue/green deployments and environment isolation
Synchronous invocation
Request/response pattern with 15-minute maximum timeout and 100 MB payload limit
Streaming invocation
Server-sent events or WebSocket streaming up to 60 minutes; 10 MB chunk size
Asynchronous job execution
Fire-and-forget jobs running up to 8 hours for long-running agentic workflows
WebSocket bidirectional streaming
32 KB frame size, 250 frames/second per connection, 25 TPS invocation rate
Session management
Isolated session workloads per agent invocation with dedicated compute allocation
Per-endpoint throttling
25 TPS per endpoint; use multiple endpoints to scale throughput horizontally
AWS IAM integration
Fine-grained access control for agent invocation and management APIs
Amazon VPC support
Deploy agents within a VPC for network isolation and private connectivity
Service Quotas integration
Most limits are adjustable via AWS Service Quotas console or API
Multi-region deployment
Available in multiple regions; session limits differ (1,000 in us-east-1/us-west-2, 500 elsewhere)
Custom hardware allocation
Hardware is fixed at 2 vCPU / 8 GB per session — not configurable or adjustable
Foundation Model-Powered Agent Runtime
high freqAgentCore hosts the agent orchestration logic while Amazon Bedrock InvokeModel/Converse APIs provide the underlying LLM inference; AgentCore manages sessions and routing while Bedrock handles model calls
Large Payload Staging
high freqWhen agent inputs or outputs exceed practical inline sizes, S3 pre-signed URLs are used to stage data; the agent runtime retrieves/stores data from S3 rather than passing large payloads directly through the 100 MB invocation limit
Agent Tool/Function Execution
high freqAgentCore hosts the agent reasoning loop while Lambda functions serve as tools the agent can invoke for specific actions (API calls, data transformation, business logic); Lambda handles short-duration tool execution while AgentCore manages the long-running orchestration
Agent Session State Persistence
medium freqDynamoDB stores conversation history and agent state across sessions; AgentCore provides the compute runtime while DynamoDB provides durable, low-latency state storage for multi-turn agent interactions
Agent Observability and Monitoring
medium freqCloudWatch Logs captures agent execution traces, invocation metrics, and error rates; CloudWatch Alarms trigger on throttling events or session limit breaches to enable proactive scaling via quota increase requests
Container Image Registry for Agent Deployment
medium freqDocker images up to 2 GB are stored in ECR and referenced during AgentCore agent creation; ECR lifecycle policies help manage image versions aligned with AgentCore's 1,000 version limit per agent
Public Agent Endpoint Facade
medium freqAPI Gateway provides a public-facing REST/WebSocket endpoint with authentication, rate limiting, and request validation; it proxies requests to AgentCore's private invocation endpoints, adding an additional security and throttling layer
Multi-Agent Workflow Orchestration
medium freqStep Functions orchestrates sequences of AgentCore agent invocations for complex multi-agent pipelines; each state in the workflow invokes a specialized agent, with Step Functions handling retry logic, error handling, and state passing between agents
Memorize the three duration limits and their hierarchy: synchronous requests = 15 minutes (NOT adjustable), streaming sessions = 60 minutes (NOT adjustable), async jobs = 8 hours (NOT adjustable). Exam questions will describe a workload duration and ask which invocation mode to use.
The 2 vCPU / 8 GB hardware limit per session is NOT adjustable — this is the single most important architectural constraint. Any exam scenario requiring more compute per agent session means AgentCore is the WRONG service; use ECS or EKS instead.
Active session limits differ by region: 1,000 in us-east-1 and us-west-2, but only 500 in all other regions. Multi-region architecture questions must account for this capacity difference — a global deployment cannot assume uniform session capacity.
The 2 vCPU / 8 GB hardware limit per session is NOT adjustable and is the definitive signal that a workload exceeds AgentCore's capabilities — redirect to ECS or EKS for compute-intensive agents
Three invocation modes, three timeouts: Sync = 15 min, Streaming = 60 min, Async = 8 hours — match the workload duration to the correct invocation mode in every architecture scenario
AgentCore (custom agent runtime) ≠ Amazon Bedrock Agents (managed declarative agent builder) — always distinguish these two services when answering service selection questions on AIF-C01
Inactive agent versions are automatically deleted after 45 days. If a scenario involves regulatory compliance, rollback requirements, or audit trails for agent versions, you must implement an external archival strategy (e.g., store version metadata in S3 or DynamoDB) before versions are purged.
Control-plane APIs (Create/Update/Delete) are throttled at 5 TPS while data-plane APIs (Invoke) run at 25 TPS and read APIs (Get) run at 50 TPS. This 5/25/50 pattern is a testable API rate hierarchy — always recommend exponential backoff for control-plane operations in automation scripts.
Container deployments create new sessions at 100 TPM (transactions per minute) per endpoint, while direct code deployments create sessions at 25 TPS (transactions per second). Direct code deployment is ~15x faster for session creation — choose it when rapid cold-start is critical.
The 10 endpoints (aliases) per agent limit is adjustable and enables horizontal throughput scaling. With 10 endpoints each supporting 25 TPS, a single agent can effectively handle 250 TPS total by distributing load across aliases. This is the correct answer when asked how to scale beyond 25 TPS for one agent.
For the AIF-C01 exam, AgentCore is primarily tested in the context of responsible AI deployment, agent lifecycle management, and choosing appropriate AWS services for agentic workloads. Focus on understanding WHEN to use AgentCore vs Lambda vs ECS rather than memorizing every API rate.
WebSocket frame size (32 KB) and streaming chunk size (10 MB) are completely different limits for different protocols. WebSocket frames are tiny and numerous (250/sec max), while streaming chunks are large and infrequent. Do not confuse these in questions about real-time agent communication.
Common Mistake
All Amazon Bedrock AgentCore limits are hard limits that cannot be changed
Correct
Most limits are adjustable via AWS Service Quotas — including active sessions, total agents, versions, endpoints, invocation rates, and API throttle limits. Only hardware per session (2vCPU/8GB), Docker image size (2GB), code deployment sizes, request timeout, payload size, streaming limits, and WebSocket frame limits are NOT adjustable.
Exam questions often describe a scenario hitting a limit and ask what to do — the correct answer is usually 'request a quota increase' for adjustable limits, but 'redesign the architecture' for non-adjustable ones. Knowing which is which is critical.
Common Mistake
Amazon Bedrock AgentCore is just another name for Amazon Bedrock Agents (the managed agent service)
Correct
AgentCore is a distinct service that provides a runtime execution environment for custom-built agents (any framework, any code). Amazon Bedrock Agents is a separate managed service for building agents declaratively with Bedrock-native constructs. AgentCore gives you full control over agent code; Bedrock Agents is higher-level and more opinionated.
This is the #1 conceptual confusion for this service. AIF-C01 tests whether candidates understand the AWS AI agent service landscape — confusing these two leads to wrong service selection answers.
Common Mistake
The 15-minute request timeout in AgentCore is the maximum time any agent task can run
Correct
The 15-minute limit applies only to synchronous invocations. Streaming sessions can run for up to 60 minutes, and asynchronous jobs can run for up to 8 hours. Long-running agentic tasks should use async invocation mode, not synchronous.
Candidates familiar with Lambda's 15-minute limit assume the same applies universally to AgentCore. The existence of three different invocation modes with three different timeouts is a key differentiator that appears in architecture scenario questions.
Common Mistake
You need to manage servers or containers manually to run agents in AgentCore
Correct
AgentCore is fully managed — you provide the code or container image, and AWS handles all infrastructure provisioning, scaling, session isolation, and compute management. You never interact with the underlying servers.
The word 'container' in the deployment options makes candidates think they need to manage ECS or EKS. AgentCore abstracts all of that — it's serverless from the user's perspective, similar to how Lambda handles function execution.
Common Mistake
Agent versions in AgentCore are retained indefinitely unless manually deleted
Correct
Inactive agent versions are automatically deleted after 45 days. If you need to retain version metadata for compliance, auditing, or rollback purposes beyond 45 days, you must implement an external archival strategy before versions are purged.
This catches candidates who assume AWS services retain data forever by default. The 45-day auto-deletion is a compliance and operational risk that must be proactively managed in production environments.
Common Mistake
The 100 MB maximum payload size in AgentCore is the same as the streaming chunk size limit
Correct
These are two completely different limits: 100 MB is the maximum total payload for a single invocation, while 10 MB is the maximum size of each individual streaming chunk. A streaming response can deliver up to 100 MB total, but must do so in chunks no larger than 10 MB each.
Conflating these two limits leads to incorrect answers about streaming architecture design. Understanding that large payloads can be streamed in chunks is key to designing scalable agent response patterns.
Duration Ladder — 15 min (sync), 60 min (stream), 8 hours (async): 'Sync Sprints, Streams Stroll, Async Ambles All Day'
Hardware is FIXED at '2 and 8' — 2 vCPU, 8 GB RAM. Think '2 wheels, 8 cylinders — you can't add more to this engine'
Control/Data/Read API rates follow the 5/25/50 pattern: 'Five to Write, Twenty-Five to Invoke, Fifty to Read'
Region session limits: 'East and West get the BEST (1,000), all the REST get less (500)'
CertAI Tutor · AIF-C01 · 2026-03-09