ml aiAIF-C01

Amazon Bedrock: The Generative AI Foundation Model Gateway

Fully managed access to leading foundation models — build, customize, and deploy generative AI without managing infrastructure

Updated 2026-02-22

Overview

Amazon Bedrock is a fully managed service that provides API-based access to high-performing foundation models (FMs) from Amazon and leading AI companies such as Anthropic, Meta, Mistral, Cohere, and Stability AI. It enables developers to build and scale generative AI applications using features like Agents, Knowledge Bases, Guardrails, and fine-tuning — all without provisioning or managing any ML infrastructure. Bedrock integrates natively with AWS security services to ensure enterprise-grade data privacy, with your data never used to train the underlying models.

Enable enterprises to rapidly build production-grade generative AI applications using best-in-class foundation models via a single, secure, serverless API — with built-in customization, retrieval-augmented generation (RAG), orchestration, and responsible AI controls

Use When

Building chatbots, virtual assistants, or conversational AI using large language models without managing GPU infrastructure
Implementing Retrieval-Augmented Generation (RAG) pipelines where private enterprise data must augment FM responses via Knowledge Bases
Creating autonomous AI agents that can plan, call APIs, query databases, and complete multi-step tasks using Bedrock Agents
Fine-tuning a foundation model on domain-specific proprietary data to improve accuracy for specialized tasks like legal, medical, or financial Q&A
Applying content moderation, topic filtering, and PII redaction across FM inputs and outputs using Bedrock Guardrails

Avoid When

When you need to train a model entirely from scratch with full architectural control — use Amazon SageMaker with custom training jobs instead, as Bedrock only supports fine-tuning of supported models
When your workload requires real-time, sub-10ms inference with highly customized hardware (e.g., custom CUDA kernels) — SageMaker real-time endpoints with custom containers offer more control
When you need to deploy open-source models not available in Bedrock's model catalog — SageMaker JumpStart or custom endpoints are more appropriate
When your use case requires persistent stateful compute beyond the context window — Bedrock is stateless per API call; you must manage conversation state externally

Key Features

On-Demand Inference (pay-per-token)

Default mode; no commitment; throughput not guaranteed under load

Provisioned Throughput

Guaranteed throughput; required for fine-tuned models; 1-month or 6-month commitment

Batch Inference

Process large datasets asynchronously at lower cost; results stored in S3

Knowledge Bases (RAG)

Managed RAG pipeline with automatic embedding, vector storage, and retrieval

Bedrock Agents

LLM-driven autonomous orchestration using ReAct framework; integrates with Lambda, APIs, Knowledge Bases

Guardrails

Content filtering, topic denial, PII redaction, grounding checks — applied bidirectionally

Fine-Tuning (Supervised)

Supported for select models only; requires labeled training data in S3; output requires Provisioned Throughput

Continued Pre-training

Unlabeled domain data; adapts model knowledge base without task-specific labels

Model Evaluation

Automatic metrics and human review workflows; compare multiple FMs side-by-side

Watermark Detection (Amazon Titan Image)

Invisible watermarks embedded in Titan-generated images; detectable via API

Cross-Region Inference

Routes requests across Regions for higher throughput and resilience; uses inference profiles

VPC Endpoints (PrivateLink)

Private connectivity without traversing public internet; required for strict compliance workloads

AWS PrivateLink support

Interface VPC endpoints available for Bedrock runtime and management APIs

Encryption at Rest (Customer Managed Keys)

KMS CMKs supported for fine-tuned model artifacts and Knowledge Base data

CloudTrail API Logging

ALL Bedrock API calls logged to CloudTrail — critical for compliance and auditing

CloudWatch Metrics and Logs

Invocation metrics, latency, token counts; model invocation logging to S3 or CloudWatch Logs

IAM Resource-Based Policies

Fine-grained access control per model ARN; supports cross-account model sharing via resource policies

Streaming Responses

InvokeModelWithResponseStream API for token-by-token streaming; improves perceived latency

Converse API

Unified multi-turn conversation API that works across all supported models with a consistent interface

Prompt Management

Store, version, and share prompt templates across teams via Bedrock Prompt Management

Integration Patterns

Managed RAG Pipeline via Knowledge Bases

high freq

Amazon BedrockAmazon S3Amazon OpenSearch Serverless

Store source documents (PDF, DOCX, TXT, HTML) in S3 → Bedrock Knowledge Base automatically chunks, embeds, and indexes into OpenSearch Serverless → at query time, Bedrock retrieves relevant chunks and augments the FM prompt with grounded context. Eliminates hallucination for enterprise knowledge queries.

Serverless GenAI API Backend

high freq

Amazon BedrockAWS LambdaAmazon API Gateway

API Gateway receives user requests → Lambda invokes Bedrock InvokeModel or Converse API → response returned to client. Lambda handles prompt construction, session state management, and post-processing. Fully serverless, scales to zero.

Compliance Auditing and Observability

high freq

Amazon BedrockAWS CloudTrailAmazon CloudWatch

CloudTrail captures every Bedrock API call (who called which model, when, with what parameters) for compliance auditing. CloudWatch collects invocation metrics (latency, token counts, errors) and model invocation logs (full prompt/response) for operational monitoring. These serve DIFFERENT purposes — CloudTrail for audit, CloudWatch for operations.

Autonomous AI Agent with Action Groups

high freq

Amazon Bedrock AgentsAWS LambdaAmazon DynamoDB

Bedrock Agent uses FM reasoning to determine which action to take → calls Action Group (Lambda function) → Lambda queries DynamoDB or external APIs → returns results to agent → agent synthesizes final response. Enables multi-step task automation without hardcoded workflows.

Pre/Post Processing NLP Pipeline

high freq

Amazon BedrockAmazon Comprehend

Use Amazon Comprehend for deterministic NLP tasks (entity extraction, sentiment analysis, language detection, PII identification) as pre-processing before sending to Bedrock FM, or as post-processing to validate/structure FM outputs. Comprehend is cheaper and faster for tasks it handles natively.

Document Intelligence Pipeline

high freq

Amazon BedrockAmazon TextractAmazon S3

Textract extracts structured text, tables, and forms from scanned PDFs/images stored in S3 → extracted text passed to Bedrock FM for summarization, Q&A, or classification. Textract handles OCR/layout understanding; Bedrock handles language reasoning.

Custom Model Training + Bedrock Deployment

medium freq

Amazon BedrockAmazon SageMaker

Train custom models or perform advanced fine-tuning in SageMaker (full control over training infrastructure) → import resulting model artifacts into Bedrock for managed serverless inference. Combines SageMaker's training flexibility with Bedrock's operational simplicity.

Secure Third-Party API Key Management for Agents

medium freq

Amazon BedrockAWS Secrets Manager

Bedrock Agents that call external APIs (e.g., Salesforce, weather APIs) need credentials. Store API keys in Secrets Manager with automatic rotation → Lambda Action Group retrieves secrets at runtime. NEVER hardcode credentials in agent instructions or Lambda code.

Enterprise Search + Generative AI

medium freq

Amazon BedrockAmazon Kendra

Amazon Kendra indexes enterprise content (SharePoint, Confluence, S3) with ML-powered search → Bedrock FM generates natural language answers grounded in Kendra search results. Kendra handles structured enterprise search; Bedrock handles natural language generation.

Deterministic Orchestration of GenAI Workflows

medium freq

Amazon BedrockAWS Step Functions

Use Step Functions for complex, multi-stage workflows requiring deterministic branching, error handling, and retry logic — with individual steps invoking Bedrock for specific FM tasks. Contrast with Bedrock Agents (LLM-driven, non-deterministic). Step Functions = predictable; Agents = autonomous.

Service Limits & Quotas

LimitValueNote

On-demand throughput (default RPM/TPM per model)

Varies by model and Region — consult the Bedrock service quotas page in the AWS Console for per-model limits Requests per minute / Tokens per minute

Candidates often assume a single universal RPM limit exists — there is no single number; each model (e.g., Claude 3.5 Sonnet vs. Titan Text) has its own separate quota

Provisioned Throughput commitment minimum

1 Model Unit (MU) minimum; commitment terms of 1 month or 6 months Model Units

Provisioned Throughput is required for fine-tuned custom models — you CANNOT invoke a fine-tuned model via on-demand inference; you must purchase Provisioned Throughput

Knowledge Base supported vector stores

Amazon OpenSearch Serverless, Amazon Aurora (pgvector), Redis Enterprise Cloud, Pinecone, MongoDB Atlas, and Amazon OpenSearch Managed Clusters Vector database options

Amazon S3 is the data SOURCE for Knowledge Bases, not the vector store — a common confusion on exam questions

Bedrock Agents maximum steps per orchestration

Varies; agents have a configurable maximum number of orchestration steps to prevent infinite loops Orchestration steps

Bedrock Agents are NOT the same as AWS Step Functions; Agents are LLM-driven orchestration, Step Functions is deterministic workflow orchestration

Guardrails content filtering categories

Hate, Insults, Sexual, Violence, Misconduct, Prompt Attack — configurable strength levels (None, Low, Medium, High) Categories with 4 strength levels each

Guardrails perform content filtering and topic blocking — they do NOT detect statistical bias in model outputs; bias detection requires separate evaluation tooling

Model Evaluation supported task types

Text summarization, question and answer, text classification, open-ended text generation Task types

Model Evaluation is distinct from Guardrails — Evaluation assesses quality and accuracy, Guardrails enforces safety policies at runtime

Fine-tuning supported techniques

Full fine-tuning and Continued Pre-training (for select models); not all models support fine-tuning Techniques

Fine-tuned models REQUIRE Provisioned Throughput to invoke — you cannot use on-demand pricing for custom model variants

Context window size

Varies by model — e.g., Claude 3.5 Sonnet supports up to 200K tokens; Amazon Titan Text Premier up to 32K tokens Tokens

Larger context windows cost more per call; choosing the right model for the task size is an exam architecture decision

Bedrock Knowledge Base chunk size (default)

Default chunking is configurable; options include Fixed-size, Hierarchical, Semantic, and No chunking strategies Chunking strategies

S3 is the only supported data SOURCE for Knowledge Base ingestion natively — other sources require custom connectors or preprocessing pipelines

Pricing Model

Pay-per-use (on-demand) based on input/output tokens processed, or reserved capacity via Provisioned Throughput (per Model Unit per hour)

On-demand pricing: Charged separately for INPUT tokens and OUTPUT tokens — output tokens are typically more expensive than input tokens
Provisioned Throughput: Billed per Model Unit per hour regardless of actual usage — commit to 1 month or 6 months; 6-month term offers discount
Batch Inference: Lower per-token cost than on-demand for asynchronous large-scale workloads — results delivered to S3
Knowledge Bases: Charges include vector storage costs (per GB/month in chosen vector store) plus embedding model token costs during ingestion
Bedrock Agents: Billed for FM tokens consumed during orchestration (planning + response steps) plus any Knowledge Base retrievals
Guardrails: Billed per text unit processed through guardrail policies — separate from FM inference costs
Model Evaluation: Charges based on tokens processed for automatic evaluation; human review workflows have separate pricing via A2I
Fine-tuning: One-time training job cost (per token processed during training) plus ongoing Provisioned Throughput for inference
No charge for idle Provisioned Throughput capacity — you pay the hourly rate whether or not you send requests
Data transfer and S3 storage for model invocation logging and batch results are billed at standard AWS rates

Exam Tips

criticalCompliance, Auditing, Observability

CloudTrail logs ALL Bedrock API calls for AUDITING (who called what model, when) — CloudWatch Logs captures model INVOCATION CONTENT (prompts and responses) for operational monitoring. These serve completely different purposes. Exam questions about compliance auditing → CloudTrail. Questions about debugging FM outputs → CloudWatch/model invocation logging.

criticalFine-Tuning, Provisioned Throughput, Pricing

Fine-tuned models in Bedrock CANNOT be invoked via on-demand pricing — they REQUIRE Provisioned Throughput. If an exam scenario describes deploying a fine-tuned model to production, the answer must include Provisioned Throughput, not on-demand inference.

criticalResponsible AI, Guardrails, Bias

Bedrock Guardrails perform CONTENT FILTERING (blocking harmful, toxic, or off-topic content) — they do NOT detect statistical bias, fairness issues, or demographic disparities in model outputs. Bias detection requires model evaluation frameworks, external fairness tooling, or human review workflows.

criticalRAG, Knowledge Bases, Vector Databases

In RAG architecture with Bedrock Knowledge Bases: S3 is the DATA SOURCE (where documents are stored), NOT the vector store. The vector store (where embeddings live) is OpenSearch Serverless, Aurora pgvector, Redis, Pinecone, or MongoDB Atlas. Exam questions frequently test this distinction.

criticalSecurity, KMS, Encryption, Cross-Account

AWS-managed KMS keys for Bedrock CANNOT be shared across AWS accounts. For cross-account model access or shared encryption, you must use Customer Managed Keys (CMKs) with appropriate key policies. This is a common trap in enterprise architecture questions.

critical

CloudTrail = compliance audit trail for WHO called WHAT model WHEN. CloudWatch = operational monitoring of HOW models perform. Never swap these — compliance questions → CloudTrail, operational debugging → CloudWatch.

critical

Fine-tuned models REQUIRE Provisioned Throughput — on-demand inference is impossible for custom model variants. Any exam scenario about deploying a fine-tuned model must include Provisioned Throughput in the solution.

critical

Guardrails = content safety filtering (harmful content, PII, off-topic). NOT bias detection. Bias requires model evaluation or human review. Content policy enforcement ≠ algorithmic fairness.

importantBedrock Agents, Orchestration, ReAct

Bedrock Agents use a ReAct (Reasoning + Acting) framework — the FM reasons about what to do, acts by calling tools/APIs, observes results, and iterates. This is fundamentally different from Step Functions (deterministic state machine). Choose Agents when you need autonomous, adaptive task completion; Step Functions when you need predictable, auditable workflows.

importantPrompt Engineering, Few-Shot, Zero-Shot

Few-shot prompting (providing 2-5 examples in the prompt) significantly outperforms zero-shot prompting for complex analytical, classification, or structured output tasks. Exam scenarios asking how to improve FM accuracy without fine-tuning → answer is few-shot or chain-of-thought prompting, NOT zero-shot.

importantSecrets Manager, Parameter Store, Security, Agents

Secrets Manager (NOT Parameter Store) is the correct answer for storing API keys used by Bedrock Agents when AUTOMATIC ROTATION is required. Parameter Store supports rotation only with custom Lambda rotation functions and is not designed for third-party API credential rotation. Exam questions with 'automatic rotation' → Secrets Manager.

importantData Privacy, Governance, Enterprise Security

Bedrock's data privacy guarantee: your prompts and completions are NEVER used to train or improve Amazon's or any third-party's base models. This is a key differentiator for enterprise adoption and frequently appears in exam questions about data governance and privacy.

importantAmazon Comprehend, NLP, Cost Optimization

Use Amazon Comprehend for DETERMINISTIC NLP tasks (entity recognition, sentiment, PII detection, language detection) and Bedrock FMs for GENERATIVE or REASONING tasks. Comprehend is cheaper, faster, and more consistent for structured NLP. Exam questions testing cost optimization or task appropriateness often hinge on this distinction.

importantBatch Inference, Cost Optimization, Architecture

Bedrock supports BATCH INFERENCE for large-scale asynchronous workloads at lower cost — input data from S3, results written back to S3. Choose batch inference when latency is not critical and you're processing thousands of records. On-demand = real-time; Batch = async + cheaper.

Good to KnowConverse API, Multi-Turn, Model Portability

The Converse API provides a UNIFIED interface for multi-turn conversations across ALL supported Bedrock models — you don't need model-specific prompt formatting. This simplifies code when switching between models and is the recommended approach for chatbot/assistant applications.

Good to KnowResponsible AI, Watermarking, Provenance, Titan

Amazon Titan Image Generator embeds INVISIBLE WATERMARKS in generated images by default — these watermarks can be detected via the Bedrock DetectGeneratedContent API. This is a responsible AI feature for provenance tracking and is tested in the Responsible AI domain of AIF-C01.

Common Misconceptions & Traps

Common Mistake

CloudWatch Logs is the right service to audit WHO called which Bedrock model and WHEN for compliance reporting

Correct

AWS CloudTrail is the authoritative audit trail for ALL Bedrock API calls — it records the caller identity (IAM principal), source IP, timestamp, model ID, and API action. CloudWatch Logs captures model invocation CONTENT (prompts and responses) when model invocation logging is enabled, but this is for operational debugging, not compliance auditing.

This is the #1 most common trap in Bedrock exam questions. Remember: CloudTrail = WHO did WHAT (audit log) | CloudWatch = WHAT happened operationally (metrics + content logs). Compliance questions → CloudTrail. Debugging FM quality → CloudWatch.

Common Mistake

Bedrock Guardrails detect and prevent AI bias, ensuring fair and unbiased model outputs

Correct

Guardrails perform CONTENT FILTERING — blocking harmful content categories (hate speech, violence, sexual content), denying off-topic subjects, redacting PII, and detecting prompt injection attacks. They do NOT evaluate statistical bias, demographic fairness, or model accuracy disparities. Bias detection requires model evaluation, human review (A2I), or external fairness frameworks.

Content safety ≠ Algorithmic fairness. Guardrails are a runtime safety net for content policy enforcement. Bias is a model quality and training data problem that requires evaluation methodologies, not runtime filters.

Common Mistake

Zero-shot prompting is sufficient for complex analytical tasks like multi-step reasoning, structured data extraction, or nuanced classification

Correct

Zero-shot prompting (no examples) works for simple, well-defined tasks but consistently underperforms for complex analytical work. Few-shot prompting (2-5 examples in context), chain-of-thought prompting (asking the model to reason step-by-step), or fine-tuning are required for complex tasks. Exam scenarios asking how to improve FM accuracy → few-shot or CoT first, fine-tuning as last resort.

The exam tests understanding of the prompt engineering hierarchy: Zero-shot → Few-shot → Chain-of-Thought → Fine-tuning. Each step increases complexity and cost but improves performance for harder tasks.

Common Mistake

AWS-managed KMS keys can be shared across AWS accounts to encrypt Bedrock fine-tuned model artifacts

Correct

AWS-managed KMS keys are account-specific and CANNOT be shared across accounts or have their key policies modified. For cross-account encryption or sharing encrypted Bedrock resources (fine-tuned model artifacts, Knowledge Base data), you MUST use Customer Managed Keys (CMKs) with a key policy that grants cross-account access.

AWS-managed keys = AWS controls, account-scoped, no sharing. CMKs = you control, shareable via key policy. Any exam question mentioning 'cross-account' + 'encryption' → CMK is always the answer.

Common Mistake

S3 is the vector store for Bedrock Knowledge Bases — documents are stored and searched directly in S3

Correct

S3 is the DATA SOURCE for Knowledge Bases (where your raw documents live). After ingestion, Bedrock chunks the documents, generates embeddings using an embedding model, and stores the vector embeddings in a VECTOR DATABASE (Amazon OpenSearch Serverless, Aurora pgvector, Redis Enterprise Cloud, Pinecone, or MongoDB Atlas). S3 is never queried at retrieval time — the vector store is.

RAG architecture has two distinct storage layers: S3 (raw documents, write-once) and vector store (embeddings, queried at runtime). Confusing these leads to wrong answers about where to optimize retrieval performance.

Common Mistake

Fine-tuned Bedrock models can be invoked using standard on-demand pricing like base models

Correct

Fine-tuned (custom) models in Bedrock REQUIRE Provisioned Throughput to invoke — you must purchase at least 1 Model Unit with a 1-month or 6-month commitment before you can call your fine-tuned model. On-demand inference is only available for base foundation models in the Bedrock model catalog.

This trips up candidates who assume fine-tuning is just a training step after which the model works like any other. The Provisioned Throughput requirement has significant cost implications and is a critical architectural consideration.

Common Mistake

AWS Parameter Store is equivalent to Secrets Manager for storing API keys used by Bedrock Agents when rotation is needed

Correct

AWS Secrets Manager is the correct service when AUTOMATIC ROTATION of credentials is required. Secrets Manager has built-in rotation support for AWS services and supports custom Lambda rotation functions for third-party APIs. Parameter Store SecureString supports encryption but does NOT have native automatic rotation — rotation requires custom implementation. Exam keyword 'automatic rotation' → always Secrets Manager.

Both services store secrets securely, but the rotation requirement is the differentiator. Parameter Store is cheaper for static secrets; Secrets Manager is required for automatically rotating credentials used by Bedrock Agent Action Groups calling external APIs.

Common Mistake

Bedrock Agents and AWS Step Functions are interchangeable for orchestrating multi-step AI workflows

Correct

Bedrock Agents use LLM-driven, non-deterministic orchestration (ReAct framework) — the FM autonomously decides which tools to call and in what order based on reasoning. Step Functions is a deterministic state machine where YOU define every state, transition, and branch. Use Agents when tasks are open-ended and require adaptive reasoning; use Step Functions when workflows are predictable, auditable, and require guaranteed execution order.

Agents = autonomous + adaptive (LLM decides). Step Functions = deterministic + auditable (developer decides). Mixing these up leads to wrong architecture answers. A compliance-heavy workflow requiring audit trails → Step Functions. An open-ended customer service task → Bedrock Agents.

Memory Tricks

🧠

GRAFFITI for Bedrock features: Guardrails, RAG (Knowledge Bases), Agents, Fine-tuning, Foundation models, Invocation logging, Titan models, Inference (on-demand/provisioned)

🧠

CloudTrail = 'Trail of breadcrumbs WHO walked WHERE' (audit) | CloudWatch = 'Watch WHAT is happening now' (operations) — never swap these for Bedrock compliance questions

🧠

RAG storage layers: 'S3 = Source documents (raw)' | 'Vector DB = Vectorized embeddings (searchable)' — S3 IN, Vector DB OUT at query time

🧠

Fine-tune → Must Provision: 'You FINE-tune your PROVISION-al athlete before the big game' — fine-tuned models always need Provisioned Throughput

🧠

Prompt engineering ladder (bottom to top, increasing power): Zero-shot → Few-shot → Chain-of-Thought → Fine-tuning → Continued Pre-training

🧠

Secrets Manager vs Parameter Store: 'ROTATE with Secrets Manager, STORE static with Parameter Store' — rotation keyword → Secrets Manager, always

CertAI Tutor · AIF-C01 · 2026-02-22

Ready to test your knowledge?

Practice AIF-C01 exam questions with AI-powered explanations — free to start.

Amazon Bedrock: The Generative AI Foundation Model Gateway

Overview

Key Features

Integration Patterns

Service Limits & Quotas

Pricing Model

Exam Tips

Common Misconceptions & Traps

Memory Tricks

Ready to test your knowledge?

Related Cheat Sheets