ml aiAIF-C01

Amazon Bedrock Knowledge Bases: RAG Without the Plumbing

Fully managed Retrieval-Augmented Generation that connects your private data to foundation models — no vector DB ops required.

Updated 2026-03-07

Overview

Amazon Bedrock Knowledge Bases is a fully managed RAG (Retrieval-Augmented Generation) service that automatically ingests, chunks, embeds, and stores your documents in a vector store, then retrieves semantically relevant context at query time to ground foundation model responses in your private data. It eliminates the need to build and maintain custom pipelines for document processing, embedding generation, and vector search. Knowledge Bases integrates natively with Amazon Bedrock Agents, Guardrails, and Model Evaluation to form a complete, production-ready generative AI stack.

Enable foundation models to answer questions about private, proprietary, or up-to-date enterprise data without retraining or fine-tuning the model — using fully managed RAG.

Use When

Building an enterprise Q&A chatbot over internal documents (HR policies, legal contracts, technical manuals) where the FM must cite specific, current information.
Augmenting an Amazon Bedrock Agent with domain-specific knowledge so the agent can look up facts before deciding which tool to invoke.
Reducing FM hallucinations by grounding responses in retrieved document chunks rather than relying solely on parametric model knowledge.
Rapidly prototyping a semantic search experience over S3-hosted documents without managing OpenSearch, Pinecone, or custom embedding pipelines.

Avoid When

When you need real-time structured data queries (e.g., live database lookups) — use Amazon Bedrock Agents with an AWS Lambda action group connected to RDS or DynamoDB instead.
When your use case is pure text generation with no private data context — a direct Bedrock InvokeModel call is simpler and cheaper without the retrieval overhead.
When you require sub-10ms latency at massive scale — managed vector search adds retrieval latency; a self-managed, highly tuned vector DB may be more appropriate.
When your documents change every few seconds — Knowledge Bases sync is batch-oriented; streaming real-time data ingestion is not its design target.

Key Features

Fully managed vector store provisioning (OpenSearch Serverless)

Auto-created if no BYOVS (Bring Your Own Vector Store) is specified

Bring Your Own Vector Store (BYOVS)

Aurora pgvector, Redis Enterprise, Pinecone, MongoDB Atlas, OpenSearch Managed Cluster

Automatic document chunking

Fixed-size, hierarchical, semantic, or no-chunking strategies

Semantic chunking via FM inference

Higher quality, higher ingestion cost

Hierarchical chunking (parent-child)

Retrieves child chunks, returns parent context to FM

Metadata filtering at query time

Supports equality, contains, and range filters

Source attribution / citations

RetrieveAndGenerate returns source document URIs with each response

RetrieveAndGenerate API (managed RAG)

Single API call handles retrieval + generation

Retrieve API (retrieval only)

Returns chunks without calling an FM — use for custom generation pipelines

Integration with Bedrock Agents

Agents can query a knowledge base as a built-in action type

Integration with Bedrock Guardrails

Apply content filtering and PII redaction to KB responses

Integration with Bedrock Model Evaluation

Evaluate retrieval quality and end-to-end RAG pipeline performance

Query reformulation / rewriting

Optional FM-based query rewriting to improve retrieval relevance

Hybrid search (semantic + keyword)

Available with supported vector stores; improves recall for exact-match queries

Re-ranking of retrieved results

Optional reranker model to reorder chunks by relevance before passing to FM

Web crawler data source

Crawl and ingest public web pages directly into a knowledge base

Confluence, Salesforce, SharePoint connectors

Managed connectors for enterprise content sources

Custom transformation via Lambda (custom chunking)

Bring your own chunking logic via a Lambda function

CloudWatch metrics and logging

Monitor ingestion jobs, retrieval latency, and token usage

AWS CloudTrail integration

All API calls logged for audit and compliance

VPC support / PrivateLink

Keep data off the public internet

Encryption at rest and in transit

KMS CMK support for customer-managed keys

Cross-region inference profiles

Route inference to multiple regions for resilience

Streaming responses from RetrieveAndGenerate

As of early 2026, streaming is not supported for RetrieveAndGenerate; use direct InvokeModel with streaming for streaming use cases

Integration Patterns

S3-Backed Document Corpus

high freq

Amazon Bedrock Knowledge BasesAmazon S3

Store source documents in S3 and configure S3 as a data source. Trigger sync jobs via EventBridge Scheduler or on-demand. S3 is the most common and exam-relevant data source. Remember: S3 changes do NOT auto-trigger re-ingestion.

Agent with Knowledge Base Action

high freq

Amazon Bedrock Knowledge BasesAmazon Bedrock Agents

Attach a knowledge base to a Bedrock Agent as a built-in knowledge base action type. The agent automatically decides when to query the KB based on the user's intent — no custom Lambda required for retrieval. This is the primary pattern for agentic RAG.

Safe RAG with Content Filtering

high freq

Amazon Bedrock Knowledge BasesAmazon Bedrock Guardrails

Apply Guardrails to RetrieveAndGenerate API calls to filter harmful content, redact PII from retrieved chunks before passing to the FM, and block responses that violate defined policies. Guardrails operate on both the retrieved context and the generated output.

RAG Pipeline Quality Assessment

high freq

Amazon Bedrock Knowledge BasesAmazon Bedrock Model Evaluation

Use Bedrock Model Evaluation to assess the end-to-end quality of a knowledge base — measuring retrieval relevance, faithfulness (is the answer grounded in retrieved chunks?), and answer correctness. This is NOT traditional ML cross-validation; there is no train/test split.

Operational Observability

medium freq

Amazon Bedrock Knowledge BasesAmazon CloudWatch

Monitor ingestion job status, retrieval latency, number of retrieved chunks, and FM token consumption via CloudWatch metrics and logs. Set alarms on ingestion failures or high retrieval latency to detect knowledge base health issues proactively.

Managed Vector Store (Default)

medium freq

Amazon Bedrock Knowledge BasesAmazon OpenSearch Serverless

When no BYOVS is specified, Bedrock automatically provisions an OpenSearch Serverless collection as the vector store. This is the lowest-ops option but incurs OpenSearch Serverless OCU costs. Suitable for most enterprise RAG use cases.

Custom Chunking Pipeline

medium freq

Amazon Bedrock Knowledge BasesAWS Lambda

Provide a Lambda function as a custom transformation step during ingestion to implement proprietary chunking logic (e.g., chunk by section headers, extract tables separately). The Lambda receives raw document content and returns custom chunks.

BYOVS with Relational Metadata

low freq

Amazon Bedrock Knowledge BasesAmazon Aurora (pgvector)

Use Aurora PostgreSQL with pgvector as the vector store when you need to combine vector similarity search with relational SQL queries or when you already operate Aurora for other workloads. Reduces operational sprawl vs. adding OpenSearch.

Service Limits & Quotas

LimitValueNote

Maximum number of knowledge bases per account per region

Refer to the AWS Service Quotas console for your account's current limit knowledge bases

AWS documentation does not publish a single universal hard number here — candidates who memorize a specific number from third-party sources are often wrong.

Maximum data sources per knowledge base

Refer to the AWS Service Quotas console for your account's current limit data sources

Candidates often assume one knowledge base = one S3 bucket; multiple data sources per KB is supported.

Supported vector stores (managed options)

Amazon OpenSearch Serverless, Amazon Aurora (pgvector), Redis Enterprise Cloud, Pinecone, MongoDB Atlas, Amazon OpenSearch Managed Cluster options

DynamoDB is NOT a supported vector store for Knowledge Bases — a common wrong answer in exam distractors.

Supported embedding models

Amazon Titan Embeddings, Cohere Embed (English and Multilingual) — model availability subject to regional availability models

You cannot change the embedding model after a knowledge base is created without rebuilding it — this is a critical architectural decision.

Supported source document types

PDF, plain text (.txt), Microsoft Word (.docx), Microsoft PowerPoint (.pptx), HTML, CSV, Markdown (.md), XML file types

Images within PDFs are not extracted for semantic search by default; multimodal retrieval requires separate configuration.

Chunking strategies available

Default (fixed-size), Fixed-size, Hierarchical, Semantic, No chunking (one chunk per document) strategies

Hierarchical chunking stores both parent and child chunks, enabling child retrieval with parent context — frequently tested in architecture questions.

Maximum ingestion job file size (S3 object)

Refer to the AWS Service Quotas console for your account's current limit MB per file

Total knowledge base size limits and per-file limits are separate quotas — do not conflate them.

RetrieveAndGenerate API — context window consideration

The retrieved chunks plus your prompt must fit within the chosen FM's context window (e.g., ~200K tokens for Claude 3 models, but usable space is reduced by system prompt, conversation history, and output reservation) tokens (model-dependent)

You cannot use the full 200K token context window for retrieved content alone — system prompts, conversation history, and reserved output tokens all reduce available space. This is a top exam trap.

Number of results returned per retrieval (numberOfResults)

Configurable; default is typically 5, maximum varies by vector store and query type — check Service Quotas chunks

More results do NOT always improve answer quality — they can dilute relevance and exceed context limits. This directly maps to the 'more few-shot examples always help' misconception.

Metadata filtering

Supported — filter retrieved chunks by document metadata attributes at query time feature

Metadata must be defined and attached to documents during ingestion; you cannot add metadata post-ingestion without re-syncing.

Sync (ingestion) job concurrency

Refer to the AWS Service Quotas console for your account's current limit concurrent sync jobs

Knowledge Bases do NOT auto-sync when S3 objects change; you must explicitly trigger a sync job or schedule one via EventBridge.

Pricing Model

Pay-per-use across three dimensions: embedding model inference (during ingestion), vector store storage and query costs, and FM inference (during generation).

Embedding costs are incurred during ingestion (when documents are chunked and embedded) and at query time (when the query itself is embedded for vector search) — both are billed per token processed by the embedding model.
If you use the Amazon-managed OpenSearch Serverless vector store, you pay standard OpenSearch Serverless OCU (OpenSearch Compute Unit) pricing for indexing and search — this can be a significant cost driver at scale.
FM generation costs during RetrieveAndGenerate are billed at standard Bedrock on-demand inference rates for the chosen model — retrieved chunks count as input tokens.
Semantic chunking incurs additional FM inference costs during ingestion because an FM is called to determine chunk boundaries — factor this into ingestion cost estimates for large document sets.
There is no charge for the Knowledge Bases control plane itself — you only pay for the underlying compute (embedding, FM inference) and storage (vector store) resources consumed.
Re-ranking (if enabled) adds additional FM inference costs per query — evaluate the quality-cost tradeoff before enabling in production.

Exam Tips

criticalContext window management, token budgeting

The RetrieveAndGenerate API consumes FM context window tokens for BOTH the retrieved chunks AND your prompt — you cannot use the full model context window (e.g., 200K tokens for Claude 3) exclusively for retrieved content. System prompts, conversation history, and reserved output tokens all reduce available space. Always account for prompt overhead when sizing numberOfResults.

criticalRAG evaluation, Bedrock Model Evaluation

Knowledge Bases evaluation via Bedrock Model Evaluation is NOT traditional ML model evaluation. There is no train/test data split, no cross-validation, and no concept of overfitting in the classical ML sense. Evaluation measures retrieval relevance (did we get the right chunks?) and faithfulness (is the answer grounded in the chunks?) — these are RAG-specific metrics, not ML training metrics.

criticalData ingestion, sync lifecycle

S3 data sources do NOT auto-sync. When documents in S3 are added, updated, or deleted, you must explicitly call StartIngestionJob (or schedule it via EventBridge) to update the knowledge base. Exam questions will test whether you know this re-sync is manual/scheduled.

critical

Knowledge Base evaluation is NOT ML model evaluation — no data splitting, no cross-validation, no overfitting. It measures retrieval relevance and answer faithfulness using RAG-specific metrics via Bedrock Model Evaluation.

critical

You CANNOT use the full FM context window for retrieved chunks — system prompt, conversation history, query, and output reservation all reduce the usable token budget. Always calculate: Usable = Total - Overhead.

critical

S3 data sources do NOT auto-sync to Knowledge Bases. You must explicitly trigger StartIngestionJob. Design an automation pattern (EventBridge or Lambda) if near-real-time freshness is required.

importantMetadata filtering, query-time filtering

When an exam question asks how to scope a knowledge base query to a subset of documents (e.g., 'only search HR documents, not legal documents'), the answer is metadata filtering — attach metadata tags during ingestion and apply filter expressions at query time. Do NOT create separate knowledge bases just to isolate document sets when metadata filtering suffices.

importantChunking strategies, retrieval quality

Hierarchical chunking stores BOTH a parent chunk and smaller child chunks. At retrieval time, the system retrieves the most relevant child chunks but returns the parent chunk (with more context) to the FM. This improves answer quality for questions requiring broader context. Contrast with fixed-size chunking which retrieves exactly what was stored.

importantAPI design, custom RAG pipelines

The Retrieve API (retrieval only, no generation) is the correct choice when you want to use Knowledge Bases as a semantic search engine and handle generation yourself — for example, to apply custom prompt engineering, use a model not supported by RetrieveAndGenerate, or implement streaming responses. Know when to use Retrieve vs. RetrieveAndGenerate.

importantVector store options, BYOVS

DynamoDB is NOT a supported vector store for Bedrock Knowledge Bases. Supported options are: Amazon OpenSearch Serverless (default), Amazon Aurora (pgvector), Redis Enterprise Cloud, Pinecone, MongoDB Atlas, and Amazon OpenSearch Managed Cluster. DynamoDB appears as a distractor in exam questions.

importantEmbedding model selection, knowledge base lifecycle

You CANNOT change the embedding model after a knowledge base is created. The embedding model is locked at creation time because all stored vectors must be in the same embedding space. To change models, you must create a new knowledge base and re-ingest all documents.

Good to KnowSource attribution, compliance, auditability

Source attribution (citations) is a built-in feature of RetrieveAndGenerate — the API response includes the source document S3 URI and chunk text for each retrieved passage. This is critical for enterprise compliance and auditability use cases. You do NOT need custom code to implement citations.

Good to KnowBedrock Agents, agentic RAG, KB description

Bedrock Agents with an attached Knowledge Base automatically decide WHEN to query the KB based on the agent's reasoning — you do not need to manually invoke the KB. The agent uses the KB description you provide to determine relevance. Write clear, specific KB descriptions to improve agent routing accuracy.

Common Misconceptions & Traps

Common Mistake

Evaluating a Knowledge Base is like evaluating a trained ML model — you split data into training and test sets, run cross-validation, and check for overfitting.

Correct

Knowledge Bases are retrieval systems, not trained models. There is no training phase, no data splitting, and no overfitting in the classical ML sense. Evaluation focuses on retrieval quality metrics (precision, recall of relevant chunks), faithfulness (is the generated answer supported by retrieved content?), and answer correctness — all via Bedrock Model Evaluation using a curated Q&A evaluation dataset.

This is the #1 conceptual trap for candidates with ML backgrounds. The exam tests whether you understand that RAG evaluation is fundamentally different from supervised learning evaluation. Remember: Knowledge Bases RETRIEVE, they don't TRAIN.

Common Mistake

Increasing the number of retrieved chunks (numberOfResults) always improves answer quality because the FM has more information to work with.

Correct

More retrieved chunks consume more context window tokens, can dilute relevance with loosely related content, and can cause the FM to lose focus or exceed the context window limit entirely. Optimal numberOfResults is a tuning exercise — more is not always better. Quality of retrieved chunks (via better chunking strategy, reranking, or metadata filtering) matters more than quantity.

This misconception directly maps to the exam trap about 'more few-shot examples always improving performance' — the same principle applies to retrieved context. Token budget awareness is a core AIF-C01 competency.

Common Mistake

You can use the full context window of the FM (e.g., 200,000 tokens for Claude 3) for retrieved document chunks in a RetrieveAndGenerate call.

Correct

The effective token budget for retrieved chunks is the model's total context window MINUS tokens consumed by: the system prompt, conversation history, the user query itself, and tokens reserved for the model's output. In practice, the usable space for retrieved content is significantly less than the headline context window size.

Exam questions will present a scenario where a solution 'should work' based on the raw context window size, but fails in practice due to prompt overhead. Always think: Total Context = System Prompt + History + Query + Retrieved Chunks + Output. Budget accordingly.

Common Mistake

Knowledge Bases automatically update when you add or change files in the S3 data source bucket.

Correct

Knowledge Bases do NOT poll S3 for changes. You must explicitly trigger a sync by calling StartIngestionJob. This can be automated by configuring an S3 Event Notification to trigger a Lambda function that calls StartIngestionJob, or by scheduling sync jobs via Amazon EventBridge Scheduler.

A common architecture mistake and exam trap. Candidates assume 'managed' means 'automatic sync.' The knowledge base will serve stale data until you explicitly re-sync. This is architecturally important for designing up-to-date RAG systems.

Common Mistake

Amazon DynamoDB can be used as the vector store for a Bedrock Knowledge Base.

Correct

DynamoDB does NOT support vector similarity search and is NOT a supported vector store for Bedrock Knowledge Bases. Supported options are Amazon OpenSearch Serverless (default managed), Amazon Aurora with pgvector, Redis Enterprise Cloud, Pinecone, MongoDB Atlas, and Amazon OpenSearch Managed Cluster.

DynamoDB is the most popular AWS database service and appears frequently as a distractor in exam questions about Knowledge Bases. The key differentiator: vector stores require approximate nearest neighbor (ANN) search capabilities, which DynamoDB does not natively provide.

Common Mistake

Bedrock Knowledge Bases performs overfitting detection similar to how ML training pipelines monitor training vs. validation loss curves.

Correct

Knowledge Bases have no concept of overfitting because no model training occurs. The retrieval index is a static representation of your documents. 'Overfitting' in a RAG context would manifest as a knowledge base that retrieves very specific chunks that match training queries perfectly but fails on novel queries — this is addressed through chunking strategy tuning and query rewriting, not regularization or early stopping.

Exam questions may describe symptoms of poor generalization in a RAG system and ask you to identify the cause. The answer is never 'overfitting of the knowledge base' — it's more likely poor chunking, insufficient document coverage, or missing query reformulation.

Memory Tricks

🧠

RICE for RAG evaluation: Relevance (did we retrieve the right chunks?), Integration (are chunks coherent together?), Correctness (is the answer factually right?), Evidence (is the answer supported by retrieved text?) — NOT train/test splits.

🧠

SPEC for Knowledge Base creation decisions: Store (which vector store?), Parse (which chunking strategy?), Embed (which embedding model — cannot change later!), Connect (which data source?) — lock these in at design time.

🧠

Remember 'S3 is LAZY' — S3 data sources don't push changes to Knowledge Bases. YOU must pull (trigger StartIngestionJob). Lazy S3 = manual sync.

🧠

Token Budget Formula: USABLE = TOTAL - SYSTEM_PROMPT - HISTORY - QUERY - OUTPUT_RESERVE. Never assume USABLE = TOTAL.

CertAI Tutor · AIF-C01 · 2026-03-07

Ready to test your knowledge?

Practice AIF-C01 exam questions with AI-powered explanations — free to start.

Amazon Bedrock Knowledge Bases: RAG Without the Plumbing

Overview

Key Features

Integration Patterns

Service Limits & Quotas

Pricing Model

Exam Tips

Common Misconceptions & Traps

Memory Tricks

Ready to test your knowledge?

Related Cheat Sheets