
Cargando...
Fully managed Retrieval-Augmented Generation that connects your private data to foundation models — no vector DB ops required.
Amazon Bedrock Knowledge Bases is a fully managed RAG (Retrieval-Augmented Generation) service that automatically ingests, chunks, embeds, and stores your documents in a vector store, then retrieves semantically relevant context at query time to ground foundation model responses in your private data. It eliminates the need to build and maintain custom pipelines for document processing, embedding generation, and vector search. Knowledge Bases integrates natively with Amazon Bedrock Agents, Guardrails, and Model Evaluation to form a complete, production-ready generative AI stack.
Enable foundation models to answer questions about private, proprietary, or up-to-date enterprise data without retraining or fine-tuning the model — using fully managed RAG.
Use When
Avoid When
Fully managed vector store provisioning (OpenSearch Serverless)
Auto-created if no BYOVS (Bring Your Own Vector Store) is specified
Bring Your Own Vector Store (BYOVS)
Aurora pgvector, Redis Enterprise, Pinecone, MongoDB Atlas, OpenSearch Managed Cluster
Automatic document chunking
Fixed-size, hierarchical, semantic, or no-chunking strategies
Semantic chunking via FM inference
Higher quality, higher ingestion cost
Hierarchical chunking (parent-child)
Retrieves child chunks, returns parent context to FM
Metadata filtering at query time
Supports equality, contains, and range filters
Source attribution / citations
RetrieveAndGenerate returns source document URIs with each response
RetrieveAndGenerate API (managed RAG)
Single API call handles retrieval + generation
Retrieve API (retrieval only)
Returns chunks without calling an FM — use for custom generation pipelines
Integration with Bedrock Agents
Agents can query a knowledge base as a built-in action type
Integration with Bedrock Guardrails
Apply content filtering and PII redaction to KB responses
Integration with Bedrock Model Evaluation
Evaluate retrieval quality and end-to-end RAG pipeline performance
Query reformulation / rewriting
Optional FM-based query rewriting to improve retrieval relevance
Hybrid search (semantic + keyword)
Available with supported vector stores; improves recall for exact-match queries
Re-ranking of retrieved results
Optional reranker model to reorder chunks by relevance before passing to FM
Web crawler data source
Crawl and ingest public web pages directly into a knowledge base
Confluence, Salesforce, SharePoint connectors
Managed connectors for enterprise content sources
Custom transformation via Lambda (custom chunking)
Bring your own chunking logic via a Lambda function
CloudWatch metrics and logging
Monitor ingestion jobs, retrieval latency, and token usage
AWS CloudTrail integration
All API calls logged for audit and compliance
VPC support / PrivateLink
Keep data off the public internet
Encryption at rest and in transit
KMS CMK support for customer-managed keys
Cross-region inference profiles
Route inference to multiple regions for resilience
Streaming responses from RetrieveAndGenerate
As of early 2026, streaming is not supported for RetrieveAndGenerate; use direct InvokeModel with streaming for streaming use cases
S3-Backed Document Corpus
high freqStore source documents in S3 and configure S3 as a data source. Trigger sync jobs via EventBridge Scheduler or on-demand. S3 is the most common and exam-relevant data source. Remember: S3 changes do NOT auto-trigger re-ingestion.
Agent with Knowledge Base Action
high freqAttach a knowledge base to a Bedrock Agent as a built-in knowledge base action type. The agent automatically decides when to query the KB based on the user's intent — no custom Lambda required for retrieval. This is the primary pattern for agentic RAG.
Safe RAG with Content Filtering
high freqApply Guardrails to RetrieveAndGenerate API calls to filter harmful content, redact PII from retrieved chunks before passing to the FM, and block responses that violate defined policies. Guardrails operate on both the retrieved context and the generated output.
RAG Pipeline Quality Assessment
high freqUse Bedrock Model Evaluation to assess the end-to-end quality of a knowledge base — measuring retrieval relevance, faithfulness (is the answer grounded in retrieved chunks?), and answer correctness. This is NOT traditional ML cross-validation; there is no train/test split.
Operational Observability
medium freqMonitor ingestion job status, retrieval latency, number of retrieved chunks, and FM token consumption via CloudWatch metrics and logs. Set alarms on ingestion failures or high retrieval latency to detect knowledge base health issues proactively.
Managed Vector Store (Default)
medium freqWhen no BYOVS is specified, Bedrock automatically provisions an OpenSearch Serverless collection as the vector store. This is the lowest-ops option but incurs OpenSearch Serverless OCU costs. Suitable for most enterprise RAG use cases.
Custom Chunking Pipeline
medium freqProvide a Lambda function as a custom transformation step during ingestion to implement proprietary chunking logic (e.g., chunk by section headers, extract tables separately). The Lambda receives raw document content and returns custom chunks.
BYOVS with Relational Metadata
low freqUse Aurora PostgreSQL with pgvector as the vector store when you need to combine vector similarity search with relational SQL queries or when you already operate Aurora for other workloads. Reduces operational sprawl vs. adding OpenSearch.
The RetrieveAndGenerate API consumes FM context window tokens for BOTH the retrieved chunks AND your prompt — you cannot use the full model context window (e.g., 200K tokens for Claude 3) exclusively for retrieved content. System prompts, conversation history, and reserved output tokens all reduce available space. Always account for prompt overhead when sizing numberOfResults.
Knowledge Bases evaluation via Bedrock Model Evaluation is NOT traditional ML model evaluation. There is no train/test data split, no cross-validation, and no concept of overfitting in the classical ML sense. Evaluation measures retrieval relevance (did we get the right chunks?) and faithfulness (is the answer grounded in the chunks?) — these are RAG-specific metrics, not ML training metrics.
S3 data sources do NOT auto-sync. When documents in S3 are added, updated, or deleted, you must explicitly call StartIngestionJob (or schedule it via EventBridge) to update the knowledge base. Exam questions will test whether you know this re-sync is manual/scheduled.
Knowledge Base evaluation is NOT ML model evaluation — no data splitting, no cross-validation, no overfitting. It measures retrieval relevance and answer faithfulness using RAG-specific metrics via Bedrock Model Evaluation.
You CANNOT use the full FM context window for retrieved chunks — system prompt, conversation history, query, and output reservation all reduce the usable token budget. Always calculate: Usable = Total - Overhead.
S3 data sources do NOT auto-sync to Knowledge Bases. You must explicitly trigger StartIngestionJob. Design an automation pattern (EventBridge or Lambda) if near-real-time freshness is required.
When an exam question asks how to scope a knowledge base query to a subset of documents (e.g., 'only search HR documents, not legal documents'), the answer is metadata filtering — attach metadata tags during ingestion and apply filter expressions at query time. Do NOT create separate knowledge bases just to isolate document sets when metadata filtering suffices.
Hierarchical chunking stores BOTH a parent chunk and smaller child chunks. At retrieval time, the system retrieves the most relevant child chunks but returns the parent chunk (with more context) to the FM. This improves answer quality for questions requiring broader context. Contrast with fixed-size chunking which retrieves exactly what was stored.
The Retrieve API (retrieval only, no generation) is the correct choice when you want to use Knowledge Bases as a semantic search engine and handle generation yourself — for example, to apply custom prompt engineering, use a model not supported by RetrieveAndGenerate, or implement streaming responses. Know when to use Retrieve vs. RetrieveAndGenerate.
DynamoDB is NOT a supported vector store for Bedrock Knowledge Bases. Supported options are: Amazon OpenSearch Serverless (default), Amazon Aurora (pgvector), Redis Enterprise Cloud, Pinecone, MongoDB Atlas, and Amazon OpenSearch Managed Cluster. DynamoDB appears as a distractor in exam questions.
You CANNOT change the embedding model after a knowledge base is created. The embedding model is locked at creation time because all stored vectors must be in the same embedding space. To change models, you must create a new knowledge base and re-ingest all documents.
Source attribution (citations) is a built-in feature of RetrieveAndGenerate — the API response includes the source document S3 URI and chunk text for each retrieved passage. This is critical for enterprise compliance and auditability use cases. You do NOT need custom code to implement citations.
Bedrock Agents with an attached Knowledge Base automatically decide WHEN to query the KB based on the agent's reasoning — you do not need to manually invoke the KB. The agent uses the KB description you provide to determine relevance. Write clear, specific KB descriptions to improve agent routing accuracy.
Common Mistake
Evaluating a Knowledge Base is like evaluating a trained ML model — you split data into training and test sets, run cross-validation, and check for overfitting.
Correct
Knowledge Bases are retrieval systems, not trained models. There is no training phase, no data splitting, and no overfitting in the classical ML sense. Evaluation focuses on retrieval quality metrics (precision, recall of relevant chunks), faithfulness (is the generated answer supported by retrieved content?), and answer correctness — all via Bedrock Model Evaluation using a curated Q&A evaluation dataset.
This is the #1 conceptual trap for candidates with ML backgrounds. The exam tests whether you understand that RAG evaluation is fundamentally different from supervised learning evaluation. Remember: Knowledge Bases RETRIEVE, they don't TRAIN.
Common Mistake
Increasing the number of retrieved chunks (numberOfResults) always improves answer quality because the FM has more information to work with.
Correct
More retrieved chunks consume more context window tokens, can dilute relevance with loosely related content, and can cause the FM to lose focus or exceed the context window limit entirely. Optimal numberOfResults is a tuning exercise — more is not always better. Quality of retrieved chunks (via better chunking strategy, reranking, or metadata filtering) matters more than quantity.
This misconception directly maps to the exam trap about 'more few-shot examples always improving performance' — the same principle applies to retrieved context. Token budget awareness is a core AIF-C01 competency.
Common Mistake
You can use the full context window of the FM (e.g., 200,000 tokens for Claude 3) for retrieved document chunks in a RetrieveAndGenerate call.
Correct
The effective token budget for retrieved chunks is the model's total context window MINUS tokens consumed by: the system prompt, conversation history, the user query itself, and tokens reserved for the model's output. In practice, the usable space for retrieved content is significantly less than the headline context window size.
Exam questions will present a scenario where a solution 'should work' based on the raw context window size, but fails in practice due to prompt overhead. Always think: Total Context = System Prompt + History + Query + Retrieved Chunks + Output. Budget accordingly.
Common Mistake
Knowledge Bases automatically update when you add or change files in the S3 data source bucket.
Correct
Knowledge Bases do NOT poll S3 for changes. You must explicitly trigger a sync by calling StartIngestionJob. This can be automated by configuring an S3 Event Notification to trigger a Lambda function that calls StartIngestionJob, or by scheduling sync jobs via Amazon EventBridge Scheduler.
A common architecture mistake and exam trap. Candidates assume 'managed' means 'automatic sync.' The knowledge base will serve stale data until you explicitly re-sync. This is architecturally important for designing up-to-date RAG systems.
Common Mistake
Amazon DynamoDB can be used as the vector store for a Bedrock Knowledge Base.
Correct
DynamoDB does NOT support vector similarity search and is NOT a supported vector store for Bedrock Knowledge Bases. Supported options are Amazon OpenSearch Serverless (default managed), Amazon Aurora with pgvector, Redis Enterprise Cloud, Pinecone, MongoDB Atlas, and Amazon OpenSearch Managed Cluster.
DynamoDB is the most popular AWS database service and appears frequently as a distractor in exam questions about Knowledge Bases. The key differentiator: vector stores require approximate nearest neighbor (ANN) search capabilities, which DynamoDB does not natively provide.
Common Mistake
Bedrock Knowledge Bases performs overfitting detection similar to how ML training pipelines monitor training vs. validation loss curves.
Correct
Knowledge Bases have no concept of overfitting because no model training occurs. The retrieval index is a static representation of your documents. 'Overfitting' in a RAG context would manifest as a knowledge base that retrieves very specific chunks that match training queries perfectly but fails on novel queries — this is addressed through chunking strategy tuning and query rewriting, not regularization or early stopping.
Exam questions may describe symptoms of poor generalization in a RAG system and ask you to identify the cause. The answer is never 'overfitting of the knowledge base' — it's more likely poor chunking, insufficient document coverage, or missing query reformulation.
RICE for RAG evaluation: Relevance (did we retrieve the right chunks?), Integration (are chunks coherent together?), Correctness (is the answer factually right?), Evidence (is the answer supported by retrieved text?) — NOT train/test splits.
SPEC for Knowledge Base creation decisions: Store (which vector store?), Parse (which chunking strategy?), Embed (which embedding model — cannot change later!), Connect (which data source?) — lock these in at design time.
Remember 'S3 is LAZY' — S3 data sources don't push changes to Knowledge Bases. YOU must pull (trigger StartIngestionJob). Lazy S3 = manual sync.
Token Budget Formula: USABLE = TOTAL - SYSTEM_PROMPT - HISTORY - QUERY - OUTPUT_RESERVE. Never assume USABLE = TOTAL.
CertAI Tutor · AIF-C01 · 2026-03-07
In the Same Category
Comparisons
Guides & Patterns