
Cargando...
Centralized, configurable content safety and responsible AI controls for foundation models — no custom code required.
Amazon Bedrock Guardrails is a managed safety layer that lets you implement customizable safeguards across all foundation models in Amazon Bedrock, including your own fine-tuned models and those accessed via the Converse API. It detects and blocks harmful content, filters topics, redacts PII, and grounds model responses — all configured through policy without writing custom filtering logic. Guardrails apply consistently at inference time and work across Bedrock Agents and Knowledge Bases, making them the authoritative solution for responsible AI governance in AWS.
Enforce consistent, centralized responsible-AI policies across every foundation model interaction in Bedrock — blocking harmful content, sensitive topics, PII, hallucinations, and prompt injection attacks without building custom filtering infrastructure.
Use When
Avoid When
Content Filters (Harmful Category Detection)
Six categories with independently tunable strength levels for both prompts and responses
Denied Topics
Semantic classifier — blocks topics by meaning, not just keywords; up to 30 topics per guardrail
Word Filters (Profanity & Custom Words)
Includes AWS managed profanity list plus up to 10,000 custom words/phrases
Sensitive Information / PII Redaction
Built-in PII types plus custom regex; actions: REDACT or BLOCK
Contextual Grounding Checks (Anti-Hallucination)
Scores response against provided source context; configurable threshold
Prompt Attack Detection (Prompt Injection)
Detects jailbreak and prompt injection attempts in user inputs
Apply to Bedrock Agents
Guardrails integrate natively with Bedrock Agents to protect agentic workflows
Apply to Bedrock Knowledge Bases
Filters both the query and the retrieved+generated response in RAG flows
Apply via ApplyGuardrail API (standalone)
Can be called independently of model inference — useful for validating external content or pre-screening user inputs
Model-agnostic (works across all Bedrock FMs)
Works with Anthropic Claude, Amazon Titan, Meta Llama, Mistral, and custom fine-tuned models in Bedrock
Versioning of Guardrail configurations
Guardrails support versioning — you can test a DRAFT version and promote to a numbered version for production
CloudWatch Metrics & Logging
Guardrail interventions are logged and can trigger CloudWatch alarms for compliance monitoring
Cross-Region inference support
Guardrails apply even when using cross-region inference profiles
Grounded RAG with Hallucination Blocking
high freqAttach a Guardrail with contextual grounding enabled to a Knowledge Base retrieval flow. The guardrail scores the model's response against retrieved document chunks and blocks or flags responses that are not grounded in the source material — the primary AWS-native anti-hallucination architecture.
Safe Agentic Workflow Enforcement
high freqAssign a Guardrail to a Bedrock Agent to filter both user instructions and agent-generated actions/responses. This prevents prompt injection attacks from hijacking agent tool calls and blocks harmful outputs from multi-step agentic reasoning chains.
Centralized Policy Across Multiple FMs
high freqCreate a single Guardrail and reference it in InvokeModel or Converse API calls across different foundation models (Claude, Titan, Llama, etc.). This ensures consistent content policy enforcement regardless of which model the application uses — critical for multi-model architectures.
Layered NLP Safety (Exam Distractor Pattern)
high freqComprehend can perform sentiment analysis and entity detection on text, but for generative AI safety in Bedrock, Guardrails is the correct integrated solution. Exam questions may offer Comprehend as an alternative — recognize that Guardrails is purpose-built for FM content safety and does not require custom Comprehend integration.
Compliance Monitoring & Alerting
medium freqEnable Guardrail metrics and model invocation logging to CloudWatch. Create alarms on guardrail intervention rates to detect abuse patterns, policy violations, or prompt injection campaigns in production applications.
Defense-in-Depth PII Protection (Exam Distractor Pattern)
medium freqMacie detects PII in S3 data at rest; Guardrails detects and redacts PII in real-time inference traffic. They solve different problems — Macie is not a substitute for Guardrails in a live chatbot, and Guardrails is not a substitute for Macie in data lake governance.
Guardrails are the ONLY correct answer for content safety in Bedrock — whenever a question asks how to prevent harmful outputs, filter PII, or block topics in a Bedrock application, Guardrails is the service. Do not pick Amazon Comprehend, Amazon Macie, or SageMaker Clarify as the primary solution.
Inference parameters (temperature, top-p, max tokens) control output style and length — they provide ZERO security or content safety guarantees. Any question suggesting inference parameters can prevent harmful content is a trap.
Contextual grounding checks require a SOURCE DOCUMENT to be provided at inference time. They compare the model's response against that document to detect hallucinations. Without a source context, grounding checks cannot function — this is critical for RAG architecture questions.
Guardrails work across ALL Bedrock foundation models including fine-tuned custom models — fine-tuning a model does NOT embed content safety. You still need Guardrails on top of a fine-tuned model for content filtering.
Inference parameters (temperature, top-p) provide ZERO content safety — Bedrock Guardrails is the ONLY correct answer for blocking harmful FM outputs, PII, or off-topic content in Bedrock applications.
Fine-tuning a model does NOT replace Guardrails — fine-tuning shapes behavior on training data, but adversarial user inputs can still elicit harmful outputs. Always layer Guardrails on top of any FM, fine-tuned or not.
Know the PII tool by context: Macie = PII in S3 at rest, Comprehend = NLP entity detection toolkit, Bedrock Guardrails = real-time PII redaction/blocking in FM inference traffic. Never substitute one for another on exam scenarios.
Denied topics use SEMANTIC understanding, not keyword matching. A topic definition written in plain English (e.g., 'Do not discuss competitor products') is evaluated by an ML classifier — you do not need to enumerate every possible keyword variation.
PII handling has TWO distinct actions — REDACT (mask the value with a placeholder, e.g., [NAME]) and BLOCK (reject the entire input or output). Know when each is appropriate: REDACT for logging/analytics use cases, BLOCK for strict compliance scenarios.
Guardrails support VERSIONING — a DRAFT version for testing and numbered versions (e.g., v1, v2) for production. Exam scenarios about safely rolling out policy changes should reference Guardrail versioning, not creating a new guardrail from scratch.
The standalone ApplyGuardrail API lets you evaluate content WITHOUT invoking a foundation model. This is useful for screening user-uploaded documents or pre-validating inputs before they enter an AI pipeline — a pattern that appears in architecture design questions.
Content filter strength is configured INDEPENDENTLY for inputs (prompts) and outputs (responses). A question about blocking violent content in user prompts but allowing it in model responses (e.g., for a creative writing app) is answered by setting different strength levels per direction.
Common Mistake
Inference parameters like temperature=0 or restrictive system prompts are sufficient security controls to prevent harmful AI outputs.
Correct
Inference parameters control output randomness and style — they are not safety mechanisms. A low temperature makes outputs more deterministic, not safer. System prompts can be bypassed by prompt injection. Only Bedrock Guardrails provides enforceable, policy-based content controls that cannot be overridden by user input.
This is the #1 trap on AIF-C01. Candidates conflate 'controlling the model' with 'securing the model.' Remember: inference parameters = quality knobs, Guardrails = security controls.
Common Mistake
Fine-tuning a foundation model on clean, curated data eliminates the need for content filtering guardrails.
Correct
Fine-tuning adapts a model's knowledge and style for a domain — it does not remove the model's ability to generate harmful content when prompted adversarially. Guardrails must be applied at inference time regardless of whether the underlying model was fine-tuned, because user inputs are unpredictable.
Exam questions often present fine-tuning as a complete responsible-AI solution. It is not. Fine-tuning + Guardrails is the correct pattern — they are complementary, not substitutes.
Common Mistake
Amazon Macie or Amazon Comprehend should be used to detect and redact PII in Bedrock chatbot responses.
Correct
Amazon Macie is designed to discover and protect PII stored in Amazon S3 — it does not process real-time inference traffic. Amazon Comprehend can detect entities in text but requires custom integration. Amazon Bedrock Guardrails provides native, zero-integration PII detection and redaction directly in the inference path — the correct choice for any Bedrock application.
AWS exams frequently list Macie and Comprehend as distractors for Bedrock PII scenarios. The key differentiator: Guardrails is in-line and real-time; Macie is batch/at-rest; Comprehend requires custom plumbing.
Common Mistake
SageMaker Clarify can be used to evaluate foundation model outputs for bias and fairness in real-time Bedrock applications.
Correct
SageMaker Clarify is designed for bias detection in traditional ML models and structured tabular datasets — it is not integrated with Bedrock foundation models for real-time output evaluation. For evaluating FM outputs, use Amazon Bedrock Model Evaluation. For preventing harmful outputs at inference time, use Bedrock Guardrails.
The Clarify vs. Bedrock Guardrails vs. Model Evaluation triangle is a common confusion zone on AIF-C01. Map them: Clarify = traditional ML bias, Model Evaluation = FM quality assessment, Guardrails = FM runtime safety.
Common Mistake
Denied topics in Guardrails work like a keyword blocklist — you must list every word or phrase related to the topic.
Correct
Denied topics use a semantic ML classifier trained on your topic definition (a plain-English description plus optional sample phrases). You describe the CONCEPT, not every possible phrasing. The classifier understands paraphrases, synonyms, and indirect references — making it far more robust than keyword filtering.
This misconception leads candidates to underestimate Guardrails' sophistication and overestimate the effort required to configure it. It also distinguishes Guardrails from simple word filters (which ARE exact-match) — know the difference between the two features.
Common Mistake
Provisioned Throughput in Bedrock is used for model evaluation and testing guardrail configurations.
Correct
Provisioned Throughput is a billing/capacity model for production inference workloads requiring consistent, low-latency performance. It has nothing to do with evaluation or guardrail testing. Use the DRAFT version of a Guardrail with on-demand inference for testing; use Bedrock Model Evaluation for systematic FM assessment.
Provisioned Throughput sounds like it could be related to 'throughput testing' of guardrails — it is not. It is purely a capacity reservation for production inference.
GUARD = Govern harmful content, Understand topics semantically, Anonymize PII, Reject prompt injections, Detect hallucinations — the five jobs of Bedrock Guardrails.
Think of Guardrails as a BOUNCER at the door of your FM: it checks everyone coming IN (prompts) and going OUT (responses), and it doesn't care which band (model) is playing inside.
Macie = S3 at REST, Comprehend = NLP toolkit, Guardrails = Bedrock REAL-TIME SAFETY. Three different tools, three different jobs — never swap them on the exam.
CertAI Tutor · AIF-C01 · 2026-03-07
In the Same Category
Comparisons
Guides & Patterns