
Cargando...
Build, train, tune, deploy, and monitor machine learning models at scale — without managing infrastructure
Amazon SageMaker is a fully managed ML platform that covers the entire machine learning lifecycle: data labeling, feature engineering, model training, hyperparameter tuning, bias detection, model deployment, and production monitoring. It abstracts away infrastructure complexity so data scientists and ML engineers can focus on building models rather than managing servers. SageMaker integrates deeply with the broader AWS ecosystem, making it the central hub for enterprise-grade ML workloads on AWS.
Accelerate the end-to-end machine learning workflow — from raw data to production-grade, monitored, and governed ML models — on a single managed platform
Use When
Avoid When
SageMaker Studio (unified IDE)
Successor to classic notebook instances; JupyterLab-based, collaborative
SageMaker Autopilot (AutoML)
Automatically trains and tunes models; provides full visibility into generated code — not a black box
SageMaker Clarify (Bias & Explainability)
Detects pre-training and post-training bias; generates SHAP-based feature attribution reports
SageMaker Feature Store
Centralized repository for ML features; online (real-time) and offline (batch) stores
SageMaker Pipelines (MLOps)
ML-native CI/CD pipeline with lineage tracking, model versioning, and conditional steps
SageMaker Model Registry
Version, approve, and deploy models; integrates with CI/CD for gated deployments
SageMaker Model Monitor
Detects data drift, model quality drift, bias drift, and feature attribution drift in production
SageMaker Ground Truth (Data Labeling)
Human + automated labeling with active learning to reduce labeling costs
SageMaker JumpStart
Pre-trained models and solution templates; one-click deployment of foundation models
SageMaker Canvas (No-Code ML)
Visual, no-code ML for business analysts; uses Autopilot under the hood
Managed Spot Training
Uses EC2 Spot Instances for training; requires S3 checkpointing; saves up to 90% on training costs
Distributed Training (Data/Model Parallel)
SageMaker Distributed Library splits training across multiple GPUs/nodes
Real-Time Inference Endpoints
Persistent endpoints with auto scaling; supports A/B testing via production variants
Serverless Inference
Pay-per-invocation; ideal for intermittent traffic; max 6 GB memory, 200 concurrent invocations
Asynchronous Inference
For large payloads (up to 1 GB) or long processing times; results written to S3
Batch Transform
Offline bulk inference from S3; no persistent endpoint; cost-effective for non-real-time use cases
SageMaker Experiments
Tracks training runs, metrics, and parameters for comparison and reproducibility
SageMaker Debugger
Real-time profiling and debugging of training jobs; detects vanishing gradients, overfitting, etc.
SageMaker Neo (Model Optimization)
Compiles models for edge devices and cloud targets; optimizes inference speed
Amazon Augmented AI (A2I)
Human review workflow for ML predictions that fall below confidence thresholds
SageMaker Data Wrangler
Visual data preparation and feature engineering; 300+ built-in transforms
SageMaker Lineage Tracking
Automatically tracks ML artifact lineage (data → training → model → endpoint)
Multi-Model Endpoints
Host thousands of models on a single endpoint; models loaded/unloaded dynamically
Multi-Container Endpoints
Run multiple containers on one endpoint for different frameworks; reduces cost
SageMaker Role Manager
Simplifies IAM role creation for ML personas (data scientist, MLOps engineer, etc.)
VPC Integration (Private Endpoints)
Training and inference can run in VPC; use VPC endpoints for S3 access without internet
Encryption at Rest and In Transit
KMS encryption for S3 artifacts, EBS volumes, and inter-node training traffic
Training Data and Model Artifact Store
high freqS3 is the universal data lake for SageMaker — training data is read from S3, model artifacts (model.tar.gz) are written to S3, and Batch Transform reads input/writes output to S3. Always use S3 VPC endpoints in production to keep traffic off the public internet.
Custom Fine-Tuning vs. Managed Foundation Models
high freqSageMaker is used to fine-tune foundation models with custom datasets when Bedrock's built-in fine-tuning doesn't meet requirements, or when you need full control over the training process. Bedrock is preferred for API-based access to FMs without infrastructure management. The boundary: custom training → SageMaker; managed FM inference → Bedrock.
Lightweight Inference Trigger
high freqLambda functions invoke SageMaker real-time endpoints via the SageMaker Runtime API (InvokeEndpoint). This pattern is used for event-driven inference — e.g., S3 upload triggers Lambda which calls SageMaker for classification. Lambda cannot host ML models directly for complex inference due to 15-minute timeout and 10 GB memory limits.
Infrastructure and Training Metrics Monitoring
high freqCloudWatch collects SageMaker training metrics (loss, accuracy emitted via print statements or SDK), endpoint invocation metrics (latency, error rate, invocations), and instance-level metrics (CPU, GPU, memory). CloudWatch CANNOT detect model bias, data drift, or algorithmic fairness — that requires SageMaker Clarify and Model Monitor.
API Audit Logging for Governance
high freqCloudTrail logs all SageMaker API calls (CreateTrainingJob, CreateEndpoint, DeleteModel, etc.) for security auditing, compliance, and change tracking. Required for regulated industries. CloudTrail records WHO did WHAT to SageMaker resources — not model performance or bias.
NLP Pipeline: Custom vs. Pre-Built
high freqUse Amazon Comprehend for pre-built NLP tasks (entity recognition, sentiment, key phrases, topic modeling) without training a model. Use SageMaker when you need a custom NLP model trained on domain-specific data that Comprehend's pre-built models don't handle accurately. SageMaker + Comprehend can be combined: Comprehend pre-processes text, SageMaker handles custom classification.
Bias Detection and Explainability in ML Pipelines
high freqSageMaker Clarify runs as a processing job within SageMaker Pipelines to detect pre-training bias (in data) and post-training bias (in model predictions). It generates bias reports and SHAP-based explainability reports. This is the ONLY AWS-native tool for algorithmic fairness — not CloudWatch, not Comprehend, not GuardDuty.
ETL-to-Training Pipeline
high freqAWS Glue performs data cataloging, ETL transformations, and writes processed data to S3. SageMaker then reads from S3 for training. For feature engineering at scale, SageMaker Data Wrangler (interactive) or SageMaker Processing Jobs (programmatic, using Spark/sklearn) can replace or complement Glue.
Feature Reuse Across Teams and Models
high freqFeature Store centralizes feature computation so multiple teams and models share consistent features. Online store (DynamoDB-backed) serves real-time inference; offline store (S3-backed with Glue catalog) serves training. Eliminates training-serving skew — a critical production ML problem.
Human-in-the-Loop Review
high freqA2I routes low-confidence ML predictions to human reviewers via Amazon Mechanical Turk, private workforce, or vendor workforce. Integrates with SageMaker endpoints, Rekognition, and Textract. Used for responsible AI workflows where high-stakes decisions require human validation.
SageMaker Clarify is the ONLY correct answer for detecting algorithmic bias and model explainability. CloudWatch monitors infrastructure metrics (CPU, latency, errors) — it has ZERO capability to detect bias, fairness issues, or model drift. If an exam question asks how to detect bias in ML predictions, the answer is always Clarify, never CloudWatch.
Know the four inference modes and when to use each: (1) Real-time endpoints — low latency, persistent traffic; (2) Serverless Inference — intermittent traffic, no idle cost, max 6 GB RAM, 200 concurrent; (3) Asynchronous Inference — large payloads up to 1 GB, long processing; (4) Batch Transform — offline bulk inference from S3, no endpoint. Exam scenarios will describe traffic patterns and ask you to choose.
Managed Spot Training can reduce training costs by up to 90% but REQUIRES S3 checkpointing. If a training job is interrupted (Spot reclamation), SageMaker resumes from the last checkpoint. Without checkpointing, the entire job restarts. Exam questions about cost optimization for training will test this.
Amazon Augmented AI (A2I) is the correct answer for human review of ML predictions — NOT SageMaker Ground Truth. Ground Truth is for labeling NEW training data. A2I is for reviewing PRODUCTION predictions that fall below a confidence threshold. Confusing these two is a common exam trap.
Post-processing filters (e.g., thresholding outputs, filtering predictions) do NOT fix bias — they mask it. The correct approach is to detect bias with Clarify pre-training (fix the data) or post-training (re-train with balanced data or apply algorithmic debiasing). Exam questions about responsible AI will test whether you know the difference between hiding bias and addressing it.
CloudWatch CANNOT detect bias — only SageMaker Clarify can. If any exam question asks about detecting algorithmic bias, model fairness, or explainability, the answer involves Clarify, never CloudWatch. This single fact can save you multiple questions on AIF-C01.
Know all four inference modes by their traffic pattern: Real-time (persistent, low-latency), Serverless (intermittent, no idle cost), Async (large payload, long processing), Batch Transform (offline bulk from S3, no endpoint). Exam scenarios describe the pattern — you identify the mode.
Post-processing output filters do NOT fix model bias — they hide it. Fixing bias requires addressing the root cause: rebalancing training data, re-weighting samples, or applying algorithmic fairness constraints, then re-training. Clarify detects; data/training changes fix. This is tested heavily on AIF-C01 responsible AI questions.
SageMaker Autopilot is AutoML — it automatically selects algorithms, preprocesses data, and tunes hyperparameters, but it is NOT a black box. It generates the actual Python code for the best pipeline, which you can inspect, modify, and retrain. This distinguishes it from fully opaque AutoML services.
SageMaker Feature Store solves training-serving skew by ensuring the same feature transformations are used in both training (offline store → S3) and inference (online store → DynamoDB). If an exam question describes inconsistent model performance between training and production, Feature Store is the architectural solution.
SageMaker Model Monitor has four monitor types: (1) Data Quality — detects statistical drift in input features; (2) Model Quality — detects accuracy/F1 drift using ground truth labels; (3) Bias Drift — detects fairness metric changes post-deployment; (4) Feature Attribution Drift — detects SHAP value changes. Each requires a separate monitoring schedule.
SageMaker Ground Truth uses active learning: a small human-labeled dataset trains an automated labeling model, which labels high-confidence examples automatically and routes uncertain examples back to humans. This iteratively reduces human labeling work. Exam questions about cost-effective data labeling at scale point to Ground Truth.
Multi-Model Endpoints allow hosting thousands of models on a single endpoint with dynamic loading/unloading. Use this pattern when you have many similar models (e.g., one per customer) and don't need all models loaded simultaneously. This dramatically reduces endpoint costs versus one endpoint per model.
SageMaker JumpStart provides pre-trained foundation models (including open-source LLMs like Llama, Falcon) that can be deployed with one click or fine-tuned. The key distinction from Bedrock: JumpStart models run on your own SageMaker infrastructure (you manage instances); Bedrock is fully managed and serverless.
SageMaker Pipelines (not AWS Step Functions) is the ML-native answer for automating ML workflows with lineage tracking, conditional execution, and model approval gates. Step Functions can orchestrate SageMaker jobs but lacks native ML lineage tracking. For pure ML CI/CD pipelines, Pipelines is the preferred answer.
SageMaker endpoints support production variants for A/B testing — you can split traffic between model versions (e.g., 90% to v1, 10% to v2) on a SINGLE endpoint. You do NOT need separate endpoints for A/B testing. Shadow variants allow testing a new model on real traffic without serving its predictions to users.
Common Mistake
CloudWatch can detect algorithmic bias and model fairness issues in SageMaker models
Correct
CloudWatch only monitors infrastructure metrics (CPU utilization, memory, latency, error rates, invocation counts). It has absolutely no capability to analyze model predictions for bias, fairness, or discrimination. SageMaker Clarify is the ONLY AWS service that detects algorithmic bias using statistical fairness metrics.
This is the #1 trap on AIF-C01 and appears frequently in SAA-C03 ML scenario questions. Remember: CloudWatch = infrastructure health; Clarify = model fairness. If a question mentions bias detection or explainability, eliminate any CloudWatch answer immediately.
Common Mistake
Content filtering and bias detection are the same thing — filtering model outputs prevents bias
Correct
Content filtering (e.g., blocking toxic outputs, profanity filters) is a safety mechanism that prevents harmful content from being shown to users. Bias detection (SageMaker Clarify) identifies systematic unfairness in how a model treats different demographic groups. Filtering outputs does NOT address the root cause of bias in training data or model weights — it only hides the symptom.
AIF-C01 specifically tests responsible AI concepts. A model that produces biased loan decisions for minority groups cannot be 'fixed' by filtering — the bias is in the model's learned parameters. The correct fix is data rebalancing, re-weighting, or algorithmic fairness constraints during training.
Common Mistake
Having a large, diverse dataset guarantees a fair and unbiased ML model
Correct
Data volume and diversity are necessary but not sufficient for fairness. A large dataset can still encode historical societal biases (e.g., historical hiring data that discriminated against women). The model will learn and amplify these patterns regardless of dataset size. Bias detection with Clarify must be performed on the data AND the trained model, and fairness metrics must be explicitly measured.
This misconception is explicitly tested on AIF-C01. The correct mental model: more data reduces variance but doesn't eliminate systematic bias. You must actively measure fairness metrics (demographic parity, equalized odds, etc.) — not assume them.
Common Mistake
SageMaker Ground Truth and Amazon A2I (Augmented AI) do the same thing
Correct
Ground Truth is for labeling TRAINING DATA — it creates labeled datasets for model training using human labelers + active learning. A2I is for human review of PRODUCTION PREDICTIONS — it routes low-confidence model outputs to human reviewers for validation before acting on them. These serve completely different stages of the ML lifecycle.
Exam questions describe a scenario (e.g., 'a deployed model returns predictions with low confidence and needs human review') and offer both as options. The temporal context is the key: pre-training → Ground Truth; post-deployment → A2I.
Common Mistake
Amazon Bedrock and SageMaker JumpStart are interchangeable for foundation model deployment
Correct
Bedrock is a fully managed, serverless API for foundation models — you never see or manage any infrastructure, and you cannot access model weights. SageMaker JumpStart deploys foundation models onto SageMaker managed instances (you choose instance type, you pay for running instances) and gives you full access to fine-tune model weights. Bedrock = managed FM API; JumpStart = self-managed FM deployment with fine-tuning access.
As Bedrock grows in prominence (especially on AIF-C01), this distinction is increasingly tested. Key differentiator: if the question mentions 'no infrastructure management' or 'API access to FMs' → Bedrock. If it mentions 'fine-tuning with custom data' or 'deploying open-source LLMs' → JumpStart/SageMaker.
Common Mistake
SageMaker Batch Transform keeps an endpoint running and processes data continuously
Correct
Batch Transform is a job-based, ephemeral inference mode. It spins up compute, reads input from S3, runs inference on all records, writes results to S3, and then TERMINATES all compute resources. There is no persistent endpoint. You are only charged for the duration of the job. It is the cost-optimal choice for non-real-time bulk inference.
Candidates confuse Batch Transform with real-time endpoints. The key signal in exam questions: 'process a large dataset overnight,' 'no real-time requirement,' 'run inference on all historical records' → Batch Transform. 'Low latency,' 'real-time API,' 'persistent' → Real-time endpoint.
Common Mistake
SageMaker Autopilot is a black-box AutoML tool that hides how it builds models
Correct
Autopilot is a transparent AutoML service that generates the actual Python code (data transformation + algorithm selection + hyperparameter tuning) for the best pipeline it discovers. You can download, inspect, modify, and retrain this code. This transparency is a key differentiator and a responsible AI feature.
Exam questions about AutoML transparency or explainability will test this. Autopilot's generated notebooks are fully inspectable — this is intentional for governance and regulatory compliance use cases.
Common Mistake
Serverless Inference is always cheaper than real-time endpoints
Correct
Serverless Inference eliminates idle costs (no charge when not invoked) and is cheaper for intermittent, low-volume traffic. However, for sustained high-traffic workloads, real-time endpoints with appropriate instance types are significantly cheaper per inference because the per-invocation pricing of Serverless adds up at scale. Additionally, Serverless has cold start latency which may violate SLAs.
Cost optimization questions will describe traffic patterns. Low/intermittent traffic → Serverless wins. High/sustained traffic → Real-time endpoint wins. Always consider both cost AND latency requirements.
CLARIFY = C-L-A-R-I-F-Y: CloudWatch Lies About Revealing Injustice — Fairness? Yes! (Clarify finds bias; CloudWatch cannot)
Four inference modes — 'Real Servers Are Better': Real-time (persistent), Serverless (intermittent), Async (large payloads), Batch (bulk offline)
Ground Truth vs. A2I: Ground Truth = GROUND (before training, building the foundation with labeled data); A2I = AIR (after deployment, floating above production reviewing live predictions)
Feature Store ONLINE = NOW (real-time inference, DynamoDB-backed, low latency); OFFLINE = LATER (training, S3-backed, batch access)
Bias fix order: Detect (Clarify pre-training) → Fix Data → Retrain → Detect Again (Clarify post-training) → Monitor (Model Monitor bias drift). Never just filter outputs!
CertAI Tutor · AIF-C01, SAA-C03, SAP-C02, DEA-C01, CLF-C02 · 2026-02-22
In the Same Category
Comparisons
Guides & Patterns