
Cargando...
Fully managed computer vision AI that finds faces, objects, text, and unsafe content in images and video — no ML expertise required.
Amazon Rekognition is a fully managed deep learning-based computer vision service that analyzes images and videos to detect objects, scenes, faces, text, celebrities, and inappropriate content. It offers two core APIs — Rekognition Image for synchronous analysis and Rekognition Video for asynchronous, streaming, or stored video analysis — with no infrastructure to manage. Rekognition Custom Labels extends the service by allowing you to train custom models on your own labeled images using AutoML, without writing a single line of ML code.
Automate visual content moderation, identity verification, searchable media archives, and workplace safety monitoring at scale without building or managing ML models.
Use When
Avoid When
Object and Scene Detection (DetectLabels)
Returns labels with confidence scores and bounding boxes for 3,000+ object and scene categories
Face Detection (DetectFaces)
Detects face attributes: age range, emotions, gender, glasses, smile, pose, quality, landmarks
Face Comparison (CompareFaces)
Compares two face images and returns similarity score; used for 1:1 identity verification
Face Search / Collections (SearchFacesByImage)
1:N face search against an indexed collection; used for identity lookup at scale
Celebrity Recognition (RecognizeCelebrities)
Identifies thousands of celebrities with name and IMDb/Wikipedia links
Content Moderation (DetectModerationLabels)
Hierarchical taxonomy of unsafe content: nudity, violence, hate symbols, drugs, etc.
Text in Image Detection (DetectText)
Detects and reads printed and handwritten text in images (not a replacement for Textract on documents)
Custom Labels (AutoML)
Train custom object/scene classifiers on your own labeled images with no ML code
PPE Detection
Detects personal protective equipment (hard hats, face covers, hand covers) on persons in images
Asynchronous Video Analysis (Start/Get APIs)
Label detection, face detection, content moderation, celebrity recognition, text detection on stored S3 video
Streaming Video Analysis (Kinesis Video Streams)
Real-time face search against collections on live video streams
Segment Detection (StartSegmentDetection)
Detects technical cues (black frames, end credits, slates) and shot changes in stored video
Image Properties (DetectLabels with imagePropertiesVersion)
Returns dominant colors, image quality, and foreground/background analysis
GDPR / data privacy controls
Face collections can be deleted; no training on customer data without explicit opt-in
Audio analysis
Rekognition processes only visual content (pixels). Use Amazon Transcribe for audio/speech.
PDF / multi-page document processing
Use Amazon Textract for structured document extraction
Real-time object detection on streaming video (non-face)
Streaming Video via KVS only supports face search. Label/object detection on streaming requires custom architecture (SageMaker + KVS)
Event-driven image moderation pipeline
high freqS3 PutObject event triggers a Lambda function that calls Rekognition DetectModerationLabels. If confidence exceeds threshold, Lambda publishes to SNS to alert moderators or auto-quarantine the object. This is the canonical serverless content moderation architecture.
Real-time face recognition on live video
high freqCamera streams video to Kinesis Video Streams. Rekognition Streaming Video processor performs face search against a pre-indexed collection in near real-time. Matches are written to DynamoDB for access logging or alerting. Used in security/surveillance and employee timekeeping scenarios.
Intelligent document + image understanding pipeline
high freqRekognition detects and crops relevant image regions (e.g., ID card photo, signature area). Textract extracts structured text from the document. Comprehend analyzes extracted text for entities or PII. Together these three services cover the full spectrum of document intelligence — a common multi-service exam scenario.
Custom Labels vs. SageMaker decision boundary
high freqUse Rekognition Custom Labels when you need a custom computer vision model with minimal ML expertise, small-to-medium datasets, and AutoML convenience. Use SageMaker when you need full control over model architecture, hyperparameter tuning, custom training loops, or very large datasets. The exam tests this decision boundary frequently.
Multimodal media analysis pipeline
high freqRekognition analyzes video frames for visual content (labels, faces, text). Transcribe converts the audio track to text. Translate localizes the transcript. Comprehend performs sentiment/entity analysis on the text. Each service handles its modality — the exam tests knowing which service owns which data type.
Vision + generative AI augmentation
medium freqRekognition provides structured metadata (labels, bounding boxes, confidence scores) from images. Bedrock multimodal models (e.g., Claude, Titan) can then generate natural language descriptions, summaries, or Q&A responses about that visual content. Rekognition handles the structured detection; Bedrock handles open-ended generative reasoning.
Async video analysis orchestration
medium freqStep Functions orchestrates a workflow: start Rekognition video job (StartLabelDetection), poll for completion via SNS/SQS callback, retrieve results (GetLabelDetection), store structured metadata in DynamoDB, and archive processed video back to S3 Glacier. This pattern handles the async nature of Rekognition Video APIs.
Rekognition is a VISUAL service only — it processes pixels (images and video frames). It has zero capability for audio analysis, natural language processing, document layout understanding, or PII detection in text. When an exam question asks about detecting PII, use Comprehend. When it asks about document text extraction, use Textract. When it asks about audio, use Transcribe.
Rekognition Custom Labels hosted endpoints are NOT serverless — they are billed per inference unit per hour while running. You must call StartProjectVersion to start and StopProjectVersion to stop. Standard Rekognition APIs (DetectLabels, etc.) ARE serverless pay-per-call. Exam scenarios about cost optimization for custom vision models will test whether you know to stop idle Custom Labels endpoints.
For face COMPARISON (1:1 verification — 'is this person the same as this photo?'), use CompareFaces. For face SEARCH (1:N identification — 'who is this person from my database of 10,000?'), use IndexFaces + SearchFacesByImage with a Face Collection. The exam distinguishes verification vs. identification scenarios.
Rekognition Video APIs are ASYNCHRONOUS. You call Start* (e.g., StartLabelDetection), receive a JobId, and then either poll GetLabelDetection or configure an SNS topic to receive a completion notification. You cannot get results synchronously for stored video. Design patterns must account for this async workflow (SNS + SQS or Step Functions polling).
Rekognition is PIXELS ONLY — no audio, no NLP, no document structure. PII in text → Comprehend. Speech → Transcribe. Form extraction → Textract. Visual content → Rekognition.
Custom Labels endpoints are billed per-hour (not per-call) and must be explicitly started/stopped. A scenario about reducing Custom Labels costs = stop idle models with a scheduled Lambda.
Video APIs are always ASYNC (Start* → SNS notification → Get*). Image APIs are always SYNC. Streaming Video (KVS) only supports face search — NOT label or object detection.
Streaming Video analysis via Kinesis Video Streams ONLY supports face search (SearchFaces). It does NOT support real-time label/object detection or content moderation on live streams. If an exam scenario needs real-time object detection on a live stream, the answer involves SageMaker or a custom solution — NOT Rekognition streaming.
Image format support is JPEG and PNG only. If a scenario involves GIF, TIFF, HEIC, WebP, or BMP images, a conversion step (e.g., Lambda using PIL/Pillow, or AWS Elemental MediaConvert for video) must be included in the architecture before calling Rekognition.
Content moderation results include a hierarchical taxonomy (e.g., 'Explicit Nudity' → 'Graphic Female Nudity'). The API returns confidence scores, and YOU set the confidence threshold in your application logic — Rekognition does not auto-block content. This is important for designing human-in-the-loop review workflows.
When comparing Rekognition Custom Labels vs. SageMaker: choose Custom Labels for minimal ML expertise, AutoML convenience, and small/medium labeled datasets. Choose SageMaker when you need custom model architectures, full hyperparameter control, large datasets, or integration with MLOps pipelines. The keyword 'no ML expertise' in exam scenarios points to Custom Labels.
The 5 MB (raw bytes) vs. 15 MB (S3 reference) image size limit is a frequently tested trap. Best practice: always store images in S3 and pass the S3 object reference to Rekognition APIs rather than embedding image bytes in the request payload.
Rekognition integrates with AWS IAM for access control. Face collections are regional resources — a collection created in us-east-1 cannot be queried from us-west-2. Architecture designs for global face search require either replication of face collections or routing to the correct region.
Common Mistake
Amazon Rekognition can detect PII (Personally Identifiable Information) in images and documents, making it a data security tool.
Correct
Rekognition detects VISUAL content — faces, objects, text pixels in images. It cannot semantically understand that a string of digits is a Social Security Number or that a name is PII. For PII detection in text, use Amazon Comprehend (DetectPiiEntities). For PII discovery in S3 data stores, use Amazon Macie.
Exam questions frequently present scenarios about 'detecting sensitive data' and include Rekognition as a distractor. The key signal: if the scenario involves text/data semantics → Comprehend/Macie. If it involves visual pixels → Rekognition.
Common Mistake
Rekognition's content moderation feature can detect prompt injection attacks or adversarial AI content, making it useful for AI safety guardrails.
Correct
Rekognition content moderation detects visual categories of unsafe content (nudity, violence, hate symbols, drugs) in images and video. It has no capability to analyze text semantics, detect malicious prompts, or identify AI-generated adversarial inputs. For prompt injection defense, use Amazon Bedrock Guardrails.
The AIF-C01 exam specifically tests AI safety tooling. Rekognition is a distractor in AI safety scenarios. Remember: Rekognition = visual safety (pixels). Bedrock Guardrails = AI safety (prompts and text).
Common Mistake
Rekognition Custom Labels works the same way as standard Rekognition APIs — pay-per-call, always available, no management needed.
Correct
Custom Labels requires you to explicitly START a model endpoint (StartProjectVersion) before inference and STOP it (StopProjectVersion) when done. You are billed per inference unit per hour while the model is running, regardless of whether you make API calls. Leaving a Custom Labels model running 24/7 with low traffic is a significant cost waste.
This is a critical cost optimization trap on SAA-C03 and SAP-C02. The exam will describe a scenario with an idle Custom Labels model and ask how to reduce costs — the answer is to stop the model when not in use, potentially using EventBridge + Lambda to schedule start/stop.
Common Mistake
Rekognition Video APIs return results synchronously, just like Rekognition Image APIs.
Correct
Rekognition Video APIs for stored video (StartLabelDetection, StartFaceDetection, etc.) are fully asynchronous. You submit a job, receive a JobId, and must either poll GetLabelDetection or configure an SNS topic for completion notification. Only Rekognition Image APIs are synchronous.
Architectures that try to call StartLabelDetection and immediately call GetLabelDetection will fail because the job won't be complete. The correct pattern uses SNS notification → SQS → Lambda → GetLabelDetection, or Step Functions with a wait state.
Common Mistake
Amazon Rekognition Streaming Video can detect any type of object or label in real-time from a live video feed.
Correct
Rekognition Streaming Video (via Kinesis Video Streams) ONLY supports face search against a pre-indexed face collection. Real-time label detection, content moderation, or PPE detection on live streams is NOT supported through the streaming API. These capabilities only work on stored video (async) or images (sync).
Exam scenarios about real-time surveillance often ask about detecting specific objects (weapons, vehicles) on live streams. Rekognition streaming cannot do this — the answer would involve SageMaker real-time inference with a custom model connected to KVS.
Common Mistake
Rekognition's DetectText API is equivalent to Amazon Textract and can extract structured data from forms and tables.
Correct
Rekognition DetectText detects and reads text that appears visually in an image (e.g., a street sign, a watermark, a license plate). It returns raw text strings and bounding boxes. It has NO understanding of document structure, form fields, table cells, or key-value pairs. Amazon Textract is purpose-built for structured document understanding with AnalyzeDocument (forms and tables).
Both services 'read text,' but the use cases are completely different. Rekognition DetectText = text in natural scenes. Textract = structured documents. Exam scenarios about processing invoices, tax forms, or medical records → Textract. Scenarios about reading text on physical objects in photos → Rekognition.
REKOGNITION = RETINA: It only sees what's in the PICTURE. No ears (no audio = no Transcribe), no brain for language (no NLP = no Comprehend), no document reader (no layout = no Textract). If it's not a pixel, it's not Rekognition's job.
Custom Labels = CUSTOM CHEF: You have to TURN ON the kitchen (StartProjectVersion) before cooking and TURN IT OFF when done (StopProjectVersion) — or you pay for a running kitchen with no customers.
Face COMPARE (1:1) = 'Is THIS you?' | Face SEARCH (1:N) = 'WHO are you?' — Compare two photos, Search a database.
Video = ASYNC always. Image = SYNC always. Remember: Videos take time to process; you can't hold the phone line open waiting.
CertAI Tutor · SAA-C03, SAP-C02, AIF-C01, CLF-C02 · 2026-02-22
In the Same Category
Comparisons
Guides & Patterns