
Cargando...
Images & video vs. document extraction vs. text understanding — pick the right tool every time
Three AI services that look similar but solve completely different problems
| Feature | Rekognition See and analyze images and video | Textract Extract structured text from documents | Comprehend Understand meaning inside text |
|---|---|---|---|
Primary Input Type CRITICAL: Comprehend cannot read images. If the question involves a scanned document, Textract extracts the text first, then Comprehend analyzes it. | Images (JPEG, PNG, GIF, WebP) and video (MP4, MOV via S3) | Documents — images and PDFs containing text, forms, tables | Raw UTF-8 text strings (plain text, not images) |
Core Capability Rekognition can detect TEXT in images (DetectText), but it is NOT a replacement for Textract — Textract understands document structure; Rekognition just spots text pixels. | Computer vision: object detection, facial analysis, scene understanding, content moderation, celebrity recognition, PPE detection | Optical Character Recognition (OCR) plus structured data extraction: key-value pairs, tables, forms, signatures, queries | Natural Language Processing (NLP): sentiment, entities, key phrases, language detection, PII detection, topic modeling, custom classification |
What It Outputs | Labels with confidence scores, bounding boxes, face attributes, emotions, landmarks, unsafe content categories | Structured JSON with WORD, LINE, CELL, KEY_VALUE_SET, TABLE, SIGNATURE, QUERY blocks and their geometry | Sentiment scores (POSITIVE/NEGATIVE/NEUTRAL/MIXED), entity types (PERSON, LOCATION, DATE…), key phrases, language code, PII entity types |
Processing Modes Multi-page PDFs always require Textract async APIs. Single-image synchronous calls are the default for Rekognition. Comprehend batch jobs use S3 for input/output. | Synchronous (DetectLabels, DetectFaces, etc.) for images; Asynchronous (StartLabelDetection, etc.) for video; Real-time streaming video via Rekognition Streaming | Synchronous (DetectDocumentText, AnalyzeDocument) for single-page; Asynchronous (StartDocumentAnalysis, StartDocumentTextDetection) for multi-page PDFs | Synchronous (DetectSentiment, DetectEntities, etc.) for single documents; Asynchronous batch jobs (StartSentimentDetectionJob, etc.) for large corpora; Real-time endpoints for custom models |
Custom / Trainable All three services offer customization WITHOUT managing ML infrastructure. Never recommend SageMaker custom training when the question is about extending these managed services. | Yes — Custom Labels (train your own image classifier/object detector with your images); Custom Moderation | Yes — Adapters (fine-tune Textract to recognize domain-specific document layouts, e.g., specific insurance forms) | Yes — Custom Classification (multi-class/multi-label) and Custom Entity Recognition trained on your labeled data |
Language Support Comprehend Medical is a separate service for healthcare NLP — do not confuse with standard Comprehend for clinical text questions. | Language-agnostic for vision tasks; DetectText supports Latin-script languages | English, German, French, Spanish, Italian, Portuguese, Arabic, Hindi, Japanese, Korean, Chinese (Simplified & Traditional), Russian — varies by feature | 100+ languages for language detection; specific NLP features (sentiment, entities) support a defined subset including EN, ES, FR, DE, IT, PT, AR, HI, JA, KO, ZH |
PII / Sensitive Data Handling For PII redaction from text documents: Textract (extract) → Comprehend (detect/redact PII). This pipeline appears frequently in exam scenarios. | No native PII text detection; can detect faces (which may be considered biometric PII) — use with IAM and data governance controls | No built-in PII detection; extract text first, then pass to Comprehend or Macie for PII identification | Native PII detection and redaction (DetectPiiEntities, ContainsPiiEntities) — can identify SSN, credit card numbers, email, phone, address, etc. |
Key Integrations | S3 (image/video source), Lambda (event-driven analysis), SNS/SQS (async job notifications), Kinesis Video Streams (real-time video), CloudWatch (metrics), IAM | S3 (document source/output), Lambda, SNS (job completion), A2I (human review), SageMaker (downstream ML), CloudWatch | S3 (batch input/output), Lambda, SageMaker (custom model integration), CloudWatch, EventBridge, Kinesis Data Streams (real-time NLP pipelines) |
Amazon Augmented AI (A2I) Support A2I has built-in task types for Rekognition (content moderation) and Textract (document analysis). This is tested — if the question mentions 'human review loop' with these services, A2I is the answer. | Yes — A2I human review for content moderation (built-in task type) | Yes — A2I human review for document analysis (built-in task type for forms) | No built-in A2I task type; custom A2I workflows can be built but not native |
Streaming / Real-Time | Yes — Rekognition Streaming Video Events (connected home, surveillance) via Kinesis Video Streams; also real-time face search | No streaming; document-based only | Real-time inference endpoints for custom models; standard APIs are synchronous per-document, not streaming |
Pricing Model Comprehend custom endpoints accrue costs even when idle — a common cost optimization exam trap. Stop or delete endpoints when not in use. | Per image analyzed or per minute of video processed; Custom Labels charged per inference hour; Stored faces charged per face per month | Per page processed; pricing varies by API (DetectDocumentText cheaper than AnalyzeDocument with tables/forms/queries); Adapters have additional charges | Per unit (100 characters) for synchronous APIs; per unit for async batch; custom model endpoints charged per inference hour while running |
Compliance / FIPS Endpoints All three support FIPS endpoints for government/regulated workloads. Comprehend does NOT have a FIPS endpoint in us-west-1 (N. California) — Rekognition and Textract do. | FIPS 140-2 endpoints available in US East (N. Virginia, Ohio), US West (N. California, Oregon) | FIPS 140-2 endpoints available in US East (N. Virginia, Ohio), US West (N. California, Oregon) | FIPS 140-2 endpoints available in US East (N. Virginia, Ohio), US West (Oregon) |
Typical Use Cases | Content moderation (social media), facial recognition/access control, media asset management, workplace safety (PPE), celebrity identification, video surveillance, e-commerce image tagging | Invoice/receipt processing, mortgage/loan document digitization, medical record extraction, ID document verification, tax form processing, contract analysis | Customer feedback sentiment analysis, support ticket routing, news article categorization, compliance document screening, social media monitoring, chatbot intent understanding, PII redaction from text |
Responsible AI / Bias Considerations AIF-C01 exam: Managed services like Rekognition, Textract, and Comprehend have built-in responsible AI features. Do NOT recommend custom implementations when built-in guardrails (A2I, confidence thresholds, PII redaction) already exist. | Facial analysis bias documented by AWS; AWS requires use case restrictions for law enforcement; age range estimation has confidence intervals; follow AWS Acceptable Use Policy | Accuracy varies by document quality and language; Adapters help with domain-specific accuracy; human review via A2I recommended for high-stakes documents | Sentiment models may reflect training data biases; custom model fairness is user's responsibility; use SageMaker Clarify for bias detection on custom Comprehend models |
Summary
Use Rekognition when your input is an image or video and you need to understand visual content (objects, faces, scenes, unsafe content). Use Textract when you have scanned documents or PDFs and need to extract structured text, forms, tables, or key-value pairs with layout awareness. Use Comprehend when you already have text and need to understand its meaning — sentiment, entities, language, topics, or PII. These services are frequently chained together: Textract extracts → Comprehend understands → results stored in S3/DynamoDB.
🎯 Decision Tree
Is input an image/video? → Rekognition. Is input a scanned document/PDF needing structured extraction? → Textract. Is input already plain text needing NLP analysis? → Comprehend. Need to process a scanned invoice for sentiment of customer notes? → Textract THEN Comprehend (pipeline). Need human review of low-confidence results? → Add A2I (native for Rekognition content mod + Textract forms). Need to detect PII in extracted document text? → Textract → Comprehend DetectPiiEntities.
CRITICAL — The Textract→Comprehend Pipeline: Comprehend CANNOT process images or PDFs directly. For any question involving 'analyze sentiment/entities/PII in scanned documents or invoices,' the correct architecture is always Textract (extract text) → Comprehend (analyze text). Choosing Comprehend alone on document images is a trap answer.
CRITICAL — Rekognition DetectText ≠ Textract: Rekognition's DetectText API can find text IN images (e.g., street signs, product labels) but has NO understanding of document structure. Textract understands forms, tables, key-value pairs, and multi-page layouts. If the question mentions 'forms,' 'tables,' 'key-value pairs,' or 'invoices,' Textract is always correct over Rekognition.
CRITICAL — Custom Models Without SageMaker: All three services support customization without managing ML infrastructure. Rekognition Custom Labels, Textract Adapters, and Comprehend Custom Classification/Entity Recognition are the right answers when the question asks to extend these services for domain-specific needs. Never recommend building a custom SageMaker model when these managed customization options exist.
IMPORTANT — A2I Human Review: Amazon Augmented AI (A2I) has BUILT-IN task types for Rekognition (content moderation) and Textract (document analysis). If an exam question asks how to add human review/oversight to these services, A2I is the answer. Comprehend does NOT have a built-in A2I task type.
IMPORTANT — PII Detection Ownership: Only Comprehend has native PII detection and redaction (DetectPiiEntities, ContainsPiiEntities). Rekognition handles biometric data (faces) but not textual PII. Textract extracts text but does not identify PII within it. For GDPR/HIPAA compliance pipelines involving documents, the pattern is Textract → Comprehend → redacted output.
IMPORTANT — Cost Trap for Comprehend Custom Endpoints: Comprehend real-time inference endpoints for custom models are billed per hour while provisioned, regardless of usage. For cost optimization questions, the answer is to delete or stop the endpoint when not in use, or use async batch jobs instead of real-time endpoints for non-latency-sensitive workloads.
NICE-TO-KNOW — Comprehend Medical is Separate: Amazon Comprehend Medical is a distinct service optimized for clinical/healthcare NLP (ICD-10-CM, RxNorm, SNOMED CT). Do not confuse with standard Comprehend. Exam questions about extracting diagnoses, medications, or medical conditions from clinical notes should use Comprehend Medical, not standard Comprehend.
NICE-TO-KNOW — Rekognition Streaming for Real-Time Video: Standard Rekognition video analysis is asynchronous (submit to S3, get SNS notification). For REAL-TIME video analysis (surveillance, connected home), Rekognition Streaming Video Events via Kinesis Video Streams is the correct architecture. This distinction appears in architecture design questions.
The #1 exam trap: Choosing Comprehend to analyze text from scanned documents or PDFs without Textract. Comprehend only accepts plain UTF-8 text — it cannot read images or PDFs. The correct pattern always requires Textract first to extract text, then Comprehend to analyze it. A secondary trap is choosing Rekognition's DetectText instead of Textract for structured document processing — Rekognition finds text in images but has zero understanding of document structure, forms, or tables.
CertAI Tutor · AIF-C01, SAA-C03, SAP-C02, CLF-C02 · 2026-02-22
Services
Comparisons
Guides & Patterns