ml aiAIF-C01SAA-C03SAP-C02CLF-C02

Rekognition vs Textract vs Comprehend: The AWS AI Trio Decoded

Images & video vs. document extraction vs. text understanding — pick the right tool every time

Updated 2026-02-22

Overview

Three AI services that look similar but solve completely different problems

Rekognition vs Textract vs Comprehend

Feature	Rekognition See and analyze images and video	Textract Extract structured text from documents	Comprehend Understand meaning inside text
Primary Input Type CRITICAL: Comprehend cannot read images. If the question involves a scanned document, Textract extracts the text first, then Comprehend analyzes it.	Images (JPEG, PNG, GIF, WebP) and video (MP4, MOV via S3)	Documents — images and PDFs containing text, forms, tables	Raw UTF-8 text strings (plain text, not images)
Core Capability Rekognition can detect TEXT in images (DetectText), but it is NOT a replacement for Textract — Textract understands document structure; Rekognition just spots text pixels.	Computer vision: object detection, facial analysis, scene understanding, content moderation, celebrity recognition, PPE detection	Optical Character Recognition (OCR) plus structured data extraction: key-value pairs, tables, forms, signatures, queries	Natural Language Processing (NLP): sentiment, entities, key phrases, language detection, PII detection, topic modeling, custom classification
What It Outputs	Labels with confidence scores, bounding boxes, face attributes, emotions, landmarks, unsafe content categories	Structured JSON with WORD, LINE, CELL, KEY_VALUE_SET, TABLE, SIGNATURE, QUERY blocks and their geometry	Sentiment scores (POSITIVE/NEGATIVE/NEUTRAL/MIXED), entity types (PERSON, LOCATION, DATE…), key phrases, language code, PII entity types
Processing Modes Multi-page PDFs always require Textract async APIs. Single-image synchronous calls are the default for Rekognition. Comprehend batch jobs use S3 for input/output.	Synchronous (DetectLabels, DetectFaces, etc.) for images; Asynchronous (StartLabelDetection, etc.) for video; Real-time streaming video via Rekognition Streaming	Synchronous (DetectDocumentText, AnalyzeDocument) for single-page; Asynchronous (StartDocumentAnalysis, StartDocumentTextDetection) for multi-page PDFs	Synchronous (DetectSentiment, DetectEntities, etc.) for single documents; Asynchronous batch jobs (StartSentimentDetectionJob, etc.) for large corpora; Real-time endpoints for custom models
Custom / Trainable All three services offer customization WITHOUT managing ML infrastructure. Never recommend SageMaker custom training when the question is about extending these managed services.	Yes — Custom Labels (train your own image classifier/object detector with your images); Custom Moderation	Yes — Adapters (fine-tune Textract to recognize domain-specific document layouts, e.g., specific insurance forms)	Yes — Custom Classification (multi-class/multi-label) and Custom Entity Recognition trained on your labeled data
Language Support Comprehend Medical is a separate service for healthcare NLP — do not confuse with standard Comprehend for clinical text questions.	Language-agnostic for vision tasks; DetectText supports Latin-script languages	English, German, French, Spanish, Italian, Portuguese, Arabic, Hindi, Japanese, Korean, Chinese (Simplified & Traditional), Russian — varies by feature	100+ languages for language detection; specific NLP features (sentiment, entities) support a defined subset including EN, ES, FR, DE, IT, PT, AR, HI, JA, KO, ZH
PII / Sensitive Data Handling For PII redaction from text documents: Textract (extract) → Comprehend (detect/redact PII). This pipeline appears frequently in exam scenarios.	No native PII text detection; can detect faces (which may be considered biometric PII) — use with IAM and data governance controls	No built-in PII detection; extract text first, then pass to Comprehend or Macie for PII identification	Native PII detection and redaction (DetectPiiEntities, ContainsPiiEntities) — can identify SSN, credit card numbers, email, phone, address, etc.
Key Integrations	S3 (image/video source), Lambda (event-driven analysis), SNS/SQS (async job notifications), Kinesis Video Streams (real-time video), CloudWatch (metrics), IAM	S3 (document source/output), Lambda, SNS (job completion), A2I (human review), SageMaker (downstream ML), CloudWatch	S3 (batch input/output), Lambda, SageMaker (custom model integration), CloudWatch, EventBridge, Kinesis Data Streams (real-time NLP pipelines)
Amazon Augmented AI (A2I) Support A2I has built-in task types for Rekognition (content moderation) and Textract (document analysis). This is tested — if the question mentions 'human review loop' with these services, A2I is the answer.	Yes — A2I human review for content moderation (built-in task type)	Yes — A2I human review for document analysis (built-in task type for forms)	No built-in A2I task type; custom A2I workflows can be built but not native
Streaming / Real-Time	Yes — Rekognition Streaming Video Events (connected home, surveillance) via Kinesis Video Streams; also real-time face search	No streaming; document-based only	Real-time inference endpoints for custom models; standard APIs are synchronous per-document, not streaming
Pricing Model Comprehend custom endpoints accrue costs even when idle — a common cost optimization exam trap. Stop or delete endpoints when not in use.	Per image analyzed or per minute of video processed; Custom Labels charged per inference hour; Stored faces charged per face per month	Per page processed; pricing varies by API (DetectDocumentText cheaper than AnalyzeDocument with tables/forms/queries); Adapters have additional charges	Per unit (100 characters) for synchronous APIs; per unit for async batch; custom model endpoints charged per inference hour while running
Compliance / FIPS Endpoints All three support FIPS endpoints for government/regulated workloads. Comprehend does NOT have a FIPS endpoint in us-west-1 (N. California) — Rekognition and Textract do.	FIPS 140-2 endpoints available in US East (N. Virginia, Ohio), US West (N. California, Oregon)	FIPS 140-2 endpoints available in US East (N. Virginia, Ohio), US West (N. California, Oregon)	FIPS 140-2 endpoints available in US East (N. Virginia, Ohio), US West (Oregon)
Typical Use Cases	Content moderation (social media), facial recognition/access control, media asset management, workplace safety (PPE), celebrity identification, video surveillance, e-commerce image tagging	Invoice/receipt processing, mortgage/loan document digitization, medical record extraction, ID document verification, tax form processing, contract analysis	Customer feedback sentiment analysis, support ticket routing, news article categorization, compliance document screening, social media monitoring, chatbot intent understanding, PII redaction from text
Responsible AI / Bias Considerations AIF-C01 exam: Managed services like Rekognition, Textract, and Comprehend have built-in responsible AI features. Do NOT recommend custom implementations when built-in guardrails (A2I, confidence thresholds, PII redaction) already exist.	Facial analysis bias documented by AWS; AWS requires use case restrictions for law enforcement; age range estimation has confidence intervals; follow AWS Acceptable Use Policy	Accuracy varies by document quality and language; Adapters help with domain-specific accuracy; human review via A2I recommended for high-stakes documents	Sentiment models may reflect training data biases; custom model fairness is user's responsibility; use SageMaker Clarify for bias detection on custom Comprehend models

Summary

Use Rekognition when your input is an image or video and you need to understand visual content (objects, faces, scenes, unsafe content). Use Textract when you have scanned documents or PDFs and need to extract structured text, forms, tables, or key-value pairs with layout awareness. Use Comprehend when you already have text and need to understand its meaning — sentiment, entities, language, topics, or PII. These services are frequently chained together: Textract extracts → Comprehend understands → results stored in S3/DynamoDB.

🎯 Decision Tree

Is input an image/video? → Rekognition. Is input a scanned document/PDF needing structured extraction? → Textract. Is input already plain text needing NLP analysis? → Comprehend. Need to process a scanned invoice for sentiment of customer notes? → Textract THEN Comprehend (pipeline). Need human review of low-confidence results? → Add A2I (native for Rekognition content mod + Textract forms). Need to detect PII in extracted document text? → Textract → Comprehend DetectPiiEntities.

Exam Tips

critical

CRITICAL — The Textract→Comprehend Pipeline: Comprehend CANNOT process images or PDFs directly. For any question involving 'analyze sentiment/entities/PII in scanned documents or invoices,' the correct architecture is always Textract (extract text) → Comprehend (analyze text). Choosing Comprehend alone on document images is a trap answer.

critical

CRITICAL — Rekognition DetectText ≠ Textract: Rekognition's DetectText API can find text IN images (e.g., street signs, product labels) but has NO understanding of document structure. Textract understands forms, tables, key-value pairs, and multi-page layouts. If the question mentions 'forms,' 'tables,' 'key-value pairs,' or 'invoices,' Textract is always correct over Rekognition.

critical

CRITICAL — Custom Models Without SageMaker: All three services support customization without managing ML infrastructure. Rekognition Custom Labels, Textract Adapters, and Comprehend Custom Classification/Entity Recognition are the right answers when the question asks to extend these services for domain-specific needs. Never recommend building a custom SageMaker model when these managed customization options exist.

important

IMPORTANT — A2I Human Review: Amazon Augmented AI (A2I) has BUILT-IN task types for Rekognition (content moderation) and Textract (document analysis). If an exam question asks how to add human review/oversight to these services, A2I is the answer. Comprehend does NOT have a built-in A2I task type.

important

IMPORTANT — PII Detection Ownership: Only Comprehend has native PII detection and redaction (DetectPiiEntities, ContainsPiiEntities). Rekognition handles biometric data (faces) but not textual PII. Textract extracts text but does not identify PII within it. For GDPR/HIPAA compliance pipelines involving documents, the pattern is Textract → Comprehend → redacted output.

important

IMPORTANT — Cost Trap for Comprehend Custom Endpoints: Comprehend real-time inference endpoints for custom models are billed per hour while provisioned, regardless of usage. For cost optimization questions, the answer is to delete or stop the endpoint when not in use, or use async batch jobs instead of real-time endpoints for non-latency-sensitive workloads.

Good to Know

NICE-TO-KNOW — Comprehend Medical is Separate: Amazon Comprehend Medical is a distinct service optimized for clinical/healthcare NLP (ICD-10-CM, RxNorm, SNOMED CT). Do not confuse with standard Comprehend. Exam questions about extracting diagnoses, medications, or medical conditions from clinical notes should use Comprehend Medical, not standard Comprehend.

Good to Know

NICE-TO-KNOW — Rekognition Streaming for Real-Time Video: Standard Rekognition video analysis is asynchronous (submit to S3, get SNS notification). For REAL-TIME video analysis (surveillance, connected home), Rekognition Streaming Video Events via Kinesis Video Streams is the correct architecture. This distinction appears in architecture design questions.

Common Trap

The #1 exam trap: Choosing Comprehend to analyze text from scanned documents or PDFs without Textract. Comprehend only accepts plain UTF-8 text — it cannot read images or PDFs. The correct pattern always requires Textract first to extract text, then Comprehend to analyze it. A secondary trap is choosing Rekognition's DetectText instead of Textract for structured document processing — Rekognition finds text in images but has zero understanding of document structure, forms, or tables.

CertAI Tutor · AIF-C01, SAA-C03, SAP-C02, CLF-C02 · 2026-02-22

Ready to test your knowledge?

Practice AIF-C01, SAA-C03, SAP-C02, CLF-C02 exam questions with AI-powered explanations — free to start.

Rekognition vs Textract vs Comprehend: The AWS AI Trio Decoded

Overview

Rekognition vs Textract vs Comprehend

Exam Tips

Common Trap

Ready to test your knowledge?

Related Cheat Sheets