ml aiAIF-C01SAA-C03SAP-C02CLF-C02

Amazon Translate: Break Language Barriers at Machine Speed

Neural machine translation that makes your applications speak every language — instantly and at scale.

Updated 2026-02-22

Overview

Amazon Translate is a neural machine translation (NMT) service that delivers fast, high-quality, and affordable language translation. It supports real-time and batch translation across a wide range of language pairs, enabling developers to build multilingual applications without managing translation infrastructure. Translate integrates natively with the broader AWS AI/ML ecosystem, making it a key component in pipelines involving Comprehend, Transcribe, Textract, and Bedrock.

Automate high-quality text translation across dozens of languages for real-time and batch workloads in applications, content pipelines, and AI workflows.

Use When

Localizing web/mobile app content dynamically for global users without manual translator involvement
Building multilingual customer support pipelines: Transcribe speech → Translate text → respond in user's language
Processing large document repositories in batch (e.g., legal, compliance, e-commerce catalogs) via Batch Translation jobs
Augmenting AI/ML pipelines where multilingual input must be normalized to a single language before analysis by Comprehend or Bedrock

Avoid When

Literary or highly nuanced creative translation (marketing slogans, poetry) — NMT lacks cultural/idiomatic precision; use human translators
Real-time speech-to-speech translation as a standalone solution — Translate only handles text; pair with Transcribe and Polly, and be aware of latency compounding across services

Key Features

Real-Time (Synchronous) Translation

Single API call, up to 5,000 bytes per request

Batch Translation (Asynchronous)

S3 input → S3 output; handles large-scale document jobs

Custom Terminology

CSV or TMX files to enforce specific term translations (brand names, product names)

Parallel Data (Active Custom Translation)

Train translation style with example sentence pairs for domain adaptation

Automatic Language Detection

Can auto-detect source language using Amazon Comprehend under the hood; no separate call needed

Formality Setting

Control formal vs. informal register for supported language pairs

Profanity Masking

Mask profane words in translated output with a grawlix (****)

Brevity Setting

Request shorter translations when length matters (e.g., UI labels, SMS)

Do Not Translate Tags

HTML/XML tags can mark content to be preserved as-is (e.g., brand names in markup)

Encryption at Rest

Uses AWS KMS; customer-managed keys (CMK) supported

Encryption in Transit

TLS enforced for all API calls

VPC Endpoints (PrivateLink)

Keep translation traffic off the public internet

CloudWatch Metrics

Monitor request counts, latency, and errors

AWS CloudTrail Integration

API call logging for compliance and audit

Integration Patterns

Speech-to-Translated-Text Pipeline

high freq

Amazon TranslateAmazon Transcribe

Transcribe converts audio to text in the source language; Translate converts the transcript to the target language. Used for multilingual call center analytics, meeting transcription, and subtitle generation.

Multilingual Sentiment & Entity Analysis

high freq

Amazon TranslateAmazon Comprehend

Translate normalizes non-English text to English first, then Comprehend performs sentiment analysis, entity recognition, or key phrase extraction. Comprehend's primary NLP models are optimized for English — Translate bridges the gap.

Multilingual Document Intelligence

high freq

Amazon TranslateAmazon Textract

Textract extracts text from scanned documents (PDFs, images); Translate converts extracted text to a target language. Enables cross-language document processing workflows for legal, financial, and compliance use cases.

Multilingual Chatbot

high freq

Amazon TranslateAmazon Lex

Lex handles conversational logic in a single language (e.g., English); Translate converts user input to English before Lex processing and translates Lex responses back to the user's language. Avoids building separate Lex bots per language.

Multilingual Generative AI Applications

high freq

Amazon TranslateAmazon Bedrock

Translate normalizes multilingual user prompts to English before sending to a foundation model in Bedrock, then translates the model's English response back to the user's language. Maximizes FM effectiveness while supporting global users.

Text-to-Translated-Speech (TTS Pipeline)

medium freq

Amazon TranslateAmazon Polly

Translate converts text to target language; Polly synthesizes speech in that language. Builds multilingual voice assistants, IVR systems, and accessibility features.

Multilingual Image/Video Metadata Pipeline

medium freq

Amazon TranslateAmazon Rekognition

Rekognition detects labels, text, or moderation signals in images/video; Translate localizes the resulting labels or detected text for international audiences.

Event-Driven Batch Translation

medium freq

Amazon TranslateAmazon S3AWS Lambda

S3 PUT event triggers Lambda to start a Translate Batch Translation job; results written back to S3. Serverless, cost-efficient pipeline for processing uploaded documents at scale.

Real-Time Content Localization Cache

medium freq

Amazon TranslateAmazon DynamoDBAWS Lambda

Lambda translates content on first request and caches translated strings in DynamoDB; subsequent requests for the same content/language pair are served from cache, reducing translation costs and latency.

Service Limits & Quotas

LimitValueNote

Real-Time Translation — Maximum document size

5,000 bytes bytes (UTF-8 encoded)

Candidates confuse 'bytes' with 'characters'. For multi-byte UTF-8 characters (e.g., CJK scripts), 5,000 bytes ≠ 5,000 characters — you can hit the limit sooner.

Batch Translation — Maximum document size

20 MB per document

Do not confuse the 5,000-byte real-time limit with the 20 MB batch limit — a very common exam distractor.

Batch Translation — Concurrent jobs per account

Refer to current Service Quotas console jobs

AWS docs do not publish a fixed universal number in the general reference; always check the Service Quotas console for your region.

Custom Terminology — Maximum file size

10 MB per terminology file (CSV or TMX)

Custom Terminology lets you enforce brand names, product names, or domain-specific terms. The 10 MB cap limits the number of term pairs you can include.

Custom Terminology — Maximum terminologies per account

Refer to current Service Quotas console terminologies

Know that Custom Terminology is a named resource attached to translation calls — not applied globally by default.

Parallel Data (Active Custom Translation) — Maximum file size

5 GB per parallel data resource

Parallel Data ≠ Custom Terminology. Parallel Data adapts style/domain; Custom Terminology enforces specific word mappings. Exams test this distinction.

Supported languages

75+ languages language pairs

AWS periodically adds languages. Memorize the concept (75+) not a rigid number, as it may be higher by exam time.

Pricing Model

Pay-per-character translated (no upfront costs, no minimum fees)

Charged per character of text processed — both source and translated text count depending on the pricing tier
Standard translation and Custom Terminology usage are priced per character; Custom Terminology itself has no per-file charge
Parallel Data (Active Custom Translation) has a separate per-character rate higher than standard translation — factor this into cost modeling
Free Tier: 2 million characters per month for the first 12 months after first translation request
Batch Translation uses the same per-character pricing as real-time — no premium for async processing
No charge for the language detection step when auto-detect is used within Translate (Comprehend is called internally but not billed separately through the Translate API)

Exam Tips

criticalTranslate + Comprehend integration pattern

When a question asks how to make a single-language NLP service (like Comprehend) work with multilingual input, the answer is almost always: Translate the input to English FIRST, then pass it to Comprehend. This is the canonical AWS multilingual NLP architecture.

criticalReal-time vs. Batch Translation

Real-time translation limit is 5,000 BYTES (not characters). For large documents or bulk processing, Batch Translation (S3 → Translate → S3) is the correct architectural choice. If a question mentions 'thousands of documents' or 'large files', Batch Translation is the answer.

criticalCustom Terminology vs. Parallel Data

Custom Terminology enforces SPECIFIC WORD MAPPINGS (e.g., brand names, product names, technical terms) — it does NOT change the translation style or domain. Parallel Data (Active Custom Translation) is used to adapt translation STYLE and DOMAIN. Exams test this distinction directly.

criticalTranslate in generative AI / Bedrock pipelines

For AIF-C01: In a generative AI pipeline serving global users, Translate is the correct service to normalize multilingual prompts before sending to a foundation model via Bedrock. This is a common architecture question in the AI Practitioner exam.

critical

Multilingual NLP = Translate FIRST (normalize to English), THEN Comprehend (analyze). Translate does not perform NLP — it only moves text between languages.

critical

Real-time limit = 5,000 bytes. Large documents or bulk processing = Batch Translation (S3-based). This distinction drives architecture decisions on SAA-C03 and SAP-C02.

critical

Custom Terminology = enforce specific word substitutions (brand names). Parallel Data = adapt translation style/domain with example sentences. These are NOT interchangeable.

importantService boundary: Translate vs. Comprehend

Amazon Translate does NOT perform sentiment analysis, entity recognition, language detection as standalone features for downstream use — those are Amazon Comprehend's jobs. Translate's auto-detect feature uses Comprehend internally, but you cannot use Translate as a replacement for Comprehend in NLP pipelines.

importantTranslate + Lex multilingual chatbot

For multilingual chatbots using Amazon Lex: the recommended pattern is Translate user input → Lex (English) → Translate Lex response back. Do NOT build separate Lex bots per language — that's operationally expensive and not the AWS-recommended architecture.

importantAI safety in multilingual pipelines

Translate is NOT a content moderation service. It does not detect harmful content, prompt injection, or policy violations. For AI safety in multilingual pipelines, you still need Comprehend (toxicity/sentiment), Bedrock Guardrails, or AWS WAF — but WAF cannot perform semantic analysis. Don't confuse translation with moderation.

importantCost optimization with translation caching

Pricing is per CHARACTER translated. When designing cost-optimized architectures, implement a translation cache (e.g., DynamoDB or ElastiCache) to avoid re-translating identical strings. This is a classic SAA-C03/SAP-C02 cost-optimization pattern.

Good to KnowTranslation settings and output customization

The Formality, Profanity Masking, and Brevity settings are OPTIONAL parameters on the translation API — not separate services. Know that these exist for exam questions about customizing translation output quality.

Common Misconceptions & Traps

Common Mistake

Amazon Translate can detect sentiment or intent in translated text — I can use it as an all-in-one NLP service.

Correct

Amazon Translate ONLY translates text between languages. Sentiment analysis, entity recognition, key phrase extraction, and language detection for NLP purposes are all Amazon Comprehend features. The two services are complementary, not interchangeable.

This is one of the most common service-boundary confusion traps on SAA-C03 and AIF-C01. The correct architecture is: Translate (normalize language) → Comprehend (analyze text). Remember: Translate moves words between languages; Comprehend understands what those words mean.

Common Mistake

I can use Translate's auto-detect feature to identify the source language and then use that result in my application logic — it's a language detection service.

Correct

While Translate can auto-detect the source language (using Comprehend internally), if you need reliable language detection as a standalone capability for routing, classification, or analytics, use Amazon Comprehend's DetectDominantLanguage API directly. Translate's auto-detect is a convenience feature for translation calls, not a primary language detection service.

Exam questions about 'detect the language of incoming customer messages' point to Comprehend, not Translate. Using Translate just to detect language (without actually translating) is wasteful and architecturally incorrect.

Common Mistake

Custom Terminology and Parallel Data do the same thing — they both customize how Translate handles specific content.

Correct

Custom Terminology is a lookup-table approach: it enforces specific word-for-word substitutions (e.g., always translate 'AWS' as 'AWS', never localize brand names). Parallel Data (Active Custom Translation) trains the model on example sentence pairs to adapt the STYLE, TONE, and DOMAIN of translations (e.g., making translations sound more formal or more like your industry's terminology).

Exam questions will describe a scenario and ask which customization feature to use. Key signal: if the requirement is 'never translate this specific term' → Custom Terminology. If the requirement is 'make translations sound like our industry documents' → Parallel Data.

Common Mistake

Batch Translation is more expensive than real-time translation because it's a managed job service.

Correct

Batch Translation uses the same per-character pricing as real-time translation. There is no premium for using the asynchronous batch job capability. The choice between real-time and batch is driven by use case (latency vs. volume) and document size limits, not cost.

Cost-optimization questions may tempt you to choose real-time translation as 'cheaper' — it is not. Choose Batch Translation for large documents (>5,000 bytes) and high-volume processing regardless of cost concerns.

Common Mistake

Amazon Translate can moderate translated content and flag harmful or inappropriate translations before they reach users.

Correct

Amazon Translate has a Profanity Masking feature, but this is limited to masking profane words with grawlix characters (****). It is NOT a content moderation service. For comprehensive content moderation, use Amazon Comprehend (toxicity detection), Amazon Rekognition (image/video moderation), or AWS Bedrock Guardrails. Translate processes language — it does not evaluate content safety.

This is directly relevant to AIF-C01's AI safety domain. In multilingual AI pipelines, you need BOTH Translate (for language normalization) AND a separate moderation/safety service. Translate alone cannot protect against harmful content, prompt injection, or policy violations.

Common Mistake

To build a multilingual chatbot with Amazon Lex, I need to create a separate Lex bot for each language I want to support.

Correct

The AWS-recommended pattern is to build ONE Lex bot in a primary language (typically English) and use Amazon Translate to translate user inputs INTO that language before sending to Lex, then translate Lex's responses BACK to the user's language. This dramatically reduces operational complexity.

Exam questions about 'cost-effective multilingual chatbot' or 'scale chatbot to 10 new languages quickly' point to the Translate+Lex pattern, not creating N separate bots. Creating separate bots per language is expensive, hard to maintain, and not the AWS-recommended architecture.

Memory Tricks

🧠

TRANSLATE = Text Relay And Navigation for Speaking Languages Across The Enterprise. It MOVES text between languages — it doesn't READ meaning (that's Comprehend's job).

🧠

Custom Terminology = DICTIONARY (specific word → specific word). Parallel Data = STYLE GUIDE (here are example sentences in our style — learn from them).

🧠

The 5,000-byte real-time limit: Think '5K bytes = 5 kilobytes = roughly one page of text'. More than one page? Use Batch (S3).

🧠

Translate pipeline order: TRANSCRIBE (ears) → TRANSLATE (tongue) → COMPREHEND (brain) → POLLY (voice). Each service does ONE job.

CertAI Tutor · AIF-C01, SAA-C03, SAP-C02, CLF-C02 · 2026-02-22

Ready to test your knowledge?

Practice AIF-C01, SAA-C03, SAP-C02, CLF-C02 exam questions with AI-powered explanations — free to start.

Amazon Translate: Break Language Barriers at Machine Speed

Overview

Key Features

Integration Patterns

Service Limits & Quotas

Pricing Model

Exam Tips

Common Misconceptions & Traps

Memory Tricks

Ready to test your knowledge?

Related Cheat Sheets