
Cargando...
Automate, accelerate, and simplify online data transfers between on-premises storage and AWS — or between AWS storage services.
AWS DataSync is a fully managed, online data transfer service that automates moving large amounts of data between on-premises storage (NFS, SMB, HDFS, object storage) and AWS storage services (S3, EFS, FSx), as well as between AWS storage services. It handles scheduling, monitoring, encryption, integrity validation, and bandwidth throttling automatically — no custom scripts required. DataSync is purpose-built for file and object data movement, not database migration.
Accelerate and automate recurring or one-time large-scale data migrations and replications between on-premises file/object storage and AWS storage services, with built-in integrity checking and scheduling.
Use When
Avoid When
Automatic data integrity verification (checksums)
Performed during and after transfer; ensures bit-for-bit accuracy.
TLS encryption in transit
Always on, TLS 1.2; cannot be disabled.
Bandwidth throttling per task
Configurable in MB/s; can be scheduled for time-of-day windows.
Task scheduling
Built-in scheduler with hourly minimum; EventBridge integration for finer control.
File metadata preservation
Preserves POSIX permissions, timestamps, ownership, and extended attributes.
SMB share support
Supports SMB 2.0, 2.1, 3.0 — critical for Windows file server migrations.
NFS share support
Supports NFS v3, v4, v4.1.
HDFS support
Transfers data from Hadoop Distributed File System to S3 or EFS.
Amazon S3-compatible object storage support
Supports on-premises S3-compatible storage as a source.
Cross-cloud transfer (Azure Blob, GCS)
Can use other cloud object stores as source; agent deployed in the source cloud.
AWS Direct Connect integration
Agent traffic routed over Direct Connect for private, high-bandwidth transfers.
VPC endpoint support (PrivateLink)
Keeps transfer traffic off the public internet; required for strict network compliance.
AWS Migration Hub integration
Track DataSync task progress alongside other migration tools in a single pane.
CloudWatch metrics and logging
Detailed task execution metrics, file-level error reporting, and CloudWatch Logs integration.
EventBridge integration
Trigger Lambda functions or SNS notifications on task completion or failure.
Incremental transfers (delta sync)
Only transfers new or changed files after initial full transfer.
Preserve deleted files option
Configurable: destination can mirror source deletions or preserve all files.
S3 storage class selection
Choose target S3 storage class (Standard, IA, Glacier, etc.) per task.
On-premises agent (VM/hardware)
Deployed as VMware ESXi, KVM, Hyper-V, or EC2 AMI for cloud-to-cloud.
Agent-less mode (for AWS-to-AWS transfers)
No agent needed when both source and destination are AWS storage services.
Private High-Bandwidth Migration Lane
high freqRoute DataSync agent traffic over a Direct Connect private VIF or hosted connection to maximize throughput and keep data off the public internet. Essential for large-scale migrations with strict data sovereignty requirements. DataSync can saturate a 10 Gbps Direct Connect link with a single task.
On-Premises NAS to S3 Data Lake Ingestion
high freqDeploy a DataSync agent on-premises, configure NFS/SMB source location, and set S3 as destination. Schedule recurring tasks for ongoing hybrid sync. Choose the appropriate S3 storage class (e.g., S3-IA for infrequently accessed archives). DataSync preserves object metadata and supports S3 bucket policies.
File Share Migration to Managed NFS
high freqMigrate on-premises NFS workloads to Amazon EFS for serverless, elastic file storage. DataSync preserves POSIX permissions, UIDs/GIDs, and timestamps — critical for Linux workload migrations. Agent-less when moving EFS to EFS across regions.
Windows File Server Lift-and-Shift
high freqUse DataSync with SMB protocol to migrate on-premises Windows file shares to FSx for Windows File Server, preserving ACLs, NTFS permissions, and file metadata. Often combined with AWS MGN for the server OS and DataSync for the file data.
Hybrid Offline + Online Migration
high freqUse Snowball Edge for the initial bulk data transfer (petabyte-scale offline), then switch to DataSync for ongoing delta synchronization until cutover. This 'seed and sync' pattern minimizes migration window duration and network bandwidth consumption.
Full Application Migration (Files + Database)
high freqUse DataSync to migrate application file data (documents, media, logs) while DMS handles the relational or NoSQL database migration in parallel. These are complementary tools — DataSync for unstructured/semi-structured data, DMS for structured database records.
Complete Workload Migration (Compute + Data)
high freqAWS MGN handles OS/application server replication (lift-and-shift), while DataSync handles associated file share data migration. Neither tool alone handles the complete workload — they are complementary for full application migrations.
Centralized Migration Tracking
medium freqRegister DataSync tasks with AWS Migration Hub to get a unified view of file migration progress alongside server migrations (MGN) and database migrations (DMS). Provides executive-level migration dashboards.
Event-Driven Post-Transfer Processing
medium freqConfigure DataSync to emit task completion events to EventBridge, triggering a Lambda function to kick off downstream processing (e.g., EMR job, Glue crawler, SNS notification). Enables fully automated data pipeline orchestration.
NetApp On-Premises to AWS ONTAP Migration
medium freqDataSync can move data from on-premises NetApp ONTAP systems to FSx for NetApp ONTAP, preserving NFS/SMB metadata. Alternatively, NetApp SnapMirror can be used, but DataSync provides a managed AWS-native path.
DataSync is ONLY for file and object data movement — it has zero database awareness. If a question mentions migrating MySQL, PostgreSQL, Oracle, or DynamoDB data, DataSync is NOT the answer; use DMS.
The 'seed and sync' pattern is a critical exam scenario: use Snow Family (Snowball Edge) for the initial bulk offline transfer, then switch to DataSync for delta sync until final cutover. This minimizes migration window and network consumption.
DataSync requires an agent for on-premises-to-AWS transfers. No agent is needed for AWS-to-AWS transfers (e.g., S3 to EFS, EFS to FSx). Questions about 'agentless' DataSync refer specifically to AWS-to-AWS scenarios.
DataSync = FILES only. Never databases. If the question mentions SQL, schemas, tables, or database engines, the answer is DMS (not DataSync). DataSync moves NFS/SMB/HDFS/S3 object data — period.
The 'seed and sync' pattern is the canonical answer for large migrations with tight cutover windows: Snowball Edge for initial bulk offline transfer → DataSync for ongoing delta sync → cutover. This appears frequently on SAA-C03 and SAP-C02.
DataSync is agentless for AWS-to-AWS transfers (S3↔EFS, EFS↔FSx, cross-region S3). An agent (VM) is only needed when the source or destination is outside AWS. This distinction appears in architecture and cost optimization questions.
DataSync automatically performs end-to-end data integrity verification using checksums — this is a built-in differentiator vs. robocopy, rsync, or S3 CLI. If a question asks how to ensure data integrity during migration, DataSync's automatic verification is the answer.
DataSync preserves file metadata including POSIX permissions, timestamps, ownership, and ACLs. This is critical for Linux NFS migrations to EFS and Windows SMB migrations to FSx — if metadata preservation is a requirement, DataSync is preferred over S3 CLI or custom scripts.
For bandwidth-constrained migrations, DataSync's built-in throttling allows you to limit transfer speed during business hours and run at full speed off-hours — no external throttling mechanism needed. This is a common exam scenario for hybrid environments.
DataSync integrates with AWS Migration Hub for centralized tracking. On SAP-C02 questions about migration governance and visibility, remember that Migration Hub is the single pane of glass that aggregates DataSync, MGN, and DMS progress.
DataSync encrypts all data in transit with TLS 1.2 by default — this cannot be disabled. For questions about compliance requirements during data transfer, DataSync satisfies transit encryption requirements out of the box.
DataSync supports incremental (delta) transfers after the initial full sync — only changed files are transferred in subsequent runs. This makes it suitable for ongoing hybrid data synchronization, not just one-time migrations.
DataSync can transfer data from other cloud providers (Azure Blob, Google Cloud Storage) to AWS. For cross-cloud migration questions, DataSync is the managed AWS-native answer rather than custom scripts or third-party tools.
For the DEA-C01 exam, DataSync is relevant in the context of moving large datasets into S3 for analytics pipelines. Remember that DataSync can trigger downstream processing via EventBridge → Lambda → Glue/EMR after task completion.
Common Mistake
AWS DataSync can migrate databases — just point it at the database server and it will move the data.
Correct
DataSync is a file and object transfer service with absolutely no database awareness. It cannot read SQL tables, understand schemas, handle transactions, or migrate stored procedures. For database migration, use AWS DMS (for data) and AWS SCT (for schema conversion). DataSync might move database backup files (flat files on disk), but it cannot perform live database replication.
This is the #1 DataSync misconception on certification exams. Questions are designed to present DataSync as a tempting answer for database migration scenarios. The key discriminator: if the question mentions tables, schemas, SQL, or database engines → use DMS. If it mentions files, file shares, NFS, SMB, S3 objects → use DataSync.
Common Mistake
AWS DataSync and AWS Snow Family (Snowball) are competing services — you use one OR the other for a migration.
Correct
DataSync and Snow Family are complementary and frequently used together in the 'seed and sync' pattern. Snowball Edge handles the initial petabyte-scale offline bulk transfer (avoiding weeks of network transfer time), and then DataSync handles the ongoing delta synchronization of changed files until the final cutover window. Neither service alone is optimal for large migrations with tight cutover windows.
Exam questions frequently present scenarios with large datasets AND tight migration windows — the correct answer is almost always the hybrid Snow + DataSync approach, not one service alone. Remember: Snow = bulk offline seed, DataSync = online delta sync.
Common Mistake
AWS DataSync requires an agent for all transfers, including AWS-to-AWS transfers.
Correct
DataSync only requires an on-premises agent when the source or destination is outside AWS (on-premises NFS/SMB, HDFS, or other cloud storage). For transfers between AWS storage services (S3 ↔ EFS, EFS ↔ FSx, S3 ↔ S3 cross-region), DataSync operates in agentless mode — no VM deployment required.
Exam questions about migrating data between AWS storage services will try to trick you into thinking you need to deploy and manage an agent. The correct answer for AWS-to-AWS DataSync scenarios is agentless operation, reducing operational overhead significantly.
Common Mistake
AWS DataSync is the same as AWS Transfer Family — both move files to AWS.
Correct
These are fundamentally different services with different use cases. DataSync is for bulk, automated, scheduled data migration and synchronization (NFS, SMB, HDFS, S3). AWS Transfer Family provides managed SFTP, FTPS, FTP, and AS2 endpoints for human-initiated or legacy-application file transfers to/from S3 and EFS. Transfer Family is for B2B file exchange and legacy FTP workflows; DataSync is for large-scale automated data movement.
Both services appear in migration and storage questions. The discriminator: if the question mentions SFTP, FTP, B2B file exchange, or legacy applications that 'speak FTP' → Transfer Family. If it mentions bulk migration, NFS/SMB shares, scheduled sync, or petabyte-scale transfer → DataSync.
Common Mistake
AWS DataSync replaces AWS Application Migration Service (MGN) for server migrations.
Correct
DataSync moves file and object data only — it cannot replicate operating systems, application binaries, system state, or running processes. AWS MGN (Application Migration Service) is required for lift-and-shift server/VM replication. In a complete application migration, MGN handles the server and DataSync handles associated file share data — they are complementary, not competing.
SAP-C02 exam scenarios about migrating entire applications often require both services. A common trap is choosing DataSync alone for a 'migrate the application server and its file shares' scenario — the correct answer uses both MGN (for the server) and DataSync (for the file data).
Common Mistake
DataSync's built-in scheduling is sufficient for all automation needs, including sub-hourly transfers.
Correct
DataSync's built-in scheduler has a minimum interval of one hour. For sub-hourly or event-driven transfers (e.g., 'trigger a sync immediately after a batch job completes'), you must use the DataSync API triggered by Amazon EventBridge or AWS Lambda. The combination of EventBridge + Lambda + DataSync API enables near-real-time automated transfers.
Exam questions about 'near-real-time' or 'event-driven' data sync to S3 or EFS will test whether you know DataSync's scheduling limitation. The correct architecture adds EventBridge/Lambda as the trigger mechanism rather than relying on the built-in scheduler.
DATASYNC = 'Data Across The Airwaves, Scheduled, Yet Not Connecting databases' — it moves FILES, not database records.
Remember the 3 Ds of migration: DMS (Databases), DataSync (Data files), Direct Connect (network highway) — each has a distinct role.
Snow + Sync = Seed and Sync: Snowball plants the seed (bulk data offline), DataSync waters it (delta changes online) until harvest (cutover).
Agent rule: 'On-prem needs an Agent, AWS-to-AWS is Agentless' — A for Agent, A for on-premises; AWS-to-AWS = no A(gent) needed.
DataSync's superpowers vs. robocopy/rsync: Automatic Integrity checks, Automatic Throttling, Automatic Scheduling, Automatic Metadata — the 4 Automatics.
CertAI Tutor · SAA-C03, SAP-C02, DEA-C01 · 2026-02-22