
Cargando...
Subscribe, discover, and deliver third-party data at scale — without building pipelines from scratch
AWS Data Exchange is a fully managed data marketplace that enables customers to find, subscribe to, and use third-party data products directly within AWS. Providers can publish data sets (files, APIs, Amazon Redshift queries, or Amazon S3 data access) and subscribers receive automatic, governed delivery to their AWS environment. It eliminates the operational burden of custom data ingestion pipelines, licensing negotiations, and compliance overhead for third-party data consumption.
To provide a governed, scalable marketplace for discovering and consuming third-party data products directly within AWS services — without custom ETL pipelines or manual data transfers
Use When
Avoid When
S3-based data set delivery
Subscribers export revisions directly to their own S3 buckets; no data crosses account boundaries insecurely
API-based data products (Amazon API Gateway)
Subscribers call provider APIs through Data Exchange without managing API keys manually; entitlement is governed by subscription
Amazon Redshift datashare delivery
Query provider data directly in Redshift without physically copying it — zero-ETL data sharing
AWS Lake Formation data access
Fine-grained column and row level access to provider data lakes via Lake Formation permissions
Automatic revision notifications
Subscribers receive EventBridge events when new revisions are published, enabling automated downstream processing
AWS CloudTrail integration
All API calls (subscribe, export, publish) are logged in CloudTrail for audit and compliance
Amazon EventBridge integration
New revision events trigger automated Lambda functions, Step Functions, or Glue jobs for hands-free data pipelines
AWS KMS encryption support
Data exported to S3 can be encrypted with customer-managed KMS keys; provider data is encrypted in transit and at rest
Built-in Data Loss Prevention (DLP)
Data Exchange does NOT include DLP — use Amazon Macie on the destination S3 bucket for sensitive data classification
Built-in analytics or query engine
Data Exchange only delivers data; analytics must be performed using Athena, Redshift, QuickSight, or other AWS services post-delivery
Built-in data storage or backup
Data Exchange is a delivery/subscription layer only; persistent storage is the subscriber's S3 bucket
Automatic compliance certification transfer
AWS infrastructure compliance (e.g., HIPAA, PCI) does not automatically certify the customer's use of third-party data products
AWS IAM access control
IAM policies control who can subscribe, export, and manage data sets within an account
Multi-region availability
Available in select AWS regions; check regional availability before designing cross-region architectures
Free data products
Many providers offer free data sets; cost is only incurred for paid subscriptions billed via AWS Marketplace
Automated Data Lake Ingestion
high freqSubscriber exports Data Exchange revisions directly into a designated S3 bucket. New revision events trigger downstream processing. S3 is the primary landing zone for all file-based data products — this is the most common integration pattern.
Event-Driven Data Pipeline
high freqWhen a provider publishes a new revision, Data Exchange emits an EventBridge event. A Lambda function is triggered to automatically export the revision to S3 and kick off a Glue or Athena job — enabling fully automated, serverless data refresh pipelines.
Audit and Compliance Logging
high freqAll Data Exchange API calls (subscribe, export, create-revision, publish) are automatically logged in CloudTrail. This is essential for compliance, cost attribution, and access auditing — especially in regulated industries using third-party data.
Sensitive Data Classification Post-Delivery
high freqAfter third-party data is exported to S3, Amazon Macie scans the bucket for PII, PHI, or other sensitive data. Data Exchange itself has NO built-in DLP — Macie must be layered on top for data governance and compliance.
Encrypted Data Delivery
high freqSubscribers configure S3 buckets with KMS customer-managed keys (CMKs). When Data Exchange exports revisions to the S3 bucket, objects are encrypted at rest using the CMK — ensuring data sovereignty and compliance with encryption mandates.
Least-Privilege Data Access Control
high freqIAM policies restrict which principals can call Data Exchange APIs (e.g., dataexchange:Subscribe, dataexchange:GetDataSet). This ensures only authorized teams can subscribe to paid products or export data — critical for cost control and data governance.
Zero-ETL Third-Party Data Sharing
high freqProviders publish Redshift datashares via Data Exchange. Subscribers query the data directly in their Redshift cluster without physically copying it — eliminating ETL overhead and storage duplication for large structured datasets.
Subscription and Usage Monitoring
high freqCloudWatch metrics and alarms monitor Data Exchange API usage, export job status, and errors. Combined with CloudTrail, this provides full observability over data consumption patterns and costs.
AWS Data Exchange is ONLY a marketplace delivery mechanism — it does NOT store data, perform analytics, provide backup, or include DLP. If a question asks about analytics, storage, or DLP on third-party data, the answer involves S3, Athena, Redshift, Macie, or Glue — not Data Exchange itself.
When a question describes 'automatically processing new third-party data as soon as it's published,' the correct architecture is: Data Exchange → EventBridge (new revision event) → Lambda → S3 → Glue/Athena. EventBridge is the glue between Data Exchange and downstream automation.
Data Exchange is for COMMERCIAL third-party data marketplace transactions. If the scenario involves sharing data BETWEEN your own AWS accounts or within your organization, the correct services are AWS RAM, S3 bucket policies, Lake Formation, or Redshift datashares — NOT Data Exchange.
AWS infrastructure compliance certifications (HIPAA, PCI-DSS, SOC 2) apply to AWS's infrastructure, NOT to the third-party data products you subscribe to via Data Exchange. Customers must independently validate compliance of the data they consume.
Data Exchange = DELIVERY ONLY. No analytics, no storage, no DLP, no backup. Any exam answer requiring these capabilities needs S3 + Athena/Redshift/Macie/Glue layered on top.
For automated pipelines: Data Exchange new revision → EventBridge event → Lambda → S3 → downstream analytics. EventBridge is the mandatory bridge between Data Exchange events and AWS automation.
Data Exchange is for COMMERCIAL third-party marketplace data. Internal account-to-account sharing uses S3 cross-account policies, AWS RAM, or Lake Formation — never Data Exchange for internal use cases.
For cost optimization questions: Data Exchange subscription charges appear on the AWS Marketplace line item of your bill. Data transfer costs when exporting to S3 (especially cross-region) are ADDITIONAL charges at standard AWS rates — always account for both in cost architecture questions.
Know the four data set delivery types: (1) S3 files — export revisions to your S3 bucket; (2) API Gateway — call provider APIs via subscription entitlement; (3) Redshift datashares — query without copying; (4) Lake Formation — fine-grained access to data lakes. Match the delivery type to the use case in exam scenarios.
Revisions in Data Exchange are IMMUTABLE once published. This is a deliberate design for data reproducibility and audit trails. If a question asks how providers update data, the answer is always 'publish a new revision' — you cannot modify an existing revision.
For CLF-C02: Place Data Exchange in the Analytics category of AWS services. It is NOT a database, NOT a storage service, and NOT a security service. The CLF exam may test category awareness — 'Which service allows you to subscribe to third-party data?' = AWS Data Exchange.
Amazon Macie does NOT integrate directly with Data Exchange — it integrates with S3. The correct pattern for sensitive data governance with Data Exchange is: export to S3 FIRST, THEN enable Macie on that S3 bucket. Never say 'Macie scans Data Exchange directly.'
CloudTrail automatically logs ALL Data Exchange API calls. For audit and compliance scenarios involving third-party data access, CloudTrail is always part of the correct answer. You do NOT need to configure anything extra — it's automatic for supported API actions.
Common Mistake
AWS Data Exchange provides built-in analytics so I can query third-party data directly within the service
Correct
AWS Data Exchange is purely a delivery and subscription mechanism with zero analytics capabilities. After data is delivered to your S3 bucket or Redshift cluster, you use separate services (Amazon Athena, Amazon Redshift, Amazon QuickSight, AWS Glue) to analyze it.
This is the #1 conceptual trap. The word 'Exchange' implies data movement, not data analysis. Think of Data Exchange as a courier service — it delivers the package (data) to your door (S3/Redshift), but you must open and use it yourself with other tools. Exam questions will describe an analytics need and include Data Exchange as a distractor.
Common Mistake
AWS Data Exchange stores and backs up the third-party data I subscribe to
Correct
AWS Data Exchange has NO storage layer. It is a transactional delivery mechanism. Subscribers must have their own S3 bucket (or Redshift cluster) as the destination. The data is not retained by Data Exchange after delivery — if you delete your S3 bucket, the data is gone.
Candidates conflate 'subscribing to data' with 'AWS storing it for me.' The service is stateless from a storage perspective. For exam questions about data retention, durability, or backup of third-party data, the answer always involves S3 versioning, S3 Glacier, or AWS Backup — never Data Exchange itself.
Common Mistake
Because AWS is HIPAA-eligible and PCI-compliant, using AWS Data Exchange with third-party health or financial data automatically makes my use case compliant
Correct
AWS infrastructure compliance certifications do NOT transfer to third-party data products or to the customer's use of that data. Customers must independently assess the compliance posture of each data provider and their own data handling practices. AWS Data Exchange does not certify, validate, or audit the content of data products.
This is a classic shared responsibility model trap applied to data products. AWS is responsible for the security OF the exchange infrastructure; you are responsible for the security and compliance IN the data you consume. Exam scenarios about regulated data (PHI, PCI) require the customer to implement their own controls (encryption with KMS, access control with IAM, classification with Macie).
Common Mistake
I can use AWS Data Exchange to share data between my own AWS accounts within my organization instead of setting up S3 cross-account access
Correct
AWS Data Exchange is a commercial marketplace for third-party data products — it is NOT designed for internal account-to-account data sharing within an organization. For internal sharing, use S3 bucket policies with cross-account IAM roles, AWS Resource Access Manager (RAM), Amazon Redshift datashares (directly), or AWS Lake Formation — these are purpose-built for intra-organization data sharing.
The word 'Exchange' misleads candidates into thinking it's a general data sharing tool. The key differentiator: Data Exchange involves AWS Marketplace transactions, provider/subscriber commercial relationships, and entitlement management. Internal sharing has none of these requirements and should use simpler, cheaper, purpose-built mechanisms.
Common Mistake
AWS Data Exchange includes built-in Data Loss Prevention (DLP) to automatically detect and protect sensitive data in subscribed datasets
Correct
AWS Data Exchange has absolutely NO built-in DLP capabilities. To classify and protect sensitive data in third-party datasets, you must export the data to Amazon S3 first, then enable Amazon Macie on that S3 bucket. Macie is the AWS-native DLP service — it is completely separate from Data Exchange.
This trap appears because candidates assume a 'data service' must include data protection features. The architectural pattern to memorize: Data Exchange (delivery) → S3 (storage) → Macie (DLP classification) → CloudTrail (audit). Each service has a distinct, non-overlapping role.
Common Mistake
AWS Data Exchange subscription charges appear as a separate 'AWS Data Exchange' line item on my AWS bill
Correct
All AWS Data Exchange subscription charges appear under the AWS Marketplace line item on your AWS bill — not as a standalone Data Exchange charge. Additionally, data transfer costs (e.g., cross-region S3 exports) appear as separate standard AWS data transfer charges.
For cost optimization and billing questions, candidates must know where charges appear. Consolidated billing for AWS Organizations will show Marketplace charges aggregated — this matters when allocating costs across business units using cost allocation tags or AWS Cost Explorer.
DELIVER, DON'T ANALYZE: Data Exchange = D-E-L-I-V-E-R-Y only. For analytics, you need the A-team: Athena, Amazon Redshift, AWS Glue.
The COURIER Model: Data Exchange is like FedEx — it delivers the package (data) to your S3 door. FedEx doesn't store your packages, analyze their contents, or protect what's inside. YOU handle storage (S3), analysis (Athena/Redshift), and protection (Macie/KMS).
MARKETPLACE ≠ SHARING: If money changes hands or there's a provider/subscriber commercial relationship → Data Exchange. If it's your own accounts/org → S3 policies, RAM, or Lake Formation.
EVENT-BRIDGE IS THE TRIGGER: New revision published → EventBridge fires → Lambda runs → S3 gets data → Glue/Athena analyzes. Remember: E-L-S-G (EventBridge, Lambda, S3, Glue) is the automated Data Exchange pipeline chain.
CertAI Tutor · SAA-C03, SAP-C02, CLF-C02 · 2026-02-22