
Cargando...
Discover, share, and govern data across your organization with built-in catalog, access control, and collaboration
Amazon DataZone is a data management service that makes it easy to catalog, discover, share, and govern data stored across AWS, on-premises, and third-party sources. It enables data producers to publish data assets to a searchable business catalog, while data consumers can discover and request access to those assets through a self-service portal. DataZone enforces fine-grained access controls and maintains data governance policies across the entire data lifecycle.
Enable secure, governed data sharing and collaboration across business units and teams without compromising compliance or security
Use When
Avoid When
Data Domain
Top-level organizational unit that defines governance boundary, portal URL, and associated AWS account
Business Data Catalog
Searchable catalog of data assets with business metadata, tags, glossary terms, and lineage information
Data Projects
Collaboration workspaces within a domain where producers and consumers interact with data assets
Subscription-Based Access Control
Consumers request access to assets; producers or domain admins approve; access grants are automated via Lake Formation or IAM
Business Glossary
Domain-wide dictionary of business terms linked to data assets for semantic consistency
Metadata Forms
Custom schemas to attach structured business metadata to data assets
Data Lineage
Tracks the origin, movement, and transformation of data assets across the pipeline
AWS Glue Data Catalog Integration
DataZone can crawl and publish Glue Data Catalog tables as data assets automatically
Amazon Redshift Integration
Publish Redshift tables/views as data assets and manage access via DataZone subscriptions
Amazon Athena Integration
Consumers can query approved S3-based assets via Athena within DataZone environment
AWS Lake Formation Integration
DataZone uses Lake Formation to enforce fine-grained column and row-level access controls on approved subscriptions
Cross-Account Data Sharing
Data assets can be shared across AWS accounts within a DataZone domain using RAM and Lake Formation
DataZone Portal (Web UI)
Self-service web portal for producers to publish and consumers to discover and request data assets
API and SDK Access
Full API support for automating DataZone operations including asset publishing, subscription management, and catalog queries
Amazon SageMaker Integration
ML teams can discover and use governed datasets from DataZone directly in SageMaker environments
EventBridge Integration
DataZone publishes events (e.g., subscription approved, asset published) to EventBridge for workflow automation
CloudTrail Logging
All DataZone API calls are logged in CloudTrail for audit and compliance purposes
IAM Integration
IAM policies control who can create domains, manage projects, and publish assets
Data Mesh Architecture Support
DataZone is purpose-built to support federated data mesh patterns with domain ownership and self-serve infrastructure
Third-Party Data Source Connectors
DataZone supports connecting to sources beyond AWS, including on-premises databases via custom connectors
Automated Asset Publishing from Glue Catalog
high freqDataZone connects to the Glue Data Catalog as a data source and automatically crawls tables, publishing them as searchable data assets in the DataZone catalog. This is the most common pattern for S3-based data lakes.
Fine-Grained Access Control Enforcement
high freqWhen a DataZone subscription is approved, Lake Formation automatically grants the appropriate column-level, row-level, or table-level permissions to the consumer. This decouples access policy from data location.
Data Warehouse Asset Governance
high freqRedshift tables and views are registered as DataZone assets. When consumers subscribe, DataZone orchestrates Redshift data sharing grants, enabling governed SQL access without direct credential sharing.
Self-Service Governed Query Access
high freqData consumers discover S3-based assets in DataZone, subscribe and receive approval, then query the data using Athena within a DataZone-managed environment with Lake Formation enforcing access.
Cross-Account Data Mesh
high freqDataZone domain spans multiple AWS accounts. Producers in account A publish assets; consumers in account B discover and subscribe. AWS RAM and Lake Formation handle cross-account permission grants automatically.
Event-Driven Governance Workflow Automation
medium freqDataZone publishes domain events (asset published, subscription requested, subscription approved/rejected) to EventBridge. Downstream Lambda functions or Step Functions automate notifications, provisioning, or compliance logging.
Governed ML Feature and Dataset Discovery
medium freqML engineers discover approved datasets and feature stores via DataZone catalog, then import them directly into SageMaker Studio environments with governance controls enforced automatically.
Audit Trail for Data Access Governance
medium freqAll DataZone API operations — asset publishing, subscription approvals, catalog searches — are captured in CloudTrail, enabling compliance audits and security investigations.
Federated Identity for DataZone Portal
medium freqIAM Identity Center (SSO) provides federated login to the DataZone web portal, enabling enterprise users to access the data catalog with their corporate credentials without separate AWS accounts.
DataZone uses a DOMAIN as the top-level governance boundary — everything (projects, catalog, glossary, portal) exists within a domain. On exams, if a question asks how to create separate governance boundaries for different business units, the answer is separate DataZone domains.
The subscription model is the core access control mechanism in DataZone — consumers do NOT get direct IAM or Redshift credentials. They subscribe to assets, approval is granted, and DataZone + Lake Formation automatically provision the access. If an exam asks about self-service governed data access, DataZone subscriptions is the answer.
DataZone does NOT store or move data — it only manages metadata, governance policies, and access grants. The actual data stays in S3, Redshift, or wherever it lives. Confusing DataZone as a data movement service is a common exam trap.
Lake Formation is the enforcement engine behind DataZone access control. When a DataZone subscription is approved, it's Lake Formation that actually grants the permissions. Exam questions may ask which service enforces the fine-grained permissions in a DataZone workflow — the answer is Lake Formation.
DataZone supports DATA MESH architecture natively — it's purpose-built for federated ownership where domain teams own and publish their own data products. If an exam scenario describes a data mesh with multiple business domains needing governed sharing, DataZone is almost certainly the right answer.
DataZone = Business data governance platform with self-service portal, subscription-based access, and business glossary. It is NOT a data movement service and does NOT store data — it only manages metadata and governs access to data in its original location.
The DataZone governance stack = DataZone (catalog + workflow) + Lake Formation (permission enforcement) + Glue Data Catalog (technical metadata source). All three work together; DataZone alone is not sufficient for complete fine-grained governance.
Data Mesh architecture on AWS = Amazon DataZone. Whenever an exam scenario describes federated data ownership, cross-domain data sharing with governance, or self-service data access with approval workflows across business units, DataZone is the purpose-built answer.
The DataZone Business Glossary is a differentiating feature — it links business terminology to technical data assets. AWS Glue Data Catalog does NOT have this business glossary capability. Exam questions testing the difference between Glue Catalog and DataZone often hinge on this feature.
DataZone integrates with EventBridge to emit events for key lifecycle actions (asset published, subscription requested, subscription approved). Use this pattern for automated governance workflows. Exam questions about automating responses to DataZone governance events point to EventBridge.
IAM Identity Center (SSO) is the recommended identity provider for the DataZone portal. Users log in with corporate credentials via SSO — they do not need individual IAM users. Exam scenarios about enterprise users accessing the DataZone portal should trigger this knowledge.
DataZone CloudTrail integration means ALL catalog operations are auditable — asset publishing, subscription approvals, and searches. For compliance and audit exam scenarios involving data governance, DataZone + CloudTrail is the complete answer.
DataZone PROJECTS are collaboration workspaces within a domain — they are NOT AWS projects, CodeBuild projects, or any other AWS service concept. Projects group producers and consumers together and define the scope of data sharing. This terminology distinction appears on exams.
Common Mistake
Amazon DataZone is just another name for AWS Glue Data Catalog with extra features
Correct
DataZone is a separate, higher-level governance service that CAN use Glue Data Catalog as a data source but adds business catalog capabilities (glossaries, metadata forms, subscription-based access, self-service portal) that Glue Data Catalog does not provide. Glue Data Catalog is a technical metadata store; DataZone is a business data governance platform.
This is the #1 confusion on exams. Remember: Glue = technical schema catalog for ETL; DataZone = business data marketplace with governance. If the scenario mentions business users, self-service access, or cross-team sharing with approvals, it's DataZone, not Glue.
Common Mistake
DataZone moves or copies data to a central repository so users can access it
Correct
DataZone NEVER moves or copies data. It is purely a metadata, governance, and access management layer. Data stays in its original location (S3, Redshift, etc.). DataZone manages who can access it and provides a catalog to find it, but the data itself is never touched or replicated by DataZone.
Exam questions may describe a scenario and ask which service 'provides access to data' — DataZone provides governed ACCESS (metadata + permissions), not data movement. Confusing this leads to choosing DataZone when a data transfer service is needed, or vice versa.
Common Mistake
AWS Lake Formation and Amazon DataZone are competing services that do the same thing
Correct
Lake Formation and DataZone are complementary. Lake Formation is the fine-grained access control and permissions enforcement engine for data lakes. DataZone is the business catalog, discovery, and governance workflow layer that USES Lake Formation to enforce the permissions it grants. Together they form a complete governance solution — DataZone for the 'what and who', Lake Formation for the 'how access is enforced'.
Exam scenarios often test whether you know that DataZone orchestrates Lake Formation — when a DataZone subscription is approved, Lake Formation is what actually grants the table/column/row permissions. Choosing one over the other is wrong; the correct answer often involves both.
Common Mistake
DataZone can only govern data stored in AWS — it cannot connect to on-premises or third-party data sources
Correct
DataZone supports connecting to on-premises databases and third-party data sources through custom connectors, not just AWS-native services. While its deepest integrations are with AWS services (Glue, Redshift, S3), it is not limited to AWS-only data sources.
Hybrid data governance scenarios on exams may include on-premises data sources. Eliminating DataZone because you think it's AWS-only would be incorrect.
Common Mistake
Any IAM user with S3 access can bypass DataZone governance and access data directly
Correct
While technically an IAM admin COULD grant direct S3 access outside of DataZone, in a properly governed DataZone + Lake Formation architecture, Lake Formation policies enforce access at the data layer and can restrict even direct S3 access. The governance is not just at the portal level — Lake Formation enforces it at the service level. Proper implementation requires setting Lake Formation as the authoritative access control layer.
Exam questions about governance bypass risks test whether you understand that DataZone governance is only as strong as the underlying Lake Formation configuration. DataZone alone without Lake Formation enforcement can be bypassed via direct IAM/S3 access.
Common Mistake
DataZone is only relevant for the Data Analytics specialty exam
Correct
DataZone concepts appear on the AWS Solutions Architect Associate, Solutions Architect Professional, and Data Engineer Associate exams because it represents a modern architectural pattern (data mesh, governed data sharing, self-service analytics) that architects must know. It is not limited to specialty certifications.
Candidates preparing for SA-level exams sometimes skip DataZone assuming it's too specialized. Architectural questions about cross-team data governance, data mesh, or self-service data access can and do appear on general architect exams.
DataZone = DOGS: Discover (catalog search), Own (domain governance), Govern (policies + Lake Formation), Share (subscription-based access) — the four core pillars of DataZone
Remember the DataZone hierarchy: DOMAIN → PROJECT → ASSET → SUBSCRIPTION (DPAS) — like a company org chart: Company → Team → Product → Sale
DataZone does NOT move data — think of it as a LIBRARY CARD SYSTEM: the library catalog (DataZone) tells you what books (data) exist and where they are, and the librarian (Lake Formation) checks your card before letting you read them. The books never leave the shelf (original source).
Glue vs DataZone: Glue is for ENGINEERS (technical schemas, ETL jobs), DataZone is for EVERYONE (business catalog, self-service portal, governance workflows)
CertAI Tutor · · 2026-02-22