analytics

Amazon DataZone: The Data Governance Powerhouse

Discover, share, and govern data across your organization with built-in catalog, access control, and collaboration

Updated 2026-02-22

Overview

Amazon DataZone is a data management service that makes it easy to catalog, discover, share, and govern data stored across AWS, on-premises, and third-party sources. It enables data producers to publish data assets to a searchable business catalog, while data consumers can discover and request access to those assets through a self-service portal. DataZone enforces fine-grained access controls and maintains data governance policies across the entire data lifecycle.

Enable secure, governed data sharing and collaboration across business units and teams without compromising compliance or security

Use When

Building an enterprise data marketplace where multiple teams need to discover and share datasets across accounts
Enforcing data governance policies and access controls across a data mesh architecture
Creating a searchable business data catalog that maps technical assets to business terms and glossaries
Enabling self-service data access with automated approval workflows so data consumers can request access without manual IT intervention
Managing data lineage and metadata for compliance and auditing purposes across heterogeneous data sources

Avoid When

Simple ETL pipelines with no governance requirements — use AWS Glue directly without the overhead of DataZone's governance layer
Single-team data warehousing with no cross-team sharing needs — Amazon Redshift or Athena alone is sufficient and cheaper
Real-time streaming data governance — DataZone is optimized for batch/cataloged assets, not real-time stream governance (use Kinesis + Lake Formation instead)
Lightweight metadata tagging only — AWS Glue Data Catalog with resource tagging may be sufficient and more cost-effective

Key Features

Data Domain

Top-level organizational unit that defines governance boundary, portal URL, and associated AWS account

Business Data Catalog

Searchable catalog of data assets with business metadata, tags, glossary terms, and lineage information

Data Projects

Collaboration workspaces within a domain where producers and consumers interact with data assets

Subscription-Based Access Control

Consumers request access to assets; producers or domain admins approve; access grants are automated via Lake Formation or IAM

Business Glossary

Domain-wide dictionary of business terms linked to data assets for semantic consistency

Metadata Forms

Custom schemas to attach structured business metadata to data assets

Data Lineage

Tracks the origin, movement, and transformation of data assets across the pipeline

AWS Glue Data Catalog Integration

DataZone can crawl and publish Glue Data Catalog tables as data assets automatically

Amazon Redshift Integration

Publish Redshift tables/views as data assets and manage access via DataZone subscriptions

Amazon Athena Integration

Consumers can query approved S3-based assets via Athena within DataZone environment

AWS Lake Formation Integration

DataZone uses Lake Formation to enforce fine-grained column and row-level access controls on approved subscriptions

Cross-Account Data Sharing

Data assets can be shared across AWS accounts within a DataZone domain using RAM and Lake Formation

DataZone Portal (Web UI)

Self-service web portal for producers to publish and consumers to discover and request data assets

API and SDK Access

Full API support for automating DataZone operations including asset publishing, subscription management, and catalog queries

Amazon SageMaker Integration

ML teams can discover and use governed datasets from DataZone directly in SageMaker environments

EventBridge Integration

DataZone publishes events (e.g., subscription approved, asset published) to EventBridge for workflow automation

CloudTrail Logging

All DataZone API calls are logged in CloudTrail for audit and compliance purposes

IAM Integration

IAM policies control who can create domains, manage projects, and publish assets

Data Mesh Architecture Support

DataZone is purpose-built to support federated data mesh patterns with domain ownership and self-serve infrastructure

Third-Party Data Source Connectors

DataZone supports connecting to sources beyond AWS, including on-premises databases via custom connectors

Integration Patterns

Automated Asset Publishing from Glue Catalog

high freq

Amazon DataZoneAWS Glue Data Catalog

DataZone connects to the Glue Data Catalog as a data source and automatically crawls tables, publishing them as searchable data assets in the DataZone catalog. This is the most common pattern for S3-based data lakes.

Fine-Grained Access Control Enforcement

high freq

Amazon DataZoneAWS Lake Formation

When a DataZone subscription is approved, Lake Formation automatically grants the appropriate column-level, row-level, or table-level permissions to the consumer. This decouples access policy from data location.

Data Warehouse Asset Governance

high freq

Amazon DataZoneAmazon Redshift

Redshift tables and views are registered as DataZone assets. When consumers subscribe, DataZone orchestrates Redshift data sharing grants, enabling governed SQL access without direct credential sharing.

Self-Service Governed Query Access

high freq

Amazon DataZoneAmazon Athena

Data consumers discover S3-based assets in DataZone, subscribe and receive approval, then query the data using Athena within a DataZone-managed environment with Lake Formation enforcing access.

Cross-Account Data Mesh

high freq

Amazon DataZoneAWS RAMMultiple AWS Accounts

DataZone domain spans multiple AWS accounts. Producers in account A publish assets; consumers in account B discover and subscribe. AWS RAM and Lake Formation handle cross-account permission grants automatically.

Event-Driven Governance Workflow Automation

medium freq

Amazon DataZoneAmazon EventBridge

DataZone publishes domain events (asset published, subscription requested, subscription approved/rejected) to EventBridge. Downstream Lambda functions or Step Functions automate notifications, provisioning, or compliance logging.

Governed ML Feature and Dataset Discovery

medium freq

Amazon DataZoneAmazon SageMaker

ML engineers discover approved datasets and feature stores via DataZone catalog, then import them directly into SageMaker Studio environments with governance controls enforced automatically.

Audit Trail for Data Access Governance

medium freq

Amazon DataZoneAWS CloudTrail

All DataZone API operations — asset publishing, subscription approvals, catalog searches — are captured in CloudTrail, enabling compliance audits and security investigations.

Federated Identity for DataZone Portal

medium freq

Amazon DataZoneAWS IAM Identity Center

IAM Identity Center (SSO) provides federated login to the DataZone web portal, enabling enterprise users to access the data catalog with their corporate credentials without separate AWS accounts.

Service Limits & Quotas

LimitValueNote

DataZone domains per AWS account

Not explicitly published in live docs — consult Service Quotas console domains

Do not confuse DataZone domains with Route 53 hosted zones or IAM identity domains — they are DataZone-specific organizational constructs

Projects per domain

Not explicitly published in live docs — consult Service Quotas console projects

Projects are a DataZone-specific concept — do not confuse with AWS CodeBuild projects or Lake Formation governed tables

Data sources per project

Not explicitly published in live docs — consult Service Quotas console data sources

A data source in DataZone is a connection configuration, not the data itself; the actual data stays in its original location

Metadata forms per asset

Not explicitly published in live docs — consult Service Quotas console forms

Metadata forms are customizable schemas; they are distinct from AWS Glue table schemas and serve a business context purpose, not a technical schema purpose

Subscription requests per asset

Not explicitly published in live docs — consult Service Quotas console requests

Subscriptions in DataZone are not the same as SNS subscriptions; they represent governed data access grants

Glossary terms per domain

Not explicitly published in live docs — consult Service Quotas console terms

Glossary terms are a governance feature unique to DataZone; AWS Glue Data Catalog does not natively support business glossaries at this level

API requests per second (default)

Not explicitly published in live docs — consult Service Quotas console RPS

API limits may vary by operation type (read vs. write vs. search); always validate in the Service Quotas console for your specific region

Pricing Model

Pay-per-use based on assets published and API calls; no upfront cost

Pricing is based on the number of data assets published to the DataZone catalog — you pay per asset per month
API request pricing applies for programmatic access to the DataZone catalog and governance APIs
There is no charge for the DataZone portal access itself — cost is driven by catalog assets and API usage
Underlying AWS services (Glue, Redshift, Athena, Lake Formation, S3) are billed separately at their own rates
Cross-account data sharing costs may include AWS RAM and Lake Formation charges in addition to DataZone asset fees
No minimum fee or commitment — DataZone scales cost linearly with catalog usage
Always check the AWS Pricing page for your specific region as DataZone pricing may vary by region

Exam Tips

criticalDataZone Domain Architecture

DataZone uses a DOMAIN as the top-level governance boundary — everything (projects, catalog, glossary, portal) exists within a domain. On exams, if a question asks how to create separate governance boundaries for different business units, the answer is separate DataZone domains.

criticalSubscription-Based Access Control

The subscription model is the core access control mechanism in DataZone — consumers do NOT get direct IAM or Redshift credentials. They subscribe to assets, approval is granted, and DataZone + Lake Formation automatically provision the access. If an exam asks about self-service governed data access, DataZone subscriptions is the answer.

criticalMetadata Management vs. Data Movement

DataZone does NOT store or move data — it only manages metadata, governance policies, and access grants. The actual data stays in S3, Redshift, or wherever it lives. Confusing DataZone as a data movement service is a common exam trap.

criticalLake Formation Integration

Lake Formation is the enforcement engine behind DataZone access control. When a DataZone subscription is approved, it's Lake Formation that actually grants the permissions. Exam questions may ask which service enforces the fine-grained permissions in a DataZone workflow — the answer is Lake Formation.

criticalData Mesh Architecture

DataZone supports DATA MESH architecture natively — it's purpose-built for federated ownership where domain teams own and publish their own data products. If an exam scenario describes a data mesh with multiple business domains needing governed sharing, DataZone is almost certainly the right answer.

critical

DataZone = Business data governance platform with self-service portal, subscription-based access, and business glossary. It is NOT a data movement service and does NOT store data — it only manages metadata and governs access to data in its original location.

critical

The DataZone governance stack = DataZone (catalog + workflow) + Lake Formation (permission enforcement) + Glue Data Catalog (technical metadata source). All three work together; DataZone alone is not sufficient for complete fine-grained governance.

critical

Data Mesh architecture on AWS = Amazon DataZone. Whenever an exam scenario describes federated data ownership, cross-domain data sharing with governance, or self-service data access with approval workflows across business units, DataZone is the purpose-built answer.

importantBusiness Glossary vs. Glue Data Catalog

The DataZone Business Glossary is a differentiating feature — it links business terminology to technical data assets. AWS Glue Data Catalog does NOT have this business glossary capability. Exam questions testing the difference between Glue Catalog and DataZone often hinge on this feature.

importantEventBridge Integration

DataZone integrates with EventBridge to emit events for key lifecycle actions (asset published, subscription requested, subscription approved). Use this pattern for automated governance workflows. Exam questions about automating responses to DataZone governance events point to EventBridge.

importantIAM Identity Center Integration

IAM Identity Center (SSO) is the recommended identity provider for the DataZone portal. Users log in with corporate credentials via SSO — they do not need individual IAM users. Exam scenarios about enterprise users accessing the DataZone portal should trigger this knowledge.

importantAudit and Compliance

DataZone CloudTrail integration means ALL catalog operations are auditable — asset publishing, subscription approvals, and searches. For compliance and audit exam scenarios involving data governance, DataZone + CloudTrail is the complete answer.

Good to KnowDataZone Project Concept

DataZone PROJECTS are collaboration workspaces within a domain — they are NOT AWS projects, CodeBuild projects, or any other AWS service concept. Projects group producers and consumers together and define the scope of data sharing. This terminology distinction appears on exams.

Common Misconceptions & Traps

Common Mistake

Amazon DataZone is just another name for AWS Glue Data Catalog with extra features

Correct

DataZone is a separate, higher-level governance service that CAN use Glue Data Catalog as a data source but adds business catalog capabilities (glossaries, metadata forms, subscription-based access, self-service portal) that Glue Data Catalog does not provide. Glue Data Catalog is a technical metadata store; DataZone is a business data governance platform.

This is the #1 confusion on exams. Remember: Glue = technical schema catalog for ETL; DataZone = business data marketplace with governance. If the scenario mentions business users, self-service access, or cross-team sharing with approvals, it's DataZone, not Glue.

Common Mistake

DataZone moves or copies data to a central repository so users can access it

Correct

DataZone NEVER moves or copies data. It is purely a metadata, governance, and access management layer. Data stays in its original location (S3, Redshift, etc.). DataZone manages who can access it and provides a catalog to find it, but the data itself is never touched or replicated by DataZone.

Exam questions may describe a scenario and ask which service 'provides access to data' — DataZone provides governed ACCESS (metadata + permissions), not data movement. Confusing this leads to choosing DataZone when a data transfer service is needed, or vice versa.

Common Mistake

AWS Lake Formation and Amazon DataZone are competing services that do the same thing

Correct

Lake Formation and DataZone are complementary. Lake Formation is the fine-grained access control and permissions enforcement engine for data lakes. DataZone is the business catalog, discovery, and governance workflow layer that USES Lake Formation to enforce the permissions it grants. Together they form a complete governance solution — DataZone for the 'what and who', Lake Formation for the 'how access is enforced'.

Exam scenarios often test whether you know that DataZone orchestrates Lake Formation — when a DataZone subscription is approved, Lake Formation is what actually grants the table/column/row permissions. Choosing one over the other is wrong; the correct answer often involves both.

Common Mistake

DataZone can only govern data stored in AWS — it cannot connect to on-premises or third-party data sources

Correct

DataZone supports connecting to on-premises databases and third-party data sources through custom connectors, not just AWS-native services. While its deepest integrations are with AWS services (Glue, Redshift, S3), it is not limited to AWS-only data sources.

Hybrid data governance scenarios on exams may include on-premises data sources. Eliminating DataZone because you think it's AWS-only would be incorrect.

Common Mistake

Any IAM user with S3 access can bypass DataZone governance and access data directly

Correct

While technically an IAM admin COULD grant direct S3 access outside of DataZone, in a properly governed DataZone + Lake Formation architecture, Lake Formation policies enforce access at the data layer and can restrict even direct S3 access. The governance is not just at the portal level — Lake Formation enforces it at the service level. Proper implementation requires setting Lake Formation as the authoritative access control layer.

Exam questions about governance bypass risks test whether you understand that DataZone governance is only as strong as the underlying Lake Formation configuration. DataZone alone without Lake Formation enforcement can be bypassed via direct IAM/S3 access.

Common Mistake

DataZone is only relevant for the Data Analytics specialty exam

Correct

DataZone concepts appear on the AWS Solutions Architect Associate, Solutions Architect Professional, and Data Engineer Associate exams because it represents a modern architectural pattern (data mesh, governed data sharing, self-service analytics) that architects must know. It is not limited to specialty certifications.

Candidates preparing for SA-level exams sometimes skip DataZone assuming it's too specialized. Architectural questions about cross-team data governance, data mesh, or self-service data access can and do appear on general architect exams.

Memory Tricks

🧠

DataZone = DOGS: Discover (catalog search), Own (domain governance), Govern (policies + Lake Formation), Share (subscription-based access) — the four core pillars of DataZone

🧠

Remember the DataZone hierarchy: DOMAIN → PROJECT → ASSET → SUBSCRIPTION (DPAS) — like a company org chart: Company → Team → Product → Sale

🧠

DataZone does NOT move data — think of it as a LIBRARY CARD SYSTEM: the library catalog (DataZone) tells you what books (data) exist and where they are, and the librarian (Lake Formation) checks your card before letting you read them. The books never leave the shelf (original source).

🧠

Glue vs DataZone: Glue is for ENGINEERS (technical schemas, ETL jobs), DataZone is for EVERYONE (business catalog, self-service portal, governance workflows)

CertAI Tutor · · 2026-02-22

Ready to test your knowledge?

Practice exam questions with AI-powered explanations — free to start.

Amazon DataZone: The Data Governance Powerhouse

Overview

Key Features

Integration Patterns

Service Limits & Quotas

Pricing Model

Exam Tips

Common Misconceptions & Traps

Memory Tricks

Ready to test your knowledge?

Related Cheat Sheets