databasesDEA-C01SAA-C03SAP-C02DOP-C02CLF-C02

Amazon Redshift: The Cloud Data Warehouse Powerhouse

Petabyte-scale SQL analytics on structured and semi-structured data — fast, managed, and deeply integrated with the AWS ecosystem

Updated 2026-02-21

Overview

Amazon Redshift is a fully managed, petabyte-scale cloud data warehouse that uses columnar storage, massively parallel processing (MPP), and advanced query optimization to deliver fast analytical query performance. It supports standard SQL and integrates natively with Amazon S3, AWS Glue, Amazon Athena, and BI tools, making it the go-to choice for OLAP workloads on AWS. Redshift Serverless and Redshift Spectrum extend its capabilities to on-demand compute and querying data directly in S3 without loading it first.

Run complex analytical (OLAP) queries across massive structured datasets with high concurrency, low latency, and deep AWS ecosystem integration — without managing infrastructure.

Use When

Enterprise data warehousing: aggregating data from multiple transactional systems (RDS, DynamoDB exports, S3 data lake) for business intelligence reporting
Running complex multi-table JOIN queries and aggregations on hundreds of billions of rows where row-store databases (RDS/Aurora) would be too slow
Building a lakehouse architecture using Redshift Spectrum to query S3 data lake alongside warehouse data in a single SQL query
Real-time analytics pipelines: ingesting streaming data via Amazon Kinesis Data Firehose or Kafka and making it immediately queryable for dashboards
Ad-hoc and scheduled reporting for BI tools like Amazon QuickSight, Tableau, or Looker that require JDBC/ODBC connectivity to a SQL engine

Avoid When

OLTP workloads (frequent single-row inserts, updates, deletes): Use Amazon RDS or Aurora instead — Redshift's columnar storage is optimized for reads, not high-throughput transactional writes
Unstructured or document data storage: Use Amazon DynamoDB (key-value/document), Amazon OpenSearch (full-text search), or Amazon S3 (raw objects) — Redshift requires structured, schema-defined data
Sub-millisecond latency requirements: Use ElastiCache (Redis/Memcached) or DynamoDB DAX — Redshift query latency is measured in seconds, not microseconds
Small datasets under a few GB: Use Amazon Athena (serverless, pay-per-query on S3) or RDS — Redshift cluster overhead is not cost-justified for small data volumes

Key Features

Columnar Storage

Stores data by column instead of row — dramatically reduces I/O for analytical queries that scan specific columns across billions of rows

Massively Parallel Processing (MPP)

Distributes query execution across all nodes and slices simultaneously — linear performance scaling with cluster size

Redshift Spectrum

Query data directly in Amazon S3 (Parquet, ORC, JSON, CSV, Avro) without loading it into Redshift — uses separate Spectrum layer, does not consume cluster compute

Redshift Serverless

Auto-scaling, pay-per-use data warehouse with no cluster management — ideal for intermittent or unpredictable workloads

Concurrency Scaling

Automatically adds transient read clusters during peak demand — 1 free hour per 24h of main cluster usage

Workload Management (WLM)

Define up to 8 queues with memory and concurrency allocations — Auto WLM uses ML to dynamically manage resources

AQUA (Advanced Query Accelerator)

Hardware-accelerated cache layer for ra3 clusters — pushes computation closer to storage for up to 10x faster queries on some workloads

Data Sharing

Share live data across Redshift clusters and AWS accounts without copying — producers share datashares, consumers query in real time

Federated Query

Query live data in Amazon RDS (PostgreSQL, MySQL) and Aurora directly from Redshift SQL — joins warehouse data with operational DB data

Materialized Views

Pre-compute and cache complex query results — support incremental refresh for streaming ingestion scenarios

Automatic Table Optimization

Redshift automatically selects distribution keys, sort keys, and table encoding based on query patterns — reduces manual DBA tuning

COPY Command (parallel bulk load)

Primary method for bulk data ingestion from S3, DynamoDB, EMR, SSH — far faster than INSERT statements; always preferred for large loads

UNLOAD Command (parallel export)

Export query results to S3 in parallel — supports Parquet, CSV, JSON; integrates with Athena and data lake architectures

Encryption at Rest

AES-256 encryption using AWS KMS or HSM — enabled at cluster creation; cannot be toggled post-creation without restore

Encryption in Transit

SSL/TLS for all client connections — enforce with require_ssl parameter group setting

VPC Isolation

Deploy cluster in a VPC for network isolation — use VPC security groups and private subnets for compliance workloads

Multi-AZ Deployment

RA3 clusters support Multi-AZ for high availability — automatically fails over to standby in another AZ. NOT available for dc2 node types

IAM Identity Center Integration

Preferred method for federated human user access — maps SSO users/groups to Redshift roles without managing individual DB users

Row-Level Security (RLS)

Define policies that restrict which rows users can see — enforced at query time without application changes

Column-Level Security

GRANT/REVOKE privileges at the column level — restrict sensitive columns (PII, financial) to authorized roles only

Dynamic Data Masking

Mask sensitive data (credit cards, SSNs) at query time based on user role — data remains stored unmasked

Audit Logging

Connection, user activity, and user logs written to S3 — enable for compliance; analyze with Athena or CloudWatch Logs Insights

Query Editor v2

Browser-based SQL editor with schema browser, query history, and collaboration — no client software required

Streaming Ingestion (Kinesis / MSK)

Ingest directly from Kinesis Data Streams or Amazon MSK (Kafka) using materialized views — low-latency, no intermediate S3 hop

ML Model Training (Redshift ML)

Train and invoke SageMaker models using SQL — CREATE MODEL statement exports data to S3, trains via AutoML, deploys as SQL function

Spatial Data (ST_ functions)

GEOMETRY and GEOGRAPHY data types with PostGIS-compatible spatial functions — supports geospatial analytics

Semi-structured Data (SUPER type)

SUPER data type stores JSON/nested data — query with PartiQL syntax; bridges relational and schema-less data in one engine

Integration Patterns

COPY/UNLOAD Bulk Data Pipeline

high freq

Amazon RedshiftAmazon S3

Use COPY to bulk-load data from S3 into Redshift (fastest ingestion method — parallelizes across slices). Use UNLOAD to export query results back to S3 in Parquet for downstream consumers. This is the foundational Redshift data movement pattern.

Lakehouse ETL Architecture

high freq

Amazon RedshiftAmazon S3AWS Glue

AWS Glue crawls and catalogs S3 data lake; Glue ETL jobs transform and load data into Redshift. Redshift Spectrum queries the raw S3 layer while the warehouse holds curated data. Glue Data Catalog serves as the unified metadata layer for both.

Query Federation — Warehouse vs. Lake

high freq

Amazon RedshiftAmazon Athena

Athena queries raw/unprocessed S3 data (ad-hoc, serverless, pay-per-scan). Redshift handles complex multi-join analytics on curated, loaded data. They share the Glue Data Catalog. Choose Athena for occasional queries on raw data; choose Redshift for repeated, complex analytics on structured data.

Federated Query — Operational to Analytical

high freq

Amazon RedshiftAmazon RDSAmazon Aurora

Redshift Federated Query connects directly to RDS PostgreSQL/MySQL or Aurora to join live operational data with warehouse data in a single SQL query — without ETL. Ideal for reports that need real-time operational context alongside historical warehouse data.

Big Data Processing to Warehouse

high freq

Amazon RedshiftAmazon EMR

EMR (Spark/Hive) processes raw big data (logs, clickstreams, IoT) and writes output to S3 in Parquet/ORC. Redshift COPY loads the processed data or Spectrum queries it in place. EMR handles compute-intensive transformation; Redshift handles analytical querying.

Event-Driven Data Loading

high freq

Amazon RedshiftAWS Lambda

S3 event triggers Lambda on new file arrival → Lambda invokes Redshift Data API to execute COPY command — fully serverless ingestion pipeline without persistent connections. Also used for custom WLM triggers, alerting on query results, or post-load data quality checks.

Federated Human User Access (Preferred)

high freq

Amazon RedshiftAWS IAM Identity Center

AWS IAM Identity Center (SSO) maps corporate identity provider (Okta, Azure AD, Active Directory) users and groups to Redshift roles. Users authenticate via SSO — no individual Redshift database users to manage. This is the AWS-recommended pattern for team access to Redshift, replacing legacy IAM role-based federation.

Real-Time Streaming Ingestion via S3

high freq

Amazon RedshiftAmazon Kinesis Data Firehose

Kinesis Data Firehose buffers streaming records and delivers batches to S3, then triggers a COPY into Redshift. Alternatively, use Redshift Streaming Ingestion directly from Kinesis Data Streams for lower latency without the S3 intermediate hop.

BI Dashboard on Data Warehouse

high freq

Amazon RedshiftAmazon QuickSight

QuickSight connects to Redshift via JDBC with SPICE caching for fast dashboard rendering. Direct Query mode hits Redshift in real time. Use Redshift WLM to prioritize QuickSight query queues and prevent BI queries from competing with ETL jobs.

Redshift ML — In-Database Machine Learning

high freq

Amazon RedshiftAmazon SageMaker

Redshift ML exports training data to S3, triggers SageMaker AutoPilot to train a model, and imports the model back as a SQL function. Data scientists and analysts can then call PREDICT() in SQL without leaving Redshift — democratizing ML for SQL users.

Service Limits & Quotas

LimitValueNote

Maximum nodes per cluster (ra3.16xlarge)

128 nodes

Node limits vary by node type; always verify per node type in current docs. ra3 types decouple storage from compute, unlike older dc2 types

Maximum nodes per cluster (ra3.4xlarge)

32 nodes

Smaller ra3 node type has a lower max node count — factor this into scaling decisions for mid-size warehouses

Maximum nodes per cluster (ra3.xlplus)

32 nodes

Entry-level ra3 node type — good for cost-conscious workloads that still need managed storage

Maximum nodes per cluster (dc2.large)

32 nodes

dc2.large is a compute-optimized SSD node — storage is fixed per node, unlike ra3 managed storage

Maximum nodes per cluster (dc2.8xlarge)

128 nodes

dc2 nodes have fixed local SSD storage; you cannot scale storage independently — this is a key differentiator vs ra3

Default clusters per region (soft limit)

20 clusters

Soft limit — can be increased via AWS Support. Relevant for multi-tenant or multi-environment architectures

Maximum Redshift Spectrum files scanned per query

No hard limit documented — performance degrades with very large file counts; use partitioning to limit scan scope

Partition your S3 data lake (by date, region, etc.) to limit Spectrum file scans and reduce query cost and latency

Maximum size of a single COPY command load from S3

No per-file hard limit — use multiple files split to match slice count for optimal parallel loading

Best practice: split input files to match or be a multiple of the number of cluster slices for maximum COPY parallelism

Maximum number of schemas per database

No hard documented limit — practical limits based on cluster resources

Use schemas for logical data separation within a single cluster to reduce cost vs. multiple clusters

Maximum number of databases per cluster

60 databases

Rarely tested directly but relevant for multi-tenant designs — schemas within one database are preferred over multiple databases

Maximum number of user-defined databases per cluster

60 databases

Includes user-created databases; the dev and template databases count against this limit

Maximum number of tables per cluster (large node types)

100,000 tables

Table limits scale with node type and count — design schemas to avoid table sprawl in large multi-tenant environments

Maximum number of tables per cluster (small node types)

9,900 tables

dc2.large clusters have significantly lower table limits — factor this into node type selection for schema-heavy workloads

Maximum number of columns per table

1,600 columns

Columnar storage means wide tables are more acceptable than in row-store DBs, but 1,600 is the hard ceiling

Maximum concurrent user connections per cluster

500 connections

Use Redshift connection pooling (via JDBC/ODBC drivers) or WLM (Workload Management) to manage concurrency — raw connections do not equal concurrent queries

Maximum number of WLM queues

8 queues (plus 1 superuser queue)

WLM queues are used to prioritize workloads — exam questions often test whether you know WLM vs. Concurrency Scaling vs. QMR

Concurrency Scaling — free credits per cluster

1 hour of free Concurrency Scaling credits per 24 hours of main cluster usage

Concurrency Scaling spins up transient clusters to handle query bursts — free tier resets daily, charges apply beyond that

Automated snapshot retention period

1 to 35 days days

Manual snapshots do NOT expire automatically — a common exam trap asking about data retention and cost

Cross-region snapshot copy retention

Up to 35 days for automated; indefinite for manual

Cross-region snapshots support DR strategies — configure cross-region copy for RPO/RTO compliance

Redshift Serverless — base RPU range

8 to 512 RPUs (Redshift Processing Units) RPUs

RPU is a Redshift Serverless concept only — do not confuse with provisioned cluster node types

Pricing Model

On-Demand (hourly per node-hour) + Reserved Instances (1 or 3 year) + Redshift Serverless (per RPU-second) + Redshift Spectrum (per TB scanned)

Provisioned clusters: billed per node-hour from cluster creation to deletion — you pay even when idle (key differentiator vs. Serverless and Athena)
Redshift Serverless: billed per RPU-second with a minimum charge per query — cost-efficient for intermittent workloads; no idle cost
Redshift Spectrum: $5 per TB of data scanned in S3 — use columnar formats (Parquet/ORC) and partitioning to dramatically reduce scan volume and cost
Reserved Instances: 1-year or 3-year terms offer up to ~75% discount vs On-Demand — best for steady-state production warehouses
Concurrency Scaling: first 1 hour per 24h of main cluster usage is free — additional hours billed at On-Demand node rates
Automated snapshots: stored in S3 at no additional charge up to the total cluster storage size — excess snapshot storage billed at S3 rates
Manual snapshots: billed at S3 storage rates and retained indefinitely — monitor and delete unused snapshots to control costs
Data transfer: no charge for data transfer between Redshift and S3 in the same region — cross-region data transfer incurs standard rates
AQUA: included at no additional charge with ra3 node types — no separate pricing
Redshift ML: charges apply for SageMaker training jobs initiated by CREATE MODEL — Redshift inference (prediction) uses cluster compute only

Exam Tips

criticalData Ingestion, COPY Command

COPY is the ONLY correct answer for bulk loading large datasets into Redshift — INSERT statements are catastrophically slow for bulk loads and should never be used for more than a few rows. If an exam question asks about loading millions of rows efficiently, COPY from S3 is the answer.

criticalOLAP vs OLTP, Service Selection

Redshift is OLAP (analytics), NOT OLTP (transactions). Any exam scenario describing frequent single-row inserts, updates, or deletes with high throughput should route to RDS or Aurora, not Redshift. The moment you see 'transactional' or 'OLTP', eliminate Redshift.

criticalIdentity Management, IAM Identity Center, Access Control

For human team access to Redshift, AWS IAM Identity Center (SSO) is the PREFERRED and RECOMMENDED approach — not IAM roles, not individual database users, not IAM federation alone. IAM Identity Center maps SSO groups to Redshift roles centrally. Exam questions testing 'centralized identity management for Redshift' should point to IAM Identity Center.

criticalRedshift Spectrum, Lakehouse Architecture

Redshift Spectrum queries S3 data using SEPARATE compute infrastructure — it does NOT consume your cluster's compute resources for the S3 scan portion. This means Spectrum can run concurrently with cluster queries without degrading performance. Cost is $5/TB scanned in S3.

criticalQuery Optimization, Distribution Style, Sort Keys

Distribution keys (DISTKEY) and sort keys (SORTKEY) are the #1 performance tuning levers in Redshift. DISTKEY determines how data is distributed across nodes (minimize data movement in JOINs). SORTKEY determines physical row order (accelerate range queries and WHERE clauses). Automatic Table Optimization (ATO) now handles this automatically — but you must know the concepts for exam questions.

critical

COPY command is the ONLY correct answer for bulk loading data into Redshift — INSERT statements are too slow for large datasets. Always COPY from S3 with files split to match cluster slice count for maximum parallelism.

critical

For centralized human user access to Redshift, AWS IAM Identity Center (SSO) is the recommended answer — NOT IAM roles, NOT individual database users, NOT raw IAM federation. IAM Identity Center maps SSO groups to Redshift database roles with the least operational overhead.

critical

Redshift = OLAP (analytics, complex queries, columnar storage, MPP). RDS/Aurora = OLTP (transactions, row storage, ACID). DynamoDB = NoSQL (key-value, no SQL analytics). These three are NEVER interchangeable — every exam scenario has exactly one correct service based on workload type.

importantRedshift Serverless, Cost Optimization

Redshift Serverless RPUs (Redshift Processing Units) are NOT the same as provisioned node types. Serverless auto-scales between your min and max RPU settings and bills per second. Use Serverless for intermittent/unpredictable workloads; use provisioned + Reserved Instances for steady-state production warehouses.

importantConcurrency Scaling, WLM, Performance

Concurrency Scaling adds transient READ clusters during peak demand — it does NOT help with write throughput or ETL performance. The first 1 hour of Concurrency Scaling per 24 hours of main cluster usage is FREE. Beyond that, standard node rates apply.

importantEncryption, Security, KMS

Encryption at rest in Redshift is configured AT CLUSTER CREATION using AWS KMS or CloudHSM. You CANNOT enable or change encryption on a running cluster — you must take a snapshot, restore to a new encrypted cluster, and redirect traffic. This is a common exam scenario for security compliance questions.

importantSnapshots, Backup, Cost Management

Manual Redshift snapshots NEVER expire — they persist until you explicitly delete them and incur S3 storage charges. Automated snapshots have a configurable retention of 1–35 days (default: 1 day). Exam questions about unexpected storage costs or long-term retention often hinge on this distinction.

importantFederated Query, RDS Integration, Real-Time Analytics

Redshift Federated Query lets you JOIN Redshift warehouse data with live data in RDS PostgreSQL/MySQL or Aurora — without ETL. This is the answer when an exam question asks how to combine operational database data with historical warehouse data in real time without data movement.

importantAthena vs Spectrum, S3 Query, Lakehouse

When an exam question asks about querying data in S3 WITHOUT loading it into a data warehouse, the answer is either Amazon Athena (fully serverless, no warehouse needed) OR Redshift Spectrum (if you already have a Redshift cluster and want to extend queries to S3). Athena = no warehouse. Spectrum = extends existing warehouse to S3.

importantData Sharing, Multi-Cluster, Cross-Account

Redshift Data Sharing allows you to share LIVE data across clusters and AWS accounts without copying or moving data. Producers grant access to datashares; consumers query the data in real time. This is the answer for multi-cluster or cross-account analytics without ETL pipelines.

importantStreaming Ingestion, Kinesis, MSK, DEA-C01

For the DEA-C01 exam specifically: Redshift Streaming Ingestion from Kinesis Data Streams or Amazon MSK uses materialized views to ingest data with low latency — NO S3 intermediate step. This is newer and lower-latency than the classic Firehose → S3 → COPY pattern. Know both patterns and when to use each.

Good to KnowTable Maintenance, VACUUM, ANALYZE, MVCC

VACUUM and ANALYZE are critical maintenance operations. VACUUM reclaims space from deleted rows and re-sorts data (Redshift uses MVCC — deletes mark rows invisible, not physically removed). ANALYZE updates table statistics for the query planner. Redshift auto-runs both in the background, but you may need manual runs after large bulk operations.

Good to KnowDistribution Style, Performance Tuning, Star Schema

Distribution styles: EVEN (default, round-robin — good for tables with no clear join key), KEY (same key value on same node — best for large table JOINs), ALL (copies full table to every node — best for small dimension tables in star schemas), AUTO (Redshift chooses automatically). Knowing when to use KEY vs ALL is a common exam differentiator.

Common Misconceptions & Traps

Common Mistake

IAM roles can directly manage human user access to Redshift — just create an IAM role with Redshift permissions and assign it to users.

Correct

IAM roles grant AWS API-level access (e.g., for services like Lambda or Glue to call Redshift APIs), but they do NOT manage database-level authentication for human users connecting via JDBC/ODBC. For human team access, AWS recommends IAM Identity Center (SSO), which maps corporate identity groups to Redshift database roles. Individual Redshift database users are a legacy, unscalable approach.

This is the #1 access control misconception in exam questions. The trap is conflating AWS IAM (API access control plane) with Redshift database authentication (data plane). IAM Identity Center bridges them properly. Remember: IAM roles = for services/applications; IAM Identity Center = for humans.

Common Mistake

Redshift and Amazon RDS/Aurora are interchangeable for analytical workloads — Aurora is fast, so it can handle data warehouse queries too.

Correct

RDS and Aurora use row-based storage optimized for OLTP (fast single-row reads/writes, ACID transactions). Redshift uses columnar storage + MPP optimized for OLAP (aggregations, full column scans across billions of rows). Aurora may be 5x faster than MySQL for OLTP, but Redshift will be orders of magnitude faster for analytical queries on large datasets. They solve fundamentally different problems.

Exam questions often present both as options for 'high-performance database' scenarios. The differentiator is always OLTP vs OLAP, not just 'performance.' If the workload involves complex aggregations, GROUP BY, large scans, or BI reporting → Redshift. If it involves frequent transactions, point lookups, or application backends → RDS/Aurora.

Common Mistake

DynamoDB can replace Redshift for analytics — it's a managed AWS database service that scales automatically.

Correct

DynamoDB is a NoSQL key-value/document database optimized for single-digit millisecond OLTP operations at any scale. It does NOT support complex SQL JOINs, GROUP BY aggregations, or analytical query patterns. It has no columnar storage or MPP. For analytics, DynamoDB data must first be exported to S3 (DynamoDB Streams → Kinesis → S3, or DynamoDB Export to S3) and then queried with Athena or loaded into Redshift.

NoSQL vs relational is a foundational concept. DynamoDB = fast key lookups, flexible schema, no SQL analytics. Redshift = complex SQL analytics on structured data. These are not substitutes — they are complementary services in a modern data architecture.

Common Mistake

Redshift is single-AZ only and requires manual failover — it doesn't support high availability like RDS Multi-AZ.

Correct

Since 2023, Redshift RA3 clusters support Multi-AZ deployment with automatic failover to a standby cluster in another Availability Zone — similar in concept to RDS Multi-AZ. DC2 node types do NOT support Multi-AZ. Additionally, Redshift Serverless is inherently multi-AZ. Older study materials may incorrectly state Redshift is always single-AZ.

This is a historically changing fact — older exam prep materials and many candidates still believe Redshift is single-AZ. For HA architecture questions involving Redshift, Multi-AZ RA3 is now a valid answer. Always check node type: RA3 = Multi-AZ capable; DC2 = single-AZ only.

Common Mistake

Using IAM federation directly (with AssumeRoleWithWebIdentity or SAML) is the simplest and best way to give a team of analysts federated access to Redshift.

Correct

While IAM federation technically works, it requires complex custom SAML/OIDC configuration, manual mapping of federation attributes to Redshift database users, and ongoing maintenance as teams change. AWS IAM Identity Center provides a purpose-built, centrally managed SSO experience that maps identity provider groups directly to Redshift roles with far less operational overhead. IAM Identity Center is the AWS-recommended approach.

Exam questions testing 'least operational overhead for federated access' or 'centralized identity management' should point to IAM Identity Center, not raw IAM federation. The key phrase is 'centralized' and 'minimal operational overhead.'

Common Mistake

Redshift Spectrum uses your cluster's compute resources to query S3, so adding Spectrum queries will slow down your existing cluster workloads.

Correct

Redshift Spectrum uses a dedicated, separate fleet of compute nodes managed by AWS to scan and process S3 data — it does NOT consume your provisioned cluster's CPU or memory for the S3 scan. Only the aggregation/final join of Spectrum results happens on your cluster. This means Spectrum queries can run concurrently with cluster queries with minimal impact.

This misconception leads candidates to incorrectly recommend cluster resizing when adding Spectrum workloads. The correct answer for 'query S3 data without impacting cluster performance' is Redshift Spectrum — its compute is separate and scales automatically.

Common Mistake

You can enable encryption on an existing Redshift cluster by modifying the cluster settings, just like enabling encryption on an EBS volume.

Correct

Unlike EBS (which supports in-place encryption via snapshot/restore with a simple API call), Redshift encryption CANNOT be enabled or changed on a running cluster. You must: (1) take a snapshot of the unencrypted cluster, (2) restore the snapshot to a NEW cluster with encryption enabled, (3) redirect your application to the new cluster endpoint, and (4) delete the old cluster.

Security compliance questions often describe a scenario where an unencrypted Redshift cluster needs to be encrypted urgently. The only correct answer is the snapshot-restore-to-encrypted-cluster approach — not 'modify the cluster to enable encryption.' This is a critical operational gotcha.

Memory Tricks

🧠

COPY > INSERT: 'COPY is King' — for bulk loads, COPY from S3 is always the answer; INSERT is for single rows only

🧠

OLAP = Redshift, OLTP = RDS/Aurora: 'Analytics → Redshift, Transactions → RDS' — never swap them

🧠

Spectrum = S3 queries with SEPARATE compute: 'Spectrum sees Stars (S3) from afar' — it doesn't touch your cluster compute

🧠

IAM Identity Center for HUMANS, IAM Roles for MACHINES: 'People use SSO, Services use Roles'

🧠

Distribution styles — remember 'EKAA': Even (default/round-robin), Key (join optimization), All (small dimension tables), Auto (let Redshift decide)

🧠

Manual snapshots NEVER expire: 'Manual means Manual deletion too' — they cost money forever until you delete them

🧠

Encryption at creation only: 'Encrypt at Birth, not After' — snapshot + restore to new encrypted cluster is the only path

CertAI Tutor · DEA-C01, SAA-C03, SAP-C02, DOP-C02, CLF-C02 · 2026-02-21

Ready to test your knowledge?

Practice DEA-C01, SAA-C03, SAP-C02, DOP-C02, CLF-C02 exam questions with AI-powered explanations — free to start.

Amazon Redshift: The Cloud Data Warehouse Powerhouse

Overview

Key Features

Integration Patterns

Service Limits & Quotas

Pricing Model

Exam Tips

Common Misconceptions & Traps

Memory Tricks

Ready to test your knowledge?

Related Cheat Sheets