databasesSAA-C03SAP-C02DEA-C01CLF-C02

Amazon Neptune: The Graph Database Powerhouse

Purpose-built managed graph database for highly connected data — relationships are first-class citizens

Updated 2026-02-21

Overview

Amazon Neptune is a fully managed, serverless-capable graph database service optimized for storing and querying billions of relationships with millisecond latency. It supports two popular graph models — Property Graph (via Apache TinkerPop Gremlin and openCypher) and RDF (via SPARQL) — making it the go-to AWS service for social networks, knowledge graphs, fraud detection, and recommendation engines. Neptune handles the heavy lifting of provisioning, patching, backup, recovery, and replication so you can focus on traversing relationships, not managing infrastructure.

To efficiently store, query, and traverse highly connected datasets where relationships between entities are as important as the entities themselves — use cases where JOIN-heavy SQL or document-model queries become prohibitively expensive.

Use When

Social networks — modeling friends-of-friends, followers, and community graphs where multi-hop relationship traversal is core
Fraud detection — identifying rings of fraudulent accounts by traversing shared attributes (IP, device, address) across millions of nodes
Recommendation engines — 'customers who bought X also bought Y' computed via graph traversal rather than expensive table scans
Knowledge graphs and semantic search — linking entities (people, places, concepts) with typed relationships for intelligent querying
Network and IT operations — modeling infrastructure topology, dependency graphs, and impact analysis for root-cause detection
Life sciences — drug interaction graphs, genomic data relationships, and clinical knowledge bases using RDF/SPARQL standards

Avoid When

Flat tabular data with few relationships — use Amazon RDS or Aurora; Neptune's graph model adds complexity without benefit when data is naturally relational and JOIN depth is shallow
Document-centric workloads — use Amazon DocumentDB (MongoDB-compatible); storing JSON documents with nested attributes is not a graph problem
High-volume key-value lookups — use Amazon DynamoDB; Neptune is not optimized for simple point-in-time key lookups at extreme throughput
Full-text search as primary access pattern — use Amazon OpenSearch Service; Neptune lacks native full-text indexing
OLAP / analytical aggregations over massive datasets — use Amazon Redshift or Athena; Neptune is optimized for graph traversal, not columnar aggregation

Key Features

Property Graph Model (TinkerPop Gremlin)

Vertices and edges with arbitrary key-value properties; ideal for social/fraud/recommendation graphs

Property Graph Model (openCypher)

Cypher query language — familiar to Neo4j users; AWS added openCypher support to ease migration

RDF / SPARQL (W3C standards)

Resource Description Framework for knowledge graphs, semantic web, and linked data use cases

Neptune Serverless

Automatically scales compute capacity based on workload — eliminates capacity planning for variable graph workloads

Multi-AZ High Availability

6 copies of data across 3 AZs; automatic failover to read replica in under 30 seconds

Read Replicas (up to 15)

Scale read throughput; also serve as failover targets

Automated Backups (continuous)

Point-in-time recovery to any second within the retention window (1–35 days)

Manual Snapshots

Retained indefinitely; can be shared across AWS accounts or regions

Encryption at Rest (AWS KMS)

Must be enabled at cluster creation — cannot be added after the fact

Encryption in Transit (TLS)

TLS enforced by default on Neptune endpoints

VPC Isolation

Neptune is VPC-only — no public endpoint option; access via VPC, VPN, or Direct Connect

IAM Authentication

IAM database authentication using Signature Version 4 signing — no password management

Neptune Streams

Ordered, change-data-capture stream of all graph mutations — enables real-time downstream processing

Neptune ML (Graph Neural Networks)

Integrates with Amazon SageMaker to train and deploy GNN models directly on Neptune graph data

Neptune Analytics

In-memory graph analytics engine for fast graph algorithms (PageRank, community detection) on large graphs

Bulk Load (from S3)

Neptune Loader imports CSV or RDF data from S3 — primary mechanism for initial data ingestion

Global Database

Cross-region replication with sub-second RPO for disaster recovery and low-latency global reads

CloudWatch Metrics Integration

CPU, memory, query latency, buffer cache hit ratio, and gremlin/SPARQL request metrics published automatically

Multi-master

Neptune does not support multi-master writes — one primary writer per cluster (unlike Aurora Multi-Master)

Public Endpoint

Neptune has no public internet endpoint — must be accessed within a VPC

Integration Patterns

Bulk Graph Data Loading

high freq

Amazon NeptuneAmazon S3

Use the Neptune Bulk Loader to import large datasets from S3 in CSV (for Property Graph) or Turtle/N-Triples (for RDF) format. This is the recommended initial load mechanism — far faster than individual PUT requests. S3 is also the target for Neptune export jobs.

Graph Database Monitoring and Alerting

high freq

Amazon NeptuneAmazon CloudWatch

Neptune automatically publishes metrics to CloudWatch including GremlinRequestsPerSec, SparqlRequestsPerSec, BufferCacheHitRatio, CPUUtilization, and FreeableMemory. Set CloudWatch Alarms on query latency and cache hit ratio to detect performance degradation. Use CloudWatch Logs for slow query logs.

Serverless Graph Query API

high freq

Amazon NeptuneAWS Lambda

Lambda functions inside the same VPC query Neptune via Gremlin or openCypher over WebSocket or HTTP. Common pattern: API Gateway → Lambda → Neptune for real-time graph queries in serverless applications. Lambda must be in the same VPC as Neptune (no public endpoint).

Choosing the Right NoSQL Database

high freq

Amazon NeptuneAmazon DocumentDB

This is primarily an architectural decision pattern, not an integration. DocumentDB is for document-centric JSON workloads (MongoDB-compatible). Neptune is for relationship-centric graph workloads. The exam tests your ability to select the correct service — they are NOT interchangeable.

Neptune ML — Graph Neural Networks

medium freq

Amazon NeptuneAmazon SageMaker

Neptune ML uses Deep Graph Library (DGL) with SageMaker to train GNN models on your Neptune graph data. Use cases: link prediction, node classification, entity resolution. Neptune exports graph data to S3, SageMaker trains the model, and Neptune serves inference results via graph queries.

Real-Time Graph Change Processing via Neptune Streams

medium freq

Amazon NeptuneAmazon Kinesis Data Streams

Neptune Streams captures every create/update/delete operation in order. A Lambda poller reads from Neptune Streams and publishes to Kinesis for downstream consumers — enabling real-time graph change propagation to search indexes, caches, or analytics systems.

Graph + Full-Text Search Hybrid

medium freq

Amazon NeptuneAmazon OpenSearch Service

Neptune handles graph traversal while OpenSearch handles full-text search. A common pattern: search OpenSearch for entity IDs matching a text query, then traverse Neptune to find related entities. Neptune Streams can sync graph mutations to OpenSearch in near real-time.

Graph Metadata + High-Volume Attribute Store

medium freq

Amazon NeptuneAmazon DynamoDB

Store graph topology (nodes, edges, relationships) in Neptune and store high-cardinality, frequently-updated attributes (click counts, timestamps, raw events) in DynamoDB. Neptune handles 'who is connected to whom' while DynamoDB handles 'what are the details of each entity'.

Transactional Data + Graph Relationship Layer

medium freq

Amazon NeptuneAmazon RDS

RDS/Aurora stores normalized transactional data (orders, inventory, accounts) while Neptune stores the relationship graph derived from that data. ETL pipelines (AWS Glue or Lambda) extract relationships from RDS and load them into Neptune for graph-specific queries.

IAM Database Authentication

medium freq

Amazon NeptuneAWS IAM

Enable IAM authentication on Neptune to allow IAM users, roles, and Lambda execution roles to authenticate using AWS Signature Version 4 — no database passwords required. This is the recommended authentication model for applications running on EC2, Lambda, or ECS.

Service Limits & Quotas

LimitValueNote

Maximum DB cluster storage

128 TiB TiB

Candidates often confuse Neptune's auto-scaling storage model with RDS where you must specify allocated storage upfront

Maximum read replicas per cluster

15 replicas

Same limit as Aurora — easy to mix up. Both Aurora and Neptune support up to 15 read replicas per cluster

Minimum backup retention period

1 day

Automated backups are always enabled on Neptune; you cannot disable them (minimum 1 day retention)

Maximum backup retention period

35 days

Manual snapshots are retained indefinitely until explicitly deleted — a common exam distinction

Storage auto-scaling increment

10 GiB GiB

Neptune storage grows in 10 GiB chunks automatically — no manual intervention or downtime required

Availability Zones for Multi-AZ

3 AZs

Neptune automatically replicates data across 3 AZs with 6 copies of your data (2 per AZ) — this is the cluster volume architecture, similar to Aurora

Data copies across AZs

6 copies

This is the same shared cluster volume model as Aurora — Neptune and Aurora share this architectural pattern

Maximum number of DB clusters per region (default)

Refer to Service Quotas console

Default quotas are soft limits — request increases via AWS Service Quotas console for production workloads

Supported graph query languages

3 languages (Gremlin, openCypher, SPARQL)

Neptune supports Gremlin (TinkerPop) and openCypher for Property Graph, and SPARQL for RDF — choosing the right language is an architectural decision, not a limitation

Neptune Serverless minimum Neptune Capacity Units (NCUs)

1 NCU

Neptune Serverless automatically scales compute from 1 NCU upward — ideal for variable or unpredictable graph workloads

Pricing Model

Pay for what you use — instance hours + I/O + storage + backup storage

Instance pricing: Charged per DB instance-hour for provisioned instances (primary + replicas billed separately)
Storage pricing: Charged per GiB-month for cluster volume storage (auto-scales; you pay for what's consumed)
I/O pricing: Charged per million I/O requests to the cluster volume (reads and writes)
Neptune Serverless pricing: Charged per Neptune Capacity Unit (NCU) hour — no instance charges; scales to zero when idle
Backup storage: Automated backup storage up to 100% of cluster size is free; excess charged per GiB-month
Data transfer: Standard AWS data transfer rates apply for cross-AZ and cross-region traffic
Neptune ML: Additional SageMaker charges apply for training and inference endpoints
Neptune Analytics: Separate pricing based on memory graph size and query processing

Exam Tips

criticalGraph database use case identification

When an exam question describes highly connected data, multi-hop relationship traversal, social networks, fraud rings, or recommendation engines — the answer is Neptune, NOT DynamoDB, DocumentDB, or RDS. The keyword 'graph' or 'relationships between entities' is your signal.

criticalNetwork architecture and VPC isolation

Neptune is VPC-ONLY — there is no public endpoint. Any architecture requiring Neptune access from outside a VPC must use VPN, Direct Connect, VPC Peering, or AWS Transit Gateway. Lambda functions querying Neptune must be deployed inside the same VPC.

criticalGraph query language selection

Neptune supports THREE query languages: Gremlin (Apache TinkerPop), openCypher, and SPARQL. Gremlin and openCypher are for Property Graphs; SPARQL is for RDF graphs. If a question mentions 'semantic web', 'linked data', or 'ontologies', the answer involves SPARQL/RDF.

critical

Highly connected data + multi-hop traversal + social/fraud/recommendation = Neptune. Not DynamoDB, not DocumentDB, not RDS. Neptune is the ONLY AWS graph database.

critical

Neptune is VPC-ONLY — no public endpoint exists. Lambda must be in the same VPC. External access requires VPN, Direct Connect, or VPC Peering.

critical

Neptune supports Gremlin, openCypher, AND SPARQL on a single cluster. RDF/SPARQL = semantic/knowledge graphs. Property Graph/Gremlin/openCypher = social, fraud, recommendation graphs.

importantNeptune cluster storage architecture

Neptune's storage architecture mirrors Aurora: 6 copies of data across 3 AZs, auto-scaling in 10 GiB increments up to 128 TiB, with continuous backups to S3. You never provision storage — it grows automatically. This is a key differentiator from RDS.

importantNeptune Serverless capacity model

Neptune Serverless is the answer when the question mentions 'unpredictable graph workloads', 'variable traffic', or 'minimize operational overhead for a graph database'. It scales NCUs automatically and you pay only for capacity consumed.

importantNeptune Streams and CDC

Neptune Streams is the change-data-capture (CDC) mechanism for graph mutations. If a question asks how to propagate Neptune graph changes to downstream systems (OpenSearch, Lambda, Kinesis) in real time — Neptune Streams is the answer.

importantEncryption at rest — immutable cluster setting

Encryption at rest must be enabled at cluster creation time — you CANNOT enable KMS encryption on an existing unencrypted Neptune cluster. To encrypt an existing cluster, take a snapshot, restore it with encryption enabled, and migrate.

importantNeptune Global Database vs multi-master

Neptune Global Database provides cross-region disaster recovery with sub-second RPO. If a question asks about Neptune with multi-region availability or low-latency global reads — Neptune Global Database is the answer, not multi-master (which Neptune does NOT support).

Good to KnowNeptune ML and SageMaker integration

Neptune ML integrates with Amazon SageMaker to run Graph Neural Networks (GNNs) on your Neptune data. If a question asks about machine learning on graph data or link prediction within a graph database — Neptune ML + SageMaker is the answer.

Good to KnowNeptune bulk data ingestion

For initial bulk data loading into Neptune, use the Neptune Loader from S3 — not individual API calls. The Loader is significantly faster for large datasets and supports CSV (Property Graph) and RDF formats (Turtle, N-Triples, N-Quads).

Common Misconceptions & Traps

Common Mistake

Any NoSQL database (DynamoDB, DocumentDB, MongoDB) can handle graph workloads efficiently — they're all 'flexible schema' databases

Correct

Graph workloads require a purpose-built graph database like Neptune. DynamoDB and DocumentDB lack native graph traversal algorithms. Multi-hop relationship queries (e.g., 'find all friends-of-friends within 3 degrees') require exponentially expensive table scans in DynamoDB and DocumentDB but are natively efficient in Neptune.

This is the #1 misconception on Neptune exam questions. The exam will describe a graph use case and offer DynamoDB or DocumentDB as distractors. Remember: Neptune is the ONLY AWS service with native graph traversal. The keywords 'highly connected', 'relationships', 'traversal', 'social graph', and 'fraud ring' all point to Neptune.

Common Mistake

Relational databases (RDS, Aurora) are suitable for graph data because you can model graphs with JOIN tables

Correct

While you CAN model graph relationships in RDS using junction tables, performance degrades exponentially with relationship depth. A 3-hop graph traversal in SQL requires 3 JOINs minimum; a 6-hop traversal requires 6 JOINs — this becomes computationally prohibitive at scale. Neptune's graph engine is optimized for exactly this pattern with index-free adjacency.

Exam questions will present a scenario where an existing RDS system is struggling with complex JOIN queries on relationship data and ask what to migrate to. The answer is Neptune, not Aurora, not a bigger RDS instance. The trigger phrase is 'complex relationships' or 'JOIN performance degrading as data grows'.

Common Mistake

DocumentDB is a graph database because it stores JSON documents with nested relationships

Correct

DocumentDB is a document database compatible with MongoDB APIs — it stores JSON documents and supports nested data, but it does NOT provide graph traversal capabilities. Nested JSON represents hierarchy, not arbitrary graph relationships. Neptune is AWS's graph database; DocumentDB is AWS's document database. They solve fundamentally different problems.

This confusion appears frequently because both Neptune and DocumentDB are 'NoSQL' and both can represent relationships in data. The key distinction: DocumentDB is optimized for document retrieval and nested attribute queries; Neptune is optimized for traversing arbitrary relationship networks. If the question mentions 'graph', choose Neptune. If it mentions 'JSON documents' or 'MongoDB', choose DocumentDB.

Common Mistake

Neptune requires you to choose between Property Graph and RDF at the service level — you must use separate Neptune clusters for each

Correct

A single Neptune cluster supports BOTH Property Graph queries (Gremlin and openCypher) AND RDF queries (SPARQL) simultaneously. You can run all three query languages against the same Neptune cluster. However, data models are separate — Property Graph data and RDF data are stored differently within the cluster.

Exam questions may try to trick you into thinking you need multiple Neptune clusters for different query languages. One Neptune cluster = multiple query language support. This also means you don't need to 'pick' Gremlin vs openCypher upfront — both work on Property Graph data in the same cluster.

Common Mistake

Neptune Multi-AZ means you have a standby instance like RDS Multi-AZ — if the primary fails, there's a brief outage while the standby is promoted

Correct

Neptune uses a shared cluster volume architecture (like Aurora) where data is automatically replicated across 3 AZs with 6 copies. Failover promotes an existing read replica in under 30 seconds — there's no separate 'standby' instance. The cluster volume itself is always multi-AZ; you're not paying for a hidden standby instance.

This matters for exam cost optimization questions. RDS Multi-AZ charges for a standby instance you never query. Neptune's HA is built into the storage layer — you only pay for read replicas you explicitly add. If a question asks about Neptune HA costs vs RDS Multi-AZ costs, Neptune's model is more cost-efficient for the same durability level.

Common Mistake

Neptune can be accessed from the public internet with the right security group rules

Correct

Neptune has NO public endpoint — period. It is strictly VPC-only. No security group rule change can expose Neptune to the public internet. Applications outside your VPC must connect via VPN, Direct Connect, VPC Peering, or Transit Gateway. This is a hard architectural constraint, not a configuration choice.

Architecture questions will present Neptune as a backend for a public-facing application and ask how external services connect. The answer always involves a network connectivity service (VPN/Direct Connect/VPC Peering) or an intermediary layer (Lambda in VPC, EC2 in VPC). Never 'public endpoint with security group'.

Memory Tricks

🧠

GRAPH = Go Relationships And Paths — that's Neptune's purpose. When you see 'relationships', 'traversal', 'connected data', 'social network', 'fraud ring', or 'recommendation' in an exam question — GRAPH = Neptune.

🧠

Neptune's query languages: G-O-S = Gremlin (TinkerPop Property Graph), openCypher (Property Graph), SPARQL (RDF/Semantic). Remember: 'GOS' — two for Property Graph, one for RDF.

🧠

Neptune Storage = Aurora Storage: Both use 6 copies across 3 AZs, auto-scale in 10 GiB increments, max 128 TiB, continuous backup to S3. If you know Aurora storage, you know Neptune storage.

🧠

Neptune is like a city with NO public roads — VPC-only, always. Every visitor (application) must use a private tunnel (VPN), a dedicated highway (Direct Connect), or live in the same neighborhood (same VPC).

CertAI Tutor · SAA-C03, SAP-C02, DEA-C01, CLF-C02 · 2026-02-21

Ready to test your knowledge?

Practice SAA-C03, SAP-C02, DEA-C01, CLF-C02 exam questions with AI-powered explanations — free to start.

Amazon Neptune: The Graph Database Powerhouse

Overview

Key Features

Integration Patterns

Service Limits & Quotas

Pricing Model

Exam Tips

Common Misconceptions & Traps

Memory Tricks

Ready to test your knowledge?

Related Cheat Sheets