
Cargando...
Purpose-built managed graph database for highly connected data — relationships are first-class citizens
Amazon Neptune is a fully managed, serverless-capable graph database service optimized for storing and querying billions of relationships with millisecond latency. It supports two popular graph models — Property Graph (via Apache TinkerPop Gremlin and openCypher) and RDF (via SPARQL) — making it the go-to AWS service for social networks, knowledge graphs, fraud detection, and recommendation engines. Neptune handles the heavy lifting of provisioning, patching, backup, recovery, and replication so you can focus on traversing relationships, not managing infrastructure.
To efficiently store, query, and traverse highly connected datasets where relationships between entities are as important as the entities themselves — use cases where JOIN-heavy SQL or document-model queries become prohibitively expensive.
Use When
Avoid When
Property Graph Model (TinkerPop Gremlin)
Vertices and edges with arbitrary key-value properties; ideal for social/fraud/recommendation graphs
Property Graph Model (openCypher)
Cypher query language — familiar to Neo4j users; AWS added openCypher support to ease migration
RDF / SPARQL (W3C standards)
Resource Description Framework for knowledge graphs, semantic web, and linked data use cases
Neptune Serverless
Automatically scales compute capacity based on workload — eliminates capacity planning for variable graph workloads
Multi-AZ High Availability
6 copies of data across 3 AZs; automatic failover to read replica in under 30 seconds
Read Replicas (up to 15)
Scale read throughput; also serve as failover targets
Automated Backups (continuous)
Point-in-time recovery to any second within the retention window (1–35 days)
Manual Snapshots
Retained indefinitely; can be shared across AWS accounts or regions
Encryption at Rest (AWS KMS)
Must be enabled at cluster creation — cannot be added after the fact
Encryption in Transit (TLS)
TLS enforced by default on Neptune endpoints
VPC Isolation
Neptune is VPC-only — no public endpoint option; access via VPC, VPN, or Direct Connect
IAM Authentication
IAM database authentication using Signature Version 4 signing — no password management
Neptune Streams
Ordered, change-data-capture stream of all graph mutations — enables real-time downstream processing
Neptune ML (Graph Neural Networks)
Integrates with Amazon SageMaker to train and deploy GNN models directly on Neptune graph data
Neptune Analytics
In-memory graph analytics engine for fast graph algorithms (PageRank, community detection) on large graphs
Bulk Load (from S3)
Neptune Loader imports CSV or RDF data from S3 — primary mechanism for initial data ingestion
Global Database
Cross-region replication with sub-second RPO for disaster recovery and low-latency global reads
CloudWatch Metrics Integration
CPU, memory, query latency, buffer cache hit ratio, and gremlin/SPARQL request metrics published automatically
Multi-master
Neptune does not support multi-master writes — one primary writer per cluster (unlike Aurora Multi-Master)
Public Endpoint
Neptune has no public internet endpoint — must be accessed within a VPC
Bulk Graph Data Loading
high freqUse the Neptune Bulk Loader to import large datasets from S3 in CSV (for Property Graph) or Turtle/N-Triples (for RDF) format. This is the recommended initial load mechanism — far faster than individual PUT requests. S3 is also the target for Neptune export jobs.
Graph Database Monitoring and Alerting
high freqNeptune automatically publishes metrics to CloudWatch including GremlinRequestsPerSec, SparqlRequestsPerSec, BufferCacheHitRatio, CPUUtilization, and FreeableMemory. Set CloudWatch Alarms on query latency and cache hit ratio to detect performance degradation. Use CloudWatch Logs for slow query logs.
Serverless Graph Query API
high freqLambda functions inside the same VPC query Neptune via Gremlin or openCypher over WebSocket or HTTP. Common pattern: API Gateway → Lambda → Neptune for real-time graph queries in serverless applications. Lambda must be in the same VPC as Neptune (no public endpoint).
Choosing the Right NoSQL Database
high freqThis is primarily an architectural decision pattern, not an integration. DocumentDB is for document-centric JSON workloads (MongoDB-compatible). Neptune is for relationship-centric graph workloads. The exam tests your ability to select the correct service — they are NOT interchangeable.
Neptune ML — Graph Neural Networks
medium freqNeptune ML uses Deep Graph Library (DGL) with SageMaker to train GNN models on your Neptune graph data. Use cases: link prediction, node classification, entity resolution. Neptune exports graph data to S3, SageMaker trains the model, and Neptune serves inference results via graph queries.
Real-Time Graph Change Processing via Neptune Streams
medium freqNeptune Streams captures every create/update/delete operation in order. A Lambda poller reads from Neptune Streams and publishes to Kinesis for downstream consumers — enabling real-time graph change propagation to search indexes, caches, or analytics systems.
Graph + Full-Text Search Hybrid
medium freqNeptune handles graph traversal while OpenSearch handles full-text search. A common pattern: search OpenSearch for entity IDs matching a text query, then traverse Neptune to find related entities. Neptune Streams can sync graph mutations to OpenSearch in near real-time.
Graph Metadata + High-Volume Attribute Store
medium freqStore graph topology (nodes, edges, relationships) in Neptune and store high-cardinality, frequently-updated attributes (click counts, timestamps, raw events) in DynamoDB. Neptune handles 'who is connected to whom' while DynamoDB handles 'what are the details of each entity'.
Transactional Data + Graph Relationship Layer
medium freqRDS/Aurora stores normalized transactional data (orders, inventory, accounts) while Neptune stores the relationship graph derived from that data. ETL pipelines (AWS Glue or Lambda) extract relationships from RDS and load them into Neptune for graph-specific queries.
IAM Database Authentication
medium freqEnable IAM authentication on Neptune to allow IAM users, roles, and Lambda execution roles to authenticate using AWS Signature Version 4 — no database passwords required. This is the recommended authentication model for applications running on EC2, Lambda, or ECS.
When an exam question describes highly connected data, multi-hop relationship traversal, social networks, fraud rings, or recommendation engines — the answer is Neptune, NOT DynamoDB, DocumentDB, or RDS. The keyword 'graph' or 'relationships between entities' is your signal.
Neptune is VPC-ONLY — there is no public endpoint. Any architecture requiring Neptune access from outside a VPC must use VPN, Direct Connect, VPC Peering, or AWS Transit Gateway. Lambda functions querying Neptune must be deployed inside the same VPC.
Neptune supports THREE query languages: Gremlin (Apache TinkerPop), openCypher, and SPARQL. Gremlin and openCypher are for Property Graphs; SPARQL is for RDF graphs. If a question mentions 'semantic web', 'linked data', or 'ontologies', the answer involves SPARQL/RDF.
Highly connected data + multi-hop traversal + social/fraud/recommendation = Neptune. Not DynamoDB, not DocumentDB, not RDS. Neptune is the ONLY AWS graph database.
Neptune is VPC-ONLY — no public endpoint exists. Lambda must be in the same VPC. External access requires VPN, Direct Connect, or VPC Peering.
Neptune supports Gremlin, openCypher, AND SPARQL on a single cluster. RDF/SPARQL = semantic/knowledge graphs. Property Graph/Gremlin/openCypher = social, fraud, recommendation graphs.
Neptune's storage architecture mirrors Aurora: 6 copies of data across 3 AZs, auto-scaling in 10 GiB increments up to 128 TiB, with continuous backups to S3. You never provision storage — it grows automatically. This is a key differentiator from RDS.
Neptune Serverless is the answer when the question mentions 'unpredictable graph workloads', 'variable traffic', or 'minimize operational overhead for a graph database'. It scales NCUs automatically and you pay only for capacity consumed.
Neptune Streams is the change-data-capture (CDC) mechanism for graph mutations. If a question asks how to propagate Neptune graph changes to downstream systems (OpenSearch, Lambda, Kinesis) in real time — Neptune Streams is the answer.
Encryption at rest must be enabled at cluster creation time — you CANNOT enable KMS encryption on an existing unencrypted Neptune cluster. To encrypt an existing cluster, take a snapshot, restore it with encryption enabled, and migrate.
Neptune Global Database provides cross-region disaster recovery with sub-second RPO. If a question asks about Neptune with multi-region availability or low-latency global reads — Neptune Global Database is the answer, not multi-master (which Neptune does NOT support).
Neptune ML integrates with Amazon SageMaker to run Graph Neural Networks (GNNs) on your Neptune data. If a question asks about machine learning on graph data or link prediction within a graph database — Neptune ML + SageMaker is the answer.
For initial bulk data loading into Neptune, use the Neptune Loader from S3 — not individual API calls. The Loader is significantly faster for large datasets and supports CSV (Property Graph) and RDF formats (Turtle, N-Triples, N-Quads).
Common Mistake
Any NoSQL database (DynamoDB, DocumentDB, MongoDB) can handle graph workloads efficiently — they're all 'flexible schema' databases
Correct
Graph workloads require a purpose-built graph database like Neptune. DynamoDB and DocumentDB lack native graph traversal algorithms. Multi-hop relationship queries (e.g., 'find all friends-of-friends within 3 degrees') require exponentially expensive table scans in DynamoDB and DocumentDB but are natively efficient in Neptune.
This is the #1 misconception on Neptune exam questions. The exam will describe a graph use case and offer DynamoDB or DocumentDB as distractors. Remember: Neptune is the ONLY AWS service with native graph traversal. The keywords 'highly connected', 'relationships', 'traversal', 'social graph', and 'fraud ring' all point to Neptune.
Common Mistake
Relational databases (RDS, Aurora) are suitable for graph data because you can model graphs with JOIN tables
Correct
While you CAN model graph relationships in RDS using junction tables, performance degrades exponentially with relationship depth. A 3-hop graph traversal in SQL requires 3 JOINs minimum; a 6-hop traversal requires 6 JOINs — this becomes computationally prohibitive at scale. Neptune's graph engine is optimized for exactly this pattern with index-free adjacency.
Exam questions will present a scenario where an existing RDS system is struggling with complex JOIN queries on relationship data and ask what to migrate to. The answer is Neptune, not Aurora, not a bigger RDS instance. The trigger phrase is 'complex relationships' or 'JOIN performance degrading as data grows'.
Common Mistake
DocumentDB is a graph database because it stores JSON documents with nested relationships
Correct
DocumentDB is a document database compatible with MongoDB APIs — it stores JSON documents and supports nested data, but it does NOT provide graph traversal capabilities. Nested JSON represents hierarchy, not arbitrary graph relationships. Neptune is AWS's graph database; DocumentDB is AWS's document database. They solve fundamentally different problems.
This confusion appears frequently because both Neptune and DocumentDB are 'NoSQL' and both can represent relationships in data. The key distinction: DocumentDB is optimized for document retrieval and nested attribute queries; Neptune is optimized for traversing arbitrary relationship networks. If the question mentions 'graph', choose Neptune. If it mentions 'JSON documents' or 'MongoDB', choose DocumentDB.
Common Mistake
Neptune requires you to choose between Property Graph and RDF at the service level — you must use separate Neptune clusters for each
Correct
A single Neptune cluster supports BOTH Property Graph queries (Gremlin and openCypher) AND RDF queries (SPARQL) simultaneously. You can run all three query languages against the same Neptune cluster. However, data models are separate — Property Graph data and RDF data are stored differently within the cluster.
Exam questions may try to trick you into thinking you need multiple Neptune clusters for different query languages. One Neptune cluster = multiple query language support. This also means you don't need to 'pick' Gremlin vs openCypher upfront — both work on Property Graph data in the same cluster.
Common Mistake
Neptune Multi-AZ means you have a standby instance like RDS Multi-AZ — if the primary fails, there's a brief outage while the standby is promoted
Correct
Neptune uses a shared cluster volume architecture (like Aurora) where data is automatically replicated across 3 AZs with 6 copies. Failover promotes an existing read replica in under 30 seconds — there's no separate 'standby' instance. The cluster volume itself is always multi-AZ; you're not paying for a hidden standby instance.
This matters for exam cost optimization questions. RDS Multi-AZ charges for a standby instance you never query. Neptune's HA is built into the storage layer — you only pay for read replicas you explicitly add. If a question asks about Neptune HA costs vs RDS Multi-AZ costs, Neptune's model is more cost-efficient for the same durability level.
Common Mistake
Neptune can be accessed from the public internet with the right security group rules
Correct
Neptune has NO public endpoint — period. It is strictly VPC-only. No security group rule change can expose Neptune to the public internet. Applications outside your VPC must connect via VPN, Direct Connect, VPC Peering, or Transit Gateway. This is a hard architectural constraint, not a configuration choice.
Architecture questions will present Neptune as a backend for a public-facing application and ask how external services connect. The answer always involves a network connectivity service (VPN/Direct Connect/VPC Peering) or an intermediary layer (Lambda in VPC, EC2 in VPC). Never 'public endpoint with security group'.
GRAPH = Go Relationships And Paths — that's Neptune's purpose. When you see 'relationships', 'traversal', 'connected data', 'social network', 'fraud ring', or 'recommendation' in an exam question — GRAPH = Neptune.
Neptune's query languages: G-O-S = Gremlin (TinkerPop Property Graph), openCypher (Property Graph), SPARQL (RDF/Semantic). Remember: 'GOS' — two for Property Graph, one for RDF.
Neptune Storage = Aurora Storage: Both use 6 copies across 3 AZs, auto-scale in 10 GiB increments, max 128 TiB, continuous backup to S3. If you know Aurora storage, you know Neptune storage.
Neptune is like a city with NO public roads — VPC-only, always. Every visitor (application) must use a private tunnel (VPN), a dedicated highway (Direct Connect), or live in the same neighborhood (same VPC).
CertAI Tutor · SAA-C03, SAP-C02, DEA-C01, CLF-C02 · 2026-02-21
In the Same Category
Comparisons