serverless

Caching Strategies on AWS: Cache Smarter, Not Harder

Master every AWS caching pattern to slash latency, cut costs, and ace certification exams

Updated 2026-02-22

Overview

Caching on AWS is the practice of storing frequently accessed data in a fast, temporary storage layer to reduce latency, decrease backend load, and lower costs. AWS offers multiple caching services and strategies — each optimized for different data types, access patterns, and consistency requirements. For certification exams, understanding WHICH caching strategy to apply in WHICH scenario (and why) is the core competency being tested.

Exams test your ability to select the correct caching strategy (lazy loading, write-through, TTL-based, etc.) and the correct AWS service (ElastiCache, DAX, CloudFront, API Gateway cache, etc.) given a specific architectural scenario — especially when cost, consistency, and latency are in tension.

Patterns & Strategies

Lazy Loading (Cache-Aside)

Data is loaded into the cache only when it is requested and not already present (a cache miss). The application first checks the cache; on a miss, it fetches from the database, writes the result to the cache, and returns it to the caller. Subsequent requests for the same data are served from the cache (a cache hit).

✓

Best for read-heavy workloads where not all data needs to be cached, data access patterns are unpredictable, and stale data is tolerable for short periods. Ideal when you want to avoid pre-populating the cache with data that may never be read.

⚠

Cache misses result in three round-trips (cache check → DB read → cache write), adding latency on first access. Data in cache can become stale if the underlying DB is updated without invalidating the cache entry. Cold start penalty when cache is empty (e.g., after a restart).

Write-Through

Every write to the database is simultaneously written to the cache. The application writes data to the cache first (or alongside the DB write), ensuring the cache is always up to date. There are never stale reads for data that has been written through.

✓

Best for write-heavy workloads where data freshness is critical and stale reads are unacceptable. Use when reads follow writes closely and you can afford the extra write latency. Common in financial, inventory, or session data scenarios.

⚠

Cache may be populated with data that is never subsequently read (cache churn / wasted memory). Every write incurs the overhead of two writes (DB + cache), increasing write latency. Cache nodes must be sized for the full write volume, not just the read hot-set.

Write-Behind (Write-Back)

The application writes data to the cache only; the cache asynchronously flushes the data to the database after a configurable delay or batch interval. Reads always come from the cache, and writes are buffered.

✓

Best for extremely write-intensive workloads where write throughput to the database is a bottleneck and some risk of data loss (if the cache node fails before flushing) is acceptable. Useful for analytics counters, leaderboards, or IoT telemetry aggregation.

⚠

Risk of data loss if the cache node fails before the async write completes. Increased architectural complexity. Not natively supported as a first-class pattern in ElastiCache — requires application-level implementation or a Lambda/stream-based flush mechanism.

TTL-Based Expiration (Time-to-Live)

Cache entries are assigned an expiration time (TTL). When the TTL expires, the entry is evicted and the next request triggers a cache miss and a fresh data fetch. TTL can be combined with lazy loading or write-through strategies.

✓

Use whenever data has a known or acceptable staleness window — e.g., product catalog data that changes hourly, session tokens, API responses, or configuration data. TTL is a universal complement to any other caching strategy to prevent indefinite staleness.

⚠

Too short a TTL defeats the purpose of caching (too many misses). Too long a TTL risks serving stale data. Thundering herd problem: if many cache entries share the same TTL and expire simultaneously, a flood of DB requests can overwhelm the backend. Mitigate with TTL jitter (randomizing expiry times).

Content Delivery Network (CDN) Caching — CloudFront

Static and dynamic content is cached at AWS edge locations (Points of Presence) globally. CloudFront caches HTTP responses based on cache behaviors, query strings, headers, and cookies. Cache-Control and Expires headers from the origin govern TTL at the edge.

✓

Use for any globally distributed web application, static asset delivery (S3-backed), API acceleration, or video streaming. Ideal when reducing geographic latency and origin load is the primary goal. Also use for DDoS mitigation via AWS Shield integration.

⚠

Cache invalidation costs money (first 1,000 paths/month free, then charged per path). Highly dynamic, personalized content is difficult to cache at the edge. Requires careful Cache-Control header design to avoid caching sensitive or user-specific data.

In-Memory Caching — ElastiCache (Redis / Memcached)

A fully managed, in-memory data store placed between your application and your database. ElastiCache for Redis supports rich data structures (strings, hashes, lists, sets, sorted sets, bitmaps, streams), persistence, replication, pub/sub, and Lua scripting. ElastiCache for Memcached is a simpler, multi-threaded, pure caching engine with no persistence.

✓

Use Redis when you need persistence, replication, complex data structures, geospatial indexing, leaderboards, pub/sub messaging, or Multi-AZ failover. Use Memcached when you need the simplest possible horizontal scaling of a pure cache with no persistence requirements and multi-threaded performance.

⚠

ElastiCache is not serverless by default (you manage node types and cluster size). ElastiCache Serverless is available but has different cost characteristics. Redis single-threaded architecture can be a bottleneck for CPU-intensive operations. Data is lost on Memcached node failure unless lazy loading repopulates it.

DAX — DynamoDB Accelerator

A fully managed, in-memory cache purpose-built for DynamoDB. DAX is API-compatible with DynamoDB — your application uses the DAX client and the cache is transparent. DAX provides microsecond read latency for eventually consistent reads and caches both item-level results (item cache) and query/scan results (query cache).

✓

Use exclusively with DynamoDB when read latency must be in the microsecond range, read-to-write ratios are high, and your application performs repeated reads of the same items or query results. Ideal for gaming leaderboards, social media feeds, and real-time bidding.

⚠

DAX only works with DynamoDB — it cannot cache other data sources. DAX does NOT cache strongly consistent reads (they always go to DynamoDB). DAX is not serverless (you provision node types). Write operations go through DAX to DynamoDB synchronously, so write latency is not improved. DAX is not suitable if your workload is write-heavy.

API Gateway Caching

API Gateway can cache endpoint responses at the stage level with a configurable TTL (default 300 seconds, range 0–3600 seconds). Cached responses are returned without invoking the backend Lambda function or HTTP endpoint, reducing latency and backend cost.

✓

Use when your API serves repetitive, cacheable responses (e.g., reference data, product listings) and you want to reduce Lambda invocation costs and backend load. Useful for REST APIs — note that HTTP APIs (v2) do NOT support response caching natively.

⚠

API Gateway caching is charged per hour based on the cache size selected (0.5 GB to 237 GB). Cache is per-stage and per-region. Cache invalidation can be triggered via the console, CLI, or by clients sending Cache-Control: max-age=0 headers (if permitted). Not available for HTTP APIs (only REST APIs).

Session Caching

User session state is stored in a centralized cache (typically ElastiCache for Redis) rather than on individual application server instances. This enables stateless application tiers where any instance can serve any user request.

✓

Use whenever you run multiple EC2 instances, ECS tasks, or Lambda functions behind a load balancer and need to maintain user session state (shopping carts, auth tokens, preferences) without sticky sessions. Essential for horizontal scaling and fault tolerance.

⚠

Adds a network hop for every session read/write. Redis TTL must be aligned with your session timeout policy. If Redis is unavailable, session data is inaccessible — design for graceful degradation or use Multi-AZ Redis with automatic failover.

Object Caching — S3 + CloudFront

Amazon S3 stores durable objects (images, videos, HTML, JS, CSS, data files). CloudFront caches these objects at edge locations. S3 Transfer Acceleration can also speed up uploads. For frequently accessed S3 objects, CloudFront dramatically reduces S3 GET request costs and latency.

✓

Use for any static website, media delivery, software distribution, or dataset that is read far more often than it is written. Combine with S3 Versioning and CloudFront cache invalidation for content update workflows.

⚠

CloudFront has an eventual consistency model at the edge — after S3 object updates, old cached versions may be served until TTL expires or invalidation is triggered. Invalidation has cost implications beyond the free tier.

Decision Framework

• STEP 1 — What is the data source?

→ DynamoDB only? → Use DAX (microsecond latency, API-compatible).

→ RDS / Aurora / other DB? → Use ElastiCache (Redis or Memcached).

→ HTTP API responses? → Use API Gateway Caching (REST APIs only) or CloudFront.

→ Static/media content? → Use CloudFront + S3.

• STEP 2 — What are the consistency requirements?

→ Stale data acceptable for a window? → Lazy Loading + TTL.

→ Data must always be fresh on reads? → Write-Through.

→ Write throughput is the bottleneck and some loss is OK? → Write-Behind.

→ Strongly consistent reads required with DynamoDB? → Bypass DAX, go directly to DynamoDB.

• STEP 3 — What is the access pattern?

→ Unpredictable / sparse reads? → Lazy Loading (avoid pre-populating).

→ Predictable / all data will be read? → Write-Through or pre-warming.

→ Global users, geographic latency? → CloudFront CDN caching.

→ Session state across stateless instances? → ElastiCache Redis for session caching.

• STEP 4 — What are the operational requirements?

→ Need persistence + replication + complex data types? → ElastiCache Redis.

→ Need pure horizontal scale, no persistence? → ElastiCache Memcached.

→ Need fully serverless with no cluster management? → ElastiCache Serverless or DAX (for DynamoDB).

→ Need to reduce Lambda invocation costs for repetitive API calls? → API Gateway stage caching.

STEP 5 — Watch for anti-patterns:

• → Never use DAX for write-heavy DynamoDB workloads.

→ Never use Memcached when you need failover, persistence, or complex data structures.

→ Never cache strongly consistent DynamoDB reads through DAX.

→ Never rely on API Gateway caching for HTTP APIs (v2) — it is not supported.

Exam Tips

criticalDAX, DynamoDB consistency models

DAX does NOT improve write performance and does NOT cache strongly consistent reads. If a question mentions strongly consistent reads with DynamoDB, DAX is NEVER the answer for those reads — they bypass the DAX cache and go directly to DynamoDB.

criticalAPI Gateway REST vs HTTP APIs

API Gateway response caching is ONLY available for REST APIs (v1), NOT for HTTP APIs (v2). If a question asks how to cache API Gateway responses for an HTTP API, the answer involves CloudFront in front of the API, not API Gateway's built-in cache.

criticalElastiCache Redis vs Memcached

ElastiCache for Redis supports Multi-AZ with automatic failover. ElastiCache for Memcached does NOT support replication or Multi-AZ — if a Memcached node fails, all data on that node is lost and must be repopulated via lazy loading.

critical

DAX never caches strongly consistent DynamoDB reads — they always go directly to DynamoDB. If the question mentions strongly consistent reads, DAX is not the caching solution.

critical

API Gateway response caching is only available for REST APIs (v1), not HTTP APIs (v2). For HTTP APIs, use CloudFront as the caching layer.

critical

Redis = persistence + replication + complex data structures + Multi-AZ. Memcached = pure volatile cache, multi-threaded, no persistence, no replication. Match the requirement to the engine.

importantLazy Loading, cache miss behavior

Lazy Loading results in a cache miss penalty of THREE operations: (1) check cache, (2) read from DB, (3) write to cache. Exams may ask about the performance impact of cold caches or cache node failures — lazy loading means stale or missing data is re-fetched on demand.

importantTTL, cache expiration, thundering herd

The Thundering Herd (or cache stampede) problem occurs when many cache keys expire at the same time, causing a flood of simultaneous DB requests. The solution is TTL jitter — adding a random offset to each cache entry's TTL to spread out expiration times.

importantWrite-Through vs Lazy Loading, cache efficiency

Write-Through caching populates the cache on every write, which means the cache may contain data that is NEVER read — wasting memory and increasing costs. Lazy Loading only caches data that is actually requested, making it more memory-efficient for sparse access patterns.

importantCloudFront, cache invalidation, cost optimization

CloudFront cache invalidation: the first 1,000 invalidation paths per month are free. After that, each invalidation path is charged. Wildcard invalidations (e.g., /images/*) count as ONE path. For cost-efficient content updates, use versioned file names (cache-busting) instead of invalidations.

importantSession caching, stateless architecture, ElastiCache Redis

Session state stored in ElastiCache Redis enables truly stateless application tiers. If an exam question describes a scenario where users lose their session when an EC2 instance is terminated or replaced, the fix is to move session storage to ElastiCache — NOT to use sticky sessions (which create single points of failure).

Good to KnowElastiCache networking, VPC

ElastiCache is NOT accessible from the public internet by default — it lives inside a VPC. Applications must be in the same VPC (or a peered VPC) to connect. This is a security feature, not a limitation — exams may test whether you know ElastiCache requires VPC placement.

Common Misconceptions & Traps

Common Mistake

DAX caches all DynamoDB reads, including strongly consistent reads, giving microsecond latency for all read types.

Correct

DAX ONLY caches eventually consistent reads. Strongly consistent reads ALWAYS bypass the DAX item cache and go directly to DynamoDB, incurring normal DynamoDB read latency. This is a hard architectural constraint, not a configuration option.

This is one of the most common DAX traps on the exam. If a question says 'strongly consistent reads' and asks about caching, DAX is wrong. Remember: DAX = Eventually consistent only.

Common Mistake

ElastiCache for Memcached and ElastiCache for Redis are interchangeable — choose either for any caching use case.

Correct

Redis and Memcached have fundamentally different capabilities. Redis supports persistence (AOF/RDB snapshots), replication, Multi-AZ failover, complex data structures (sorted sets, streams, geospatial), pub/sub, and Lua scripting. Memcached is a pure, multi-threaded, volatile cache with no persistence and no replication. Choosing the wrong one for a scenario is a common exam mistake.

Exams frequently present scenarios that require one specific capability (e.g., 'survive a node failure' → Redis; 'multi-threaded horizontal scale, no persistence' → Memcached). Map the requirement to the capability, not just the name.

Common Mistake

Write-Through caching is always better than Lazy Loading because the cache is always up to date.

Correct

Write-Through has a significant drawback: every item written to the DB is also written to the cache, even if it is never subsequently read. This wastes cache memory and increases write latency. For workloads with sparse or unpredictable read patterns, Lazy Loading is more efficient. Neither pattern is universally superior — the right choice depends on the read/write ratio and freshness requirements.

Exams test nuanced trade-off understanding. Watch for questions that describe a scenario with many writes but few reads of the same data — Write-Through is the WRONG choice there despite seeming 'safer' for consistency.

Common Mistake

Adding a cache always improves performance and is always the right architectural choice.

Correct

Caching introduces complexity (cache invalidation, consistency management, cold start penalties, thundering herd risk) and cost (cache node pricing, CloudFront data transfer). For data that changes very frequently or must always be strongly consistent, caching may add latency (cache miss + DB read) without meaningful benefit. Caching is a trade-off, not a universal improvement.

Exams may present scenarios where caching is the WRONG answer — e.g., financial transaction data requiring strong consistency, or data that changes on every request. Recognizing when NOT to cache is as important as knowing when to cache.

Common Mistake

API Gateway caching works the same way for both REST APIs and HTTP APIs.

Correct

API Gateway response caching is ONLY supported for REST APIs (v1). HTTP APIs (v2) do not have a built-in response cache. To cache responses for an HTTP API, you must place CloudFront in front of it and configure CloudFront cache behaviors.

This is a frequent trap because HTTP APIs are positioned as the 'modern, cheaper' option, leading candidates to assume they have all REST API features. They don't — caching is a key missing feature.

Common Mistake

CloudFront only caches static content from S3 and cannot cache dynamic content or API responses.

Correct

CloudFront can cache ANY HTTP response, including dynamic content and API responses, based on configurable cache behaviors (query strings, headers, cookies). CloudFront can sit in front of API Gateway, Application Load Balancers, EC2, and custom origins — not just S3. Cache behavior configuration determines what is cached and for how long.

Candidates often limit CloudFront mentally to 'S3 CDN.' In reality, CloudFront is a full reverse-proxy cache that can accelerate and cache any HTTP-based workload.

Memory Tricks

🧠

LAZY = Load As Zapped (on miss): data enters cache only when requested. WRITE-THROUGH = Write Twice, Read Once: every write goes to both DB and cache simultaneously.

🧠

DAX = Definitely Always eXcludes strongly consistent reads. If 'strongly consistent' appears in the question, DAX is NOT the caching answer.

🧠

Redis vs Memcached: Redis = Rich (persistence, replication, data structures, pub/sub). Memcached = Minimal (pure cache, multi-threaded, volatile, nothing fancy).

🧠

TTL Jitter = Jitter prevents stampede: randomize expiry times to prevent the thundering herd from hitting your database simultaneously.

🧠

CloudFront Invalidation Cost Rule: '1000 free, then fee per path' — use versioned filenames (cache-busting) to avoid invalidation costs entirely.

Common Trap

Assuming DAX caches all DynamoDB reads including strongly consistent reads — it does NOT. Strongly consistent reads bypass DAX entirely and go directly to DynamoDB, incurring full DynamoDB latency. This single misconception causes more DAX-related exam failures than any other.

CertAI Tutor · · 2026-02-22

Ready to test your knowledge?

Practice exam questions with AI-powered explanations — free to start.

Caching Strategies on AWS: Cache Smarter, Not Harder

Overview

Patterns & Strategies

Decision Framework

Exam Tips

Common Misconceptions & Traps

Memory Tricks

Common Trap

Ready to test your knowledge?

Related Cheat Sheets