System Design Lab
Rate limiter design is a latency-sensitive atomic state problem.
Change request volume, quota, burst tolerance, key cardinality, hot-key skew, regions, and latency target. The architecture shifts from a local counter to Redis/Lua, sharded state, local pre-checks, and a quota service when global correctness matters.
Normal evolution scenarios
Click left to right for the intended demo path. Each card changes the workload inputs.
Recommended shape
Bottlenecks
Why this changes
Decision tradeoffs
Source-backed rules
These are the durable system-design claims behind the model. The exact slider thresholds are deliberately labeled as teaching assumptions.
Atomic increment plus expiry is the simple rate-limiter baseline
Redis documents counter-based rate limiter patterns using INCR and key expiry, which matches the single-window baseline.
Redis DocsLua scripts make check-and-update atomic on one Redis shard
A limiter should not perform read, compute, and write as separate network operations when many API servers are racing.
Redis DocsProduction rate limiting is usually enforced before the origin
Edge enforcement protects the backend by deciding whether a request may continue before origin resources are spent.
Cloudflare DocsDistributed rate limiting has an explicit local versus global tradeoff
Global rate limiting centralizes decisions, while local checks are faster but less exact across many instances or regions.
Envoy DocsTeaching assumptions
- The lab models the synchronous enforcement path; Kafka-style event streams are for analytics, abuse investigation, and tuning.
- Hot-key thresholds are intentionally conservative because one abusive key can dominate a shard even when total QPS looks safe.
- Strict global quotas across regions are modeled as a correctness choice that spends latency and availability budget.