System Design Lab

Rate limiter design is a latency-sensitive atomic state problem.

Change request volume, quota, burst tolerance, key cardinality, hot-key skew, regions, and latency target. The architecture shifts from a local counter to Redis/Lua, sharded state, local pre-checks, and a quota service when global correctness matters.

Read the source article

Normal evolution scenarios

Click left to right for the intended demo path. Each card changes the workload inputs.

Workload

These are inputs, not preset architecture stages.

Request rate Synchronous allow/deny checks on the enforcement path. Quota per key Allowed requests for one enforcement key in a one-minute window. Burst allowance How much short burst above the steady quota should be tolerated. API servers Independent servers making allow/deny decisions. Active keys Distinct users, IPs, API keys, devices, or advertiser accounts with limiter state. Hottest key share How much total traffic one abusive or popular key can own. Regions Regions that must participate in enforcement. Decision latency target Budget for the limiter check before the backend request continues. Strict global quota Every region should share one precise quota instead of approximate regional budgets. Fail closed on store errors Deny requests when limiter state is unavailable; safer for abuse, riskier for availability.

Recommended shape

Current architecture path

Clients

Client sends traffic that must receive a synchronous allow or deny

Edge / API

API gateway runs enforcement before the backend call

Local pre-check fast in-process check for low-risk or cached state

Limiter state

Redis Lua atomic check-and-update for distributed servers

Shard router spreads key state and isolates hot keys

Coordination

Quota service coordinates strict global or regional budgets

Service + analytics

Backend receives only allowed traffic

Events records decisions for abuse analysis and tuning

Bottlenecks

Atomic path load

Hot-key pressure

State memory

Cross-region correctness

Latency budget

Why this changes

Decision tradeoffs

Limiter algorithm

Local memory

Redis + Lua

State sharding

Global quota

Fail mode

Source-backed rules

These are the durable system-design claims behind the model. The exact slider thresholds are deliberately labeled as teaching assumptions.

Verified rule

Atomic increment plus expiry is the simple rate-limiter baseline

Redis documents counter-based rate limiter patterns using INCR and key expiry, which matches the single-window baseline.

Redis Docs

Verified rule

Lua scripts make check-and-update atomic on one Redis shard

A limiter should not perform read, compute, and write as separate network operations when many API servers are racing.

Redis Docs

Verified rule

Production rate limiting is usually enforced before the origin

Edge enforcement protects the backend by deciding whether a request may continue before origin resources are spent.

Cloudflare Docs

Verified rule

Distributed rate limiting has an explicit local versus global tradeoff

Global rate limiting centralizes decisions, while local checks are faster but less exact across many instances or regions.

Envoy Docs

Teaching assumptions

The lab models the synchronous enforcement path; Kafka-style event streams are for analytics, abuse investigation, and tuning.
Hot-key thresholds are intentionally conservative because one abusive key can dominate a shard even when total QPS looks safe.
Strict global quotas across regions are modeled as a correctness choice that spends latency and availability budget.