System Design Lab

Rate limiter design is a latency-sensitive atomic state problem.

Change request volume, quota, burst tolerance, key cardinality, hot-key skew, regions, and latency target. The architecture shifts from a local counter to Redis/Lua, sharded state, local pre-checks, and a quota service when global correctness matters.

Normal evolution scenarios

Click left to right for the intended demo path. Each card changes the workload inputs.

Workload

These are inputs, not preset architecture stages.

Recommended shape

Current architecture path
Rate limiter architecture diagram Whiteboard-style architecture diagram for synchronous rate-limit enforcement, local pre-checks, Redis Lua state, sharding, quota service, backend forwarding, and analytics events. Clients Edge / API Limiter state Coordination Service + analytics Client request burst API gateway enforce before app Local check cheap prefilter Redis Lua atomic state update Shard router key -> state shard Quota service global budgets Backend only allowed traffic Events abuse + tuning
Clients
Client sends traffic that must receive a synchronous allow or deny
Edge / API
API gateway runs enforcement before the backend call
Local pre-check fast in-process check for low-risk or cached state
Limiter state
Redis Lua atomic check-and-update for distributed servers
Shard router spreads key state and isolates hot keys
Coordination
Quota service coordinates strict global or regional budgets
Service + analytics
Backend receives only allowed traffic
Events records decisions for abuse analysis and tuning

Bottlenecks

Atomic path load

Hot-key pressure

State memory

Cross-region correctness

Latency budget

Why this changes

    Decision tradeoffs

    Limiter algorithm

    Local memory

    Redis + Lua

    State sharding

    Global quota

    Fail mode

    Source-backed rules

    These are the durable system-design claims behind the model. The exact slider thresholds are deliberately labeled as teaching assumptions.

    Verified rule

    Atomic increment plus expiry is the simple rate-limiter baseline

    Redis documents counter-based rate limiter patterns using INCR and key expiry, which matches the single-window baseline.

    Redis Docs
    Verified rule

    Lua scripts make check-and-update atomic on one Redis shard

    A limiter should not perform read, compute, and write as separate network operations when many API servers are racing.

    Redis Docs
    Verified rule

    Production rate limiting is usually enforced before the origin

    Edge enforcement protects the backend by deciding whether a request may continue before origin resources are spent.

    Cloudflare Docs
    Verified rule

    Distributed rate limiting has an explicit local versus global tradeoff

    Global rate limiting centralizes decisions, while local checks are faster but less exact across many instances or regions.

    Envoy Docs

    Teaching assumptions

    • The lab models the synchronous enforcement path; Kafka-style event streams are for analytics, abuse investigation, and tuning.
    • Hot-key thresholds are intentionally conservative because one abusive key can dominate a shard even when total QPS looks safe.
    • Strict global quotas across regions are modeled as a correctness choice that spends latency and availability budget.