System Design Lab
Ad tracking architecture changes only when constraints force it.
Start with the simplest click/impression collector. Use the scenarios for a normal evolution path, then adjust the workload to see exactly when one host, one shared database, or one partition stops being a good answer.
Normal evolution scenarios
Click left to right for the intended demo path. Each card changes the workload inputs.
Recommended shape
Single host collector
Keep ingestion, validation, and storage together while the workload fits one machine.
Bottlenecks
Why this changes
Decision tradeoffs
Source-backed rules
These are the durable system-design claims behind the model. The exact slider thresholds are deliberately labeled as teaching assumptions.
Partitions scale throughput, but ordering is partition-local
Kafka topics are split into partitions across brokers; consumers see ordered events within a topic-partition. This is why the lab treats partition key as both a scaling tool and an ordering tradeoff.
Apache Kafka docsDurable queues and streams decouple failure and back pressure
Queueing and streaming systems isolate producers from consumers and let components scale or fail independently. This supports the event-log step when direct writes start coupling ingestion to reports.
AWS Well-ArchitectedRealtime windows need event time, watermarks, and late-event policy
Flink uses watermarks to track event-time progress, and late or out-of-order events force latency/correctness tradeoffs. This backs the streaming and freshness parts of the lab.
Apache Flink docsRealtime analytics is an event-stream serving problem
Realtime analytics systems derive insights from event streams soon after generation. This is why dashboard/risk/billing views should split away from the raw ingest path as read pressure grows.
ClickHouse docsTeaching assumptions
- Capacity numbers in the sliders are teaching thresholds, not production benchmarks.
- Real capacity depends on payload size, batching, replication, acks, indexes, query shape, disk, network, and operational SLOs.
- The lab is strongest for interview reasoning: show the simplest design first, then name the exact constraint that forces the next component.