Introduction to System Design
Table of Contents
System design is about making decisions that let software work reliably at scale. When your app has 100 users, almost any architecture works. When it has 10 million, the decisions you made early either save you or haunt you.
This tutorial introduces the core concepts, vocabulary, and thinking patterns you need before diving into specific components like load balancers, caches, and message queues.
What System Design Actually Means
System design is answering questions like:
- How do we handle 10,000 requests per second?
- What happens when a server crashes at 3 AM?
- How do we add features without breaking existing ones?
- Where does the data live, and how do we keep it consistent?
- How do we deploy changes without downtime?
It’s not about picking the “best” technology — it’s about understanding trade-offs and choosing what’s appropriate for your constraints.
The Core Trade-Offs
Every design decision involves trade-offs. The most fundamental ones:
Consistency vs Availability (CAP Theorem)
In a distributed system, when a network partition occurs, you must choose:
- Consistency — Every read returns the most recent write (or an error)
- Availability — Every request gets a response (but it might be stale)
A banking system chooses consistency (you can’t show a wrong balance). A social media feed chooses availability (showing a slightly stale feed is better than showing nothing).
Latency vs Throughput
- Latency — How long a single request takes
- Throughput — How many requests you can handle per second
Optimizing for one often hurts the other. Batching requests improves throughput but increases latency for individual items.
Simplicity vs Scalability
A monolithic application is simpler to develop, deploy, and debug. Microservices scale independently but add complexity in networking, deployment, and data consistency.
Start simple. Add complexity only when you have a specific scaling problem.
Key Metrics to Understand
Latency numbers every developer should know
| Operation | Time |
|---|---|
| L1 cache reference | 1 ns |
| RAM reference | 100 ns |
| SSD random read | 16 μs |
| Network round trip (same datacenter) | 500 μs |
| HDD seek | 4 ms |
| Network round trip (cross-continent) | 150 ms |
The takeaway: memory is fast, disk is slow, network is slower. Design accordingly.
Back-of-envelope estimation
Before designing, estimate the scale:
- Users: 10M daily active users
- Requests: 10 actions/user/day = 100M requests/day ≈ 1,150 req/sec average
- Peak: 3x average = ~3,500 req/sec
- Storage: 100M records × 1KB = 100GB
- Bandwidth: 3,500 req/sec × 1KB = 3.5 MB/sec
These rough numbers tell you whether you need one server or a hundred.
Building Blocks
Every large system is composed of a small set of building blocks:
Clients and Servers
The basic model: clients send requests, servers process them and return responses. A “server” might itself be a client to another service.
Load Balancers
Distribute incoming traffic across multiple servers. If one server dies, traffic routes to the others. Algorithms include round-robin, least connections, and consistent hashing.
Application Servers
Run your business logic. Stateless servers (no local state) are easier to scale — just add more behind the load balancer.
Databases
Persistent storage. Relational (PostgreSQL, MySQL) for structured data with relationships. NoSQL (MongoDB, DynamoDB, Redis) for specific access patterns.
Caches
Store frequently accessed data in memory for fast retrieval. Redis and Memcached are common choices. Caching reduces database load and improves latency dramatically.
Message Queues
Decouple producers from consumers. Instead of processing a task immediately, put it on a queue and let a worker handle it asynchronously. Kafka, RabbitMQ, and SQS are popular options.
CDNs (Content Delivery Networks)
Serve static content (images, CSS, JS) from servers geographically close to users. Reduces latency and offloads traffic from your origin servers.
Scaling Patterns
Vertical Scaling (Scale Up)
Get a bigger machine: more CPU, more RAM, more disk. Simple but has limits — you can’t buy an infinitely large server.
Horizontal Scaling (Scale Out)
Add more machines. More complex (you need load balancing, data partitioning) but virtually unlimited.
Database Scaling
- Read replicas — Copy data to read-only databases; route reads there, writes to the primary
- Sharding — Split data across multiple databases by some key (user ID, geography)
- Caching — Put a cache (Redis) in front of the database for hot data
Stateless Services
If your application servers store no local state (sessions, caches), any server can handle any request. This makes horizontal scaling trivial — just add servers behind the load balancer.
Reliability Patterns
Redundancy
No single point of failure. Multiple servers, multiple databases, multiple availability zones. If one fails, others take over.
Health Checks
Load balancers periodically check if servers are healthy. Unhealthy servers are removed from rotation automatically.
Graceful Degradation
When a component fails, the system continues with reduced functionality rather than crashing entirely. If the recommendation service is down, show popular items instead of personalized ones.
Circuit Breakers
If a downstream service is failing, stop sending it requests (open the circuit). Periodically try again. This prevents cascading failures.
Retries with Backoff
When a request fails, retry — but wait longer between each attempt (exponential backoff). Add jitter (randomness) so all clients don’t retry simultaneously.
A Simple Architecture
Here’s a typical web application architecture:
Users → CDN (static assets)
→ Load Balancer → App Servers → Cache (Redis)
→ Database (PostgreSQL)
→ Message Queue → Workers
- CDN serves images, CSS, JS
- Load balancer distributes requests across app servers
- App servers handle business logic (stateless)
- Cache stores hot data (sessions, frequent queries)
- Database stores persistent data
- Message queue + workers handle async tasks (emails, image processing, reports)
This handles most applications up to millions of users. Beyond that, you start sharding databases, adding more cache layers, and splitting into microservices.
How to Think About System Design
- Start with requirements — What does the system need to do? What are the constraints?
- Estimate scale — How many users, requests, data? This determines your architecture.
- Design the happy path — How does a normal request flow through the system?
- Identify bottlenecks — What breaks first at 10x scale?
- Add reliability — What happens when components fail?
- Consider evolution — How will this change in 6 months? 2 years?
What’s Next
This introduction gives you the vocabulary and mental models. The next tutorials dive deep into specific building blocks: load balancing strategies, caching patterns, database scaling, and message queue architectures. Each one builds on these fundamentals.