Introduction to System Design

May 18, 2026
#system-design #scalability #architecture

System design is about making decisions that let software work reliably at scale. When your app has 100 users, almost any architecture works. When it has 10 million, the decisions you made early either save you or haunt you.

This tutorial introduces the core concepts, vocabulary, and thinking patterns you need before diving into specific components like load balancers, caches, and message queues.

What System Design Actually Means

System design is answering questions like:

  • How do we handle 10,000 requests per second?
  • What happens when a server crashes at 3 AM?
  • How do we add features without breaking existing ones?
  • Where does the data live, and how do we keep it consistent?
  • How do we deploy changes without downtime?

It’s not about picking the “best” technology — it’s about understanding trade-offs and choosing what’s appropriate for your constraints.

The Core Trade-Offs

Every design decision involves trade-offs. The most fundamental ones:

Consistency vs Availability (CAP Theorem)

In a distributed system, when a network partition occurs, you must choose:

  • Consistency — Every read returns the most recent write (or an error)
  • Availability — Every request gets a response (but it might be stale)

A banking system chooses consistency (you can’t show a wrong balance). A social media feed chooses availability (showing a slightly stale feed is better than showing nothing).

Latency vs Throughput

  • Latency — How long a single request takes
  • Throughput — How many requests you can handle per second

Optimizing for one often hurts the other. Batching requests improves throughput but increases latency for individual items.

Simplicity vs Scalability

A monolithic application is simpler to develop, deploy, and debug. Microservices scale independently but add complexity in networking, deployment, and data consistency.

Start simple. Add complexity only when you have a specific scaling problem.

Key Metrics to Understand

Latency numbers every developer should know

Operation Time
L1 cache reference 1 ns
RAM reference 100 ns
SSD random read 16 μs
Network round trip (same datacenter) 500 μs
HDD seek 4 ms
Network round trip (cross-continent) 150 ms

The takeaway: memory is fast, disk is slow, network is slower. Design accordingly.

Back-of-envelope estimation

Before designing, estimate the scale:

  • Users: 10M daily active users
  • Requests: 10 actions/user/day = 100M requests/day ≈ 1,150 req/sec average
  • Peak: 3x average = ~3,500 req/sec
  • Storage: 100M records × 1KB = 100GB
  • Bandwidth: 3,500 req/sec × 1KB = 3.5 MB/sec

These rough numbers tell you whether you need one server or a hundred.

Building Blocks

Every large system is composed of a small set of building blocks:

Clients and Servers

The basic model: clients send requests, servers process them and return responses. A “server” might itself be a client to another service.

Load Balancers

Distribute incoming traffic across multiple servers. If one server dies, traffic routes to the others. Algorithms include round-robin, least connections, and consistent hashing.

Application Servers

Run your business logic. Stateless servers (no local state) are easier to scale — just add more behind the load balancer.

Databases

Persistent storage. Relational (PostgreSQL, MySQL) for structured data with relationships. NoSQL (MongoDB, DynamoDB, Redis) for specific access patterns.

Caches

Store frequently accessed data in memory for fast retrieval. Redis and Memcached are common choices. Caching reduces database load and improves latency dramatically.

Message Queues

Decouple producers from consumers. Instead of processing a task immediately, put it on a queue and let a worker handle it asynchronously. Kafka, RabbitMQ, and SQS are popular options.

CDNs (Content Delivery Networks)

Serve static content (images, CSS, JS) from servers geographically close to users. Reduces latency and offloads traffic from your origin servers.

Scaling Patterns

Vertical Scaling (Scale Up)

Get a bigger machine: more CPU, more RAM, more disk. Simple but has limits — you can’t buy an infinitely large server.

Horizontal Scaling (Scale Out)

Add more machines. More complex (you need load balancing, data partitioning) but virtually unlimited.

Database Scaling

  • Read replicas — Copy data to read-only databases; route reads there, writes to the primary
  • Sharding — Split data across multiple databases by some key (user ID, geography)
  • Caching — Put a cache (Redis) in front of the database for hot data

Stateless Services

If your application servers store no local state (sessions, caches), any server can handle any request. This makes horizontal scaling trivial — just add servers behind the load balancer.

Reliability Patterns

Redundancy

No single point of failure. Multiple servers, multiple databases, multiple availability zones. If one fails, others take over.

Health Checks

Load balancers periodically check if servers are healthy. Unhealthy servers are removed from rotation automatically.

Graceful Degradation

When a component fails, the system continues with reduced functionality rather than crashing entirely. If the recommendation service is down, show popular items instead of personalized ones.

Circuit Breakers

If a downstream service is failing, stop sending it requests (open the circuit). Periodically try again. This prevents cascading failures.

Retries with Backoff

When a request fails, retry — but wait longer between each attempt (exponential backoff). Add jitter (randomness) so all clients don’t retry simultaneously.

A Simple Architecture

Here’s a typical web application architecture:

Users → CDN (static assets)
     → Load Balancer → App Servers → Cache (Redis)
                                   → Database (PostgreSQL)
                                   → Message Queue → Workers
  • CDN serves images, CSS, JS
  • Load balancer distributes requests across app servers
  • App servers handle business logic (stateless)
  • Cache stores hot data (sessions, frequent queries)
  • Database stores persistent data
  • Message queue + workers handle async tasks (emails, image processing, reports)

This handles most applications up to millions of users. Beyond that, you start sharding databases, adding more cache layers, and splitting into microservices.

How to Think About System Design

  1. Start with requirements — What does the system need to do? What are the constraints?
  2. Estimate scale — How many users, requests, data? This determines your architecture.
  3. Design the happy path — How does a normal request flow through the system?
  4. Identify bottlenecks — What breaks first at 10x scale?
  5. Add reliability — What happens when components fail?
  6. Consider evolution — How will this change in 6 months? 2 years?

What’s Next

This introduction gives you the vocabulary and mental models. The next tutorials dive deep into specific building blocks: load balancing strategies, caching patterns, database scaling, and message queue architectures. Each one builds on these fundamentals.