Simonyan System Design Architecture Skill

Last updated: 27 May 2026

Apply a structured, senior-engineer methodology to design scalable systems from scratch, make defensible architectural trade-off decisions, and communicate them clearly in interviews or on the job.

// TL;DR

The Simonyan System Design Architecture Skill is a structured, senior-engineer methodology for designing scalable systems from scratch. It walks you through a ten-step workflow — from a single-server baseline to full production infrastructure — covering database selection, horizontal scaling, load balancing, API design (REST, GraphQL, gRPC), caching, and redundancy. Use it whenever you need to design a new system, prepare for a system design interview, conduct an architectural review, or evaluate an existing codebase for scalability, reliability, and performance. Every step requires you to explicitly articulate the trade-offs behind each decision.

Framework

// When should I use the Simonyan System Design Architecture Skill?

Use this skill whenever you need to design a new system or evaluate an existing one for scalability, reliability, and performance — including system design interviews, architectural reviews, or onboarding onto a complex codebase.

// What inputs do I need before starting a system design?

System or feature to designrequired
A plain-English description of what the system must do (e.g., 'a product catalogue API serving millions of mobile users').
Scale requirementsrequired
Approximate user volume, read/write ratio, latency targets, and whether traffic is bursty or steady.
Data characteristicsrequired
Whether data is structured/relational, unstructured/semi-structured, or graph-like; consistency vs. availability priorities.
Interaction pattern
Request-response, real-time streaming, async processing, or a mix.
Deployment context
Cloud provider, existing infrastructure, microservices vs. monolith.

// What are the core principles of the Simonyan System Design framework?

Start Small, Then Scale

Begin with a single-server setup that handles one user. Understand every component in that setup before adding complexity. Every complex system is just an evolved simple one.

Separate the Web Tier and Data Tier

Once user demand grows, split the server handling web/mobile traffic from the server managing the database. This lets each tier scale independently based on its specific load.

Avoid Single Points of Failure

Any component whose failure brings the entire system down is a single point of failure. Eliminate them through redundancy, health checks, and self-healing systems — for databases, load balancers, and every critical component.

Horizontal Scaling Over Vertical Scaling

Vertical scaling (scale up) adds resources to one server; it hits a hard cap and has no redundancy. Horizontal scaling (scale out) adds more servers and is generally more suitable for high-traffic applications because it offers higher fault tolerance and unlimited growth.

Match the Database to the Data Shape

Use SQL (relational) databases when data is well-structured, relationships are clear, and strong ACID transactional integrity is required. Use NoSQL databases when data is unstructured or semi-structured, low latency is critical, or the schema must be flexible at massive scale.

Choose the Protocol Based on Interaction Pattern

Your protocol choice fundamentally shapes your API design options and performance. HTTP for standard request-response, WebSockets for real-time bidirectional communication, gRPC for high-performance inter-service communication, and AMQP for async message queuing.

The Best API Needs No Documentation

A great API is consistent in naming and casing, simple enough for developers to use intuitively, secure by default (authentication, authorization, rate limiting, input validation), and performant through caching, pagination, and minimised round trips.

Articulate the Trade-offs

Senior-level system design is not about finding the perfect answer — it is about understanding what you gain and what you give up with each architectural decision. Always state the trade-off explicitly.

// How do you apply the Simonyan System Design Skill step by step?

1
Establish the single-server baseline
Describe the simplest version of the system: one server, one database, one cache, accessed via DNS → IP resolution. Trace the request flow: client types domain → DNS returns IP → HTTP request to server → server processes → response (HTML for browser, JSON for mobile). Identify the two traffic sources: web applications and mobile applications.
2
Identify scale triggers and separate tiers
Ask: at what point does the single server break? Separate the Web Tier (handles web/mobile traffic) from the Data Tier (manages the database). Each can now be scaled independently. Flag any single points of failure introduced at this stage.
3
Select the right database type
Run the database selection decision: Is data well-structured with clear relationships? → SQL (PostgreSQL, MySQL). Need ACID transactions (banking, finance, e-commerce orders)? → SQL. Need super-low latency, flexible/unstructured schema, or massive write throughput? → NoSQL. Then pick the NoSQL sub-type: document store (MongoDB) for JSON-like records; wide-column store (Cassandra) for massive write scale; key-value store (Redis) for RAM-speed lookups; graph store (Neo4j) for relationship-heavy data like recommendations.
4
Design the scaling strategy
Decide between vertical scaling (scale up: add RAM/CPU to existing server — simple, but has a hard resource cap and zero redundancy) and horizontal scaling (scale out: replicate servers — preferred for high-traffic because it provides fault tolerance and unlimited growth). For horizontal scaling, introduce a load balancer in front of the server pool.
5
Configure the load balancing algorithm
Choose the algorithm that fits the server pool and use case: Round Robin (equal-spec servers, simple rotation); Least Connections (variable-length sessions, routes to server with fewest active connections); Least Response Time (heterogeneous servers, optimises for fastest response + fewest connections); IP Hash (client must consistently reach the same server); Weighted variants (servers have different RAM/CPU capacities); Geographic (global service, route to nearest server to reduce latency); Consistent Hashing (distributes via hash ring, ensures session affinity). Configure health checks so the load balancer stops routing to failed servers automatically.
6
Eliminate single points of failure
For every critical component (load balancer, database, cache), apply one of three strategies: Redundancy (run multiple instances so if one fails, others absorb traffic); Health checks and monitoring (continuously probe components, stop routing to failed ones); Self-healing systems (auto-replace a failed instance with a fresh one). Apply database replication here (covered in the databases section of the methodology).
7
Define the API style and protocol
Pick the API style: REST for standard web/mobile apps (stateless, resource-based, HTTP methods); GraphQL for complex UIs needing flexible queries with minimal round trips and precise data shapes; gRPC for high-performance microservice-to-microservice communication using Protocol Buffers over HTTP/2. Then pick the protocol: HTTP/HTTPS for request-response (always use HTTPS — TLS encryption is the golden standard); WebSockets for real-time bidirectional communication (chat, live feeds); AMQP for async message queuing with producer-consumer decoupling; gRPC/HTTP2 for internal microservice RPC calls. TCP vs UDP at transport layer: TCP for reliable ordered delivery (payments, auth, user data); UDP for speed-over-reliability (video calls, gaming, live streams).
8
Design the API contract
For REST: model resources as plural nouns (not verbs) — /products not /getProducts. Use proper HTTP methods (GET=read, POST=create, PUT=full replace, PATCH=partial update, DELETE=remove). Return correct status codes (200 OK, 201 Created, 404 Not Found, 400 Bad Request, 401 Unauthorized, 500 Server Error). Add versioning (/api/v1/). Support filtering (query params), sorting, and pagination (page+limit or offset+limit or cursor-based). For GraphQL: define a schema (types, queries, mutations) that mirrors the domain model. Keep schemas small and modular. Limit query depth. Always return an errors field (GraphQL always returns HTTP 200; errors live in the response body). Use input types for all mutations.
9
Apply the four API design principles as a checklist
Consistency: uniform naming, casing, and patterns across all endpoints. Simplicity: developers should intuit usage without reading docs. Security: authentication, authorization, input validation, rate limiting — non-negotiable. Performance: caching strategy, pagination on all list endpoints, minimise payload size, reduce round trips by co-locating related data where sensible.
10
Articulate the trade-offs for every major decision
For each architectural choice made in steps 1–9, state explicitly: what you gain (e.g., gRPC gives lower latency between services) and what you give up (e.g., gRPC requires HTTP/2 client support, so it is unsuitable for browser-facing APIs). This is what separates senior-level answers from junior-level answers.

// What does the Simonyan System Design framework look like in practice?

A startup building a social media app expects to grow from 1,000 to 10 million users over 18 months. They have a feed, user profiles, posts, and a real-time notification system.

Start with a single-server baseline, then separate the Web Tier and Data Tier immediately given the growth trajectory. Use PostgreSQL (SQL) for user profiles and posts (structured, relational, need transactional integrity). Add Redis (key-value NoSQL) for session caching and feed caching (RAM-speed lookups). Horizontally scale the API servers behind a load balancer using Least Connections (sessions vary in length). Eliminate single points of failure on the database with replication. Use REST for the standard CRUD feed/profile API with versioning (/api/v1/) and pagination on all list endpoints. Use WebSockets for real-time notifications (bidirectional, push-based). Trade-off stated: WebSockets require persistent connections which consume server memory — mitigate with horizontal scaling and connection limits.

An e-commerce platform needs a product recommendation engine that processes hundreds of millions of user-activity events and serves personalised results at low latency.

Separate the recommendation write path (event ingestion) from the read path (serving recommendations). Use a NoSQL wide-column store (Cassandra) for storing user-activity events at massive write scale. Use a graph database (Neo4j-style) for modelling product-user relationships that power recommendations. Introduce a message queue (AMQP) between the event producer (web/app) and the recommendation processor (consumer) so the consumer processes at its own pace without dropping events. Expose recommendations via a REST API with Redis key-value caching in front of it for sub-millisecond read latency. Use geographic load balancing if users are globally distributed. Trade-off stated: eventual consistency in the recommendation engine is acceptable because a slightly stale recommendation is better than high latency.

// What mistakes should I avoid when designing systems with this framework?

Jumping to complex architecture before establishing the single-server baseline — you will miss the component interactions that matter.
Choosing vertical scaling (scale up) as a long-term strategy: it has a hard resource cap and introduces a single point of failure with no redundancy.
Using a SQL database for every use case — unstructured/semi-structured data at massive scale belongs in NoSQL; forcing it into SQL creates schema rigidity and performance bottlenecks.
Treating the load balancer as infinitely reliable — a single load balancer is itself a single point of failure; always add redundancy, health checks, or self-healing for the load balancer itself.
Using verbs in REST resource URLs (e.g., /getProducts, /deleteUser) instead of plural nouns with proper HTTP methods.
Returning all results from a list endpoint without pagination — this wastes server bandwidth and degrades both server and client performance at scale.
Confusing JWT (a token format) with an authentication method, or calling OAuth2 an authentication system when it is an authorization framework.
Selecting gRPC for browser-to-server communication — most browsers do not support HTTP/2 fully; gRPC is best reserved for server-to-server (microservice) communication.
Using HTTP polling for real-time features instead of WebSockets — polling wastes bandwidth, increases latency, and consumes server resources unnecessarily.
Building deeply nested GraphQL queries without a query depth limit — this opens the API to denial-of-service via maliciously complex queries.
Omitting API versioning in REST — without /v1/, /v2/ prefixes, breaking changes in the backend immediately break all existing clients.
Skipping the trade-off articulation step — naming the right technology without explaining what you gain and give up is a junior-level answer, not a senior-level one.

// What key terms do I need to know for system design?

Single Point of Failure: Any component in the system whose failure alone causes the entire system to go down. Must be eliminated through redundancy, health checks, or self-healing systems.
Web Tier: The layer of the architecture that handles incoming web and mobile traffic. Separated from the Data Tier to allow independent scaling.
Data Tier: The layer of the architecture responsible for managing the database. Separated from the Web Tier to allow independent scaling.
Vertical Scaling (Scale Up): Adding more resources (RAM, CPU) to an existing single server. Simple but has a hard resource cap and no redundancy.
Horizontal Scaling (Scale Out): Adding more servers to share the load. Preferred for high-traffic applications; offers fault tolerance and theoretically unlimited growth.
Round Robin: Load balancing algorithm that routes each new request to the next server in sequential rotating order. Best for servers with similar specifications.
Least Connections: Load balancing algorithm that routes traffic to the server with the fewest active connections. Best for variable-length sessions.
Least Response Time: Load balancing algorithm that routes to the server with the lowest response time and fewest active connections. Best when servers have different performance capabilities.
IP Hash: Load balancing algorithm that hashes the client's IP address to consistently route the same client to the same server. Used when session affinity is required.
Consistent Hashing: Load balancing algorithm that places servers and clients on a hash ring; a client is routed to the nearest server on the ring. Ensures session affinity and graceful handling of server additions/removals.
Health Check: A continuous probe the load balancer sends to servers to determine availability. Failed health checks cause the load balancer to stop routing traffic to that server until it recovers.
ACID: The four properties of SQL database transactions: Atomic (all-or-nothing), Consistent (valid state to valid state), Isolated (concurrent transactions don't interfere), Durable (data persists even after failure).
Document Store: A NoSQL database type (e.g., MongoDB) that stores data as JSON-like documents, allowing complex nested data structures within a single record.
Wide-Column Store: A NoSQL database type (e.g., Cassandra) that stores data in tables with dynamic columns. Optimised for massive write throughput and scale.
Key-Value Store: A NoSQL database type (e.g., Redis) that stores data as key-value pairs primarily in RAM. The fastest read/write database type due to in-memory storage.
Graph Store: A NoSQL database type (e.g., Neo4j) that models entities and their relationships as graph nodes and edges. Used for recommendation engines and relationship-heavy data.
REST (Representational State Transfer): An API style using resource-based URLs, standard HTTP methods (GET, POST, PUT, PATCH, DELETE), stateless requests, and fixed response structures. The most common API style for web and mobile applications.
GraphQL: An API query language with a single endpoint where the client specifies the exact shape and fields of the response. Eliminates over-fetching and reduces round trips for complex UIs. Created by Facebook.
gRPC (Google Remote Procedure Call): A high-performance RPC framework using Protocol Buffers and HTTP/2. Best for microservice-to-microservice communication; not suitable for most browser-facing APIs due to HTTP/2 client support requirements.
WebSockets: A protocol enabling persistent, bidirectional connections between client and server after an initial HTTP handshake. Enables the server to push data to the client without the client polling. Used for real-time features like chat and live notifications.
AMQP (Advanced Message Queuing Protocol): An enterprise messaging protocol used for asynchronous message queuing between a producer (publisher) and a consumer (processor). The message broker holds messages in queues until the consumer is ready, decoupling system components.
TCP (Transmission Control Protocol): A transport layer protocol guaranteeing ordered, reliable delivery of all packets via a three-way handshake. Required for payments, authentication, and any data where loss is unacceptable. Slower than UDP.
UDP (User Datagram Protocol): A transport layer protocol that sends packets without delivery guarantees, ordering, or handshaking. Faster and lower-overhead than TCP. Acceptable for video calls, gaming, and live streams where some packet loss is tolerable.
Three-Way Handshake: The TCP connection establishment process: client sends SYN → server responds SYN-ACK → client responds ACK. Only after this is data transmission permitted.
Overfetching: A REST API problem where the client receives more data than it needs for a given view, wasting bandwidth. GraphQL solves this by letting the client specify exactly which fields to return.
Pagination: The practice of returning data in pages (using page+limit, offset+limit, or cursor parameters) rather than returning all records at once. Mandatory on all list endpoints for performance and bandwidth efficiency.
Self-Healing System: An architectural pattern where a failed component (e.g., a load balancer) is automatically detected and replaced with a fresh instance, preventing service interruption.
Contract-First Design: An API design approach where the request/response contract is defined before implementation begins. Common in interviews and collaborative team environments.

// FREQUENTLY ASKED QUESTIONS

What is the Simonyan System Design Architecture Skill?

It is a structured, ten-step methodology that guides you from a single-server baseline to a fully scalable production architecture. Developed from Hayk Simonyan's system design teaching, the framework covers tier separation, database selection, horizontal scaling, load balancing, API design, caching, redundancy, and trade-off articulation. It is designed for system design interviews, architectural reviews, and real-world infrastructure planning.

What is the single-server baseline in system design?

The single-server baseline is a starting architecture where one server handles all traffic, application logic, and database operations for a single user. You trace the full request flow — DNS resolution, HTTP request, server processing, response — before adding any complexity. The purpose is to understand every component interaction first, because every complex system is simply an evolved simple one.

How do I choose between SQL and NoSQL databases in system design?

Choose SQL (PostgreSQL, MySQL) when your data is well-structured, relationships are clear, and you need ACID transactional integrity — such as banking or e-commerce orders. Choose NoSQL when data is unstructured or semi-structured, you need super-low latency, or you require flexible schemas at massive scale. NoSQL sub-types include document stores (MongoDB), wide-column stores (Cassandra), key-value stores (Redis), and graph stores (Neo4j).

How do I pick the right load balancing algorithm?

Match the algorithm to your server pool and use case. Use Round Robin for equal-spec servers. Use Least Connections when session lengths vary. Use Least Response Time for heterogeneous server performance. Use IP Hash or Consistent Hashing when you need session affinity. Use Geographic load balancing for global services to reduce latency. Always configure health checks so failed servers are removed automatically.

How does REST compare to GraphQL and gRPC for API design?

REST is best for standard web and mobile apps — it uses resource-based URLs, HTTP methods, and stateless requests. GraphQL is ideal for complex UIs needing flexible, precise data queries with minimal round trips, but requires query depth limits. gRPC excels at high-performance microservice-to-microservice communication using Protocol Buffers over HTTP/2, but is unsuitable for browser-facing APIs. Each has trade-offs in simplicity, performance, and client compatibility.

When should I use the Simonyan System Design framework?

Use it whenever you need to design a new system or evaluate an existing one for scalability, reliability, and performance. This includes system design interviews at FAANG-level companies, architectural reviews at your current job, greenfield project planning, or onboarding onto a complex codebase. The framework is especially valuable when you need to make and communicate defensible trade-off decisions.

What results can I expect from applying this system design framework?

You can expect to produce architectures that are scalable, fault-tolerant, and clearly communicated. In interviews, you will deliver senior-level answers by explicitly stating trade-offs rather than just naming technologies. On the job, you will make defensible decisions about databases, APIs, scaling strategies, and redundancy. The framework ensures you never skip critical steps like pagination, API versioning, or single-point-of-failure elimination.

What are the most common system design mistakes to avoid?

The most common mistakes include jumping to complex architecture before establishing a single-server baseline, choosing vertical scaling as a long-term strategy, forcing unstructured data into SQL databases, treating the load balancer as infinitely reliable, using verbs in REST URLs instead of plural nouns, skipping pagination on list endpoints, and omitting trade-off articulation. Each of these signals a junior-level understanding of system design.

How do WebSockets differ from HTTP polling for real-time features?

WebSockets establish a persistent, bidirectional connection after an initial HTTP handshake, allowing the server to push data to the client instantly. HTTP polling forces the client to repeatedly ask the server for updates at fixed intervals, wasting bandwidth, increasing latency, and consuming unnecessary server resources. Always prefer WebSockets for real-time features like chat, live notifications, and live feeds.

What does it mean to articulate trade-offs in system design?

Articulating trade-offs means explicitly stating what you gain and what you give up with every architectural decision. For example, gRPC gives lower latency between services but requires HTTP/2 support, making it unsuitable for browsers. Eventual consistency in a recommendation engine is acceptable because slightly stale results are better than high latency. This practice is what separates senior-level from junior-level system design answers.

// GET THIS SKILL — FREE