Introduction
What frustrates me the most about the system design interview is how poorly specified it is. Every definition of it I have seen is circular: the system design interview is one where you must design a real-world system at internet scale.
Any questions?
Poorly specifying the format of this interview makes it very difficult to prepare. You are simply left to review sample solutions, notice a pattern, and then hopefully imitate the underlying structure correctly in real interviews. Whoever prepared the sample solutions is probably also operating according to the same guidance, and the result is a recursive series of imitations diluting any educational value.
The system design interview requires the interviewee to provide a draft of a minimum viable product which generally covers the following areas:
- requirements gathering
- estimations
- APIs
- data model
- deep dive
- system architecture
In addition, the interviewer will also try to assess the depth of your technical knowledge and ability to recognize tradeoffs.
The goal of the interview is to provide breath on the important aspects of system design while leaving time for the interviewer to probe in depth about particular topics of their choice. For each section that involves design, there are almost always tradeoffs for each approach. You are to identify each approach, their tradeoffs, and pick the best option for your requirements.
When new systems are built, similar system design documents are created. The main difference is that real system design documents tend to include a lot more details because they written in more than 45 minutes and they are reviewed by a team of managers / engineers who will be responsible for implementing it. As system design documents tend to be created by more senior engineers leading projects, this interview tends to become more important for more senior engineering positions.
Details on each section
Requirements gathering
The first thing to understand about the system design interview is that there is no one-shot prompt. In a coding interview, the entire question can be specified in a single prompt and there is no actual need to interact with the interviewer to complete the question. This is partly why it’s easier to practice coding: you can practice coding in isolation.
System design starts with a vague prompt (e.g. “design google”) and requires interaction with the interviewer to scope out some minimal set of functional requirements. Functional requirements are exactly what the system is expected to do and no more. When you scope out the requirements, you are expected to take a pro-active approach. It is for this reason that system design typically covers services you are already familiar with, often consumer products. Taking a pro-active approach will mean you propose the functional requirements and only let the interviewer correct you on the details or provide guidance on what is and isn’t out of scope.
The reason for the interactive nature of system design is that it simulates real-world requirements gathering process with customers / stakeholders. The more senior your position is, the more scrutiny any committee will pay to your requirements gathering. Your functional requirements must completely determine system behavior and all the remaining parts of this interview are determined by this section.
Functional requirements can get incredibly detailed. For example, if you are designing a search engine:
- what results are you returning? Images, web pages, documents, emails, something else?
- what character encoding are you working with (ascii, unicode, something else)?
- should you restrict your searches by language? (e.g. “pain” means “bread” in french)
- how do you handle similar characters like ï and i?
- how should you rank the results? By relevancy? Chronologically?
- do you have bandwidth constrained users? If so should you not return images?
- If you are returning images, what image formats do you need to support?
- should results be filtered by some property, like language or feature (e.g. restaurants)?
- should returned results be truncated or compressed in some way?
That said functional requirements is not what makes the system design interview challenging. Ultimately the functional requirements can just be satisfied by a set of APIs on a single web server with a database. The non-functional requirements are what makes the interview challenging. Non-functional requirements are like metadata attributes to describe how the functional requirements are accomplished. Low latency, high availability, scale, consistency requirements, data residency requirements, users with constrained bandwidth, security, etc. These non-functional requirements require a high degree of optimizations that are not intuitive or trivial to implement.
Estimates
Estimates is the least important section of system design and sometimes the interviewer is ok with skipping it entirely. For estimates you are to calculate some important performance requirements of the system using some guesswork for unknown variables. The most important performance properties tend to be:
- requests per second (may be divided into reads / writes per second)
- inbound bandwidth per second
- outbound bandwidth per second
- storage per year
Sometimes the interviewer will ask you to estimate some hardware requirements, such as how many servers you need, how much RAM you need, etc.
You will generally compute these estimates with Fermi calculations. The interviewer just wants to know that you are capable of making reasonable system estimates and it’s important not to dwell too much on this section in the interest of time. Almost regardless of the numbers you compute, your system design will be the same.
APIs
APIs are abstract programming interfaces, and they are how your clients programmatically interact with your system. You will need to specify what kinds of APIs you provide (library, a web service, something else), and the APIs themselves. Each API will have some some set of inputs and outputs.
The important thing to keep in mind is that your APIs must totally satisfy your functional requirements. It is not enough to specify just the most commonly called APIs; they must all be there. The parameters must also be complete.
That said, authentication and user signup is typically out of scope for a set of APIs.
Data Model
This is mostly about you specifying all the data entities you want to keep track of and writing their relationships in Third Normal Form (3NF). Third Normal form was developed for relational databases to avoid common problems like data duplication and data anomalies.
Again your data model must have enough information to meet all of your functional requirements. The data model tends to be the simplest section since it’s most about identifying the data entities and their relationships (1:1, 1:N, N:M, etc.).
The data model doesn’t require you commit to a particular database such as relational or key-value store.
Deep dive
There is typically some important design decision that is not covered in the other sections. If you are designing a rate limiter, what rate limiting algorithm will you use (e.g. leaky bucket, token-based, sliding window, fixed window)? If you are implementing search, what indexing structure will you use (inverted index, prefix tree, quad tree, etc)?
Most system design questions tend to have one very important design decision and that tends to be the focus on this section. The interviewer will often probe for more and more detail about how the implementation works to test the limit of your knowledge. Details about the implementation often have important performance implications. For example, can you cache your indexing structure in RAM or do you need to use disk/object storage? In a global service with multiple data centers, do you have a single global rate limiter or one at each data center? And where in your architecture do you even put your rate limiter?
This section may also include your choice of database: relational, key-value, wide-column, etc. Different database types have different performance properties, and you are expected to pick one which best meets your expected data access patterns. For example, if you have a high rate of writes to reads, you might choose wide-column store. If you need ACID properties, you might choose relational.
System architecture
System architecture is the section that people typically associate with system design. This is usually the final section. You will be asked to draw a diagram of system components: typically microservices and undifferentiated cloud services. The system components are expected to have arrows designating either the flow of information or interactions between separate components.
Your microservices are typically expected to represent specific tasks that your system performs. It is broadly similar to class design except services are a higher level of abstraction. Ideally each microservice will do one specific task very well and is allowed to communicate with other microservices or components.
The undifferentiated cloud services are things like load balancers, CDNs, caches, object storage, API Gateways, message queues, etc. They are fundamental building blocks of any distributed service and things that are so commonly used in internet architectures that you can just leverage existing solutions instead of trying to design your own. The only exception to using undifferentiated services is when you are building the thing. If you are building a message queue you of course can’t use AWS Simple Queue Service.
Again, you system architecture must meet your functional requirements. That part is easy. But it must also meet your non-functional requirements. If you have high scale, you may need your services to scale horizontally as individual hardware components aren’t performant enough on their own. If you need low latency, you may need to pre-compute data, make use of caching, or use an indexing structure for fast data access. If you expect spikey traffic patterns and need high availability, you should make use of message queues for services with relaxed latency requirements.
What makes system design challenging is that there is a large design space and the accepted one makes what is considered the best tradeoffs. Everything in system design is considered about tradeoffs of some important properties (e.g. consistency, latency, availability, etc.) as there is rarely a single design that maximizes every desirable property simultaneously.
Your interviewer will typically ask you to explain the request flow for a given scenario. This is to verify that your system meets its functional requirements and you understand how the components interact. Sometimes it is to draw your attention to a flaw in your design that you are expected to recognize with enough probing. They will often ask about optimizations, or probe you to consider alternative designs by asking about edge cases or additional requirements.