The Devil's Compiler

February 4, 2024

Genius as a search problem

Great men and women of history are a subject of fascination. When their greatness is due to the quality, novelty, and impact of their ideas, they are often called geniuses. Limiting the concept of genius to purely academic / research / engineering, we can define genius as being the first to seriously explore high impact ideas.

Ideas don’t exist in isolation and are connected to other ideas via reasoning such as deduction, inference, intuition, etc. In this sense we can envision the world of ideas as a large graph. Nodes are ideas. Edges are the transition of one idea to another in a chain of thought via some reasoning mechanism. Not all nodes need to be true ideas, and not all edges need to be logically consistent transitions. But out there in the vast universe of idea space are ideas yet unexplored which have great consequence at an individual scale and at societal scale.

Now we can think of genius as a search problem. Historical geniuses achieve the title in retrospect for having seriously explored a cluster of idea space that was previously unknown and highly impactful.

Search problems in graphs are a familiar problem in computer science and mathematics and have been studies for many decades, if not centuries. We can reason a bit about factors that increase the chances of exploring these new frontiers of idea space:

Where you start in idea space.
The search process.
The total amount of idea space explored.

These aspects are multi-dimensional (more than one thing can influence it). Some of it may depend on chance, creativity, or innate ability, and to explore idea space in a way of historic significance perhaps requires that. However, there are still factors within the control of the everyman. A non-exaustive list might include:

You can choose where to start.
You can choose the effort or persistence you put into exploring idea space.
You can choose the tools available to explore idea space.

If the goal were to explore beyond the frontier of idea space, you might start on a problem that nobody has seriously explored, perhaps because it’s not thought to be possible, practical, or profitable. Hard work and persistence give you the best chance of exploring the most you can in the limited time you have. The tools you might use may be things like new technologies, existing technologies used in new ways, new data, new theories, etc. The tools you use can inform your search process by allowing you to traverse edges in idea space that might not otherwise be possible.

If this search problem ideation is correct, it naturally suggests the pitfalls to avoid. Such a scenario might look like starting on a high-visibility problem that is thoroughly explored (e.g. I’ve heard todo list apps and dating apps are very common software startups). The problem may have no or weak personal significance, making it hard to persist after so many failures, when nobody believes in you, and there is little financial incentive to keep trying. And there may be no differentiator in the toolset used compared to those who have come before you and failed.

I think it’s likely difficult to know where the frontier of idea space lies. For all we know the frontier of idea space may be all around us, limited only by capabilities of the human brain. Everyone one super-human deduction away from something extraordinary. Who could have guessed that classical blackbody radiation would be so close to the frontier of quantum mechanics, or number theory so close to the frontier of cryptography? That said, the most predictable frontiers are likely where the problems are. Problems themselves represent a lack of adequate solutions. Either the laws of physics define the boundary of what a solution might look like, discoverable in idea space, or the solutions exist, also discoverable in idea space.

There’s much that can be written about this search problem but my goal was to articulate impactful ideation as a search problem. If there is one inference that I think should be made, it’s that we should think strategically about how we might push past existing frontiers in idea space to solve problems. The gap all problems and their solutions is just a collection of paths in idea space.

January 29, 2023May 28, 2023

What is the system design interview?

Introduction

What frustrates me the most about the system design interview is how poorly specified it is. Every definition of it I have seen is circular: the system design interview is one where you must design a real-world system at internet scale.

Any questions?

Poorly specifying the format of this interview makes it very difficult to prepare. You are simply left to review sample solutions, notice a pattern, and then hopefully imitate the underlying structure correctly in real interviews. Whoever prepared the sample solutions is probably also operating according to the same guidance, and the result is a recursive series of imitations diluting any educational value.

The system design interview requires the interviewee to provide a draft of a minimum viable product which generally covers the following areas:

requirements gathering
estimations
APIs
data model
deep dive
system architecture

In addition, the interviewer will also try to assess the depth of your technical knowledge and ability to recognize tradeoffs.

The goal of the interview is to provide breath on the important aspects of system design while leaving time for the interviewer to probe in depth about particular topics of their choice. For each section that involves design, there are almost always tradeoffs for each approach. You are to identify each approach, their tradeoffs, and pick the best option for your requirements.

When new systems are built, similar system design documents are created. The main difference is that real system design documents tend to include a lot more details because they written in more than 45 minutes and they are reviewed by a team of managers / engineers who will be responsible for implementing it. As system design documents tend to be created by more senior engineers leading projects, this interview tends to become more important for more senior engineering positions.

Details on each section

Requirements gathering

The first thing to understand about the system design interview is that there is no one-shot prompt. In a coding interview, the entire question can be specified in a single prompt and there is no actual need to interact with the interviewer to complete the question. This is partly why it’s easier to practice coding: you can practice coding in isolation.

System design starts with a vague prompt (e.g. “design google”) and requires interaction with the interviewer to scope out some minimal set of functional requirements. Functional requirements are exactly what the system is expected to do and no more. When you scope out the requirements, you are expected to take a pro-active approach. It is for this reason that system design typically covers services you are already familiar with, often consumer products. Taking a pro-active approach will mean you propose the functional requirements and only let the interviewer correct you on the details or provide guidance on what is and isn’t out of scope.

The reason for the interactive nature of system design is that it simulates real-world requirements gathering process with customers / stakeholders. The more senior your position is, the more scrutiny any committee will pay to your requirements gathering. Your functional requirements must completely determine system behavior and all the remaining parts of this interview are determined by this section.

Functional requirements can get incredibly detailed. For example, if you are designing a search engine:

what results are you returning? Images, web pages, documents, emails, something else?
what character encoding are you working with (ascii, unicode, something else)?
should you restrict your searches by language? (e.g. “pain” means “bread” in french)
how do you handle similar characters like ï and i?
how should you rank the results? By relevancy? Chronologically?
do you have bandwidth constrained users? If so should you not return images?
If you are returning images, what image formats do you need to support?
should results be filtered by some property, like language or feature (e.g. restaurants)?
should returned results be truncated or compressed in some way?

That said functional requirements is not what makes the system design interview challenging. Ultimately the functional requirements can just be satisfied by a set of APIs on a single web server with a database. The non-functional requirements are what makes the interview challenging. Non-functional requirements are like metadata attributes to describe how the functional requirements are accomplished. Low latency, high availability, scale, consistency requirements, data residency requirements, users with constrained bandwidth, security, etc. These non-functional requirements require a high degree of optimizations that are not intuitive or trivial to implement.

Estimates

Estimates is the least important section of system design and sometimes the interviewer is ok with skipping it entirely. For estimates you are to calculate some important performance requirements of the system using some guesswork for unknown variables. The most important performance properties tend to be:

requests per second (may be divided into reads / writes per second)
inbound bandwidth per second
outbound bandwidth per second
storage per year

Sometimes the interviewer will ask you to estimate some hardware requirements, such as how many servers you need, how much RAM you need, etc.

You will generally compute these estimates with Fermi calculations. The interviewer just wants to know that you are capable of making reasonable system estimates and it’s important not to dwell too much on this section in the interest of time. Almost regardless of the numbers you compute, your system design will be the same.

APIs

APIs are abstract programming interfaces, and they are how your clients programmatically interact with your system. You will need to specify what kinds of APIs you provide (library, a web service, something else), and the APIs themselves. Each API will have some some set of inputs and outputs.

The important thing to keep in mind is that your APIs must totally satisfy your functional requirements. It is not enough to specify just the most commonly called APIs; they must all be there. The parameters must also be complete.

That said, authentication and user signup is typically out of scope for a set of APIs.

Data Model

This is mostly about you specifying all the data entities you want to keep track of and writing their relationships in Third Normal Form (3NF). Third Normal form was developed for relational databases to avoid common problems like data duplication and data anomalies.

Again your data model must have enough information to meet all of your functional requirements. The data model tends to be the simplest section since it’s most about identifying the data entities and their relationships (1:1, 1:N, N:M, etc.).

The data model doesn’t require you commit to a particular database such as relational or key-value store.

Deep dive

There is typically some important design decision that is not covered in the other sections. If you are designing a rate limiter, what rate limiting algorithm will you use (e.g. leaky bucket, token-based, sliding window, fixed window)? If you are implementing search, what indexing structure will you use (inverted index, prefix tree, quad tree, etc)?

Most system design questions tend to have one very important design decision and that tends to be the focus on this section. The interviewer will often probe for more and more detail about how the implementation works to test the limit of your knowledge. Details about the implementation often have important performance implications. For example, can you cache your indexing structure in RAM or do you need to use disk/object storage? In a global service with multiple data centers, do you have a single global rate limiter or one at each data center? And where in your architecture do you even put your rate limiter?

This section may also include your choice of database: relational, key-value, wide-column, etc. Different database types have different performance properties, and you are expected to pick one which best meets your expected data access patterns. For example, if you have a high rate of writes to reads, you might choose wide-column store. If you need ACID properties, you might choose relational.

System architecture

System architecture is the section that people typically associate with system design. This is usually the final section. You will be asked to draw a diagram of system components: typically microservices and undifferentiated cloud services. The system components are expected to have arrows designating either the flow of information or interactions between separate components.

Your microservices are typically expected to represent specific tasks that your system performs. It is broadly similar to class design except services are a higher level of abstraction. Ideally each microservice will do one specific task very well and is allowed to communicate with other microservices or components.

The undifferentiated cloud services are things like load balancers, CDNs, caches, object storage, API Gateways, message queues, etc. They are fundamental building blocks of any distributed service and things that are so commonly used in internet architectures that you can just leverage existing solutions instead of trying to design your own. The only exception to using undifferentiated services is when you are building the thing. If you are building a message queue you of course can’t use AWS Simple Queue Service.

Again, you system architecture must meet your functional requirements. That part is easy. But it must also meet your non-functional requirements. If you have high scale, you may need your services to scale horizontally as individual hardware components aren’t performant enough on their own. If you need low latency, you may need to pre-compute data, make use of caching, or use an indexing structure for fast data access. If you expect spikey traffic patterns and need high availability, you should make use of message queues for services with relaxed latency requirements.

What makes system design challenging is that there is a large design space and the accepted one makes what is considered the best tradeoffs. Everything in system design is considered about tradeoffs of some important properties (e.g. consistency, latency, availability, etc.) as there is rarely a single design that maximizes every desirable property simultaneously.

Your interviewer will typically ask you to explain the request flow for a given scenario. This is to verify that your system meets its functional requirements and you understand how the components interact. Sometimes it is to draw your attention to a flaw in your design that you are expected to recognize with enough probing. They will often ask about optimizations, or probe you to consider alternative designs by asking about edge cases or additional requirements.

December 25, 2022

Exponential trends for fun and profit

If there was decision point to inform career and investment choices I would recommend understanding contemporary exponential trends or technology trends. By exponential trends I am mostly referring to changes generally produced by new technology be it the size of a user base, production/service costs, or something else. Making the right decision here could be the difference between starting your career at BlockBuster or Netflix in 2002. Or investing in Sears rather than Amazon in 2002.

Technology innovation is often a precursor for major historical events that remake social and economic modes of organization. Imagine how different the world would look with the combustion engine, gunpowder, steel, computers, printing press, and internet. The introduction of a new technology alone can reshape economic futures. Whaling used to be a major market because blubber was used for lamps, but that entire profession has disappeared because of the use of petroleum. Then again entire industries around oil lamps disappeared because of the introduction of electricity and the incandescent light bulb. Even still incandescent light bulb manufacturers faced economic peril if they didn’t eventually migrate to manufacturing more energy efficient bulbs like LED. Each innovation created and destroyed fortunes, though at each transition more value was created than lost as the cost structure improved.

When invention first ushers in the beginning of disruption, technology adoption tends to happen more slowly than people expect. The birth of the internet is debatable depending on how we define it, but let’s say 1990. Even in 2022, only 60% of the world has internet access globally. I think a good rule of thumb today is about 30 years to go from inception to mass adoption (maybe 60-80%).

Not shown in this graph are technology disruption cycles. Print advertising like newspapers and magazines has been disrupted for online advertising like Facebook and Google. For print advertisers, the years 2000 – 2010 erased the prior 50 years of revenue gains. For each technology adoption cycle there is some complimentary technology disruption cycle, as each new technology is ultimately replacing some less efficient solution. For example, Uber’s rise came at the expense of Taxis and Airbnb’s rise came at the expense of hotels.

I don’t have good data on the length of technology disruption cycles but I’ll estimate 10-20 years. Disruption cycles tend to be shorter than adoption cycles because adoption cycles tend to create new markets, in addition to disrupting existing ones.

There are many active technology adoption cycles ongoing today. You are doubtlessly part of this either directly by using some emerging consumer technology (e.g. online dating apps) or indirectly by prioritizing cost/performance in your purchase decisions (e.g. buying a phone with a long battery life). Exponential trends tend to precede widespread technology adoption or act as precursors to other forms of technology adoption. The increasing ability to send more and more data over the internet (bandwidth), observable in the 90s, was a harbinger for the eventual spread of high-bandwidth internet applications such as streaming video and video conferencing.

So if we want to identify future technology adoption cycles, we can just identify existing exponential trends. Then we can reason about the where these exponential trends might lead us.

A non-exhaustive list of exponential trends today:

Moore’s law

I mention Moore’s law because it’s the most well-known example of an exponential trend, even though it’s mostly no longer true today. It concerns the number of transistors that can be placed on a computer chip known as a CPU. More transistors = more compute. For decades computers were getting so fast that old computers were almost immediately obsoleted in compute power. This pattern eventually tapered off due to physical limits. More transistors = more friction = more heat. If the computer chips get too hot, the transistors get damaged. So heat dissipation prevented making computers faster by just making transistors smaller and stuffing them into the same chip.

Nielsen’s law

Nielsen’s law concerns the bandwidth, or the rate of data transmission, in an average consumer internet connection. With low bandwidth it might only make sense to send text, so applications like email. With more bandwidth you might be able to download/upload images and maybe short video (social media). With more bandwidth still you could stream long-form video (youtube, netflix, and Zoom). With even more bandwidth you could stream high-bandwidth video in real time (think VR). This progression is predictable and it will end when physical limits are hit in how much data can be streamed over fiber optic cables, but we are nowhere near this limit.

Huang’s law

GPUs are computing devices that are really efficient at parallelization. There are a few applications for such kinds of compute: gaming, deep learning, and scientific computing. Of particular importance is deep learning, which you might know as artificial intelligence. As GPUs get exponentially more efficient at a fixed price point, the cost to train a deep learning model decreases exponentially. That is, artificial intelligence training is largely GPU-compute bound. Unless physical limits are hit soon, you can expect artificial intelligence to become dirt cheap.

Energy density of Lithium Ion Batteries

Li-Ion batteries are what powers your smart phone and your laptop. They’re the predominant battery type for consumer electronics and electric vehicles. Higher energy density means smaller batteries. Smaller batteries means lighter batteries, or if you prefer, longer battery life with a constant size. Unless this trend his physical limits soon, we can expect such things as:

phones you charge once a week
cars you charge once a week
consumer drones that can fly for hours at a time (currently flight time is up to 20 min)
potentially electric planes

Decrease in storage costs

HDD and SSD are two types of computer storage drives. HDD is older and uses a spinning magnetic disk. SSD is newer and has no moving parts. As storage becomes cheaper, it means we can afford to store more things for the same cost. You might have used this trend to predict the rise of cloud storage (think of saving your phone’s images on Apple’s servers). You could predict the rise in telemetry (monitoring some data point such as temperature) and consequently the rise in BigData (analytics on the vast quantities of historical telemetry/data). As this trend continues, you can expect to see longer historical record keeping, such as car dashcams never deleting data. And consequently this high-quality data can be used for other purposes such as training self-driving cars. This trend is analogous to the storage trends that followed the decreasing cost of paper.

Growth in money supply from central banking

Today’s national currency comes from central banks, like the Federal Reserve. Central banking is a relatively new phenomena in human history, only being around for the last 100 years or so. With central banking, an institution actually has the ability to print money. As history proves, governments with the ability to print money tend to operate their economies on credit and are rarely stingy. There are many interpretations you can make from an exponentially increasing money supply, and one of them is that the exchange rate for less inflationary assets like stock or gold will increase in nominal values.

Cost to send a kg into space

Space is a pretty expensive place to send materials. The main thing we send into space today are satellites (think GPS or weather satellites). As costs decrease, we can send more material. That might mean bigger, more sophisticated satellites. Or it could mean a new form of travel even faster than air travel. The longest flight in the world is New York to Singapore at 18 hours 30 minutes. With a rocket this could be done in less than an hour. It’s technologically possible today, just too expensive for anyone to do. That is, until it isn’t. Cheap transport of materials into space opens up new opportunities for high-precision manufacturing (dust in the air can result in impurities) and research I can only imagine about.

Cost of DNA sequencing

Large classes of disease are hereditary. A DNA test could identify genetic diseases immediately, or responses to certain drug classes, or propensity to develop certain diseases later in life (e.g. diabetes, cardiovascular, addiction). But before we can even get to that point, there needs to be more genomic research, and that’s only possible with more data (DNA sequencing patients with known diseases). Both both genomics research and clinical practice, cheap DNA sequencing is needed. DNA sequencing is currently around the $600 price range, which is still not economical at scale. That said, these falling costs could be used to predict the rise of companies like 23andMe which do partial DNA sequencing.

Final words

As I said this is a non-exhaustive list of trends and I’ve only chosen some of my favorite examples. Even to one trend there are often many dimensions. Li-Ion batteries aren’t just becoming more energy dense, they’re also becoming cheaper with better performance characteristics like recharging speed. There are many important trends I’ve probably missed that should have made my list too. And these trends need not be purely about technology. You could just as well make a chart with fentanyl overdoses and predict its emergence as a national issue years ahead of schedule. These trends need not be exponential, either, but exponential trends naturally tend to have the largest impact.

There were a huge number of historical exponential trends I could identify even with cursory research. Imagine the decreasing cost of rail transport, or decreasing accident rate from elevators / airplanes and how they enabled skycrapers / air travel, or the decreasing costs of lighting. There are simply too many to list and it would only be scratching the surface. The point being that these trends are everywhere, all the time. Especially the moment you read this sentence. The exact exponential trends will change but the cycle of technology adoption changing economic and social modes of organization will continue. Identifying technology adoption cycles ahead of schedule by monitoring exponential trends can be huge beneficial for fun and profit.

April 9, 2022

What differentiates a really good doctor

I’ve seen a lot of doctors in my time. What makes my experience different is the lengths I went to source them. I’ve travelled to different states, countries, and continents to see them. Sometimes they needed to be the best I could find, for any price and any distance. I’ve done this several times for different problem domains. As you can imagine, there’s a lot of diversity in how good they were, or to put it a different way, how useful they were. These meetings were usually just to discuss a single service. Could they do it, or not?

The best doctors were able to render services outside of something called the standard of care. The standard of care is just medical consensus for how doctors should approach various problems, and it tends to lag leading medical research by 20+ years. There’s a joke in medical school that 50% of what’s in medical texts (read: standard of care) is wrong, but we just don’t know which parts. Well, the best doctors tend to be medical researchers and they know which parts are wrong in their research domain. Instead of administering existing treatments they administer experimental ones. They try things that nobody else will, so they get results that nobody else gets. This could mean a curative treatment, or a drastic change in some important outcome (e.g. mortality rate).

Average doctors follow the standard of care too closely to get above average or below average results. The smarter they are, the more consistently average results. At best they may have earlier awareness of updates to the standard of care, but can never go beyond it. If they happen to also be medical researchers then they’re working on marginal improvements to the standard of care. Questions like whether to administer 4 or 5 days of antibiotics just doesn’t have the same impact as limb reconstruction or nerve decompressions. They may know of more impactful and interesting research problems but deliberately avoid the risk and uncertainty that comes with it. And sometimes they just happen to be overly optimistic about the accuracy of the standard of care, or perhaps not knowledgeable enough to confidently challenge it. They feel maybe other fields of medicine have made obvious, backwards mistakes but probably not their field in this day and age.

The reason average doctors exist and are useful is that dumb patients also exist. They’re the people who naturally aren’t familiar medicine, and also lack good intuition. If they flout the standard of care they often choose a much worse treatment option, like homeopathy for a serious, progressive disease. Sometimes they can get things right, but only because the standard of care got it wrong. It turns out that if the standard of care is wrong, it’s not hard for a patient to beat it. An underappreciated aspect of being a patient is you have much better data quality than doctors do. Doctors read about symptoms from books, but patients experience them. If you’re a patient and you stumble on an effective treatment, you’ll know. But as a doctor, do you really the guy who can’t tell the difference between a ligament and a tendon?

By the existence of dumb patients you might also guess the existence of smart patients. They’re the most competent of all. They have the trifecta of high data quality, medical knowledge, and good intuition. Because they experience the actual pathology, it’s 10x easier for them to identify medical knowledge that’s wrong. Is condition X really caused by stress or lack of sleep? If Y is the cause then Z should also be true, but it’s not. Patients also have 10x more confidence in data quality. Imagine the concept of sun sneezing (aka photonic sneeze reflex). It’s such a bizarre concept, that exposure to sunlight can trigger a sneeze reflex. As a patient you can be assured it’s a real thing. Maybe it’s happened dozens of times. But as a doctor hearing this from a patient? I’m not sure it’d be considered credible without independent confirmation, and even then you’d wonder if it was true. Lastly, patients don’t experience conflicts of interest, or have legal concerns, etc. Doctors may say they act in the best interests of the patient but they quietly practice defensive medicine and accept grant money from pharmaceutical companies. A doctor’s judgment can be easily swayed by external forces, but much less so for a patient.

In summary what differentiates the really good doctors just comes down to the knowledge and judgment to selectively abandon the standard of care and try something better. Effective medical researchers sometimes fall into this category and also smart patients. By doing what nobody else does, they can get results nobody else can get. Sometimes these results are exceptional. Doctors who rigorously adhere to the standard of care get approximately the same results as any other doctor employing the same strategy. Often there is good reason to adhere to the standard of care, but when it’s wrong, the outcomes will never be exceptional. Dumb patients exercise independent judgment like smart patients but lack the knowledge and intuition to outperform the standard of care. By doing what nobody else does, they can also get results nobody else can get. Sometimes these results are poor.

August 14, 2021January 2, 2022

When you don’t know what you don’t know

There’s a certain class of problems that kind of fascinates me and I’d describe them as not knowing what you don’t know. You could say it’s involuntary ignorance or unawareness of common knowledge. It’s something that you don’t know, that everybody else seems to know, and because everybody else seems to know it they assume you couldn’t possibly not know it. If I had to list the features of the problem it’d look like this:

Very hard to discover the solution
Once you discover the solution, fixing the problem is usually trivial
The solutions are always obvious in retrospect
The solutions are already widely known by others

I have several examples of this happening in my life. The first and best example is when I was 10. In school I was failing history tests but doing well on every other subject. A poor grade would be something like < 75/100. I was getting scores in the 20s/100. Basically bottom 1st percentile stuff. The school was calling my parents, suggesting maybe I should be moved to a different school. The school and my parents IQ tested me three times. I underwent a battery of tests, such as neurological, vision, hearing, psychological, etc.

Apparently many thousands of dollars were spent trying to find the underlying problem. The solution came from a child psychologist who figured out that I didn’t understand the concept of studying. I didn’t know I could or was supposed to prepare for these tests by repeatedly reviewing the material so I would memorize it. Nobody had told me about it and I had never done it before. I guess the subject never came up in conversations with friends, or if it did maybe I thought studying meant something different. I really have no idea how other people figured this out on their own. Naturally the solution was to get a tutor and my next score on these tests was 96/100. This issue was immediately resolved and never happened again.

A more relatable and general example is someone who mispronounces words because he learned it on his own by reading.

There’s really no doubt that this kind of problem must be generally pervasive in every functional area of life. Social, academic, professional, health, romantic, etc. For some people, the difference in between success and mediocrity may come down to unawareness of some common knowledge that plays an important functional role.

I don’t really have firm recommendations on detecting of fixing these sorts of problems. The most I can say is that in my experience the people most likely to point out these common-sense solutions are either mentors, trusted friends, or hired consultants. They will generally provide you honest feedback that others wouldn’t for risk of offending you.

May 2, 2021January 2, 2022

A very common interviewing mistake: not practicing on generalized problems

What frustrates me about most advice with respect to interviewing practice is that it’s generally obvious and the target audience is people who are inexperienced. Examples of this advice are things like “make sure your interviewer knows what you’re thinking” or “don’t jump immediately into coding”. There’s no good source of advice for people who are intermediate level or above. One example is the subject of this article. It’s one of the lowest-hanging fruit for issues I see at the intermediate level for the software engineering interviewee.

Before I explain this, I want to provide an example. In calculus everyone learns the concept of a derivative. Conceptually it’s the rate of change at some point. You might be shown that the derivative if x^2 is 2x^1. The formula for the derivative of x^n is n * x^(n-1). You will get so used to these questions that it becomes muscle memory. And then in a homework assignment or a test you need to find the derivative of 2^x. Suddenly you realize your mental model is broken. If the polynomial rule for a derivative doesn’t work here, then what’s the real equation for a derivative?

The problem here is a selection bias on the examples you practiced on. You practiced on easy questions because the textbook author or instructor didn’t want to distract you with the real complexity involved. The easy questions, or maybe just repetitive questions, tend to have a shortcut. You apply the shortcut and now associate the shortcut with the generalized solution. Teaching you how to apply this concept to slightly more complicated examples was “someone else’s job” or “something for you to figure out on your own”. You are left half trained and might not even know it.

This problem was the subject of my favorite book, Mathematical Methods in the Physical Sciences by Mary Boas. In the introduction, she outlined how physics departments don’t have the time to teach students the physics and the math, so they teach the physics and expect the math department to teach the math. But the math department doesn’t do this for various reasons. Her book was to teach applied mathematics at the level of rigor required in the physical sciences. It is essentially a book about more rigorous applied math education for STEM fields.

This problem also exists in a lot of other domains, such as practicing for the software engineer interview. It’s a general flaw of teaching, where repetitive and simplified examples give a distorted view to the underlying structure of the solution. In the spirit of this, I’ll provide you with real, non-trivial examples.

Let’s say you’re given an array of integers are you’re asked to find the k-smallest items. You master that question on Leetcode and you interview. Now you’re asked to find the k closest 2D points to a 2D target point. They are the same question. For each question, you are given some input set of elements, and you are to find the top k of them according to some special distance function. That’s it, but one has a much lower pass rate than the other.

Let’s consider another example. You want to find the square root of a 64-bit double, or something really close to the square root. You might recognize that a binary search could work. You might be surprised by the number of smart, experienced people who end up suggesting a heirarchical binary search (do a binary search to find integer bounds [n, n+1] around the real square root, then a second one over doubles using those bounds), make implementation mistakes as if this binary search was over an integer array (e.g. high = mid – 1 and low = mid + 1 do not make sense in this context for adjusting bounds), or have significant implementation difficulty.

Here’s all you ever needed to know about generalized binary search: if position(x) < position(y) implies f(x) < f(y), then you can use a single binary search to get to your answer regardless of of what position() and f() mean. For a “simple” binary search over an array, position(x) is just the index of x. So this becomes i < j implies array[i] < array[j]. But for computing the square root, it becomes position(x) < position(y) implies x^2 < y^2, which is true. And the generalized code looks like this:

while (left < right):
mid = ComputeMid(left, right)
if CanReturnEarly(mid):
return mid
if GoLeft(mid):
high = mid – offset
else:
low = mid + offset

By overpracticing on simplified examples, some interviewees assume that offset must be 1 (which causes incorrect behavior for my question). They may not realize that offset can be zero if you don’t have to worry about convergence. They may set left/right/mid as integer types which wouldn’t make sense for this problem. Sometimes they struggle with when they can return early, which is usually just a sign they are overwhelmed with new information to struggle on something that was literally provided as part of the question.

The general lesson here is that many of the concepts you learned aren’t specific to the data types or values you practiced with. Unfortunately, instructors/authors/teachers have a bias to teach you using simplified examples that hides the complexity of the actual concept they want you to understand. The tension here is that you need exposure to that complexity if you want to perform at an intermediate or higher level of proficiency.

So what can you do about this? Mostly it comes down to understanding the general structure of the solution, and knowing which part of the general solution is fixed and which parts vary with the inputs. Once you understand this, you start practicing against classes of problems as opposed to specific problems.

March 7, 2021January 2, 2022

Fundamentals of the software engineering interview

Introduction

I’ve been interviewing software engineers professionally for almost a year now. I’ve thought a lot about why some people don’t pass interviews, why others do extremely well, and what makes a good coding/algorithm question. One thing that’s clear to me is that people who do not do well often do not understand fundamental data structures, algorithms, or coding skills. Conversely, people who do extremely well often have a much better than average grasp of the fundamental data structures, algorithms, and coding skills. This post is a non-exhaustive list of what I consider the fundamentals among these three categories. I find that virtually all coding questions require a composition of these skills and knowledge.

Data structures

Arrays
Linked lists (singly linked, doubly linked)
Map/dictionary
Priority queue / heap
Trees
Graphs
Stacks
Queues
Sets

Algorithms

Depth first search traversal
Breath first search traversal
pre-order/post-order/in-order tree traversal
Greedy algorithm
Binary search (and generalized binary search)
Recursion
Dynamic programming
Quicksort
Mergesort
Backtracking
Topological sort
Union find
Divide and conquer

Coding skills (python specific)

Custom sort in one expression (sort with custom comparator)
enums
lexical scoping
deep copy an object
While loops
for loops (forwards and reverse iteration)
enumerate
bitwise logic (&, |, shifting, etc.)
array operations (index, append, extend, remove, find, slicing)
Dict operations (iterate keys, iterate keys+values, iterate values, add, delete, union)
set operations (add, remove, contains, union)
Create an array with range of integers m through n
Use every data structure with arbitrary objects
Create a class
Use class inheritance
Polymorphism
Use an abstract base class
Overload operators (+, -, <, >, ==, etc.)
Overload the __str__() function
Get the memory location / unique id for an object
Convert a string to int/double
All constants (max number, min number, lowercase alphabet, uppercase alphabet, numeric digits, punctuation)
String interpolation
Map/filter/any/all
List comprehension (in 1D, 2D, 3D, etc.)
Lambda functions
Adding type checking to function signature
Type checking during runtime
Caching utils
All language keywords
Reading from a file
Writing to a file
Convert a string to a file-like object (for testing)
try/catch exception handing
Built-in and custom exceptions
bit operations (and, xor, or)
String operations (to lowercase, is prefix, is suffix, find substring index, etc.)
Convert char to ascii and vice versa (e.g. ‘a’ -> 97)
Slicing an object (array / string)
Reverse an array
Iterate a container forward and reverse
Define an iterator for a custom class
Generate a random integer between a and b
Generate a random float between a and b
Datetime operations
Math operations (exponentiation, logarithm, integer division, modulus, min/max/abs/floor/ceil)
Swap two elements of a random access container
Assert statements
Debug logging
Regexes
Invert a min-heap to a max-heap and vice versa
Read user input from the console
Decorators
Parallel programming
Measuring time elapsed
Sleeping for a specified amount of time
Functors

Other

clarifying requirements
code quality
manual testing
creating good test cases
time / space tradeoffs
design patterns

FAANG

Interviewers are the most competitive companies often have a tendency for asking questions that I guess you would call Leetcode hard. Many Leetcode hard questions require prior knowledge of optimizations / techniques / patterns / etc. to solve in 40-45 minutes. In a sense it turns the interview into a mini competitive programming competition. The short version is that there’s an additional level of skills and knowledge beyond the fundamentals which deserves its own category and are usually discussed in guides to competitive programming.

Interval trees
Segment trees
2D prefix sums
Sliding windows
Monotonic queues/stack
Rolling hashes
string processing algorithms
bucket sort / square root decomposition
https://usaco.guide/general/

January 18, 2021

Thinking patterns that make you good in physics and software

I studied physics and later went on to become a self-taught software engineer. There was a fair amount of overlap in between the skills used in those two fields, namely abstract reasoning, short-term memory, and pattern matching. Naturally there were also differences. I also found that what made me good at physics didn’t translate to software. I guess you could say I am naturally a better physicist than I am a software engineer. After five years of software engineering I’ve gotten a much better sense of the thinking patterns that increase productivity. What’s interesting to me is that in each of these fields there’s one way of looking at problems that works very well, but it doesn’t translate to other fields.

The first way to explain what makes you good at physics is actually to explain what differentiates people really bad at physics from people who can perform at even an average level. While physics was straightforward for me, it seemed that it was often extremely challenging for other people. Sometimes I would tutor students struggling with their introductory physics classes and it was always interesting to try to follow their thought process. Many students would just be given a physics problem and have absolutely no idea what to do. This happened least frequently for reciting facts and happened most frequently for open ended problems.

The way to approach open ended problems in physics is to understand all the physical laws related to the problem. You might even call these “first principles”. You generally need some level of pattern-matching to compose these first principles into an equation or expression. Then reducing that expression to the answer usually requires some mathematical facility. That’s pretty much it. Sometimes you’ll be permitted to use approximations, which generally means taking a Taylor series of some expression, and discarding all but the first term. The hardest part of real physics is the math, not using “first principles” and approximations to create an expression. But it’s the first principles thinking that differentiates the skill levels for easy physics problems, because the math is designed not to be that hard.

Software is different. When I first got into software I found it extremely challenging, but not for the reasons you’d expect. Writing my first “hello world” program took many days. I had C source code but no idea how this text somehow “did something”. For some reason, the online tutorials of the day never mentioned the process of compilation or running an executable. I found many simple tasks overwhelming and frustrating, such as uploading a files. I didn’t understand how to make a file “go to another computer”. Unfortunately, the language I used to express the problem was too generic to get search results from google, like with my first hello world program.

Things got much easier for me once I started thinking in terms of APIs. I always understood the “what” but not the “how” of doing things in software. The API answered that question. Instead of asking the vague question of “getting it to do something, to work” I could just ask “what’s the API?” Now when I needed to build something, once I had a sense of what components were necessary, the details just came down to figuring out what the APIs were and interfacing with them properly. This is a flexible way of looking at things because all software interfaces at its boundaries with an API: Databases, networks, compilers, functions, libraries, etc. The useful thinking pattern here was to view all interactions in terms of APIs.

In conclusion first-principles thinking works very well for physics, and thinking in terms of APIs works very well for software. These are very different ways of looking at the world. Different problem domains (e.g. law, politics, medicine) each could have one unique way of looking at the world that makes work much easier but also don’t translate well to other problem domains.

January 10, 2021January 2, 2022

When small differences are meaningful

My father’s office was always a mess. His desk was covered with so many documents in so many layers there was only room to place something on the desk by piling it on top of something else. To an extent I inherited this I also kept messy rooms and desks. I realized this was because I’d treat objects almost as “to do” lists. If I got a letter and didn’t have the time to open it, I’d place it on my desk. I intended to get to it later but didn’t always. Things eventually piled up.

At some point I wondered what would be the easiest way to things organized with the least effort. My guess was to leave things slightly more organized with each interaction. The best way to do this was to clean up one unrelated thing every time I had to find or place something. Over a long enough time period this would bring any arbitrarily messy space to a clean, organized one.

Another way to look at this is that the difference between clean and messy can be extremely small. It’s the guy who throws away his trash and also one piece of litter vs. the guy who doesn’t. Over a long enough time period these differences compound to create highly visible effects. In a physical sense this is about entropy. If entropy is increasing slightly for each time unit, everything will eventually be disordered. If entropy is decreasing slightly for each time unit, everything will eventually be ordered. Let’s call this entropic velocity.

So the difference between positive entropic velocity and negative entropic velocity can create highly visible, divergent, emergent effects over the right time scales. This concept probably only applies to things which are capable of accumulating state over time. If your immune system cannot fully fight off a pathogen, even if only by a little, you’re eventually going to develop serious, potentially life-threatening health problems (e.g. gangrene, sepsis). If you are extroverted and seek out meeting new people, you can grow your networks extremely wide over time. If gravity slightly exceeds the centripetal force from angular rotation, matter can condense into a planetary object.

This phenomena does influence some important human outcomes. If you are earning slightly more than you spend, you can accumulate wealth. If you are saving with compound interest, you can grow wealth exponentially. Over time scales of 30-50 years, the emergent effects can be profound (e.g. top 10% vs bottom 10% of wealth). There are probably other human outcomes for which this is true, and you probably want to be on the right side of entropy for it.

November 29, 2020

Preparing for your interviews — Leetcode is not enough

Introduction

The most common component of the software engineering interview is the algorithmic-style interviews. For algorithm interviews, you are given a problem whose solution is expected to be written in code within 45 min – 1 hour. The goal is not just to write a correct solution, but to solve it with better time/space complexity than a naive solution. You are generally graded on the performance and correctness of your solution.

If you’re preparing for your interviews, the best single resource is probably Leetcode. Leetcode is a platform for practicing algorithm questions marked based on difficulty and tagged by which companies have used them in real interviews. Like other platforms, Leetcode provides the questions and will check the correctness of your solutions according to test cases and run-time limits. Unlike other platforms (e.g. HackerRank), Leetcode’s customers are people practicing for their software engineering interviews.

In my experience interviewing as well as interviewing others, Leetcode is a good first approximation for practice because it implicitly tests for the end-goal of writing a solution that passes a battery of test cases with strict runtime requirements. However there are many components of the software engineer interview which can greatly affect your performance for which no amount of Leetcode practice will help you grow these skills.

What Leetcode is good at

As I mentioned Leetcode or a similar platform is perhaps the single best tool you can use. It will implicitly check for a combination of skills that are necessary to reach the end goal of writing correct code with strict runtime requirements.

Coding

You can’t solve an algorithmic question without being able to code. In this sense, Leetcode will help you answer an important question: are you capable of coding at the most basic level of proficiency?

Algorithmic Analysis

You also can’t solve an algorithmic question without having a solution of the appropriate time complexity. The test cases will often include extremely large inputs, so if you have an O(n^2) solution, it will have very clear timed runtime differences from an O(n log n) solution. In this way, you can clearly see if your solution is timing out of test cases, which would suggest you are not solving the problem with the appropriate time complexity. If you are capable of passing all test cases, this is a sign that you have devised a solution with the appropriate time complexity.

Problem Solving

Naturally algorithm questions aren’t about memorization. They involve higher-level problem solving skills where you combine your knowledge of data structures and algorithmic analysis to come up with a general algorithm which solves the question in the appropriate time complexity.

Data Structures

Without use of the correct data structures you may not be able to solve problems at all. For example, there is no good replacement for priority queues / heaps. Since different data structures have different time complexities for their operations, knowledge of data structures and algorithmic analysis are complimentary skills, both necessary for solving problems.

Edge Cases for Correct Inputs

Code correctness is paramount for any solution. You want your function to work properly for any appropriate input that it may receive. If it well-formed input causes incorrect behavior, then your code is incorrect. Leetcode is pretty good at including test cases that include coverage of edge cases. If you can write code that passes Leetcode’s test cases on the first attempt, you can probably handle edge cases in your interviews.

What Leetcode is bad at

In my experience interviewing as well as interviewing others, there are many aspects of the interview that are not covered by solving Leetcode-style problems, especially at selective companies. This is unfortunate because there is no other resource I am aware of that will help you practice these skills. These things are usually not explicit grading criterion but nonetheless can hinder your progress towards writing correct code within the time constraints.

The thing that ties together all of the things that Leetcode is good at is that it’s easy and scalable. It’s not a coincidence that Leetcode questions involve no ambiguity about correct inputs/ouputs, grading is easily quantifiable, and that everything is extremely structured. Leetcode is not testing the full range of what’s graded in an interview but rather they are testing a balance of what is important and what is easily measured. Leetcode is a business after all, and they implementing features that bring the highest ROI. I say this to underscore that there are clear reasons that Leetcode (or any other platform) exists in exactly the way it does. As you might expect, the things Leetcode doesn’t help you practice with are harder to grade through software in a way that a human grades it:

Requirements Gathering

In a real interview setting the question is often deliberately underspecified. What is expected of you is to properly scope out the requirements of the problem — the inputs, outputs, what to return in special cases, and coming up with good examples to understand the question. In a professional setting these skills are extremely important because mistakes of this nature are so costly. Because this normally happens in a dialog between interviewee and interviewer, it’s not easily automated in code.

Code Quality

Code quality is hard to quantify in an algorithm but easy for humans to understand. Most code in production environments is written once and read many times. Code can often last years, or even decades until it is removed or deprecated. Writing easily understandable code is a valuable skill because there is a shared understanding that clean code is easier to maintain. There are normally many ways to write code and you want to do it in a way that minimizes complexity, state, and lines of code given the constraints of your programming language.

Testing Bad Inputs

Since writing correct code consistently in production environments is an ideal and not a reality, handling error cases is important. Occasionally you may be asked during interviews to deal with these error cases (usually bad inputs), such as error cases and value errors. You may also be asked to handle these errors with exceptions if your language provides exception handling.

Familiarity with the Language

You are expected to understand the language you are interviewing in very well. You need to know how to import the correct libraries, what libraries are available to you, what methods/properties are available for built in classes, string processing, sorting, etc. This extended list is pretty important and I may elaborate on it in another blog post. But the short version is that you need to know your language well enough to never need to look anything up, because you won’t have that opportunity in an interview anyway. And if you don’t know your language well and fail to use built-in functions or libraries, your interviewer won’t be able to disambiguate that from writing unclean code.

Manual Testing

Manual testing is how you test your code when you don’t have access to a compiler. This is an important skill because the first line of defense against introducing bugs is understanding how you code works by looking at it. Due to the high cost of bugs in production environments (in terms of downtime, degrading product quality to customers, engineering time investigating issues, etc.) your ability to find bugs on your own is highly valued. The goal of manual testing is to execute the code like a manual debugger would, going through the code line by line and keeping track of state. Many mistake manual testing to be testing the algorithm design when it is actually about testing the algorithm implementation. This is a very common part of the interview for people to make mistakes, and these mistakes happen when they execute their mental model of what the code is supposed to do rather than executing what the code actually says.

Whiteboarding

In an in-person interview you will likely be coding on a whiteboard and not a computer. There is no good reason for this except network effects — whiteboards were used for interviews when computers were still expensive (or company budgets were low), and now we’re all stuck with it like the QWERTY keyboard. This is regrettable for a number of reasons I outline elsewhere, but what is important to keep in mind is that you are practicing for your interviews on a computer but using a totally different medium for transcribing information in real settings. With whiteboards your transcription speed is slower, you can’t shift lines down, shift characters right, or copy-paste. You will find that whiteboards are far less efficient than keyboards and because of this you will need to practice whiteboarding in a way that minimizes the time spent making edits.

Writing Code from Scratch

When you have access to a compiler, like for many phone interviews, you will often be writing code from scratch. For languages like java where your functions may need to be static public with a main function, this can be an important thing to practice. Make sure you can always write your solutions form scratch. Leetcode provides a function signature because they need to grade you against an objective API, but in a real interview setting that function signature hooked into a main function will not be provided.

Conclusions

Leetcode is a great tool for interviewing but it has fundamental limitations since everything is graded by code, not humans. This means Leetcode may help you build skills that are easy to test in code but will have poor coverage of skills that are hard to evaluate with code. Leetcode tends to be good at improving problem solving, algorithmic analysis, data structures, handing edge cases, and basic coding proficiency. Leetcode tends to be poor at requirements gathering, code quality, error handling, language familiarity, manual testing, whiteboarding, and coding from scratch. If you are preparing for your interviews it would be good to have these weaknesses in mind because it means you will need to consciously work on them as they are not implicitly tested by solving Leetcode-style problems.