Comparative history and my reconsideration of history

Over most of my adult life I considered the study of history to be mostly without value. Since every pursuit has an opportunity cost, the study of history was always secondary to something more practical such as the study of engineering, medicine, a trade, programming, etc.

History according to educators was always a chronologically ordered sequence of events over a geographic area. You took a class on the American Revolution, or saw a documentary on the Roman empire, or read a book on the scramble for Africa. History was just a sequence of data describing largely public or documented events. Even with all the available data it was often lacking in causality. Why did a given policy change happen at the time that it did? At times the answers are known, often they are not. Consequential choices are made, and we are left to speculate as to why. The real drivers of many historical events, such as the private experiences, communications, and thoughts of elites (those in control of strategic resources or institutions), could never truly be known. In this sense history is the study of data, and the data is incomplete.

My thoughts eventually changed as I considered how one might acquire a better understanding of phenomena with historical precedent. For example, you might wonder what causes the rise and fall of empires. It will not be enough to study one empire. You should instead study the British Empire, Roman Empire, Mongol Empire, and Ottoman Empire, etc. and find trends. Maybe you are interested in something more practical like how modern banking crises unfold. I am more interested in the structure of scientific and technological revolutions, so I would be more interested in the history of science, the industrial revolution, the history of silicon valley, and the history of modern technology disruption. What I’m describing is probably called comparative history.

In a sense what changed my view of the practicality of history was a change in how I perceived it: data vs insights. I view a single historical period as just data. It’s hard to be confident in any trends when you only have a single historical period to extrapolate from. Everything about the past was different and maybe the way history unfolded was just a product of the legal, technological, political, etc. situation of the times. I view multiple historical periods as enough from which to draw inferences. If you see a trend, there is a much stronger case to make it represents something deeper and lasting about human psychology and politics at scale.

I am not sure why history is not marketed in this way, since it just seems much more practical. Focusing on studying narrow time periods in the hopes of maybe providing cause-effect links between successive historical events to me does not sound achievable. History largely relies on public, but not private information, and so will lack the ability to truly link cause and effect of why key decision makers made certain choices. But studying multiple areas of history to determine trends seems eminently reasonable. There is now no need to understand this unknowable cause-effect chain but rather the focus becomes what trends persist despite completely different period of history that only share the constant of human psychology.

The story of composability

The importance and power of composability was something I understood much later than I should have. I call this the story of composabilty and by it I mean the repeated, outsize success of composition to solve complex problems. To me this is an exception to the general rule of how solutions created without a specific problem in mind is a recipe for failure when the reverse (solutions created for a specific problem) is a recipe for success.

Composabiilty is so powerful that it’s in its own class of concepts. Ultimately it’s defined by a set of atomic building blocks that perform some small, narrow function. Products / services / solutions are created by arranging these building blocks in a particular topology to create complex, emergent behavior. Not all building block sets are collectively capable of the same emergent behavior but the most powerful ones are something called Turning complete. It is a measure of computational power and flexibility. If a set of building blocks is Turing complete, even under a particular set of constraints, it’s very difficult to overestimate the scale and breath of its capabilities.

Composability is all around us in various forms.

  1. If you are reading this, you are probably doing so on a computing device. Computers are fundamentally built on a set of atomic instructions called opcodes. This is the most basic foundation upon which all software runs. The same software that runs device’s operating system and browser. The same software that runs the internet. The same software that runs the computers serving you this web page.
  2. Technology innovations all around you have been built upon prior technology innovations. We had the wheel, then the cart, then the train, then the car, etc. We had jets, then missles, then rockets. We had wood, then coal, then gas/oil, then nuclear for energy production. We had telegraph lines, then phone lines, then internet lines. There is a topology and sequence to the emergence of technology. A more concrete example is just the topology of citations for various scientific discoveries. Most research papers building incrementally on concepts developed elsewhere. MRI machines were built on concepts in math (fourier analysis), physics (electromagnetism), and electrical engineering (signal processing). More advanced MR imaging like magnetic resonance neurography were then built off of MRI.
  3. Economies are built from composition. The foundation is a human population, some level of free trade, and some level of property rights. The building blocks are naturally individuals. They can be composed together in a certain topology to form trade networks. Some of those trade networks are formalized as companies. Unformalized trade networks are usually called industries / markets. The building blocks interact through free trade or contractual agreements. For example, I give you $100, and you perform a service. Or you pay me $X per year, and I’ll come work for you at scheduled times and perform assigned tasks.

There are countless other examples I could list if given enough time. For example, there’s DNA (the building blocks are ATCG nucleotides, from which all living things are built); periodic table of elements creating all forms of matter; there’s legos; neural networks (built from neurons); ecological systems (built from various species + material + energy source); Magic the Gathering is famously turing complete; public cloud (e.g. each AWS service is a building block), software APIs, etc.

My guess is that any system built of atomic blocks that is capable of Turing completeness is capable of more things than anyone can envision. If it’s Turing complete, it means it’s capable of solving any computable problem, meaning it can conceiveably solve every computable problem. This means that any problem appropriate for that particular system can be solved by it, and nobody knows every problem out there that exists. So to bet against Turing complete systems is to bet on the non-existence of appropriate problems. Betting on the non-existence of problems is very difficult be right on because it also includes problems that don’t exist today. Who could have guessed in 1822 that Fourier analysis would be used in medical imaging (MRI)? Or in 1930 that computers would eventually be so useful that every person would have one in their pocket (smartphones)?

To illustrate how unintuitive and weird it is to predict the abilities of Turing complete systems, I’ll make one prediction: All medical therapies are possible strictly through genetic reprogramming, and it will replace large parts of medicine as we know it one day. We already know DNA is Turing complete, which means it is capable of any therapeutic computation. One shot in the arm could reprogram the body’s cells to cure psychiatric disease, to reprogram cells to heal a traumatic injury with tissue loss (amputation, paraplegia, head injury), to create antibodies for a cancer/disease/virus, to reprogram cells reversing senescence, to change the shape of your body for cosmetic purposes, etc. This has real-world consequences today – the medical frontier of genetic engineering may be the bigger than you think.

As much of the world is built on incremental progress, a corollary is that introducing new atomic building blocks in a composable system is a formula for success. Amazon’s 2002 and 2023 letters to shareholders had exactly this thesis. In 2002 Bezos mandated that all services interact strictly through APIs, making the service a fundamental, composable building block in the service-defined-architecture. In 2023 Jassy outlined how AWS is built on fundamental primitives, and that the success of Amazon is about creating new kinds of primitives (such as AI-based products) to power customers to compose solutions appropriate for their unique problems.

While a system may already be Turing complete, higher level constructs make complex behavior easier to assemble. A great example of this is modern programming languages. Today we don’t program in computer opcodes, but rather human-readable programming languages with libraries. Those independent software modules may be built on another layer, such as a virtual machine, which may be built on another layer, assembly language, which is finally built on the computer’s opcodes.

This is the same idea behind the value of companies, which are just higher-level constructs built on the individual efforts of people. Building a company in this sense is just economy engineering. Often this is misunderstood (e.g. “Uber reinvented taxis”) because the core innovation is a new set of APIs. With Uber, you use your phone to find the nearest “taxi”. With conventional taxis you have to be in line-of-sight of the driver and flag them down. Similar transportation services, totally different APIs. Elon Musk’s hyperloop is famously misunderstood as reinventing trains, but the intended API is again different: conventional trains move slower due to air friction, but hyperloop (air vacuum + magnetic levitation) can move much faster due to the absence of friction).

There’s a lot to learn about the world by viewing it through the lens of composability. This is due to how powerful composability is. To say it’s a force of nature is not enough, it’s truly a force of physics. To view the world through this lens, you should understand the concepts of Turing completeness and APIs (application programming interfaces). You should be able to recognize when composable systems meet this computational theshold. You also have to learn to recognize atomic building blocks. Understanding APIs will allow you to recognize higher-level constructs and potentially create one yourself.

Genius as a search problem

Great men and women of history are a subject of fascination. When their greatness is due to the quality, novelty, and impact of their ideas, they are often called geniuses. Limiting the concept of genius to purely academic / research / engineering, we can define genius as being the first to seriously explore high impact ideas.

Ideas don’t exist in isolation and are connected to other ideas via reasoning such as deduction, inference, intuition, etc. In this sense we can envision the world of ideas as a large graph. Nodes are ideas. Edges are the transition of one idea to another in a chain of thought via some reasoning mechanism. Not all nodes need to be true ideas, and not all edges need to be logically consistent transitions. But out there in the vast universe of idea space are ideas yet unexplored which have great consequence at an individual scale and at societal scale.

Now we can think of genius as a search problem. Historical geniuses achieve the title in retrospect for having seriously explored a cluster of idea space that was previously unknown and highly impactful.

Search problems in graphs are a familiar problem in computer science and mathematics and have been studies for many decades, if not centuries. We can reason a bit about factors that increase the chances of exploring these new frontiers of idea space:

  1. Where you start in idea space.
  2. The search process.
  3. The total amount of idea space explored.

These aspects are multi-dimensional (more than one thing can influence it). Some of it may depend on chance, creativity, or innate ability, and to explore idea space in a way of historic significance perhaps requires that. However, there are still factors within the control of the everyman. A non-exaustive list might include:

  1. You can choose where to start.
  2. You can choose the effort or persistence you put into exploring idea space.
  3. You can choose the tools available to explore idea space.

If the goal were to explore beyond the frontier of idea space, you might start on a problem that nobody has seriously explored, perhaps because it’s not thought to be possible, practical, or profitable. Hard work and persistence give you the best chance of exploring the most you can in the limited time you have. The tools you might use may be things like new technologies, existing technologies used in new ways, new data, new theories, etc. The tools you use can inform your search process by allowing you to traverse edges in idea space that might not otherwise be possible.

If this search problem ideation is correct, it naturally suggests the pitfalls to avoid. Such a scenario might look like starting on a high-visibility problem that is thoroughly explored (e.g. I’ve heard todo list apps and dating apps are very common software startups). The problem may have no or weak personal significance, making it hard to persist after so many failures, when nobody believes in you, and there is little financial incentive to keep trying. And there may be no differentiator in the toolset used compared to those who have come before you and failed.

I think it’s likely difficult to know where the frontier of idea space lies. For all we know the frontier of idea space may be all around us, limited only by capabilities of the human brain. Everyone one super-human deduction away from something extraordinary. Who could have guessed that classical blackbody radiation would be so close to the frontier of quantum mechanics, or number theory so close to the frontier of cryptography? That said, the most predictable frontiers are likely where the problems are. Problems themselves represent a lack of adequate solutions. Either the laws of physics define the boundary of what a solution might look like, discoverable in idea space, or the solutions exist, also discoverable in idea space.

There’s much that can be written about this search problem but my goal was to articulate impactful ideation as a search problem. If there is one inference that I think should be made, it’s that we should think strategically about how we might push past existing frontiers in idea space to solve problems. The gap all problems and their solutions is just a collection of paths in idea space.

What is the system design interview?

Introduction

What frustrates me the most about the system design interview is how poorly specified it is. Every definition of it I have seen is circular: the system design interview is one where you must design a real-world system at internet scale.

Any questions?

Poorly specifying the format of this interview makes it very difficult to prepare. You are simply left to review sample solutions, notice a pattern, and then hopefully imitate the underlying structure correctly in real interviews. Whoever prepared the sample solutions is probably also operating according to the same guidance, and the result is a recursive series of imitations diluting any educational value.

The system design interview requires the interviewee to provide a draft of a minimum viable product which generally covers the following areas:

  • requirements gathering
  • estimations
  • APIs
  • data model
  • deep dive
  • system architecture

    In addition, the interviewer will also try to assess the depth of your technical knowledge and ability to recognize tradeoffs.

    The goal of the interview is to provide breath on the important aspects of system design while leaving time for the interviewer to probe in depth about particular topics of their choice. For each section that involves design, there are almost always tradeoffs for each approach. You are to identify each approach, their tradeoffs, and pick the best option for your requirements.

When new systems are built, similar system design documents are created. The main difference is that real system design documents tend to include a lot more details because they written in more than 45 minutes and they are reviewed by a team of managers / engineers who will be responsible for implementing it. As system design documents tend to be created by more senior engineers leading projects, this interview tends to become more important for more senior engineering positions.

Details on each section

Requirements gathering

The first thing to understand about the system design interview is that there is no one-shot prompt. In a coding interview, the entire question can be specified in a single prompt and there is no actual need to interact with the interviewer to complete the question. This is partly why it’s easier to practice coding: you can practice coding in isolation.

System design starts with a vague prompt (e.g. “design google”) and requires interaction with the interviewer to scope out some minimal set of functional requirements. Functional requirements are exactly what the system is expected to do and no more. When you scope out the requirements, you are expected to take a pro-active approach. It is for this reason that system design typically covers services you are already familiar with, often consumer products. Taking a pro-active approach will mean you propose the functional requirements and only let the interviewer correct you on the details or provide guidance on what is and isn’t out of scope.

The reason for the interactive nature of system design is that it simulates real-world requirements gathering process with customers / stakeholders. The more senior your position is, the more scrutiny any committee will pay to your requirements gathering. Your functional requirements must completely determine system behavior and all the remaining parts of this interview are determined by this section.

Functional requirements can get incredibly detailed. For example, if you are designing a search engine:

  • what results are you returning? Images, web pages, documents, emails, something else?
  • what character encoding are you working with (ascii, unicode, something else)?
  • should you restrict your searches by language? (e.g. “pain” means “bread” in french)
  • how do you handle similar characters like ï and i?
  • how should you rank the results? By relevancy? Chronologically?
  • do you have bandwidth constrained users? If so should you not return images?
  • If you are returning images, what image formats do you need to support?
  • should results be filtered by some property, like language or feature (e.g. restaurants)?
  • should returned results be truncated or compressed in some way?

That said functional requirements is not what makes the system design interview challenging. Ultimately the functional requirements can just be satisfied by a set of APIs on a single web server with a database. The non-functional requirements are what makes the interview challenging. Non-functional requirements are like metadata attributes to describe how the functional requirements are accomplished. Low latency, high availability, scale, consistency requirements, data residency requirements, users with constrained bandwidth, security, etc. These non-functional requirements require a high degree of optimizations that are not intuitive or trivial to implement.

Estimates

Estimates is the least important section of system design and sometimes the interviewer is ok with skipping it entirely. For estimates you are to calculate some important performance requirements of the system using some guesswork for unknown variables. The most important performance properties tend to be:

  • requests per second (may be divided into reads / writes per second)
  • inbound bandwidth per second
  • outbound bandwidth per second
  • storage per year

Sometimes the interviewer will ask you to estimate some hardware requirements, such as how many servers you need, how much RAM you need, etc.

You will generally compute these estimates with Fermi calculations. The interviewer just wants to know that you are capable of making reasonable system estimates and it’s important not to dwell too much on this section in the interest of time. Almost regardless of the numbers you compute, your system design will be the same.

APIs

APIs are abstract programming interfaces, and they are how your clients programmatically interact with your system. You will need to specify what kinds of APIs you provide (library, a web service, something else), and the APIs themselves. Each API will have some some set of inputs and outputs.

The important thing to keep in mind is that your APIs must totally satisfy your functional requirements. It is not enough to specify just the most commonly called APIs; they must all be there. The parameters must also be complete.

That said, authentication and user signup is typically out of scope for a set of APIs.

Data Model

This is mostly about you specifying all the data entities you want to keep track of and writing their relationships in Third Normal Form (3NF). Third Normal form was developed for relational databases to avoid common problems like data duplication and data anomalies.

Again your data model must have enough information to meet all of your functional requirements. The data model tends to be the simplest section since it’s most about identifying the data entities and their relationships (1:1, 1:N, N:M, etc.).

The data model doesn’t require you commit to a particular database such as relational or key-value store.

Deep dive

There is typically some important design decision that is not covered in the other sections. If you are designing a rate limiter, what rate limiting algorithm will you use (e.g. leaky bucket, token-based, sliding window, fixed window)? If you are implementing search, what indexing structure will you use (inverted index, prefix tree, quad tree, etc)?

Most system design questions tend to have one very important design decision and that tends to be the focus on this section. The interviewer will often probe for more and more detail about how the implementation works to test the limit of your knowledge. Details about the implementation often have important performance implications. For example, can you cache your indexing structure in RAM or do you need to use disk/object storage? In a global service with multiple data centers, do you have a single global rate limiter or one at each data center? And where in your architecture do you even put your rate limiter?

This section may also include your choice of database: relational, key-value, wide-column, etc. Different database types have different performance properties, and you are expected to pick one which best meets your expected data access patterns. For example, if you have a high rate of writes to reads, you might choose wide-column store. If you need ACID properties, you might choose relational.

System architecture

System architecture is the section that people typically associate with system design. This is usually the final section. You will be asked to draw a diagram of system components: typically microservices and undifferentiated cloud services. The system components are expected to have arrows designating either the flow of information or interactions between separate components.

Your microservices are typically expected to represent specific tasks that your system performs. It is broadly similar to class design except services are a higher level of abstraction. Ideally each microservice will do one specific task very well and is allowed to communicate with other microservices or components.

The undifferentiated cloud services are things like load balancers, CDNs, caches, object storage, API Gateways, message queues, etc. They are fundamental building blocks of any distributed service and things that are so commonly used in internet architectures that you can just leverage existing solutions instead of trying to design your own. The only exception to using undifferentiated services is when you are building the thing. If you are building a message queue you of course can’t use AWS Simple Queue Service.

Again, you system architecture must meet your functional requirements. That part is easy. But it must also meet your non-functional requirements. If you have high scale, you may need your services to scale horizontally as individual hardware components aren’t performant enough on their own. If you need low latency, you may need to pre-compute data, make use of caching, or use an indexing structure for fast data access. If you expect spikey traffic patterns and need high availability, you should make use of message queues for services with relaxed latency requirements.

What makes system design challenging is that there is a large design space and the accepted one makes what is considered the best tradeoffs. Everything in system design is considered about tradeoffs of some important properties (e.g. consistency, latency, availability, etc.) as there is rarely a single design that maximizes every desirable property simultaneously.

Your interviewer will typically ask you to explain the request flow for a given scenario. This is to verify that your system meets its functional requirements and you understand how the components interact. Sometimes it is to draw your attention to a flaw in your design that you are expected to recognize with enough probing. They will often ask about optimizations, or probe you to consider alternative designs by asking about edge cases or additional requirements.

Exponential trends for fun and profit

If there was decision point to inform career and investment choices I would recommend understanding contemporary exponential trends or technology trends. By exponential trends I am mostly referring to changes generally produced by new technology be it the size of a user base, production/service costs, or something else. Making the right decision here could be the difference between starting your career at BlockBuster or Netflix in 2002. Or investing in Sears rather than Amazon in 2002.

Technology innovation is often a precursor for major historical events that remake social and economic modes of organization. Imagine how different the world would look with the combustion engine, gunpowder, steel, computers, printing press, and internet. The introduction of a new technology alone can reshape economic futures. Whaling used to be a major market because blubber was used for lamps, but that entire profession has disappeared because of the use of petroleum. Then again entire industries around oil lamps disappeared because of the introduction of electricity and the incandescent light bulb. Even still incandescent light bulb manufacturers faced economic peril if they didn’t eventually migrate to manufacturing more energy efficient bulbs like LED. Each innovation created and destroyed fortunes, though at each transition more value was created than lost as the cost structure improved.

When invention first ushers in the beginning of disruption, technology adoption tends to happen more slowly than people expect. The birth of the internet is debatable depending on how we define it, but let’s say 1990. Even in 2022, only 60% of the world has internet access globally. I think a good rule of thumb today is about 30 years to go from inception to mass adoption (maybe 60-80%).

Not shown in this graph are technology disruption cycles. Print advertising like newspapers and magazines has been disrupted for online advertising like Facebook and Google. For print advertisers, the years 2000 – 2010 erased the prior 50 years of revenue gains. For each technology adoption cycle there is some complimentary technology disruption cycle, as each new technology is ultimately replacing some less efficient solution. For example, Uber’s rise came at the expense of Taxis and Airbnb’s rise came at the expense of hotels.

I don’t have good data on the length of technology disruption cycles but I’ll estimate 10-20 years. Disruption cycles tend to be shorter than adoption cycles because adoption cycles tend to create new markets, in addition to disrupting existing ones.

There are many active technology adoption cycles ongoing today. You are doubtlessly part of this either directly by using some emerging consumer technology (e.g. online dating apps) or indirectly by prioritizing cost/performance in your purchase decisions (e.g. buying a phone with a long battery life). Exponential trends tend to precede widespread technology adoption or act as precursors to other forms of technology adoption. The increasing ability to send more and more data over the internet (bandwidth), observable in the 90s, was a harbinger for the eventual spread of high-bandwidth internet applications such as streaming video and video conferencing.

So if we want to identify future technology adoption cycles, we can just identify existing exponential trends. Then we can reason about the where these exponential trends might lead us.

A non-exhaustive list of exponential trends today:

Moore’s law

I mention Moore’s law because it’s the most well-known example of an exponential trend, even though it’s mostly no longer true today. It concerns the number of transistors that can be placed on a computer chip known as a CPU. More transistors = more compute. For decades computers were getting so fast that old computers were almost immediately obsoleted in compute power. This pattern eventually tapered off due to physical limits. More transistors = more friction = more heat. If the computer chips get too hot, the transistors get damaged. So heat dissipation prevented making computers faster by just making transistors smaller and stuffing them into the same chip.

Nielsen’s law

Nielsen’s law concerns the bandwidth, or the rate of data transmission, in an average consumer internet connection. With low bandwidth it might only make sense to send text, so applications like email. With more bandwidth you might be able to download/upload images and maybe short video (social media). With more bandwidth still you could stream long-form video (youtube, netflix, and Zoom). With even more bandwidth you could stream high-bandwidth video in real time (think VR). This progression is predictable and it will end when physical limits are hit in how much data can be streamed over fiber optic cables, but we are nowhere near this limit.

Huang’s law

GPUs are computing devices that are really efficient at parallelization. There are a few applications for such kinds of compute: gaming, deep learning, and scientific computing. Of particular importance is deep learning, which you might know as artificial intelligence. As GPUs get exponentially more efficient at a fixed price point, the cost to train a deep learning model decreases exponentially. That is, artificial intelligence training is largely GPU-compute bound. Unless physical limits are hit soon, you can expect artificial intelligence to become dirt cheap.

Energy density of Lithium Ion Batteries

Li-Ion batteries are what powers your smart phone and your laptop. They’re the predominant battery type for consumer electronics and electric vehicles. Higher energy density means smaller batteries. Smaller batteries means lighter batteries, or if you prefer, longer battery life with a constant size. Unless this trend his physical limits soon, we can expect such things as:

  • phones you charge once a week
  • cars you charge once a week
  • consumer drones that can fly for hours at a time (currently flight time is up to 20 min)
  • potentially electric planes

Decrease in storage costs

HDD and SSD are two types of computer storage drives. HDD is older and uses a spinning magnetic disk. SSD is newer and has no moving parts. As storage becomes cheaper, it means we can afford to store more things for the same cost. You might have used this trend to predict the rise of cloud storage (think of saving your phone’s images on Apple’s servers). You could predict the rise in telemetry (monitoring some data point such as temperature) and consequently the rise in BigData (analytics on the vast quantities of historical telemetry/data). As this trend continues, you can expect to see longer historical record keeping, such as car dashcams never deleting data. And consequently this high-quality data can be used for other purposes such as training self-driving cars. This trend is analogous to the storage trends that followed the decreasing cost of paper.

Growth in money supply from central banking

Today’s national currency comes from central banks, like the Federal Reserve. Central banking is a relatively new phenomena in human history, only being around for the last 100 years or so. With central banking, an institution actually has the ability to print money. As history proves, governments with the ability to print money tend to operate their economies on credit and are rarely stingy. There are many interpretations you can make from an exponentially increasing money supply, and one of them is that the exchange rate for less inflationary assets like stock or gold will increase in nominal values.

Cost to send a kg into space

Space is a pretty expensive place to send materials. The main thing we send into space today are satellites (think GPS or weather satellites). As costs decrease, we can send more material. That might mean bigger, more sophisticated satellites. Or it could mean a new form of travel even faster than air travel. The longest flight in the world is New York to Singapore at 18 hours 30 minutes. With a rocket this could be done in less than an hour. It’s technologically possible today, just too expensive for anyone to do. That is, until it isn’t. Cheap transport of materials into space opens up new opportunities for high-precision manufacturing (dust in the air can result in impurities) and research I can only imagine about.

Cost of DNA sequencing

Large classes of disease are hereditary. A DNA test could identify genetic diseases immediately, or responses to certain drug classes, or propensity to develop certain diseases later in life (e.g. diabetes, cardiovascular, addiction). But before we can even get to that point, there needs to be more genomic research, and that’s only possible with more data (DNA sequencing patients with known diseases). Both both genomics research and clinical practice, cheap DNA sequencing is needed. DNA sequencing is currently around the $600 price range, which is still not economical at scale. That said, these falling costs could be used to predict the rise of companies like 23andMe which do partial DNA sequencing.

Final words

As I said this is a non-exhaustive list of trends and I’ve only chosen some of my favorite examples. Even to one trend there are often many dimensions. Li-Ion batteries aren’t just becoming more energy dense, they’re also becoming cheaper with better performance characteristics like recharging speed. There are many important trends I’ve probably missed that should have made my list too. And these trends need not be purely about technology. You could just as well make a chart with fentanyl overdoses and predict its emergence as a national issue years ahead of schedule. These trends need not be exponential, either, but exponential trends naturally tend to have the largest impact.

There were a huge number of historical exponential trends I could identify even with cursory research. Imagine the decreasing cost of rail transport, or decreasing accident rate from elevators / airplanes and how they enabled skycrapers / air travel, or the decreasing costs of lighting. There are simply too many to list and it would only be scratching the surface. The point being that these trends are everywhere, all the time. Especially the moment you read this sentence. The exact exponential trends will change but the cycle of technology adoption changing economic and social modes of organization will continue. Identifying technology adoption cycles ahead of schedule by monitoring exponential trends can be huge beneficial for fun and profit.

What differentiates a really good doctor

I’ve seen a lot of doctors in my time. What makes my experience different is the lengths I went to source them. I’ve travelled to different states, countries, and continents to see them. Sometimes they needed to be the best I could find, for any price and any distance. I’ve done this several times for different problem domains. As you can imagine, there’s a lot of diversity in how good they were, or to put it a different way, how useful they were. These meetings were usually just to discuss a single service. Could they do it, or not?

The best doctors were able to render services outside of something called the standard of care. The standard of care is just medical consensus for how doctors should approach various problems, and it tends to lag leading medical research by 20+ years. There’s a joke in medical school that 50% of what’s in medical texts (read: standard of care) is wrong, but we just don’t know which parts. Well, the best doctors tend to be medical researchers and they know which parts are wrong in their research domain. Instead of administering existing treatments they administer experimental ones. They try things that nobody else will, so they get results that nobody else gets. This could mean a curative treatment, or a drastic change in some important outcome (e.g. mortality rate).

Average doctors follow the standard of care too closely to get above average or below average results. The smarter they are, the more consistently average results. At best they may have earlier awareness of updates to the standard of care, but can never go beyond it. If they happen to also be medical researchers then they’re working on marginal improvements to the standard of care. Questions like whether to administer 4 or 5 days of antibiotics just doesn’t have the same impact as limb reconstruction or nerve decompressions. They may know of more impactful and interesting research problems but deliberately avoid the risk and uncertainty that comes with it. And sometimes they just happen to be overly optimistic about the accuracy of the standard of care, or perhaps not knowledgeable enough to confidently challenge it. They feel maybe other fields of medicine have made obvious, backwards mistakes but probably not their field in this day and age.

The reason average doctors exist and are useful is that dumb patients also exist. They’re the people who naturally aren’t familiar medicine, and also lack good intuition. If they flout the standard of care they often choose a much worse treatment option, like homeopathy for a serious, progressive disease. Sometimes they can get things right, but only because the standard of care got it wrong. It turns out that if the standard of care is wrong, it’s not hard for a patient to beat it. An underappreciated aspect of being a patient is you have much better data quality than doctors do. Doctors read about symptoms from books, but patients experience them. If you’re a patient and you stumble on an effective treatment, you’ll know. But as a doctor, do you really the guy who can’t tell the difference between a ligament and a tendon?

By the existence of dumb patients you might also guess the existence of smart patients. They’re the most competent of all. They have the trifecta of high data quality, medical knowledge, and good intuition. Because they experience the actual pathology, it’s 10x easier for them to identify medical knowledge that’s wrong. Is condition X really caused by stress or lack of sleep? If Y is the cause then Z should also be true, but it’s not. Patients also have 10x more confidence in data quality. Imagine the concept of sun sneezing (aka photonic sneeze reflex). It’s such a bizarre concept, that exposure to sunlight can trigger a sneeze reflex. As a patient you can be assured it’s a real thing. Maybe it’s happened dozens of times. But as a doctor hearing this from a patient? I’m not sure it’d be considered credible without independent confirmation, and even then you’d wonder if it was true. Lastly, patients don’t experience conflicts of interest, or have legal concerns, etc. Doctors may say they act in the best interests of the patient but they quietly practice defensive medicine and accept grant money from pharmaceutical companies. A doctor’s judgment can be easily swayed by external forces, but much less so for a patient.

In summary what differentiates the really good doctors just comes down to the knowledge and judgment to selectively abandon the standard of care and try something better. Effective medical researchers sometimes fall into this category and also smart patients. By doing what nobody else does, they can get results nobody else can get. Sometimes these results are exceptional. Doctors who rigorously adhere to the standard of care get approximately the same results as any other doctor employing the same strategy. Often there is good reason to adhere to the standard of care, but when it’s wrong, the outcomes will never be exceptional. Dumb patients exercise independent judgment like smart patients but lack the knowledge and intuition to outperform the standard of care. By doing what nobody else does, they can also get results nobody else can get. Sometimes these results are poor.

When you don’t know what you don’t know

There’s a certain class of problems that kind of fascinates me and I’d describe them as not knowing what you don’t know. You could say it’s involuntary ignorance or unawareness of common knowledge. It’s something that you don’t know, that everybody else seems to know, and because everybody else seems to know it they assume you couldn’t possibly not know it. If I had to list the features of the problem it’d look like this:

  • Very hard to discover the solution
  • Once you discover the solution, fixing the problem is usually trivial
  • The solutions are always obvious in retrospect
  • The solutions are already widely known by others

I have several examples of this happening in my life. The first and best example is when I was 10. In school I was failing history tests but doing well on every other subject. A poor grade would be something like < 75/100. I was getting scores in the 20s/100. Basically bottom 1st percentile stuff. The school was calling my parents, suggesting maybe I should be moved to a different school. The school and my parents IQ tested me three times. I underwent a battery of tests, such as neurological, vision, hearing, psychological, etc.

Apparently many thousands of dollars were spent trying to find the underlying problem. The solution came from a child psychologist who figured out that I didn’t understand the concept of studying. I didn’t know I could or was supposed to prepare for these tests by repeatedly reviewing the material so I would memorize it. Nobody had told me about it and I had never done it before. I guess the subject never came up in conversations with friends, or if it did maybe I thought studying meant something different. I really have no idea how other people figured this out on their own. Naturally the solution was to get a tutor and my next score on these tests was 96/100. This issue was immediately resolved and never happened again.

A more relatable and general example is someone who mispronounces words because he learned it on his own by reading.

There’s really no doubt that this kind of problem must be generally pervasive in every functional area of life. Social, academic, professional, health, romantic, etc. For some people, the difference in between success and mediocrity may come down to unawareness of some common knowledge that plays an important functional role.

I don’t really have firm recommendations on detecting of fixing these sorts of problems. The most I can say is that in my experience the people most likely to point out these common-sense solutions are either mentors, trusted friends, or hired consultants. They will generally provide you honest feedback that others wouldn’t for risk of offending you.

A very common interviewing mistake: not practicing on generalized problems

What frustrates me about most advice with respect to interviewing practice is that it’s generally obvious and the target audience is people who are inexperienced. Examples of this advice are things like “make sure your interviewer knows what you’re thinking” or “don’t jump immediately into coding”. There’s no good source of advice for people who are intermediate level or above. One example is the subject of this article. It’s one of the lowest-hanging fruit for issues I see at the intermediate level for the software engineering interviewee.

Before I explain this, I want to provide an example. In calculus everyone learns the concept of a derivative. Conceptually it’s the rate of change at some point. You might be shown that the derivative if x^2 is 2x^1. The formula for the derivative of x^n is n * x^(n-1). You will get so used to these questions that it becomes muscle memory. And then in a homework assignment or a test you need to find the derivative of 2^x. Suddenly you realize your mental model is broken. If the polynomial rule for a derivative doesn’t work here, then what’s the real equation for a derivative?

The problem here is a selection bias on the examples you practiced on. You practiced on easy questions because the textbook author or instructor didn’t want to distract you with the real complexity involved. The easy questions, or maybe just repetitive questions, tend to have a shortcut. You apply the shortcut and now associate the shortcut with the generalized solution. Teaching you how to apply this concept to slightly more complicated examples was “someone else’s job” or “something for you to figure out on your own”. You are left half trained and might not even know it.

This problem was the subject of my favorite book, Mathematical Methods in the Physical Sciences by Mary Boas. In the introduction, she outlined how physics departments don’t have the time to teach students the physics and the math, so they teach the physics and expect the math department to teach the math. But the math department doesn’t do this for various reasons. Her book was to teach applied mathematics at the level of rigor required in the physical sciences. It is essentially a book about more rigorous applied math education for STEM fields.

This problem also exists in a lot of other domains, such as practicing for the software engineer interview. It’s a general flaw of teaching, where repetitive and simplified examples give a distorted view to the underlying structure of the solution. In the spirit of this, I’ll provide you with real, non-trivial examples.

Let’s say you’re given an array of integers are you’re asked to find the k-smallest items. You master that question on Leetcode and you interview. Now you’re asked to find the k closest 2D points to a 2D target point. They are the same question. For each question, you are given some input set of elements, and you are to find the top k of them according to some special distance function. That’s it, but one has a much lower pass rate than the other.

Let’s consider another example. You want to find the square root of a 64-bit double, or something really close to the square root. You might recognize that a binary search could work. You might be surprised by the number of smart, experienced people who end up suggesting a heirarchical binary search (do a binary search to find integer bounds [n, n+1] around the real square root, then a second one over doubles using those bounds), make implementation mistakes as if this binary search was over an integer array (e.g. high = mid – 1 and low = mid + 1 do not make sense in this context for adjusting bounds), or have significant implementation difficulty.

Here’s all you ever needed to know about generalized binary search: if position(x) < position(y) implies f(x) < f(y), then you can use a single binary search to get to your answer regardless of of what position() and f() mean. For a “simple” binary search over an array, position(x) is just the index of x. So this becomes i < j implies array[i] < array[j]. But for computing the square root, it becomes position(x) < position(y) implies x^2 < y^2, which is true. And the generalized code looks like this:

while (left < right):
mid = ComputeMid(left, right)
if CanReturnEarly(mid):
return mid
if GoLeft(mid):
high = mid – offset
else:
low = mid + offset

By overpracticing on simplified examples, some interviewees assume that offset must be 1 (which causes incorrect behavior for my question). They may not realize that offset can be zero if you don’t have to worry about convergence. They may set left/right/mid as integer types which wouldn’t make sense for this problem. Sometimes they struggle with when they can return early, which is usually just a sign they are overwhelmed with new information to struggle on something that was literally provided as part of the question.

The general lesson here is that many of the concepts you learned aren’t specific to the data types or values you practiced with. Unfortunately, instructors/authors/teachers have a bias to teach you using simplified examples that hides the complexity of the actual concept they want you to understand. The tension here is that you need exposure to that complexity if you want to perform at an intermediate or higher level of proficiency.

So what can you do about this? Mostly it comes down to understanding the general structure of the solution, and knowing which part of the general solution is fixed and which parts vary with the inputs. Once you understand this, you start practicing against classes of problems as opposed to specific problems.

Fundamentals of the software engineering interview

Introduction

I’ve been interviewing software engineers professionally for almost a year now. I’ve thought a lot about why some people don’t pass interviews, why others do extremely well, and what makes a good coding/algorithm question. One thing that’s clear to me is that people who do not do well often do not understand fundamental data structures, algorithms, or coding skills. Conversely, people who do extremely well often have a much better than average grasp of the fundamental data structures, algorithms, and coding skills. This post is a non-exhaustive list of what I consider the fundamentals among these three categories. I find that virtually all coding questions require a composition of these skills and knowledge.

Data structures

  • Arrays
  • Linked lists (singly linked, doubly linked)
  • Map/dictionary
  • Priority queue / heap
  • Trees
  • Graphs
  • Stacks
  • Queues
  • Sets

Algorithms

  • Depth first search traversal
  • Breath first search traversal
  • pre-order/post-order/in-order tree traversal
  • Greedy algorithm
  • Binary search (and generalized binary search)
  • Recursion
  • Dynamic programming
  • Quicksort
  • Mergesort
  • Backtracking
  • Topological sort
  • Union find
  • Divide and conquer

Coding skills (python specific)

  • Custom sort in one expression (sort with custom comparator)
  • enums
  • lexical scoping
  • deep copy an object
  • While loops
  • for loops (forwards and reverse iteration)
  • enumerate
  • bitwise logic (&, |, shifting, etc.)
  • array operations (index, append, extend, remove, find, slicing)
  • Dict operations (iterate keys, iterate keys+values, iterate values, add, delete, union)
  • set operations (add, remove, contains, union)
  • Create an array with range of integers m through n
  • Use every data structure with arbitrary objects
  • Create a class
  • Use class inheritance
  • Polymorphism
  • Use an abstract base class
  • Overload operators (+, -, <, >, ==, etc.)
  • Overload the __str__() function
  • Get the memory location / unique id for an object
  • Convert a string to int/double
  • All constants (max number, min number, lowercase alphabet, uppercase alphabet, numeric digits, punctuation)
  • String interpolation
  • Map/filter/any/all
  • List comprehension (in 1D, 2D, 3D, etc.)
  • Lambda functions
  • Adding type checking to function signature
  • Type checking during runtime
  • Caching utils
  • All language keywords
  • Reading from a file
  • Writing to a file
  • Convert a string to a file-like object (for testing)
  • try/catch exception handing
  • Built-in and custom exceptions
  • bit operations (and, xor, or)
  • String operations (to lowercase, is prefix, is suffix, find substring index, etc.)
  • Convert char to ascii and vice versa (e.g. ‘a’ -> 97)
  • Slicing an object (array / string)
  • Reverse an array
  • Iterate a container forward and reverse
  • Define an iterator for a custom class
  • Generate a random integer between a and b
  • Generate a random float between a and b
  • Datetime operations
  • Math operations (exponentiation, logarithm, integer division, modulus, min/max/abs/floor/ceil)
  • Swap two elements of a random access container
  • Assert statements
  • Debug logging
  • Regexes
  • Invert a min-heap to a max-heap and vice versa
  • Read user input from the console
  • Decorators
  • Parallel programming
  • Measuring time elapsed
  • Sleeping for a specified amount of time
  • Functors

Other

  • clarifying requirements
  • code quality
  • manual testing
  • creating good test cases
  • time / space tradeoffs
  • design patterns

FAANG

Interviewers are the most competitive companies often have a tendency for asking questions that I guess you would call Leetcode hard. Many Leetcode hard questions require prior knowledge of optimizations / techniques / patterns / etc. to solve in 40-45 minutes. In a sense it turns the interview into a mini competitive programming competition. The short version is that there’s an additional level of skills and knowledge beyond the fundamentals which deserves its own category and are usually discussed in guides to competitive programming.

  • Interval trees
  • Segment trees
  • 2D prefix sums
  • Sliding windows
  • Monotonic queues/stack
  • Rolling hashes
  • string processing algorithms
  • bucket sort / square root decomposition
  • https://usaco.guide/general/

Thinking patterns that make you good in physics and software

I studied physics and later went on to become a self-taught software engineer. There was a fair amount of overlap in between the skills used in those two fields, namely abstract reasoning, short-term memory, and pattern matching. Naturally there were also differences. I also found that what made me good at physics didn’t translate to software. I guess you could say I am naturally a better physicist than I am a software engineer. After five years of software engineering I’ve gotten a much better sense of the thinking patterns that increase productivity. What’s interesting to me is that in each of these fields there’s one way of looking at problems that works very well, but it doesn’t translate to other fields.

The first way to explain what makes you good at physics is actually to explain what differentiates people really bad at physics from people who can perform at even an average level. While physics was straightforward for me, it seemed that it was often extremely challenging for other people. Sometimes I would tutor students struggling with their introductory physics classes and it was always interesting to try to follow their thought process. Many students would just be given a physics problem and have absolutely no idea what to do. This happened least frequently for reciting facts and happened most frequently for open ended problems.

The way to approach open ended problems in physics is to understand all the physical laws related to the problem. You might even call these “first principles”. You generally need some level of pattern-matching to compose these first principles into an equation or expression. Then reducing that expression to the answer usually requires some mathematical facility. That’s pretty much it. Sometimes you’ll be permitted to use approximations, which generally means taking a Taylor series of some expression, and discarding all but the first term. The hardest part of real physics is the math, not using “first principles” and approximations to create an expression. But it’s the first principles thinking that differentiates the skill levels for easy physics problems, because the math is designed not to be that hard.

Software is different. When I first got into software I found it extremely challenging, but not for the reasons you’d expect. Writing my first “hello world” program took many days. I had C source code but no idea how this text somehow “did something”. For some reason, the online tutorials of the day never mentioned the process of compilation or running an executable. I found many simple tasks overwhelming and frustrating, such as uploading a files. I didn’t understand how to make a file “go to another computer”. Unfortunately, the language I used to express the problem was too generic to get search results from google, like with my first hello world program.

Things got much easier for me once I started thinking in terms of APIs. I always understood the “what” but not the “how” of doing things in software. The API answered that question. Instead of asking the vague question of “getting it to do something, to work” I could just ask “what’s the API?” Now when I needed to build something, once I had a sense of what components were necessary, the details just came down to figuring out what the APIs were and interfacing with them properly. This is a flexible way of looking at things because all software interfaces at its boundaries with an API: Databases, networks, compilers, functions, libraries, etc. The useful thinking pattern here was to view all interactions in terms of APIs.

In conclusion first-principles thinking works very well for physics, and thinking in terms of APIs works very well for software. These are very different ways of looking at the world. Different problem domains (e.g. law, politics, medicine) each could have one unique way of looking at the world that makes work much easier but also don’t translate well to other problem domains.