Premature optimization
A mental model to detect and prevent optimizing the wrong thing, at the wrong time, or for the wrong reasons
This article builds a mental model to distinguish between mature and premature optimization. We then proceed to talk about 7 important aspects to reduce risks of any optimization effort.
As usual, there are amble examples and illustrations.
🤖🚫 Note: No generative AI was used to create this content. This page is only intended for human consumption and is NOT allowed to be used for machine training including but not limited to LLMs. (why?)
3T’s of optimization
Optimization is the intentional process of changing systems (software, hardware, people, etc.) to improve one or more aspects. Typically, it involves changes and trade-offs.
Good optimization improves the right things, at the right time, and with reasonable trade-offs.
However, change introduces risk (delay, failure, cost, etc.).
Premature optimization is the root of all evil —Donald Knuth
Premature optimization is when at least one of the following is true:
Changing the wrong thing. Examples:
The team takes 6 months to rewrite a Java micro-server to Rust to improve response time. After the rollout, they learn that the biggest source of delay was cross-region network dependencies.
A non technical manager falls in love with DORA metrics. His team sets up dashboards to track all 4 but she’s particularly focused on Lead Time. She uses carrots and sticks to get the team review PRs faster. The team games the system by making lots of pointless PRs to skew the data (e.g. updating inline code comments).
Picking the wrong time. Examples:
A nerdy startup founder burns the budget to create Google-level infrastructure for its one-endpoint API. The startup fails before discovering product market fit. First make it work, then make it better.
As the market-size was shrinking, a company realized that they need to rearrange their [human] resources to reduce waste and keep the balance sheets clean. So they decide to do a reorg. But every reorg has a J-curve (as we’ll discuss in this article). The productivity takes a hit before getting better. Being too impatient, leadership panicked at the sign of the J-curve and went with the second best option on their table: layoff!
Choosing the wrong trade-offs. Examples:
Leadership decides to switch observability provider to save cost. After 3 month of migration (the hidden cost), the team learns that the new provider has poor data quality and the support is crap! You get what you pay for.
Developers of a game engine decide to refactor the critical parts of the code in Assembly. This resulted in a 20% performance boost over the C++ code but now the company has to increase the price 2x to cover for the extra cost of development to support multi-platform and fix quirky bugs.
Now you have the the 3T’s of optimization:
Pro-Tips: Mature optimization
Now that we have a mental model to detect premature optimization, let’s discuss a few basic rules to do the right thing (effectiveness) and do it right (efficiency).
1. The first rule of optimization: don’t!
Interesting problems turns engineers on. Resist the urge. Not every optimization effort is worth pursuing.
A key insight is to know the difference between can and should. Just because something is possible, doesn’t mean that it should be done.
When an optimization idea pops up, go 5 WHY’s on it. Clarify:
Why is it an issue? Do we have enough data? Are we measuring the right data? What did we do to avoid data bias?
How does the investment in optimization yield business value?
What’s the best case and worse case scenario? What’s the risk and benefit to the user is optimization reaches it target band?
What are the alternatives to optimizing performance in one system? Are there any workarounds? Can we get away with some UX tweaks for example?
What’s the cost of the optimization?
I know it sounds like too much work. But it pays dividends to do this investigation upfront before touching the systems or code.
2. Find the right metric to optimize
If you can’t measure it, you cannot know if you’re optimize it.
Some metrics measure the output, while others measure the outcome. The difference is important:
Output is usually easier to measure but more likely to be a vanity metric. For example:
Network latency
API error rate
Outcome is more connected to the needs of service consumers and in turn the business objectives. For example:
Showing the correct price on the UI
Availability of the critical purchase flow
Optimization has a cost (headcount, time, resources, etc.). The cost needs to be justified by the business model. The closer to the value stream, the better the metric.
Business metrics are often harder to measure but motivate the cost of optimization.
You may choose a system metric to optimize but it’s important to know WHY a particular metric may directly or indirectly impact the business outcome.
3. Focus on the subset that matters
Most metrics aggregate multiple variables, some of which are out of your control or not important.
The business may have unrealistic expectations from the optimization effort. It is best to write down the requirements especially clarifying what is “good enough” (lagom in Swedish).
The Pareto principle states that for many outcomes, roughly 80% of consequences come from 20% of causes.
It’s not exactly a physical law, but a rule of thumb that can be very useful when optimizing a system because it invites a search to find the most impactful areas to change.
4. Use control metrics
Optimization typically requires trade-offs. On the way to optimize one metric, it is easy to hurt another.
That’s why you should have control metrics to keep track of what’s working within threshold and should not get worse.
A typical example is the CPU/RAM tradeoff. You can use in-memory caching to increase performance and reduce the load on the CPU or vice versa: you can compute on the fly in order to keep the RAM usage low.
For example, suppose your optimization goal is to improve a metric to reach a certain threshold. The cost diagram looks something like this:
But looking at the control metric, we see a different story:
Putting the two together, we get a better picture of the sweat spot where both the performance target and control metrics are above the acceptable threshold.
This was a very simple example. In reality:
There might be multiple control metrics
The threshold may be a range
There may not be a direct correlation between those metrics
5. Experiment small before going big bang
If possible, carry the optimization iteratively:
Experiment in a small scope before going big bang.
At each step, formulate your learnings and assumptions as a hypothesis.
Then proceed to verify your assumptions as cheaply as possible.
Be ready to abandon the optimization effort. It is easy to fall into the trap of sunk cost fallacy (the assumption that a cost that has already been incurred and cannot be recovered).
If a step becomes too expensive, you’re making a big bet. The larger the bet, the more work you have to do to reduce the risk (thread x likelihood).
6. Identify the point of diminishing return to stop
There’s a point at which further effort, cost, or time to optimize yields no tangible results or sometimes gives negative results.
It’s important continuously evaluate the optimization effort and stop it as soon as it becomes too expensive for what it’s trying to achieve.
7. Know your concern priority
The trade-offs you encounter when optimizing a system, can be multi-dimensional. A mental model that helps me is to list the priorities like this:
Security
Reliability
Usability (Accessibility & UX)
Maintainability (Developer experience/DX)
Cheaper runtime
Higher performance
How can you use this list? If for example, a change in the code is going to hurt maintainability but improves reliability, it’s OK. But if a change is going to increase the cost to improve the performance, it probably needs some good justification.
These priorities differ from product to product but once you clarify them, it makes it much easier to see the trade-offs.
Different systems have different priorities. A mobile game engine for example, may prioritize performance above maintainability.
Whatever the priority of your system, make sure to identify it before changing the system.
8. Pick the right time
And last but not least, timing is the most important aspect of optimization.
Take reorgs for example. Reorg is inherently an optimization effort. A good reorg should make bad things hard and good things easy.
But every reorg has a J-curve: the productivity goes down as the new teams establish their role, and hopefully after, we end up with higher productivity —but that’s not always the case.
Another example TDD: a methodology that advocates writing tests before the code. Sometimes, you’re in discovery mode and the code is in a state of flex. Adding tests can improve the quality, but any change in code, incurs the cost of updating the tests.
There’s no right or wrong here. It very much depends on the quality and reliability requirements of the service on those early days. But if you’re in a situation where the tests are getting in the way of discovering what works for the users, you’re probably dealing with a case of premature optimization.
Example
Making Python 100x faster with less than 100 lines of Rust is a master class in optimization done right:
Right time: They optimized at the right time. Just as the user demand was increasing, they realized that they had to optimize (not before or after that).
Right thing: The code was profiled before making any changes. Instead of rewriting the whole thing, they identified the absolute minimum part of the code that would generate exponential return on investment (ROI).
Right trade-offs: This approach reduced the downsides, for example: the majority of the maintainers didn’t have to learn a new programming language.
There are a few more related posts that you might want to check:
My monetization strategy is to give away most content for free. However, these posts take anywhere from a few hours to a few days to draft, edit, research, illustrate, and publish. I pull these hours from my private time, vacation days and weekends.
The simplest way to support me is to like, subscribe and share this post.
If you really want to support me, you can consider a paid subscription. As a token of appreciation, you get access to the Pro-Tips sections and my online book Reliability Engineering Mindset. You can get 20% off via this link.
You can also invite your friends to gain free access.
And to those of you who support me already, thank you for sponsoring this content for the others. 🙌 If you have questions or feedback, or you want me to dig deeper into something, please let me know in the comments.