Tech bet
Overengineering, premature optimization, resume-driven development, hype-driven development, gold-plating, cargo-culting, etc.
My article about a pragmatic approach to pay back tech debt became one of the most popular posts in this newsletter last year. It made rounds on Hacker news and social media, has been read over 85K times. and shared over 180 times.
The side effect of that reach is a lot of interesting conversations which it started. It helped me formulate some thoughts around another relevant phenomenon that I call tech bet.
Does any of these symptoms sound familiar?
The software improvement budget is approved before figuring out exactly what we’re about to do.
Big bang improvement projects without any clear and pragmatic iterative checkpoints get cancelled mid-execution.
Someone responsible for fixing some bugs in a system manages to convince leadership to rewrite the whole thing in a new language.
An overly complex technical solution is created to solve a simple problem. When you lift the hood, the code feels like the programmers had a fight with the product requirements and struggled with the tech.
None of those symptoms are bad on their own, but there’s an element of pure chance and lack of concrete data that makes these investments particularly risky.
If tech debt is about having inferior technology, tech bet is about having too much of it.
Tech debt is reactive, whereas tech bet is proactive.
Tech debt gives the engineers a bad conscience, whereas tech bet is a sign of naïve ambition or malice.
Tech Bet
Tech bet is the practice of paying the price of a hypothetical future tech debt upfront without having the data or insight to back it up. It leads to waste (money, time, energy) and creates unnecessary friction for the product.
The defining feature of tech bet is lack of information or wrong information which leads to inaccurate predictions and costly wrong decisions —hence the term bet.
For example:
Engineers build a technical solution that can handle global scale before a startup has a product market fit.
Ivory tower Staff Engineer spends months writing a technical strategy instead of spending time to solve the immediate challenges that bring the company to bankruptcy.
Backend engineers build a sophisticated database architectures with replicas and multi-AZ where Google Spreadsheet would do
Your new colleague from a reputable company convinces you to invest in a Platform as soon as your total number of services goes above 2!
And my favorite: the platform team is building a commodity service that has nothing to do with the business model instead of buying it.
Sounds familiar? This phenomenon goes by many names: overengineering, premature optimization, resume-driven development, hype-driven development, gold-plating, cargo-culting, etc.
At its core, tech bet is about taking the cost now (in terms of development time, energy, and money), hoping that it will lead to some savings in the future.
Sometimes tech bets are painted as compliance to “best practices”. Often what’s considered best practice is just someone else’s opinion which is taken out of context claiming to solve more problems than it actually does.
Sometimes tech bet is called a “bet”. It is honest about the risk and is often more cost-conscious but regardless of the term, the essence remains the same: to pay the price upfront hoping to get lucky with the return —hence the bet.
Sometimes tech bets are sold as a technical investment to make us more confident. In reality there’s little to no data to support the claim that the investment can be realized.
When the future arrives, if the product is still alive, you often have to change it anyway because:
Newer tech has deprecated the old tech
The product requirement has evolved to levels that demand rethinking the tech solution
Ironically, shoehorning the initial investments to the changing requirements led to tech debt
The original team have left, and the new recruits can’t cope with the unnecessary complication (there’s a difference between complex and complicated)
In the meantime, you have to pay the higher cost of maintaining a system that is more expensive than it had to be. You had too much tech.
At the end of this article, I will touch upon how to measure the amount of tech and how to know you have too much. But first, a story.
The Story
Many years ago, I joined a team as a front-end developer.
The product I would be working on was a greenfield identify provider. It was the identification component for 160+ sites owned by the company.
In simple words, it was the log in screen!
There was already a brownfield product in place written in PHP. It was a bespoke solution that was completely developed in house roughly following the OAuth standard. I was told it had so much tech debt that it was “beyond repair”.
The majority of the old team had quit after some drama which led to the EM and product departure.
It was a bastard product but too important to kill.
Instead, the company decided to assemble a new team who would be tasked to create a completely new solution from the ground up. Then flip a switch and 🎉tada! We’d be live.
At least that was the plan!
On my first day, one of the senior engineers kindly threw two printed booklets on my desk and asked me to read them cover to cover. They were these two documents:
Their plan was to create a new bespoke identity provider fully compliant with the above standards. This time in Java!
The few remaining people from the old team were tasked to keep the lights on and run the PHP solution. They were not part of the new team. In some sense, the two teams were competing. They didn’t even talk to each other. Hello! Java vs PHP? Two different species. 😄
The new team was given complete autonomy. It was the brainchild of one of our directors of engineering. A young handsome Norwegian dude who had 6 years of Software Engineering Google under his belt which catapulted his job title to tech lead, VP of engineering, and now Director.
Back then “Google” still meant something. It hadn’t removed the “don’t be evil” motto or resorted to deceptive techniques to hide its incompetence in its own game: AI.
I liked the guy. He was full of energy. It was comforting to have someone from Google at the helm. Retrospectively, I think we all suffered a bit from a bit of halo effect.
I was the second front-end developer in the team. The first had a “senior” title but had a background in backend development and was new to the crazy land of the browsers.
That didn’t stop him from spending the majority of his time creating a browser logging pipeline using Amazon Kinesis and Firehose. If it sounds complicated, that’s because it is. This is the same problem that Sentry or Splunk had already solved. But who needs to buy, when you can DYI? Especially when there’s little supervision and no one to ask the right questions.
The PM was a project manager and primarily busy with Gantt charts and stakeholder management. As far as she was concerned, everybody was working full throttle to make the Google-guy’s dream a reality and kick the old team’s a**.
Did the company need to spend money building an in-house logging solution? This is the type of problem that Simon Wardly calls commodity. We didn’t even need one to be honest! He was treating the browser as a backend running inside a data center —which was understandable with his background.
Most of my needs during that phase could be easily addressed with local console logging.
Don’t solve a problem you don’t have.
6 months into the project, we hired a new head of product. A bit cocky but friendly tall guy. As part of his self-onboarding, he came all the way from the company HQ to Stockholm to see what everyone is up to.
When he approached where my team was sitting, he asked for a demo. I showed him what we got but honestly told him:
I’m not sure how this product came to be and when will we be able to switch the old one with this one.
This didn’t sound like new information to him. Without a flinch he asked:
Show me what you mean
By that time, I had spent more time checking out the old product than reading those RFCs! I was more curious to see what the end product would be measured against. I spent around 15 minutes showing him the control panel, SDK, technical documentation and other aspects of the old product. I could go for hours but he was in a rush to talk to other teams.
I didn’t know what to make of that visit, but a month later the new product was shut down. The old product was the center of attention again.
That ex-Google guy quit. A few key backend developers of the new product quit in protest. The Java code rested in peace in some GitHub repo for eternity. The rest of us were merged into what was remaining of the old team.
I learned a ton about maintaining and refactoring legacy systems. My respect for the team behind the old product grew a lot. Turns out, none of them were happy with a total rewrite but I guess the halo effect was just too strong! 😇
Sometimes I wonder:
Did I singlehandedly kill that product?
Nah! I think the new head of product had seen this kind of tech bet before and didn’t have the appetite. Even if I did manage to kill the product by being transparent about the vanity of the rewrite, I probably saved the company millions of SEK.
I don’t know how the decision to rewrite started but I’d imagine it went something like this behind closed doors in the HQ:
“The identity platform is what connects the user data across all our brands, but most of the team quit” —Said a hypothetical senior director
After some awkward silence, some genius screamed:
How hard can that be? It’s just a login screen. Here’s my plan: we hire a new team and put them in charge of that product. In fact, I just know the guy. He’s from Google…
And after hiring the Google guy, he tried to look into the code and said:
🤮 Get that PHP away! I can’t even read it. We’re gonna rewrite in Java.
The initial plan changed from maintaining the system and fixing bugs to a total rewrite! A few meetings later, he was given 6 months to solve this problem.
Challenge accepted!
He assembled a new team of rockstar completely composed of new developers. No one talked to the old team. As far as the company was concerned, it was a dead product at the end of its lifecycle.
The new team did the best it could, but 6 months was too little and without proper product management, it already burned too much time playing with Kinesis and whatnot.
Did the company take a bet? Yes. But this was far from the only case. There were multiple other rewrites. There were too many new people with too many new ideas to shine.
Eventually the company faced serious challenges and broke in two halves (I’ve written about that period here). Last time I checked another company was devouring what was left of it. Sad, but if you’ve read the end of my last post, you know that the world of business is no joke and gambling is certainly not risk free!
👉Share or discuss on Hacker News
Tech debt vs tech bet
Let’s compare some aspects to distinguish tech debt from tech bet
Payment intent
Tech Debt: postponed to the future
Tech Bet: paid in advance hoping for a return in the future
Justification
Tech Debt: tactical: “we need to cut some corners to meet the deadline with the current bandwidth” (deadline is often made up)
Tech Bet: strategic: “if we do this now, we’re going to be future-proof” (future state is often a subjective guess)
Symptom
Tech Debt: the capabilities of the technical solution lags behind the product requirements
Tech Bet: the product requirements lag behind what the technical solution is capable of
Root cause
Tech Debt: poor engineering, ambitious product management
Tech Bet: ambitious engineering, poor product management
Mitigation
Tech Debt: start a payment plan
Tech bet: start simplification and removing cruft.
How to have enough tech?
Let’s get this out of the way: tech is hard to measure. If you were hoping that I will drop a vanity metric like SLOC (source line of code), number of labels on a tech-radar, or even DORA metrics (eg. lead time, change fail ratio, etc.) I’m sorry to disappoint.
Consider the fact that different people have different competence levels, and each product has unique customers, business model, and architecture. There’s no easy metric to measure the amount of tech.
However, there are a set of heuristics that I’ve developed over the years to help steer clear of tech bet.
If someone is pushing for technical investment, ask them how long this can be postponed and what are the costs to doing it later. Don’t solve a problem you don’t yet have. Usually they don’t have any number and if they do, it’s usually attractive numbers about the cost that the investment will save. You can completely maneuver around that by asking about the cost of postponing it and then focusing the conversation on verifying the data. If the problem is too far into the future, it’s most probably a bet.
FAB framework: separate facts, assumptions, and beliefs. Most of what flies around as facts is just someone’s assumption or a perspective of the data that supports their belief.
Hire and retain organizational truth tellers. In a podcast episode, Jeff Bezos said: “any high performing organization, whether it’s a sports team, a business, a political organization, an activist group, I don’t care what it is, any high performing organization has to have mechanisms and a culture that supports truth-telling.” The truth-tellers act as the canary in the coal mine. They call bullsh*t before it becomes too expensive to clean up. Hiring isn’t enough. Involve them in the discussions (inclusion) and build the psychological safety for them to speak up.
Release fast and early. A greenfield team should have a demo in less than a month to show that they have a good grasp on the problem. They should have something in production in less than 3 months. Anything that takes longer than 3 months is too high of a bet. Break the problem into smaller deliverable chunks and continuously evaluate whether you should place a bet and what the delivery would look like. Avoid big bang releases like plague. Rather have a broken product now than the promise of a solid one in the future.
Use written decision documents. Not only do they help quickly onboard newcomers, but they also serve as a track record to learn from the mistakes. Gambling is risky. Once the company takes the bill for the wrong bet, maximize the ROI by learning from it. Do a retrospective and make sure everyone understands what went wrong.
There are always exceptions where we’re dealing with a genuine technical investment, not gambling. And there’s always a level of uncertainty and risk that the business needs to take before startups eat its lunch. Don’t be dogmatic.
Like all things in life, the balance is the sweet spot but hard. I refuse to claim a universal framework that protects all companies from all tech bets, but hopefully this article gave you a few tools to assess.
My monetization strategy is to give away most content for free. However, these posts take anywhere from a few hours to a few days to draft, edit, research, illustrate, and publish. I pull these hours from my private time, vacation days and weekends.
The simplest way to support me is to like, subscribe and share this post.
If you really want to support me, you can consider a paid subscription. As a token of appreciation, you get access to the Pro-Tips sections and my online book Reliability Engineering Mindset. You can get 20% off via this link.
You can also invite your friends to gain free access.
And to those of you who support me already, thank you for sponsoring this content for the others. 🙌 If you have questions or feedback, or you want me to dig deeper into something, please let me know in the comments.
I am very familiar with the story you told in this article Alex, though I was not very close to that specific team. Well, if I remember correctly one of "my guys" was appointed as DevRel embedded with it at some point, but I'm digressing.
I think you're touching on a very important point here, where individuals and teams start thinking, talking and acting on technology for the sake of technology. Because they know better, and they're coming from a place where "we're serious about this stuff".
In the specific story you relate to, there was also a fractal element, where the over-investment in technology happened at the broader scale as well.