Tech gamble
Overengineering, premature optimization, resume-driven development, hype-driven development, gold-plating, cargo-culting, etc.
My article about a pragmatic approach to pay back tech debt became one of the most popular posts in this newsletter last year. It made rounds on Hacker news and social media, has been read over 85K times. and shared over 180 times.
The side effect of that reach is a lot of interesting conversations which it started. It helped me formulate some thoughts around another relevant phenomenon that I call tech gamble.
Does any of these symptoms sound familiar?
The software improvement budget is approved before figuring out exactly what we’re about to do.
Big bang improvement projects without any clear and pragmatic iterative checkpoints get cancelled mid-execution.
Someone responsible for fixing some bugs in a system manages to convince leadership to rewrite the whole thing in a new language.
An overly complex technical solution is created to solve a simple problem. When you lift the hood, the code feels like the programmers had a fight with the product requirements and struggled with the tech.
None of those symptoms are bad on their own, but there’s an element of pure chance and lack of concrete data that makes these investments particularly risky.
If tech debt is about having inferior technology, tech gamble is about having too much of it.
Tech debt is reactive, whereas tech gamble is proactive.
Tech debt gives the engineers a bad conscience, whereas tech gamble is a sign of naïve ambition or malice.
Tech Gamble
Tech gamble is the practice of paying the price of a hypothetical future tech debt upfront without having the data or insight to back it up. It leads to waste (money, time, energy) and creates unnecessary friction for the product.
The defining feature of tech gamble is lack of information or wrong information which leads to inaccurate predictions and costly wrong decisions —hence the term gamble.
For example:
Engineers build a technical solution that can handle global scale before a startup has a product market fit.
Ivory tower Staff Engineer spends months writing a technical strategy instead of spending time to solve the immediate challenges that bring the company to bankruptcy.
Backend engineers build a sophisticated database architectures with replicas and multi-AZ where Google Spreadsheet would do
Your new colleague from a reputable company convinces you to invest in a Platform as soon as your total number of services goes above 2!
And my favorite: the platform team is building a commodity service that has nothing to do with the business model instead of buying it.
Sounds familiar? This phenomenon goes by many names: overengineering, premature optimization, resume-driven development, hype-driven development, gold-plating, cargo-culting, etc.
At its core, tech gamble is about taking the cost now (in terms of development time, energy, and money), hoping that it will lead to some savings in the future.
Sometimes tech gamble is painted as compliance to “best practices”. Often what’s considered best practice is just someone else’s opinion which is taken out of context claiming to solve more problems than it actually does.
Sometimes tech gamble is called a “bet”. It is honest about the risk and is often more cost-conscious but regardless of the term, the essence remains the same: to pay the price upfront hoping to get lucky with the return —hence the bet.
Sometimes tech gamble is sold as a technical investment to make us more confident. In reality there’s little to no data to support the claim that the investment can be realized.
When the future arrives, if the product is still alive, you often have to change it anyway because:
Newer tech has deprecated the old tech
The product requirement has evolved to levels that demand rethinking the tech solution
Ironically, shoehorning the initial investments to the changing requirements led to tech debt
The original team have left, and the new recruits can’t cope with the unnecessary complication (there’s a difference between complex and complicated)
In the meantime, you have to pay the higher cost of maintaining a system that is more expensive than it had to be. You had too much tech.
At the end of this article, I will touch upon how to measure the amount of tech and how to know you have too much. But first, a story.
The Story
Many years ago, I joined a team as a front-end developer.
The product I would be working on was a greenfield identify provider. It was the identification component for 160+ sites owned by the company.
In simple words, it was the log in screen!
There was already a brownfield product in place written in PHP. It was a bespoke solution that was completely developed in house roughly following the OAuth standard. I was told it had so much tech debt that it was “beyond repair”.
The majority of the old team had quit after some drama which led to the EM and product departure.
It was a bastard product but too important to kill.
Instead, the company decided to assemble a new team who would be tasked to create a completely new solution from the ground up. Then flip a switch and 🎉tada! We’d be live.
At least that was the plan!
On my first day, one of the senior engineers kindly threw two printed booklets on my desk and asked me to read them cover to cover. They were these two documents:
Their plan was to create a new bespoke identity provider fully compliant with the above standards. This time in Java!
The few remaining people from the old team were tasked to keep the lights on and run the PHP solution. They were not part of the new team. In some sense, the two teams were competing. They didn’t even talk to each other. Hello! Java vs PHP? Two different species. 😄
The new team was given complete autonomy. It was the brainchild of one of our directors of engineering. A young handsome Norwegian dude who had 6 years of Software Engineering Google under his belt which catapulted his job title to tech lead, VP of engineering, and now Director.
Back then “Google” still meant something. It hadn’t removed the “don’t be evil” motto or resorted to deceptive techniques to hide its incompetence in its own game: AI.
I liked the guy. He was full of energy. It was comforting to have someone from Google at the helm. Retrospectively, I think we all suffered a bit from a bit of halo effect.
I was the second front-end developer in the team. The first had a “senior” title but had a background in backend development and was new to the crazy land of the browsers.
That didn’t stop him from spending the majority of his time creating a browser logging pipeline using Amazon Kinesis and Firehose. If it sounds complicated, that’s because it is. This is the same problem that Sentry or Splunk had already solved. But who needs to buy, when you can DYI? Especially when there’s little supervision and no one to ask the right questions.
The PM was a project manager and primarily busy with Gantt charts and stakeholder management. As far as she was concerned, everybody was working full throttle to make the Google-guy’s dream a reality and kick the old team’s a**.
Did the company need to spend money building an in-house logging solution? This is the type of problem that Simon Wardly calls commodity. We didn’t even need one to be honest! He was treating the browser as a backend running inside a data center —which was understandable with his background.
Most of my needs during that phase could be easily addressed with local console logging.
Don’t solve a problem you don’t have.
6 months into the project, we hired a new head of product. A bit cocky but friendly tall guy. As part of his self-onboarding, he came all the way from the company HQ to Stockholm to see what everyone is up to.
When he approached where my team was sitting, he asked for a demo. I showed him what we got but honestly told him:
I’m not sure how this product came to be and when will we be able to switch the old one with this one.
This didn’t sound like new information to him. Without a flinch he asked:
Show me what you mean
By that time, I had spent more time checking out the old product than reading those RFCs! I was more curious to see what the end product would be measured against. I spent around 15 minutes showing him the control panel, SDK, technical documentation and other aspects of the old product. I could go for hours but he was in a rush to talk to other teams.
I didn’t know what to make of that visit, but a month later the new product was shut down. The old product was the center of attention again.
That ex-Google guy quit. A few key backend developers of the new product quit in protest. The Java code rested in peace in some GitHub repo for eternity. The rest of us were merged into what was remaining of the old team.
I learned a ton about maintaining and refactoring legacy systems. My respect for the team behind the old product grew a lot. Turns out, none of them were happy with a total rewrite but I guess the halo effect was just too strong! 😇
Sometimes I wonder:
Did I singlehandedly kill that product?
Nah! I think the new head of product had seen this kind of tech gamble before and didn’t have the appetite. Even if I did manage to kill the product by being transparent about the vanity of the rewrite, I probably saved the company millions of SEK.
I don’t know how the decision to rewrite started but I’d imagine it went something like this behind closed doors in the HQ:
“The identity platform is what connects the user data across all our brands, but most of the team quit” —Said a hypothetical senior director
After some awkward silence, some genius screamed:
How hard can that be? It’s just a login screen. Here’s my plan: we hire a new team and put them in charge of that product. In fact, I just know the guy. He’s from Google…
And after hiring the Google guy, he tried to look into the code and said:
🤮 Get that PHP away! I can’t even read it. We’re gonna rewrite in Java.
The initial plan changed from maintaining the system and fixing bugs to a total rewrite! A few meetings later, he was given 6 months to solve this problem.
Challenge accepted!
He assembled a new team of rockstar completely composed of new developers. No one talked to the old team. As far as the company was concerned, it was a dead product at the end of its lifecycle.
The new team did the best it could, but 6 months was too little and without proper product management, it already burned too much time playing with Kinesis and whatnot.
Did the company gamble? Yes. But this was far from the only case. There were multiple other rewrites. There were too many new people with too many new ideas to shine.
Eventually the company faced serious challenges and broke in two halves (I’ve written about that period here). Last time I checked another company was devouring what was left of it. Sad, but if you’ve read the end of my last post, you know that the world of business is no joke and gambling is certainly not risk free!
👉Share or discuss on Hacker News
Tech debt vs tech gamble
Let’s compare some aspects to distinguish tech debt from tech gamble
Payment intent
Tech Debt: postponed to the future
Tech Gamble: paid in advance hoping for a return in the future
Justification
Tech Debt: tactical: “we need to cut some corners to meet the deadline with the current bandwidth” (deadline is often made up)
Tech Gamble: strategic: “if we do this now, we’re going to be future-proof” (future state is often a subjective guess)
Symptom
Tech Debt: the capabilities of the technical solution lags behind the product requirements
Tech Gamble: the product requirements lag behind what the technical solution is capable of
Root cause
Tech Debt: poor engineering, ambitious product management
Tech Gamble: ambitious engineering, poor product management
Mitigation
Tech Debt: start a payment plan
Tech Gamble: start simplification and removing cruft.
These posts take a few hours to draft, edit, illustrate, and publish. My monetization strategy is to give away most of the content for free because I believe it helps the community. For those who spare a few bucks to support the time and energy I put into this, the following section is a token of appreciation. A paid subscription also gives you access to my WIP book Reliability Engineering Mindset. Right now, you can get 20% off via this link.
Keep reading with a 7-day free trial
Subscribe to Alex Ewerlöf Notes to keep reading this post and get 7 days of free access to the full post archives.