Technical Debt: The Myth at the Heart of the IT Labyrinth
Or, Ben-Hur in the C-Suites
*Swwwwwit*
The crack of the whip hitting the developer’s back echoes through the meeting room, icy and wet. The senior Java engineer grits his teeth. He knows that he must go through this, that it’s the only way to get the credits he needs to finally get rid of this junky authentication micro-service using TLS 1.1 whose security flaws have gone far beyond the ranks of traveling circus to join the halcyon of P.T. Barnum and Harry Houdini.
He moans, “It’s way past time for us to finally pay our Technical Debt.” The cord, already dripping with blood, rise again to hang over the engineer, his tired back falling under the bored eyes of the Financial Director… Will this meeting finally give him the three days of development time the team needs to finally get rid of the infectious wart lodged within their system?
Feeling a bit of déjà-vu? Old memories of the technical team flagellating themselves in front of the executive committee, confessing the sin of technical debt in order to earn themselves a bit of time?
Let’s talk about the real sinners in a conversation about how to finance technical development:
Software needs to make a real return on investment.
The accountants have done a large part of the damage on their own. [1]. Notably, by letting this idea of “technical debt” take over, an idea that doesn’t, in my opinion, correspond to reality. So instead let’s talk seriously about the characteristics and consequences of technical investment. Let’s knock off, for good, the urban legend and say:
Technical debt doesn’t exist.
First of all, it doesn’t exist because a debt is something that is contractually owed to someone, a creditor, and in this case there’s no one who’s getting a loan. If there’s no creditor and no debtor, well, the name just doesn’t make any sense.
Deleting code is an investment
Code isn’t a material good; it’s not a building that we can depreciate. Yes, I know, it’s not cool to work on things that aren’t susceptible to fancy financial maneuvers, but that’s how it is.
Writing a line of code is taking a risk in order to provide a service — ideally a service that’s faster, better, and costs less. In that context, we don’t really pay too much attention to the aesthetics of it all: having a section of the tubing, a section that’s absolutely vital, quickly installed will always be preferable to thoughtfully considering all your different tubing options as the water keeps gushing out onto the floor.
The best thing that can happen to a coder, and to a project, is having that blessed moment when they DELETE code. Because then that code is no longer needed, because we’ve just replaced it with something much better (i.e., by much less code).
And let’s not forget that each line you eliminate will save you some cash and bazillions in efficiency.
The accounting approach to depreciation over X number of years is thus completely inadequate when it comes to talking about code. Throwing out code that is no longer needed, because it was written at a certain moment to respond to a certain need, and taking the time to rewrite it because it’s no longer responding to that need, is a good investment.
This all leads to other phrases that we hear all the time, and that I’d like to also banish from the mouths and ears of all coders and their managers. For example, “We’re just creating more technical debt *sigh*”, or “There’s going to come a day when we have to pay our technical debt!” Most of the time, they’re said in response to pressure from the C-Suites, asking engineers to decide among different priorities.
But this mental pressure around the myth of technical debt doesn’t have any real benefit for the business; on the contrary, it’s extremely harmful.
Knocking ourselves for having created technical debt and living in sin does absolutely no good for anyone. It just puts more stress on teams and individuals who already have more than enough of that. It’s much healthier to move forward while accepting the controlled risk of technical investment.
Another way of thinking about technical investment comes from actually knowing the profession that you’re coding in, knowledge that takes a long time to acquire. It’s about investing in the future ability of a team to solve the problem better, once that team has the specific knowledge needed to understand that particular problem.
A good coder in the banking industry is going to first need to spend time understanding the job, and so they’d become a banker who knows how to code. It’ll probably take them 2 years, and their value will then have a whole new dimension that’s not only linked to the code that they’ll have produced. And at that point, they’ll probably re-write that code from zero, much more efficiently, and producing via iteration a much better solution that better fits the banking industry. If they end up being an expert in the domain, able to quickly and efficiently explain exactly what the industry does, that’s close to sublime.
We’ve seen this kind of thing happen at Clever Cloud, a few times even, like for our monitoring system, which became Clever Cloud Metrics, or our load balancing system, which was rewritten in two days after the v1 took 8 months (and was buggy to boot, but we’d approached it all wrong from the beginning). They’re examples that really make me feel sure when I say that the real value is found in the team.
That’s also what lets you drive forward tech project design today, especially in terms of user experience — it’s how you are better informed earlier about investment decisions, favoring rapid prototyping and iteration that help limit risk.
A moralizing approach to technical debt would logically push you to fire a team that failed the first time when faced with an industry- or client-specific problem. But the experience acquired by that team would also probably push a decider who had donned the glasses of technical investment to double down: the risk of a 2nd failure is much lower with that team, all of whom have learned about both themselves and the industry they’re working in over the course of that initial failure. They’ll do better and likely succeed, as they now know all the pitfalls linked to the activity they’re trying to improve.
We invest in code like it’s real estate (kind of)
In reality, there is no technical debt, just investment decisions that are aimed at short-, medium-, or long-term objectives. Let’s break it down a bit:
Being CTO means making decisions that let you answer 3 types of simple questions: “Where are we sleeping tonight?”, “Where are we going to sleep in 5 years?”, and “What kind of real estate portfolio am I going to leave to my children?”
A little discount Quechua(™) tent for two, the most rustic of all shelters, one that goes up real fast… well, that can be very useful, especially when it’s almost midnight. That’s the kind of prototype that’s pretty ugly, but serves as a proof-of-concept to get you some financing or otherwise generate that early bit of cash that can turn a “project” into a real product that you show to clients (an MVP for a startup, a POC in a corporation, something thrown together at a startup weekend…) A more solid long-term investment, such as a Parisian apartment in a nice Haussmannien building, takes more time to create — it’s not the kind of thing that goes up in one night. To get through the question of where you’re sleeping tonight, you buy the little tent, which you’ll end up selling (for a loss) to some other lost soul or just tossing out when it’s completely used up.
When you are ready for a whole building, you don’t build it on the same foundations where you put your tent: many have tried, but it always turns into a tent city, and the user experience goes from bad to worse (not recommended).
And a natural evolution in your development from tent to beautiful building will probably be in putting together a nice modular home, where you can live as the walls go up on your final home.
It’s easy to identify software, or bits of software, that have tons of long-term value: GCC, Linux, GlibC (all of GNU, for example), JVM, ffmepg… And we can well assume that Photoshop still has quite a few lines that were there at the beginning. DHH spoke recently about an old script, more than 10 years old, that’s still in production at Basecamp, buried somewhere in a minor element like “billing clients”.
In our own humble experience, I can easily find some bits from the Clever Cloud console that were coded 9 years ago while sitting next to a pool in Italy, and others that (aside from some small updates) haven’t changed in quite a long time (with 400 micro-services, I figure it’s normal). But that doesn’t mean they’re going to stay there: almost all of them are on the roadmap to be decommissioned, which doesn’t happen by simply iterating on them, but instead with a complete rewrite.
Note: Here I’m talking about not making any iterative updates in the code EXCEPT the regular updates. To keep the housing metaphor going, your tent or modular home, just like your beautiful Haussmannian apartment, sometimes need a cleaning. You have to take care of your home, making sure that it’s habitable, otherwise it’ll become unlivable and unuseable. Code is the same: we at Clever Cloud do regular cleanings and updates, at least for security and key features, at least monthly on all projects.
There is also one particular case: “migration” software, written over the course of several days but only executed once, during The Migration. Often considered as an investment loss, it’s overlooked, under-financed, and generally disrespected when it comes to resource allocation — but in fact, it’s a key part of the system. A migration that goes wrong, losing data or causing churn (with up to 10–15% of clients who leave), or that simply drags out with updates and corrections that take weeks or months (taking even more time from the dev team), is a giant source of hidden losses. If you judge software by its cost per use and try to apply that to a migration, you’re using the wrong metric.
Instead, look at how a development hour is related to the revenue that it safeguards, generates, or saves. And that one’s a brief, but very profitable, entry.
The CTO, managing your investment portfolio
Just like a beautiful Parisian apartment building will lose a great deal of value if there are no investments in regular maintenance, you also need to maintain a tech product so that it keeps up (or improves) its condition.
Closing a previous line of investment, which is to say eliminating code, should bring a light-hearted sense of gain, as doing so is extremely profitable.
On the occasions when you’re losing something, it’s even more important to close things off quickly: an investment that’s going wrong becomes harder and harder to make up for. If we know that it’s ok to make investment mistakes, it’s not ok to be stubborn just because no one wants to be the one to shut things down and take the loss. Our first primary database for Clever Cloud was a Riak. Pretty quickly, we figured out that PostgreSQL was better (for our use case — we’re not evaluating the technology in any absolute sense here), and so we didn’t hesitate to throw everything out and start writing again from zero.
It was a healthy technical investment decision. You can’t just buy more shares of Google to make up for other portfolio mistakes that are pulling down the rest of a balanced portfolio.
Note that technical investment decisions aren’t just about architecture or code; they can also be about infrastructure. Essentially, many, many so-called “devops” technologies are based on a particular vision of scripting, of a “binary” vs. “semantic” approach (note to self, do an article about all that). These are always super-short-term investment decisions, even if they seem to be more than that because we decided to use those complicated names. It’s a bit like if we had sealed our bed in the tent onto the big building’s foundations, so that once the building’s actually done we can’t move it wherever we want.
There’s a relatively common atavism out there, poisoning developers’ code with constraints arising from production that are more or less pertinent: technological limitations, the need to deploy with chef/puppet/docker/capistrano/jenkins, a system of secret abscons or improbable network settings that make the application completely inoperable without a specific topology… It’s important to understand that everything dealing with infrastructure and running are important but oftentimes short-term investments (assuming that isn’t your core value add). Do note, I’m talking here about tooling in production, not infrastructure purchasing, which is oftentimes a sound financial decision (you do know that we sell Clever Cloud on-premise, yes?).
In brief (yeah, yeah),
To me, technical debt seems like a bad way to look at IT financing and investment, since it leaves the idea that every bit of software has within it a certain debt carelessly placed there by the team. It’s not true. Every software investment, just like other industrial investments such as in machines or buildings, needs to be seen as an investment line where we calculate a real-time return on investment (ROI, if you’re feeling saucy), that includes the cost of acquisition (the initial purchase price, the internal conception and writing of code; this is amortized more or less quickly) and the maintenance cost (which is unavoidable, as otherwise we lower the total value of the investment).
A good investment comes when the (cumulative) revenues are more than the (cumulative) costs, and it’s not a bad thing to think about decommissioning code when the amortization has been achieved.
And let us pass pudically by those cases where the absence of a full measuring of costs and revenues blocks healthy technical investment decision-making.
Any negligence in the maintenance costs will bring on later costs to get things moving again (often referred to as technical debt), and that’s why you always have to keep looking forward — as the saying goes, “every line of code that’s in production is a line of code that has to be maintained.”
Finally, like with all investments, sometimes we make a bad one, and the only thing to do is to accept it and cross that line off the list. It doesn’t do you any good to hold onto a bunch of Thomas Cook stock, so focus your attention and energy on a more pertinent issue.
Why is this all so important? Because it completely changes the approach to technical development, it stops looking at code as something carved in stone, it stops seeing something eternal or marked with the red brush of “technical debt”. If you are demanding such a folly, you’re demanding that every technical manager be able to predict the future perfectly, rather than making investments that are more or less risky and that are adapted to the situation.
Doing so would be to deny the fact that this is an industry, that our work is to conceive tools that are able to substitute the human brain, and that this work includes the maintenance, evolution, decommissioning, and recycling of those tools.
A hearty thanks to Yann Heurtaux for his help in writing the French version of this post, to Kyle Hall for translating it, and to Yoann Grange for his ideas and graphics.
[1] I’ve talked on stage many times about this. See e.g. “Why and how bookkeepers f***d up IT” — Devops Days 2016, to be shared extensively with all our spreadsheets & powerpoint fans.
Quentin ADAM is Clever Cloud’s CEO.
You Write Code.
We Run It.
Clever Cloud is an IT Automation platform. We manage all the ops work while you focus on your business value.