Tuesday, October 1, 2019

The Single Worst Mistake any Software Product Based Company Can Make

I am a long time follower and reader of Joel Spolsky. Way way back in 2000, he penned what has become a seminal blog post.

Things You Should Never Do, Part 1

It should be required reading for all leadership teams in all software companies, especially the small ones, because those are the ones most likely to commit Hari-Kari in the way described in the above post. I will quote it.

We’re programmers. Programmers are, in their hearts, architects, and the first thing they want to do when they get to a site is to bulldoze the place flat and build something grand. We’re not excited by incremental renovation: tinkering, improving, planting flower beds.There’s a subtle reason that programmers always want to throw away the code and start over. The reason is that they think the old code is a mess. And here is the interesting observation: they are probably wrong. The reason that they think the old code is a mess is because of a cardinal, fundamental law of programming:It’s harder to read code than to write it.This is why code reuse is so hard. This is why everybody on your team has a different function they like to use for splitting strings into arrays of strings. They write their own function because it’s easier and more fun than figuring out how the old function works.

Here's something I've seen happen time and time again in my career.

Manager Dave:   Steve, why did we ship version 3.2 and break all our customer's ability to do business again? I thought once we hired QA people that would stop happening.
Team Lead Steve (haggard, has had 3 hours sleep):   Um, Dave, regressions in 3.2 were caused by the requirement in that release to rewrite a core module of our system to please one of our customers, that broke all three hundred installed customers.
Manager Dave:  Can you explain to me why that is?
Team Lead Steve:  Goes on to explain the concept of technical debt, makes reference to configuration spaces (parameter 310 is on, parameter 311 is off, parameter 312 is set to X, parameter 313 is set to Y).
Manager Dave (cradling his head in his hands): So what do we do, Steve?
Team Lead Steve:  Well we could rewrite everything get rid of all the technical debt, refuse to add parameters to turn features on and off in 900 ways.

In my view, the thing that saves people from making the single worst mistake (the ground up rewrite, discussed in the blog post from Joel Spolsky) is that most people realize the economic disaster before they commit all their resources to it.  Wait, we can't ship any minor updates to our customers for THREE YEARS? We won't HAVE any customers in three years.

Another way the rewrite conversation comes up is when it's time to grow your team from four delphi devs to to five, and you can't find one that will move to Spooksville Indiana, and join your local team, and you aren't going to add remote team members. If you moved to C#, and rewrote, all your problems will be over, there are reams of unemployed C# developers everywhere, and you can probably grow your team to 20 people no problem, you just need to rewrite 9 million lines of code in C# that are currently in Delphi. 

There are still teams running on Delphi 7 and Delphi 5 because they can't figure out the complexities of moving up to modern unicode Delphi  who think they can't manage the unicode transition, but they can successfully rewrite everything in C#.

Note that there ARE times when I think you can rewrite, ground up, same language or different, same database or different. They are when all the following stars align:

1. When you can keep your existing team working on your classic product, and start a parallel effort,. and afford to develop both in parallel for the years to a decade it will take before the new product can replace the old.

2. When you don't just think you know your requirements, your designs and your use cases but you actually do know them.  Most of the time you have a confirmation bias telling you that you know this stuff, but guess what, you're probably wrong here.

The next thing, if the economics don't kill you, in a rewrite, that's gonna kill you is second system effect. Second System Effect is a term coined by Fred Brooks and is the title of one of the essays/chapters in the seminal book "The Mythical Man Month" that every senior developer or software team leader/manager should read.

Joel says the same thing thus:

It’s important to remember that when you start from scratch there is absolutely no reason to believe that you are going to do a better job than you did the first time. First of all, you probably don’t even have the same programming team that worked on version one, so you don’t actually have “more experience”. You’re just going to make most of the old mistakes again, and introduce some new problems that weren’t in the original version.

I have noticed that old code is scarred with bug fixes. And we don't like the look of it we wish it looked clean and rather like pseudo-code. The thing is the real world has lock conflicts, network timeouts, retries,  user errors, windows defender locking and even altering your on-disk files, disk performance and corruption issues, video card bugs,  USB driver glitches.   Need I go on?

The real world is a mess, and if your product works at all for your customers, it's amazing that you got that far. Don't blow it now.  Fix your bugs, and clean your messes up. And you better stop digging new holes (increasing technical debt) before you can expect to see the old technical debts get paid off.

One of the leading causes of technical debt is crazy features for ONE customer.   You could rethink when and where you add custom hacks for one customer who is loud and persistent, and perhaps could accomplish their goals another way without breaking your product in half.

Thinking carefully about Risk management, and careful heads-up planning is necessary to avoid killing your software products or your whole software company.

Don't rewrite your code ground up.