Tuesday, February 09, 2021

When is a rewrite the right answer?

Whether a rewrite is the right answer is a perennial topic among developers. It's a natural instinct -- my thoughts are usually something along the lines of "I could write this so much better if I just started again". The difficulty of working with someone else's unfamiliar architecture, written in a way that probably doesn't come naturally to someone who dives into the middle of it and tries to change something, can be quite overwhelming; we imagine an alternate universe where the code is designed exactly the way we are used to thinking, and in that dreamland we effortlessly make the change we want to make and move on almost instantly.

The downside of a rewrite is all the details. It takes a huge amount of effort to iron out all the little things that make up the last 90% of the project. We don't think about this when fantasizing about a rewrite, because detailed estimates require breaking down the problem quite a lot and that is effort in itself.

None of this is new; it has been discussed repeatedly. c2's take at https://wiki.c2.com/?RewriteCodeFromScratch lists some reasons for and against a rewrite. The bad reasons include "We don't want to read it to figure out how it works" and "Not Invented Here". Joel on Software's most famous article is called Things You Should Never Do, and discusses Netscape's decision to rewrite (ultimately resulting in Internet Explorer taking over).

As in all things software, however, the true answer is "it depends". Sometimes, rewrites can work. Herb Caudill starts from the Netscape example but adds five other case studies at https://medium.com/@herbcaudill/lessons-from-6-software-rewrite-stories-635e4c8f7c22 and concludes that throwing away your current product is a waste of value but you can still innovate by building something else next to it.

The BM Project

In our case, there are different considerations again. Our BM project started as a C# console application that ran on user's machines. This version was never released because we were nervous about maintaining it and the interfaces it used. It was built as insurance while we worked on the web-hosted version, which consisted largely of c# code from the console application ported to run in AWS Lambda. C# was chosen because that is the language the original team was familiar with.

A few months later, after the webapp had been successful, the product owner asked us to integrate the webapp into a larger product called XP. BM was using XP as an endpoint anyway; it made sense for users to be able to access it without having to type in their XP password into a strange app.

This is where we should have done the rewrite. The new team had already written three microfrontends with Python Lambda backends, and was comfortable with that technology stack. We ended up having to remove about half of the application (as XP already had an ordering system that the standalone app had to manager) and make significant adjustments to the other half of the server-side code.

Instead of doing a rewrite, we listened to the conventional wisdom and attempted to make adjustments to the existing codebase. The problems we decided to live with were:

  • c# lambdas have (or had at the time) terrible cold-start characteristics; the most underpowered (128MB) lambdas time out before the C# runtime comes online for cold starts
  • each endpoint of the API was coded as a separate lambda. As we added endpoints, we also had to add terraform configuration for each one, which was about 30 lines of boilerplate that was prone to error; each deployment for developer test environments took about ten seconds per lambda so this also blew out our coding cycle time by a few minutes
  • c# code is unfamiliar to the new team so we spent a fair bit of time learning a new language. the language itself wasn't a problem so much as the tooling -- getting docker running to compile the project in dotnetcore, figuring out how to update dependencies on linux, and various other minor issues that almost always come with a new technology stack
  • the c# code was never really tidied up since the BM project started, and used a little too much inheritance for things to be really clear. It hadn't got as far as full spaghetti (the limited scope of the project saved us there) but it took a bit of effort to sort out neat logging and authentication patterns that were watertight.

In the end, the project took us about three months. We hadn't anticipated it taking that long, and if we had known how fiddly the C# was going to be we would almost certainly have chosen to clone one of the existing python projects and translate the c# api handlers into python.

As it stands, such work is now on the backlog, and is made more difficult by the number of handlers we ended up adding to the C#. The problem has become particularly acute as maintainers from other teams are starting to get involved with the BM project, and are having to figure out the docker build environment, deploying many lambdas to AWS at once, and learning the syntax and tooling around C# itself.

There are new types of actions planned for XP, and the project for those will hopefully start by translating the existing BM handlers (and their tests!) into python, for the sanity of future maintainers.