Tuesday, February 09, 2021

When is a rewrite the right answer?

Whether a rewrite is the right answer is a perennial topic among developers. It's a natural instinct -- my thoughts are usually something along the lines of "I could write this so much better if I just started again". The difficulty of working with someone else's unfamiliar architecture, written in a way that probably doesn't come naturally to someone who dives into the middle of it and tries to change something, can be quite overwhelming; we imagine an alternate universe where the code is designed exactly the way we are used to thinking, and in that dreamland we effortlessly make the change we want to make and move on almost instantly.

The downside of a rewrite is all the details. It takes a huge amount of effort to iron out all the little things that make up the last 90% of the project. We don't think about this when fantasizing about a rewrite, because detailed estimates require breaking down the problem quite a lot and that is effort in itself.

None of this is new; it has been discussed repeatedly. c2's take at https://wiki.c2.com/?RewriteCodeFromScratch lists some reasons for and against a rewrite. The bad reasons include "We don't want to read it to figure out how it works" and "Not Invented Here". Joel on Software's most famous article is called Things You Should Never Do, and discusses Netscape's decision to rewrite (ultimately resulting in Internet Explorer taking over).

As in all things software, however, the true answer is "it depends". Sometimes, rewrites can work. Herb Caudill starts from the Netscape example but adds five other case studies at https://medium.com/@herbcaudill/lessons-from-6-software-rewrite-stories-635e4c8f7c22 and concludes that throwing away your current product is a waste of value but you can still innovate by building something else next to it.

The BM Project

In our case, there are different considerations again. Our BM project started as a C# console application that ran on user's machines. This version was never released because we were nervous about maintaining it and the interfaces it used. It was built as insurance while we worked on the web-hosted version, which consisted largely of c# code from the console application ported to run in AWS Lambda. C# was chosen because that is the language the original team was familiar with.

A few months later, after the webapp had been successful, the product owner asked us to integrate the webapp into a larger product called XP. BM was using XP as an endpoint anyway; it made sense for users to be able to access it without having to type in their XP password into a strange app.

This is where we should have done the rewrite. The new team had already written three microfrontends with Python Lambda backends, and was comfortable with that technology stack. We ended up having to remove about half of the application (as XP already had an ordering system that the standalone app had to manager) and make significant adjustments to the other half of the server-side code.

Instead of doing a rewrite, we listened to the conventional wisdom and attempted to make adjustments to the existing codebase. The problems we decided to live with were:

  • c# lambdas have (or had at the time) terrible cold-start characteristics; the most underpowered (128MB) lambdas time out before the C# runtime comes online for cold starts
  • each endpoint of the API was coded as a separate lambda. As we added endpoints, we also had to add terraform configuration for each one, which was about 30 lines of boilerplate that was prone to error; each deployment for developer test environments took about ten seconds per lambda so this also blew out our coding cycle time by a few minutes
  • c# code is unfamiliar to the new team so we spent a fair bit of time learning a new language. the language itself wasn't a problem so much as the tooling -- getting docker running to compile the project in dotnetcore, figuring out how to update dependencies on linux, and various other minor issues that almost always come with a new technology stack
  • the c# code was never really tidied up since the BM project started, and used a little too much inheritance for things to be really clear. It hadn't got as far as full spaghetti (the limited scope of the project saved us there) but it took a bit of effort to sort out neat logging and authentication patterns that were watertight.

In the end, the project took us about three months. We hadn't anticipated it taking that long, and if we had known how fiddly the C# was going to be we would almost certainly have chosen to clone one of the existing python projects and translate the c# api handlers into python.

As it stands, such work is now on the backlog, and is made more difficult by the number of handlers we ended up adding to the C#. The problem has become particularly acute as maintainers from other teams are starting to get involved with the BM project, and are having to figure out the docker build environment, deploying many lambdas to AWS at once, and learning the syntax and tooling around C# itself.

There are new types of actions planned for XP, and the project for those will hopefully start by translating the existing BM handlers (and their tests!) into python, for the sanity of future maintainers.

Monday, July 20, 2020

web auth by mistake

A brief and inaccurate history of application security, for learning purposes.

Facebook and Google's Gmail are used as example client and server respectively, but many companies both serve and consume OAuth.
  1. In the beginning, there were usernames. You connected to a computer (either with a keyboard or a network) and this created a session, where the computer (server) sets aside some working memory for your use. Unfortunately, once someone found out your username, they could do things you didn't want them to do. So a public identifier needs an associated secret (a password).
  2. Unfortunately, password databases were leaked, and the passwords stolen. So hashing was introduced, where even if the database leaked, the passwords could not be known. (A one-way hash function takes a password and reliably produces the same "hash"; there's no easy way to go the other way, from a hash to a password. You can hash the user's input, and if it is the same as the stored hash, it must have been the right password).
  3. Unfortunately, rainbow tables were invented, a huge list of all possible passwords and their hashes. So salts were used, where extra data was combined with passwords before hashing to make rainbow tables not work.
  4. The internet was invented. Users logged in with passwords to create sessions. Unfortunately, Internet Service Providers and other intermediate networks proved untrustworthy, so HTTPS was introduced. Servers were required to identify themselves with a certificate signed by a trusted company such as Thawte or Verisign, and the connection was encrypted. Browsers are distributed with all the certificates of the trusted companies, known as Certfication Authorities; a signature made with the CA's private key (known only to a very carefully guarded server) can be verified using their public key.
  5. Unfortunately, users didn't like typing their password into every page. Later, dodgy javascript would also fake login pages. A session identifier needed to be stored in the browser so that the user didn't have to type their password into every page. Cookies were invented to store the session identifier.
  6. Javascript was invented, and was able to read the cookies. Unfortunately, third-party javascript on web pages started stealing cookies.
  7. Secure cookies were invented that were not readable by JavaScript and were only allowed to be sent to the server that created them. This was good.
  8. Unfortunately, sites such as Facebook wanted to read data from sites such as Gmail. They asked for users' passwords, and unfortunately some sites stole the users' accounts. This created a lot of work for the gmail account recovery team (and other problems). So the idea of access tokens was invented, where the user could give Facebook a Gmail access token instead of their Google password. An example access token might be FACEMAILACCESS303.
  9. Unfortunately, some sites did dodgy things with their access tokens. So the idea of scopes was invented, where access tokens have restricted permissions. So, for example, the user could give Facebook an access token that can read Gmail contacts but not delete (Myspace) emails.
  10. Unfortunately, users didn't like manually issuing access tokens. So the idea of browser redirects with codes was invented, where Facebook would send the user to Google and then Google would ask the user if they wanted to allow access and then redirect back to Facebook with the access token in the url.
  11. Unfortunately, sites such as Fakebook would pretend to be Facebook and would ask to be redirected back to them instead. So OAuth was invented, where sites would register for OAuth and get a client ID, such as FACE01, and that client ID would only work for certain redirect URLs (www.facebook.com).
  12. Unfortunately, Fakebook could use Facebook's client ID. So the idea of a client secret was invented, which meant that access tokens would only work if the client secret was also provided, and Facebook would only use the secret to prove its identity to Google (or similar). The client secret is Facebook's Google password, for example, FACESECRET.
  13. Unfortunately, it was not safe to expose the client secret to the browser. So the idea of an authorization code was invented, which could only be used a single time to create a access token. The way this works is that Facebook asks the user to go to Google to create an authorization code (such as FACEAUTH303) for the Facebook client ID (FACE01), then Facebook is given that code and uses it with its secret to create a Google access token: sending FACEAUTH303 (FACESECRET) to Google might yield the access token FACEMAILACCESS303.
  14. Unfortunately, lots of sites leaked their access tokens. So the idea of a refresh token was introduced, so that losing an access token was less of a problem. The refresh token can only be used to create access tokens, and is not used to make normal requests, so it is less exposed. The access tokens that leak can then expire after a few minutes or hours, and the refresh token is used to create more access tokens without having to prompt the user again.
  15. Then, mobile apps were invented. They received their authorization codes via redirects between apps, which were unfortunately controlled by the apps themselves (not the server providing the redirect url). So the client secret couldn't help, because it would leak the first time it was used. Instead, the idea of PKCE ("Pixie") was introduced, where a new secret would be different for every authorization code -- it would be generated by Facebook, then the hash of it sent, then when Facebook wanted to redeem its authorization code, they would provide the unhashed PKCE code. This makes intercepting authorization codes on their way back to the requester pointless.
  16. Unfortunately, our story stops there because we've run out of history. Welcome to the present!

Takeaways

  1. Use secrets to prove identity so that you don't have to worry about fraudulent connections.
  2. Use https everywhere so you don't have to worry about insecure networks
  3. Only send client secrets to the site that issued them so that you don't have to worry about them leaking.
  4. Only store hashes of salted passwords so that read access to the password database doesn't have to be protected.
  5. Protect secrets travelling through insecure clients (browsers or apps) so that you don't have to worry about them leaking.
  6. Keep tokens away from JavaScript so you don't have to worry about XSS attacks.
  7. There seems to be some overlap between authorization codes and refresh tokens. They are both used to protect access tokens. Both of them are (or can be) single-use-only. Both of them (or can be made to) only work if presented with the client secret. Confusingly, authorization codes protect refresh tokens which protect access tokens. Authorization codes seem to be more about prompting the user and validating client secrets, whereas refresh tokens are just about short-lifing access tokens. But some flows will be able to combine these into a single token (e.g. refresh tokens requiring client secrets to work).
  8. Use refresh tokens so you can worry less about untrustworthy browsers and apps leaking client secrets and access tokens
  9. Use authorization codes so you know the user has been asked whether they want to grant a permission
  10. Use PKCE so that you don't have to worry about the redirect process being insecure
  11. OAuth allows websites to exchange fine-grained user permissions. For example, it can allow Facebook to read your Gmail contact list but not read your emails.
  12. Prevent refresh tokens from getting stolen and reused by making them single-use. If a refresh token does get stolen and reused, make the user authorize again and stop the stolen one (and possibly its access tokens) from working. For example, if Facebook asked the user to create a second Google refresh token (via an authorization code), the first one would stop working. (The user is be triggered to authorize again because the thief has made the user's copy of the refresh token invalid -- when the thief creates an access token, the server also issues an updated refresh token)

Saturday, June 27, 2020

Emotions are like event loops

Emotions are the human psychology equivalent of a computer program's main event loop.

Emotions and feelings have causes. A person experiences things in a rough analogy with how programs receive input or internal messages. The emotions then drive behaviour in accordance with what the person has been programmed to do by past experience.

Sometimes -- although not very often, as it is extremely expensive in terms of CPU time -- the rational mind gets involved and has a small amount of input to the course-of-action decision.

Most of the time, the event loop runs efficiently, relying on emotions -- simple heuristics that can process events quickly and reasonably accurately.

The rational mind is only engaged -- at great expense -- when the emotional mind -- that is in the driver's seat and ultimately dictates actions -- decides it is worthwhile to do so.

Friday, November 29, 2019

What do you look for in a code review?

I have thought about this question a lot. I think the ultimate answer is "non-functional requirements": simplicity, maintainability, security, performance, test coverage (not the number; how many of the functional requirements are encoded into tests).

This comes from the point of code review: to make the software we are working on more useful.
There is a big list of non-functional requirements on Wikipedia; I will cover the ones I think are most important below. I'm biased toward maintainability because I've spent many years maintaining other people's code.

Simplicity

For me, this is the most important thing. Software dies when it becomes too complex. Development slows to a crawl when it is not possible to keep the important architectural components in your head. Changing anything becomes a process of trial-and-error which takes forever, and usually makes things even more complicated.
So, it is important to be constantly vigilant about simplicity. Sometimes that means saying no to a new requirement. Sometimes it means using a library instead of writing new code. Sometimes it means NOT using a library to avoid the complexity of having to manage that dependency. Often, none of the above are possible and the best you can do is refactor.
Simplicity applies at all levels. Requirements, architecture, module-level design, and individual lines of code. Even the size of an individual pull request. Keep them all as simple as possible. Jargon often used on this topic is "domain boundaries" or "high-cohesion/low-coupling" or "cyclomatic complexity" or "cognitive complexity".
Red flags for simplicity are:
  1. singletons/globals/side-effect-heavy code
  2. long, generic names
  3. many lines of code in one function
  4. unnecessary abstraction (also known as "overengineering")
  5. "surprises" or "magic" in the code
  6. Flag arguments (boolean parameters that switch behaviour)
  7. Inheritance heirarchies
  8. Mutexes
"Keep it simple" is my #1 thing for code reviews.

Test Coverage

When looking at test coverage, I will try to work out how many of the edge cases in the requirements for the change have been encoded into tests. For example, if the change is to send an email when an order is complete, there should be a happy-path test to make sure that whatever email interface is getting hit, and that the request has some of the strings the email is expected to contain. If the change is to process multi-leg orders, the tests should demonstrate how the obvious two orders get processed, but also that mixing a buy and sell leg is OK and that having three buy legs works, and maybe even that a leg with zero quantity or negative price is handled correctly.
Having tests like this makes the software maintainable. A new maintainer can come along and re-discover all the requirements that have been built into the program by reading the tests. There is no easier way for a programmer to learn the requirements (apart from having been there when the requirements were determined).
This allows new maintainers to understand the program -- and this allows them to keep their changes simple.
For a discussion of the other kind of tests ("don't test the glue"), see https://stackoverflow.com/q/12860657. Aiming for 95% test coverage (or whatever) almost always leads to writing useless tests that become a maintenance burden.

Readability

Going hand-in-hand with reading the tests, maintainers will spend a lot of time trying to figure out the code itself. There are a few things that help a lot here.
  1. Comments that explain WHY. How, what, etc. I can get from reading the code. Hopefully, the imporant part for understanding -- the why -- comes from reading the tests; sometimes it doesn't.
  2. Lots of descriptive names -- variables, functions, etc. Don't use generic words ("-Manager" is my pet hate), they don't help. Be specific.
  3. Auto-formatters. All the big languages have this now. Stop wasting your time by not using them. Black, prettier, clang-format, go-fmt, whatever.

Teamwork

The best code review is one you do together. It's not worth trying to have difficult conversations through text when you can talk to someone directly.
If you're doing a code review and find something that's not a super-quick fix, it's almost always better to discuss it in person.
For everyone else's benefit, it is generous to record the outcome of the discussion in the review.

Friday, November 22, 2019

Tuesday, August 06, 2019

What is CORS?

CORS is Cross-Origin Request Security.
You can set up CORS for your site by:
  1. responding appropriately to OPTIONS requests (see the access-control- headers above), and
  2. adding the access-control-allow-origin header to your normal HTTP method responses (GETPOST etc.) as well.

Why is CORS a thing?

In the beginning, javascript was only allowed to make requests to the same server as that bit of javascript came from. This was called the Same-Origin policy. Without the Same Origin Policy, sites would be able to make requests as each other: Mallory's website would be able to call Alice's servers, and the browser would add Alice's authentication cookie to the request.
However, that policy proved restrictive, so CORS was added to allow websites to permit requests from other origins.

How it works

  1. Some javascript tries to send a request to a different domain than where that javascript came from
    // alice.com
    fetch('alice-api.com/add_user', {method: 'post'})
    
  2. The browser does the Same-Origin check and enforces the CORS policy:
    1. The browser sends the "preflight" request. This is an request-method=OPTIONS request to the url (in this case, alice-api.com/add_user):
          Host: alice-api.com
          User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0
          Access-Control-Request-Method: POST
          Access-Control-Request-Headers: content-type
          Referer: https://alice.com
          Origin: https://alice.com
          Connection: keep-alive
      
    2. The different domain (in this case, alice-api.com) responds:
          HTTP/2.0 200 OK
          date: Tue, 06 Aug 2019 02:26:03 GMT
          access-control-allow-origin: https://alice.com
          access-control-allow-headers: Authorization,Content-Type,Content-Disposition
          access-control-allow-methods: OPTIONS,HEAD,GET,POST,PUT,PATCH,DELETE
          access-control-allow-credentials: true
          access-control-max-age: 7200
      
  3. Now that CORS has been checked, the browser does the real request to the different domain: request-method=POST to alice-api.com/add_user. The response generated by alice-api.com must ALSO contain the header:
        access-control-allow-origin: https://alice.com
    
    or the browser will not accept it.

Wednesday, July 31, 2019

Pairing notes from session by Michelle Gleeson

Every time: Starting a session

  • Can you both see the monitor? Get comfortable.
  • how long will you be pairing for, when will the breaks be (every 45 minutes)
  • when swap? (every ten minutes, every green test, other little goals?)
  • talk through the business logic / user story

Every time: after a session:

  • what was hard about the session
  • what were the unexpected benefits?
  • how can we make it better next time?

Key takeaway:

  • most benefits realised when you combine pairing, clean code and test-driven-development.

Why pair?

  • whole team accountability
  • higher standards
  • can go on holiday (no key-man dependency, no bus-factor, no handover)
  • continuous handover
  • continuous code review
  • build strong team relationships (how long does this take?)
  • increases inclusion and authenticity
    • story about girl who hated her job until that team started pairing
    • opportunities for juniors to work on critical items
    • everyone learns lots of tips and tricks
  • deliberate action: challenge your own thinking
    • story about submarine verbalising routine possibly-dangerous actions

How to pair?

  • no phones
  • one monitor
  • same thought process
  • low level hum of quiet discussion
  • if on laptop, plug in a keyboard (no hogging screen)
  • mirror displays
  • no slack

Quickstart to get a team used to pairing

  • identify why (measurable goals, e.g. 150% improved cycle time)
  • agree to a two week experiment
  • work through discomfort
  • be courageous
  • review regularly
    • what's hard
    • what are the unexpected benefits
    • how can we make it better
  • block out time in calendar and link slack status to it so you don't get interrupted

tips and tricks:

  • offsite pairing might help
  • team charter to make sure values aligned
  • development style: use a linter!

advanced:

  • mob once a week. Several developers sitting around a shared TV, passing around the keyboard.
  • use a timer to swap
  • track who you pair with to make sure you get everyone. Make a chart.
  • pairing cycle: make a test pass, refactor, write next (failing) test, pass over the keyboard
  • everyone knows what's going on, manager can ask anyone
  • team size 4-6 (bigger gets inefficient)
google clean code by robert martin
17 november code retreat. kata. pair with someone for 45 minutes then delete your code. dojo. new challenge for each pair.