Tuesday, March 14, 2017

The Story of the Iceberg Tests

The Story of the Iceberg Tests

About two years after I started at a financial software company, I was assigned to fix a problem with iceberg orders for a Canadian market. Our Trading Interface was not sending the correct display volume to the order server.

I was starting to get sick of these jobs because we had already had several problems with icebergs. Every market represents them differently -- one will tell you just the visible volume, another the hidden volume, and another might only tell you the refill quantity when the order is created. Since it is our job to normalize all this for the order server, we have to handle all the variations and produce a complete picture for the order server.

What was happening was that every time someone would fix icebergs for one market, it would break for another. Annoyed by this, I spent two days combing through all the markets' code, gathered all the implicit requirements, and after verifying a few other requirements with knowledgeable people, I wrote a complete set of unit tests. I was then able, through a fair bit of trial and error, to produce normalization code that worked for all the markets. In total, it took nearly a week of careful, painstaking work, and resulted in 26 unique tests and all the TIs were able to handle their markets' idiosyncracies at the same time.

The problems with icebergs vanished overnight.

About a year later, my boss decided our normalization code needed to be refactored. He discussed it with us, and I said to go for it, but to make sure to pass all the tests. He was a cowboy coder -- his test strategy was to manually test the happy path and let clients find the other problems.

(This approach was really awful: it took us days to find the cause of a problem, pulling logs and stepping through code. There were also lots of problems, two or three every week. It also meant that clients became resistant to upgrading. The end result was that our team spent more than half its time on support.)

My boss was unable to write code that passed all the tests. He gave up after several weeks. I felt victorious -- without those tests, it is likely that I would have spent several months in a painful release-investigate-debug cycle, fixing the same iceberg problems over and over again.

Fortunately for software engineers, building the bridge between the user and the machine is not like civil engineering. Our builds take seconds, not years. It is easy for us to build the bridge a hundred times a day, reinforcing parts of the code that are under tension or stress. As a result, it should be easy for us to find problems with our design.

Just like with a real bridge, reinforcing the more fragile parts of your code with full-coverage tests can save huge amounts of pain in missed deadlines and exhausted engineers.



Here is how to write good tests (as taught in every undergraduate software engineering course):
  1. Cover both branches of every branch. Make sure your tests fully explore all possible outcomes of the code. Only someone working with the source code can do this.
  2. Cover every requirement.
  3. Test each equivalence class once and only once. An equivalence class is all inputs that result in the same calculation process but with slightly different numbers. So an order with quantity=7 and quantity=8 would be in the same equivalence class.
  4. Test each edge case once and only once. See what happens when we send quantity=0 or quantity=-1 or quantity=blank.
  5. Test anything else that seems likely to break in the future

I have repeated this process twice, for multileg cancellations and the IosAttributeFilter. It worked every time.

Even better, the tests are still saving us work. A team member came to ask me about a failing iceberg test the other day, so I pointed out a market that would have broken if they made the change they wanted to make. They found a better solution to their market's problem.

previouslyunit testing: writing better code faster

Addendum

I have grown as a programmer since I wrote this article. I would recommend reading the following:

http://blog.stevensanderson.com/2009/08/24/writing-great-unit-tests-best-and-worst-practises/

(particularly the point about the two sweet spots of testing, highly-focused ones for validating a complicated component vs. integration tests that are cheap to maintain)

Good tests are useful as documentation as well. I don't have to support old TIs I wrote because I can just refer the new developers (or my own failed memory) to the tests when they have a question about how or why it is coded that way, and whether they can change something. How is answered by debugging through the code (made easy by the test providing typical input and a harness); Why is answered by the test's description and existence. Whether they can change something is answered by "change it and see if a test breaks" which is a damn sight better than the previous "we're too afraid to change anything because we have no way of knowing if we've broken something without giving it to clients". It is important that the tests are written with these things in mind; writing a test that will not be useful when it fails is counterproductive because it makes the test suite overwhelming and thus not useful as documentation. So "100% coverage" is not a good thing to aim for with the above goals in mind. Test the hard stuff; test the API boundary; but don't test the wiring / glue code between modules.