Friday, May 18, 2018

PostMortem: Why the GIT/Conan toolchain migration took a year instead of the original estimate of a week

1. There was actually quite a lot of development to do. During the course of this project, Jonathan:
    * learned Kotlin and TeamCity's DSL (through several iterations)
    * learned conan and fixed some issues with it (it evolved from version 0.26 to 1.3)
    * wrote many lines of code, generated many more. Often several hundred lines per day for months.
    * ran hundreds of test builds in teamcity, and debugged each one
    * debugged through many problems with Git, GitVersion, and Git-Tfs
    * documented everything
2. We kept changing the design. Each time it changed, there was a cumulative amount of other things to fix. icu, build.py adjustments, package.py adjustments (still going on), is2c changes, re-migrating projects, etc.
    * Keeping the list of projects as a static list (is2c/proj_names.py) was a really good decision, as it was very quick to make changes and basically did not require testing
    * This was partly unavoidable (we are designing something new, after all) but there were several possible mitigations here.
    * Adding people to the project before it was ready introduced more dependencies before the foundations had solidified. This caused a lot of unnecessary work (maybe a cumulative total of three months).
    * Trying to migrate all the ti dependencies was a bit of an effort. There were actually lots of dependencies (51, currently) with a very complicated dependency tree, 9 levels deep. This is pretty wild for c++ libraries; most open source projects do not seem to handle this well. The only reason we got away with it until now was the monolithic architecture; unfortunately we are now deeply embedded in the obvious maintainability nightmare that results from that approach. It would have been better to pick a project with only a few dependencies (say, three or four) to do first, and once that was solid, then automatically migrate many more. Unfortunately, Jonathan worked on the REDACTED team so it was politically justifiable to aim for REDACTED projects initially.
    * It didn't help that conan was evolving with us. In some ways that was good (they were very responsive to pull requests, often accepting and releasing within 24 hours); but it would have been better to let others evolve conan and pick up a more mature product once it had stabilized. Conan still does not have support for modifying multiple dependencies at once.
3. There were lots of existing problems that needed to be fixed before dependencies could be migrated.
    * The monitoring tool was still on an old build of a base library 14 (this took several days to fix, the architecture of that project was chandlerized)
    * The python wrapper of our binary protocol was using a checked-in copy of baselibrary.dll because it was initially deemed too hard to set up properly (this is now good, with a conanfile.py to build the dll and then setup.py to package it into a .wheel)
    * The python test library still required 3.4 so I implemented support for this and then removed it when the python wrapper turned out to need Python 3.5+ and we had to fix the python test library instead
    * Various other minor issues taking hours to days to deal with
4. Writing documentation was not allowed for in the original estimate. Jonathan hadn't written much user documentation in the past and it is quite laborious.
5. Jonathan had other demands on his time from his team. Getting pulled off repeatedly to work on team support or client work meant that the adjustments introduced by c and s were not immediately dealt with and took significantly longer to fix when they were finally discovered. Additionally, it made merges from TFS to Git significantly more involved as many more commits had been made while the merges weren't happening (this could have been handled a lot better by discovering the git-subtree merge process sooner). An additional complication was that the aim was for a solution that would work for most of the company's 1000+ released c++ projects – keeping things simple and flexible was always an additional consideration.