My company www.aflorithmic.ai recently went through some migrations. There were a few from the easy – moving from python anywhere to AWS Elastic Beanstalk, to the more involved – a big refactoring that involved moving ejecting from Expo. However, I think the lessons apply.
Firstly, I’m not an expert. I’ve been involved in migrations over my career. And I’ve always found them stressful. I think that’s because a migration often involves existing stakeholders, and the fact is that sometimes migrations fail.
Why do we need migrations?
Everyone who’s worked with code knows this. And sometimes that’s a point of embarrassment. In a hackathon, you can create an app in 3 hours, but it can take you weeks to add a new feature to an existing app or system. A lot of that is just one of those facts of life which is entropy increases, and things just get more complicated. So the reason it can take weeks to add a feature, is that you need to respect the existing users.
Technical debt is an inevitable thing that happens to a code base with time. Not to mention the maintenance cost of an existing codebase.
One thing that’s controversial about migrations is that they’re not easy. They are a way to get technical leverage. You invest now and slow down in the hope of payment in the future. There are other fixes that Engineers and Engineering managers will bring in. These include monitoring tools and code-reviews.
I’m the co-founder of a startup. I’ve seen my engineering team grow from 3 people to 10 in the past 15 months. So I’ve inevitably seen bringing in more communication and things like code-review. These suggestions are good, but they run out.
Migrations are the only mechanism to effectively manage technical debt as your company and code grows. If you don’t get effective at software and system migrations, you’ll end up languishing in technical debt.
One thing I’ve learned in my career is that most tools can only survive about one order of magnitude of growth. If your company grows you’ll have to run migrations, otherwise you’ll have over engineered your earlier systems.
Productivity and migrations
This diagram explains how migrations help with technical debt
That’s one scenario. What happens if a migration fails? You might think that’s a strange question. However, migrations do fail. Sometimes you’ll move to a new ecosystem, and it just doesn’t work. Well, you can get a situation where a codebase has like no further productivity. If this happens you’re in a lot of trouble but this does happen.
It looks like this.
Migrations as strategic bets
Firstly it’s worth pointing out that any migration is a strategic bet for your company. Often you’ll know a migration is coming because you keep pushing it back. This may be because the tool is running out of use (Python 2.7 to Python 3+ is one example) or you may have functionality that you can’t provide with your current tools (certain database migrations). You may hit the limits of a particular framework (Expo, Ruby on Rails). It is the nature of a strategic bet that sometimes they fail. One more point is that you can only handle so many migrations at once. Your organization has a limited number of them they can do in a year.
A framework for migrations
- Prototype (one easy, then hard)
The first step is to de-risk the migration. A good way to do this is to write a design document. It’s important to both show this to your detractors and your supporters. This whole process of writing a document and having people discuss it is super important for creating a shared understanding and also having an outline of what’s the tradeoffs in a particular design. All architecture is compromise.
Also, remember that each team bets on a migration. We’ve all as engineers heard horror stories of companies stuck between databases or a never-ending project that couldn’t be handled.
It’s also worth asking ‘is this worth doing’ and ‘is it worth doing now’. For example it’s very easy to end up with not-invented-here syndrome. These concerns should all be in your design document. And this is one of the strengths of the Amazon press release method.
The first part is to prototype one easy migration and the one hard. It can also be a good idea to invest in tooling at this point. For example, if you can automate 90% of a migration and build tooling for that it’s super powerful. And this can make you get time on your side.
Another good idea is to think of a migration as a product. So spend time on the tooling, and explaining it to the wider team. When I was at Amazon we used to get ‘product updates’ even for infrastructure projects or migrations. This helped a lot because we got visibility of what was happening. And also there was a focus on customer value.
The last part of a migration is to get people to stop using the old legacy system that you’ve deprecated. This can take a long time.
You can use tooling to your advantage here by adding something to your linter. Or generating self-service tooling or documentation. In our case, because we’re a small team we focused a lot on documentation.
The last 5% is hard. The last moving to a new system, the nitty gritty and the expectations that you have which are ‘why doesn’t it work like the last system’ are hard to manage. Sometimes you just need to put in the work though and deliver. Pushing through friction is a big part of leadership.
When you finish the migration you need to celebrate. It’s important that you reward finishing and delivering not just starting the project. I’ve seen organizations that reward those who pitch projects at the expense of those who deliver. This is bad, especially because you incur significant technical debt.