Lessons Learned Leading High-Stakes Data Migrations

A migration done right will not only optimize your project’s infrastructure. It will also leave you with a deeper understanding of your system.

Jun 11th, 2025 8:00am by Cynthia Dunlop

Featued image for: Lessons Learned Leading High-Stakes Data Migrations

Image from Roman Zaiets on Shutterstock.

“No one ever said, ‘Meh, it’s just our database.'”
— Tim Koopmans, senior director at ScyllaDB

Every data migration is high stakes to the person leading it. Whether you’re upgrading an internal app’s database or moving 362 petabytes of the social platform X [Twitter’s] data from bare metal to Google Cloud Platform (GCP), a lot can go awry, and you don’t want to be blamed for downtime or data loss.

But a migration done right will not only optimize your project’s infrastructure. It will also leave you with a deeper understanding of your system, and maybe even yield some fun “war stories” to share with your peers.

To cheat a bit, why not learn from others’ experiences first? Enter Miles Ward, CTO at SADA and former Google and AWS cloud lead, and Tim Koopmans, senior director at ScyllaDB, performance geek and SaaS startup founder. Ward and Koopmans recently got together to chat about lessons they’ve personally learned from leading real-world data migrations.

You can watch the complete discussion and hear some interesting anecdotes here:

Let’s look at three key takeaways from the chat.

1. Start With the Hardest, Ugliest Part First

It’s always tempting to start a project with some quick wins, but tackling the worst part first will yield better results overall. Ward explains, “Start with the hardest, ugliest part first because you’re going to be wrong in terms of estimating timelines and noodling through who has the correct skills for each step and what are all of the edge conditions that drive complexity.”

For example, he saw this approach in action during Google’s seven-year migration of the Gmail backend (handling trillions of transactions per day) from its internal Gmail data system to Spanner. First, Google built Spanner specifically for this purpose. Then, the migration team ran roll-forwards and roll-backs of individual mailbox migrations for over two years before deciding that the performance, reliability and consistency in the new environment met their expectations.

Ward added, “You also get an emotional benefit in your teams. Once that scariest part is done, everything else is easier. I think that tends to work well, both interpersonally and technically.”

2. Map the Minefield

You can’t safely migrate until you’ve fully mapped out every little dependency. Both Koopmans and Ward stress the importance of exhaustive discovery: cataloging every upstream caller, every downstream consumer, every health check and contractual downtime window before a single byte shifts. Ward warns, “If you don’t have an idea of what the consequences of your change are … you’ll design a migration that’s ignorant of those needs.”

Ward then offered a cautionary anecdote from his time at Twitter, as part of a team that migrated 362 petabytes of active data from bare-metal data centers into Google Cloud. They used an 800 Gbps interconnect (about the total internet throughput at the time) and transferred everything in 43 days. To be fair, this was a data warehouse migration, so it didn’t involve hundreds of thousands of transactional queries per second. Still, Twitter’s ad systems and revenue depended entirely on that warehouse, making the migration mission-critical.

Ward shared: “They brought incredible engineers, and those folks worked with us for months to lay out the plan before we moved any bytes. Compare that to something done a little more slapdash. I think there are plenty of places where businesses go too slow, where they overinvest in risk management because they haven’t modeled the cost-benefit of a faster migration. But if you don’t have that modeling done, you should probably take the slow boat and do it carefully.”

3. Engineer a ‘Blissfully Boring’ Cutover

“If you’re not feeling sleepy on cut-over day,” Ward quipped, “you’ve done something terribly wrong.” But how do you get to that point?

Koopmans shared that he’s always found dual writes with single reads useful: You can switch over once both systems are up to speed. If the database doesn’t support dual writes, replicating writes via change data capture (CDC) or something similar works well. Those strategies provide confidence that the source and target behave the same under load before you start serving real traffic.

Then Koopmans asked Ward, “Would you say those are generally good approaches, or does it just depend?” Ward’s response: “I think the biggest driver of ‘It depends’ is that those concepts are generally sound, but real‐world migrations are more complex. You always want split writes when feasible, so you build operational experience under write load in the new environment. But sample architecture diagrams and Terraform examples make migrations look simpler than they usually are.”

Another complicating factor: Most companies don’t have one application on one database. They have dozens of applications talking across multiple databases, data warehouses, cache layers and so on. All of this matters when you start routing read traffic from various sources. Some systems use scheduled database-to-warehouse extractions, while others avoid streaming replication costs. Load patterns shift throughout the day as different workloads come online. That’s why you should test beyond the immediate reads after migration or when initial writes move to the new environment.

So codify every step, version it and test it all multiple times — exactly the same way. And if you need to justify extra preparation or planning for migration, frame it as improving your overall high-availability design. Those practices will carry forward even after the cutover.

Also, be aware that new platforms will inevitably have different operational characteristics. That’s why you’re adopting them. But these changes can break hard-coded alerts or automation. For example, maybe you had alerts set to trigger at 10,000 transactions per second, but the new system easily handles 100,000. Ensure that your previous automation still works and systematically evaluate all upstream and downstream dependencies.

Follow these tips and the big day could resemble Digital Turbine’s stellar example. Ward shared, “If Digital Turbine’s database went down, its business went down. But the company’s DynamoDB to ScyllaDB migration was totally drama-free. It took two and a half weeks, all buttoned up, done. It was going so well that everybody had a beer in the middle of the cutover.”

Closing Thoughts

Data migrations are always “high stakes.” As Ward bluntly put it, “I know that if I screw this up, I’ll piss off customers, drive them to competitors or miss out on joint growth opportunities. It all comes down to trust. There are countless ways you can screw up an application in a way that breaches stakeholder trust. But doing careful planning, being thoughtful about the migration process and making the right design decisions sets the team up to grow trust instead of eroding it.”

Data migration projects are also great opportunities to strengthen your team’s architecture and build your own engineering expertise.

Koopmans left us with this thought: “My advice for anyone who’s scared of running a data migration: Just have a crack at it. Do it carefully, and you’ll learn a lot about distributed systems in general — and gain all sorts of weird new insights into your own systems in particular. ”

Bonus: Access our free NoSQL Migration Masterclass for a deeper dive into migration strategy, missteps and logistics.

Cynthia Dunlop has been writing about software development and testing for much longer than she cares to admit. She's currently senior director of content strategy at ScyllaDB.