Why Google Source has high uptime

Google Source has tremendous uptime. They manage it by having a super-flexible schema, and code that takes advantage of it.

How googlesource.com manages database upgrades with no downtime, by Shawn Pearce

I love this old post from Shawn Pearce on how Google Source has no planned downtime. They take advantage of protobuf and BigTable having a flexible schema. The application is coded to ignore columns it doesn’t understand, but still save and persist them.

Nothing prunes the old fields from the Bigtable. Disk storage is cheap, disk IOs are not. Leaving the deleted data on disk is cheaper than scanning through every row and clipping out the deleted fields.

So when they upgrade the code with new or deleted columns, they can upgrade one server at a time, because they know that the old code will play nicely with any new columns, and new code won’t harm any deleted columns. It’s a nice alternative to feature flags.

I also ❤️ this little bit on why they keep some commented-out code.

This is why we leave deleted fields commented out in source code, so future developers know not to reuse a field number.

Basically, the commented out code is a piece of documentation. They could put it in some external documentation wiki, as is proper, but that isn’t be as useful as putting it exactly where mistakes are likely to be made.

Just goes to show that you can break the rules for good reason.