This talk, which I only listened to, about the
Forbin Project’s Google’s systems for back up and restore was fun. It just lets you glimpse little bits of what is obviously an elephant.
Here are the things I enjoyed in the talk, particularly the first two.
- He hinted that they can use encryption key management to delete customer data without the bother of erasing all the backup fragments.
- It’s all about the restore. That’s when the system comes under intense scrutiny. So it must be fast and automated. So much of the design evolved after that pressure became clear to them.
- They replicate, ala RAID 4, the restores across tapes. Bad blocks identified (or suspected) are repaired
- They do regular restore testing (5% ?).
- Replication/redundancy is necessary over many dimensions, not just copies. Geography, mechanism, … but he didn’t enumerate the dimensions.
- He hinted that to speed restores they might use only half tapes, since that reduces seek time. Though if you think about it you’ll see that you can sort the tapes so redundancy blocks are in the 2nd half.
- They do ship data physically.
- Logistics planning is obviously a thing.
I would have loved to get a brief overview of the API they deliver to the systems that utilizes their services. Particularly the introductory material that outlines the contracts you can negotiate via configuration and what then are your responsibilities as a user of the services. There are some very slight hints about that, but not much.
The multi-dimensional aspect to redundancy got me to wondering if they have a backup exchange agreement with other big Sky Net operators.
The talk and questions is an hour and fifteen minutes.