Here’s something I hadn’t quite noticed before about the trade offs between reliability and efficiency. This was triggered by the hearsay that yesterday’s troubles in the US air traffic control system are the fault of a single router. I don’t actually believe that rumor. That makes it sound like they design the system so it could not fail gracefully; more likely this was a breakdown in the hard part – implementing and maintaining the architecture for graceful failure. So I was thinking about why that happens. Which lead to noticing something I’d not noticed before.
If you want to make a system more efficient or cheaper a common trick of the trade is to roll up parts. For example if you have five elementary schools in town it will be cheaper to roll them up into a single school. That’s called economy of scale. What your eliminating when you merge those schools are the redundant bits; i.e. maybe you don’t need two school Principals, two school nurses, two language labs, one great PTA president instead of one great and one who’s just ok, etc. etc. If the redundant bits are a large portion of the overall cost then roll up is highly leveraged.
One of my utility companies charges me a base fee plus a fee for usage. That base fee creates an incentive for me to merge my account with my neighbors. The base fee is about $240 a year, so could each safe a $120 a year. The benefit from rolling up accounts like that falls off fast. That $120 a year is the economic motivation to roll up the account. But notice that the motivation weakens as the coop grows. If five of my neighbors get it together to roll up their accounts into a single account we’d save $960 a year – but individually we’d only save $192; or an addition $72 each. The economic motivation for reducing the redundancy weakens quickly.
I’d noticed that before, and is other reason why the motivation weakens, i.e. the increasing coordination costs of the larger organization. I’ve presumed that in different spheres of activity there are distinct sweet spots; but that’s not the topic today. The topic for today is how this plays off against the problem of system reliability.
What I hadn’t noticed is how the motivations for economies of scale are at their most powerful just at the moment when they will do the maximum damage to your redundant system design. For example if your running the FAA flight control network you balance costs against reliability; wearing those two hats is exhausting – not to mention confusing. Those are very different kinds of expert skill. So if Mr. Reliability is out sick and Mr. Efficiency is looking for his most highly leveraged move to reduce cost by merging duplicate system – well – that list will sort to the top duplications where 2 systems are reduced to one system. And there goes your last bit-o-redundancy for that subsystem. A complex real system has numerous subsystems and you need only loose it in one portion to put the entire system at risk.