Continuous v.s. Batch: The Census

Log, from Blamo: Civil War Reenactor

I am enjoying this extremely long blog post about how logs can form the hub for a distributed system, by Jay Kreps from Linked-in. It’s TLMR “too long, must read?” It reminds me of my post about listening to the system, but more so.

He has a wonderful example of batch v.s. continuous processing. A dialectic worthy of its own post at some point.

The US census provides a good example of batch data collection. The census periodically kicks off and does a brute force discovery and enumeration of US citizens by having people walking around door-to-door. This made a lot of sense in 1790 when the census was first begun. Data collection at the time was inherently batch oriented, it involved riding around on horseback and writing down records on paper, then transporting this batch of records to a central location where humans added up all the counts. These days, when you describe the census process one immediately wonders why we don’t keep a journal of births and deaths and produce population counts either continuously or with whatever granularity is needed.

Cute. My goto example has always been the difference between the annual cycle(s) that arises from agriculture and tax law revisions v.s. the newspaper’s daily cycle in service of the demand for fish wrapping.

But of course that’s not really continuous, it’s just batch with different cycle times. And yet I once encountered a continuous system that involved a pipeline across a desert. Each time the sun would emerge from behind the clouds the pipe would warm up and a vast slug of material would be ejected out the far end into a hastily build holding pit at the refinery. Maybe slug processing would be a good fall back term for the inevitable emergence of batches in continuous systems. Blame the clouds.

Ascription is an Anathema to any Enthusiasm

Ben Hyde

Leave a Reply