A friend recently inquired:
… it says “Transferring data from www.google-analytics.com”. It has been sitting in that state now for minutes.
to which my immediate reaction was: “Oh, that’s Web 2.0.”
Web 2.0 is many things to many people. One of my favorites is that Web 2.0 is a vision for how the architecture of the internet operating system might shake out. In this vision there are many vendors who contribute services to the system and applications are built by picking among those services. I joke that in that world the only dominate player would be O’Reilly who’d naturally get to publish a book for every service. Doc writers rule!
A somewhat less general version of that strawman architecture of applications delivered by aggregate diverse vendor services looks only at the individual web page, and then the page is assembled by pulling content from diverse vendor services. In that variation the UI subsystem for the internet operation system is situated in the web browser (much as back in the day we thought it might be situated in an X terminal). UI designers know that latency is a must have feature.
There is a gotcha in the Web 2.0 architecture. When you assemble your application each additional supplier increase you risk. That’s called supplier risk. This is what my friend was observing. It used to be conventional wisdom that no same web site developer would let this kind of supplier risk into his design. That has turned out to be false, and I think it was always overstated to the point of being silly.
Internet systems built along the lines of my Web 2.0 sketch are a like just in time manufacturing; but, with the knob turned up to eleven. Supply chains sometimes fail catastrophically in a cascading failure. There is a wonderful example of that in the book about the auto industry The Machine that Changed the World. The story takes place in Detroit in the early 20th century. Before the story begins the auto industry’s supply chains are dense and somewhat equitable. Detroit has many small producers of assorted component parts. The producer of seats would come into work each morning to find his inputs sitting on his loading dock. He’d assemble his seats and deliver them on to the next guy. And then there was a recession. He comes in and his morning bucket of bolts is missing. His supplier has gone bankrupt. This failure cascaded and when it was over, when the recession ended, the auto industry was a lot less diverse.
There are days when I think it’s all about latency. And in this world each hick up creates drives us toward another round of consolidation. For example I think it’s safe to say the chances your suffer the hickup my friend observed are much reduced if you situate your site inside of Google’s data centers.
Well, so, thinking about my friend’s comment got me to wondering: How’s that Web 2.0 thing working out? Do we have any data on the depth and breadth of supply chain entanglement in the web application industry? Do we have any metrics? Can we see any trends. Ben Laurie has recently been looking at something similar (about DNS, about AS), the supplier risk he’s thinking about is what bad actors might do if they owned (pwn’d in Ben’s terms) one of the points of concentrated control. He’s got pretty pictures, but no metrics.
Here’s a possibility. I’ve been enjoying a firefox plugin Ghostery, which reveals how many “web bugs” or “behavioral marketing trackers” or what ever you want to call them are embedded in each page I visit. For example if you go to Paul Kedrosky’s awsome blog Infectious Greed there are ten (Google Analytics, Google Adsense, Lijit, Minit, Federated Media, Doubleclick, ShareThis, Sphere, and Insight Express). Ghostery isn’t quite doing what I wanted. It is surveying only a subset of universe of Web 2.0 services used in assembling a page. So it doesn’t report when the page is pulling in Yahoo maps or widgets from Flickr or Etsy. But it’s a start.
If opt in Ghostery will pipe what it learns from your browsing back into a survey of what’s happening across various pages. That includes, of course, a directory of all the services it’s keeping an eye on. For example here is the Ghostery directory page for Lijit which reveals a bit of what’s being accumulated, i.e. that Lijit was found on over a thousand sites by ghostery users who have opted in to reporting back what they are seeing.
So yesterday I hacked up a tiny bit of code to pull those counts from Ghostery’s directory so I could see what the tracker market is looking like. (Note that the ghostery firefox plugin is open source, but as yet the server’s not.) You can see the rankings of top trackers here. I presume they are powerlaw distributed. Organically grown unregulated market shares usually are. Even so, it is extremely concentrated with four of the top six positions are Google’s. Here’s the top handful:
800000 Google Analytics 300000 Google Adsense 200000 Doubleclick 70000 Statcounter 60000 AddThis 40000 Google Custom Search Engine 40000 Quantcast 30000 OpenAds 20000 Omniture 20000 WordPress Stats 20000 SiteMeter 10000 Revenue Science 10000 AdBrite 10000 Casale Media 10000 Twitter Badge 10000 MyBlogLog 10000 DiggThis 10000 Microsoft Atlas 10000 ShareThis 9000 NetRatings SiteCensus 9000 Google Widgets 9000 ValueClick Mediaplex 8000 AddtoAny