Author Archives: bhyde

California to issue warrants?

I see from the news that California may issue registered warrants or what as a work around for not having the cash to pay their bills.  The newspaper men are calling IOUs, but it’s always hard to tell when a piec.e of paper transitions between stock, bond, currency, warrant, IOU, etc. Are these insturments better or worse than California’s bonds?

This reminded me of a story from the depression where.  In that story the local merchants arranged to convert the town issued tax warrants into their local micro-currency.  You can read that story in this posting, it’s the second story in the piece.

Market concentration in Web 2.0

A friend recently inquired:

… it says “Transferring data from www.google-analytics.com”. It has been sitting in that state now for minutes.

to which my immediate reaction was: “Oh, that’s Web 2.0.”

Web 2.0 is many things to many people. One of my favorites is that Web 2.0 is a vision for how the architecture of the internet operating system might shake out. In this vision there are many vendors who contribute services to the system and applications are built by picking among those services. I joke that in that world the only dominate player would be O’Reilly who’d naturally get to publish a book for every service. Doc writers rule!

A somewhat less general version of that strawman architecture of applications delivered by aggregate diverse vendor services looks only at the individual web page, and then the page is assembled by pulling content from diverse vendor services. In that variation the UI subsystem for the internet operation system is situated in the web browser (much as back in the day we thought it might be situated in an X terminal). UI designers know that latency is a must have feature.

There is a gotcha in the Web 2.0 architecture. When you assemble your application each additional supplier increase you risk. That’s called supplier risk. This is what my friend was observing. It used to be conventional wisdom that no same web site developer would let this kind of supplier risk into his design. That has turned out to be false, and I think it was always overstated to the point of being silly.

Internet systems built along the lines of my Web 2.0 sketch are a like just in time manufacturing; but, with the knob turned up to eleven. Supply chains sometimes fail catastrophically in a cascading failure. There is a wonderful example of that in the book about the auto industry The Machine that Changed the World. The story takes place in Detroit in the early 20th century. Before the story begins the auto industry’s supply chains are dense and somewhat equitable. Detroit has many small producers of assorted component parts. The producer of seats would come into work each morning to find his inputs sitting on his loading dock. He’d assemble his seats and deliver them on to the next guy. And then there was a recession. He comes in and his morning bucket of bolts is missing. His supplier has gone bankrupt. This failure cascaded and when it was over, when the recession ended, the auto industry was a lot less diverse.

There are days when I think it’s all about latency. And in this world each hick up creates drives us toward another round of consolidation. For example I think it’s safe to say the chances your suffer the hickup my friend observed are much reduced if you situate your site inside of Google’s data centers.

Well, so, thinking about my friend’s comment got me to wondering: How’s that Web 2.0 thing working out? Do we have any data on the depth and breadth of supply chain entanglement in the web application industry? Do we have any metrics? Can we see any trends. Ben Laurie has recently been looking at something similar (about DNS, about AS), the supplier risk he’s thinking about is what bad actors might do if they owned (pwn’d in Ben’s terms) one of the points of concentrated control. He’s got pretty pictures, but no metrics.

Here’s a possibility. I’ve been enjoying a firefox plugin Ghostery, which reveals how many “web bugs” or “behavioral marketing trackers” or what ever you want to call them are embedded in each page I visit. For example if you go to Paul Kedrosky’s awsome blog Infectious Greed there are ten (Google Analytics, Google Adsense, Lijit, Minit, Federated Media, Doubleclick, ShareThis, Sphere, and Insight Express). Ghostery isn’t quite doing what I wanted. It is surveying only a subset of universe of Web 2.0 services used in assembling a page. So it doesn’t report when the page is pulling in Yahoo maps or widgets from Flickr or Etsy. But it’s a start.

If opt in Ghostery will pipe what it learns from your browsing back into a survey of what’s happening across various pages. That includes, of course, a directory of all the services it’s keeping an eye on. For example here is the Ghostery directory page for Lijit which reveals a bit of what’s being accumulated, i.e. that Lijit was found on over a thousand sites by ghostery users who have opted in to reporting back what they are seeing.

So yesterday I hacked up a tiny bit of code to pull those counts from Ghostery’s directory so I could see what the tracker market is looking like.  (Note that the ghostery firefox plugin is open source, but as yet the server’s not.)  You can see the rankings of top trackers here. I presume they are powerlaw distributed. Organically grown unregulated market shares usually are. Even so, it is extremely concentrated with four of the top six positions are Google’s. Here’s the top handful:

800000 Google Analytics
300000 Google Adsense
200000 Doubleclick
70000 Statcounter
60000 AddThis
40000 Google Custom Search Engine
40000 Quantcast
30000 OpenAds
20000 Omniture
20000 WordPress Stats
20000 SiteMeter
10000 Revenue Science
10000 AdBrite
10000 Casale Media
10000 Twitter Badge
10000 MyBlogLog
10000 DiggThis
10000 Microsoft Atlas
10000 ShareThis
9000 NetRatings SiteCensus
9000 Google Widgets
9000 ValueClick Mediaplex
8000 AddtoAny

Grapple

I enjoyed Lera  Boroditsky’s  essay in support of  the Whorfian hypothesis. Denialist like to mention grapple.  It’s is a kind of snow.  Slightly melted and refrozen.  So, see, English has lots of words for snow too. But I long ago picked a side in that argument: Language deeply effects your thinking.  So for me her wonderful essay is preaching to the choir.

It’s a great read, with lots of really fun stories.  There is a tribe which describes location via compass points.  “There is a spot on your shirt’s southwest collar.”    If you ask them to order a series of images in timeline order they orient them east to west; much as speakers of Mandarin speakers will order then vertically.    Bridges have male gender in Spanish, and female in German.  Asking and answering in English a people who’s native tongue is respectively Spanish or German will describe a bridge as – respectively:  big, strong, sturdy, towering v.s. beautiful, elegant, slender.

Some languages, like English, aren’t really into gender, while others have lots.  I can’t find anything to confirm this, but she reports that in some  Australian  Aboriginal languages have a gender used for shiny things.  Which is notable given where I presume they got the name for Google Wave.

Of course, what  Tim O’Reilly was trying to do when he gin’d up the term Web 2.0, the millionth word, was to shape the conversation.  He may have set his sights too low.

Wrong Frame

I have long been a huge fan of  Robert Cialdini’s first book, Influence.  The original printing is the best because it retains the maximal emotion.  He was horrified to discover that people had these clever tricks for manipulating his behavior.  The book is written as a kind of handbook for how to defend your self.  Later editions, and his later books, are colored by a more even handed attitude, and sometimes you think he’s gone entirely over to the darkside.  I’m suspicious the makes a good living giving talks to salemen.

I’ve not read the most recent book  Yes! 50 Scientifically Proven Ways to Be Persuasive but there is a nice short summary of all 50 techniques to be found here.

Reading those I was struck by one entry:

As time goes by, the value of a favor increases in the eyes of the favor-giver, and decreases in the eyes of the favor-receiver. Researchers asked a group of people in the random office environment to exchange favors and then rate the value of the given/received favor in their eyes. A few weeks later the same employees were reminded of the favor, and asked to evaluate the favor again. Favor-givers consistently assigned higher value to a given favor, while as the time passed by, favor-receivers tended to assign lower value to the received favor.

Ha!  That’s amusing, but the reason why it’s amusing should be drawn out.  It’s amusing because the entire statement is an oxymoron, a farce in one line.  Such misunderstandings are always amusing.  It’s a category error.  Favors are gifts, they are not economic transactions.  When you do a favor your are not collect IOUs in the currency of some pseudo economy.  If you think you are, well then your not doing actually favor, your playing a game.  Keeping score.  And there is nothing wrong with playing a games, lots of games in this life.  Certainly lots of activities labeled as gift exchanges are in fact just point scoring in some game or another.  But if you think your playing such a game you presuming that the recipient knows the rules of your imaginary game fraught with affordances for misunderstandings.  And, that is the stuff of farce.

It helps to recognize that it is in the nature of public goods that the books do not balance.  To push them into that frame is to miss the point.  Recently I’ve come to saying to people who are suffering from this category error: “Those books don’t balance, nor should the, but if we must think in those terms how do you want the accounts to look when you arrive at your deathbed?”

Persuasion  is often the art of moving the decision into an advantageous frame.

Look Ma! No Hands!

Years ago I worked for a company that had no quality assurance, none!  No testing, nothing!  In point of fact they didn’t have a lot of things, furniture for example.  We had some folding tables and chairs, but not enough.  Performing without a net, wee!  That may have been the first time I mumbled “Look Ma!  No hands!”  We took a childish glee in our bravado.  I was talking to my inner mom.  I she smiled lovingly and quietly suggested: you be careful honey.

I’m always amused when I mumble that.  I’m the audience at my own farce. Self-awareness is better if amused.  I’ve worked on projects without: customer contact, product management, specifications, management, real engineers, sales, money, office, email, operations, user documentation, source control, a good editor, a sane language, clue, I could go on.  And, I must point out that these days, what with cloud computers, there is a fad for computer projects without computers!

In fact this pattern is so common that I’m starting to think there’s something to it.  We presume it’s a bug, but maybe it is a feature. In any case, I’ve gotten a lot of mileage out of being on the look out for it.

Each time there is a narrative.  There is always a list of awful things that happen if you add it back: lazy OPS, whinny QA, micro-managers.  There is a whole  literature  that lays the blame for institutional  inability  to innovate down to their fine offices and the heavy sauces in their company  cafeteria.    There is always a bit (or more) of a sense of mission in doing without.  This isn’t just hair shirt; it’s a real pleasure showing that you can rub your stomach and pat your head at the same time.  All while riding a bicycle with no hands!  These all create a kind of pride and solidarity in the team, along with a bit of a dirty secret.  In a sense all that narrative is an amazingly positive way to make good out of scarcity.

Do-without no-hands seem to be a positive.  And not just because you get a better story to tell to your grand kids.  Doing without can be a total win.  Lousy is often a damn sight more expensive than none for a lot of the parts of projects.  Money is always short.  Thrift is a virtue, it buys time.

The only time it blows up in your face is when the team becomes so deeply committed to positive aspects of forgoing this or that, and then suddenly what they desperately need is that.  One firm of my experience had lousy QA and it all blew up.  It took two expensive tries to fix that.  First they hired up some QA, but they got no respect and it failed.  They took the most senior of labor and stuck him with the job.  Their status, and their intimate familiarity with the local custom, let them route around the deeply entrenched belief that we could ride that bicycle with nothing but dancer like body language.

Cascades of Surprise

We build monitoring frameworks like the one I outlined in “Listening to the System” for at least four reasons.  Their maybe legal requirements that we keep records for later auditing and dispute resolution.  We may want to monitor the system so we can remain in control.  We may want to collect data in service of tuning the system, say to reduce cost or improve latency.  And there there is debugging.  Audit, control, tuning, and debugging are, of course, are not disjoint categories.

Good monitoring will draw our attention to surprising behaviors.  Surprising behaviors trigger debugging projects.  The universe of tools for gleaning out surprising behavior from systems is very large.  Years ago, when I worked at BBN, the  acoustics’  guys were working on a system that listened to the machine room noise on a ship hoping to sense that something  anomalous  was happening.

I attended a talk “Using Influence to Understand Complex Systems”  this morning by  Adam Oliner  (the same talk performed by his coauthor  Alex Aiken  is on youtube) where I was again reminded of how you can often do surprisingly effective things with surprisingly simple schemes.

Adam and Alex are tackling an increasingly common problem.  You have a huge system with  numerous  modules.  It is acting in surprising ways.  You’ve got a vast piles of logging data from some of those modules.  Now what do you do?

Their scheme works as follows.  For each of the data streams convert the stream into a metric that roughly measures how surprising the behavior was at each interval in time.  Do time series  correlation between the modules.  That lets you draw a graph: module A influence B (i.e. surprising behavior in A tends to precede surprising behavior in B).  You can also have arcs that say A and B tend to behave surprisingly at the same time.  These arcs are the influence mentioned in their title.

If you add a pseudo module to include the  anomalous  behavior your investigating, then the graph can give you some hints for were to investigate further.

At first blush you’d think that you need domain expertise to convert each log into a metric of how surprising the log appears at that point in time.  But statistics is fun.  So they adopted a very naive scheme for converting logs into time series of surprise.

They discard everything in the log except the intervals between the messages.  Then they keep a long-term and a short-term histogram.  The surprise is a measure of how different these appear.  The only domain knowledge is setting up what short and long-term means.

The talk includes a delightful story about applying this to a complex robot’s naughty behaviors, drawing attention first to the portion of the system at fault and further revealing the  existence  of a hidden component where the problem actually was hiding out.  Good fun!

I gather that they don’t currently have a code base you can download and apply in-house, but the system seems simple enough that cloning it looks straight forward.

They would love to have more data to work on, so if you have a vast pile of logs for a system with lots and lots of modules, and your willing to reveal the inter-message timestamps, module names, and some information about when mysterious things were happening.  I suspect they would be enthusiastic about sending you back some pretty influence graphs to help illuminate your mysterious behaviors.

It would be fun to apply this to some social interaction data (email/im/commit-logs).  I suspect the histograms would need to be tinkered with a bit to match the distributions seen in such natural systems better.  Just trying various signals as to what denotes a surprising behavior on the part of the participants in the social network would be fun.  But it would be cool to reveal that when Alice acts in a surprising way shortly there after Bob does; and a bit later the entire group  descends  into a flame war.

Islanding

Reading recently that as Microsoft was selecting the sites for their new cloud computer’s data centers they had 31 variables as input.  I assume they plotted those on heat maps like this one showing the price of electricity across the United States.

Back in high school I Jane Jacob’s books on the economics of urban regions schooled me in a cynical attitude about these stories about site optimization.  I recall learning that the most powerful predictor of where a large firm would sight it’s new office park was the distance from the CEO’s wife’s horses.  So I wasn’t terrible surprised that one of Microsoft’s big data centers is in the country side of east of Redmond.

FYI – the drawing above is terribly misleading.  For wholesale power West Texas is a steal right now, wind power.  For a residential power consumer per month cost to connect to the grid tends to be a large additional cost.  I wrote about that under the heading of “micro-utilitity coops” using the gas company as an example.  Since then I’ve learned there is a nice term of art in the utility industry “islanding.”  That’s worth reading about if your want yet another way to look at the issues around localism.

Islanding is one of the themes that runs thru the discussions of cloud computing.  But it goes under various guises (security, control, specialization, cost or ops, capital equipment, bandwidth, latency).  That I continue to presume that anybody who can make a credible case for building their own island will be able to  negotiate  a pricing deal with their cloud vendor means I’m starting to think that people who run their own data centers feel like fellow travels with other the off-grid enthusiasts.  You gotta love ‘em.

Wave – Part 3

Google has  signaled  that they would like to see wave widely adopted.  In service of that goal they have a dot-org site which reveals a rough draft for a design that enables wave servers to interoperate so their users can collaborate.  But, the whole story is much more complex.  There is a lot to do before their signaled desire can become a widespread reality.

Let’s start with that picture.  Google has revealed some of the plumbing that might form a standard on the boundary between Wave Service providers and any Wave Federations that may emerge.  Lower in the drawing we have some hints about how the traffic between members of these federations might move.  The term XMPP is mentioned.  But that is really not enough.  I’m not bothered by that, it’s early days and rough specs are to be expected.

Let’s move up in drawing, into the Wave Service Providers box.  It would have been a more credible industrial standard move if Google had one or two other players willing to run up on stage and signal their intent to become Wave Service Providers.  Alternately they might have released a complete reference implementation for a Wave Service Provider and, of course, they should place it under an open source license.  The word providers is plural, but at the moment we can only be confident that Google will deploy.  Until some other names signal that they are, at minimum, seriously considering wave I think it is fair to say: Google’s open platform signal isn’t really credible.  It’s not a party if your peers don’t come.  It’s a certain kind of federation if all your partners are tiny and you are huge.  But, again, it’s early days and it’s a big world out there so I’m sure Google can find somebody to come on board.

All the cut points in that layer cake is full of make or break options for Google.  Take a historcial example up at the top, whe the Macintosh shipped in 1984 they set a stellar example of getting that bit right.  They provided beautiful example applications and they provided clear and concise user interface guidelines.  Right now all we have is an example application, one which almost a research proof of concept.  It certainly isn’t as elegant a user experiance as the other Google web apps.

Much as my layer cake implies we will see multiple wave service providers it implies we will see  multiple  wave aware  applications.  How many?    I think, and hope, it’s many.  But were is the signal from Google about this?  One wonders where the Google Search, Maps, Mail, Calendar, Docs, Voice etc. teams stand on all this?  Now, I think it would be insane for Google to take the risk of forcing all and sundry through out the company into a forced march to adopt wave.  But it’s very odd that you could make the argument that there will be only one wave aware application.  We need a much clearer signal about this.

Say you wanted to build an application for collaborative bill payment.  Questions start popping up fast.  Do you design it as plugin to some master wave application’s UI?  There was a period in the 80s and early 90s when a lot of the desktop OS vendors tried to create unified application frameworks; these didn’t work out.  Often it didn’t work out due to market power dynamics between the app vendors and the OS vendors and to a lesser degree it didn’t work out due to execution issues; but it looks to me like the same questions arise here.

Say I’m a vendor of web forum software, my customers install it on their systems.  Say I’m contemplating building a next generation version that’s wave aware, can it be installed on any of N wave service providers?  Does that sentence even meaningful?  This looks like a rerun.  We spend a lot of time in the 80s building software so it could run across multiple platforms.  These wave service providers look to me very similar to cloud computing vendors, very similar to platform vendors, and very similar to desk top OS vendors with their UI minimally interoperable user interface conventions.

Google has revealed a bit of the API provided by their sandbox Wave Service Provider.  How to build automated participants (aka robots), and how to build smallish widgets that provide little visualizations and games.  There is bit of hint that it will be possible how to build entirely unique kinds of wave documents.  That is one of the signs that it would be possible to build, say, a wave aware accounting application, a wave aware college admissions system, a wave aware travel agency.

It’s early days and none of the above should be taken as critical.  It is my intent to see if I can block out what I’d like to see happen.  Or maybe to just start to block out where to ask questions about what’s to be desired.

Wave – Part 2

This posting isn’t really about Wave, it’s about a lovely detail in the protocol design, in particular in the crypto spec.

Imagine that Germany, Britian and France are running wave servers for their citizens, and of course many wave documents have participants from all three countries.  These three servers are all federated and a torrent of traffic is flowing between the servers as citizens of various countries collaborate.

Federation is always a bit like  Goldilocks (not too hot, not too cold).      These three trust each other enough to exchange messages but they don’t really trust each other not to mess with those messages.  So obviously it would be nice if the crypto design assured that all the messages had a cryptographic signature on them.  Then when the German server got a message from a French by user way of the Brits it could be checked to be sure the Brits didn’t mess with it.  Nice idea, sure, but these messages typically constitute a single character, so signing every message is going to be aweful expensive.

The design has a lovely trick for solving this problem.  The French server bundles up batches of messages, signs the batch, and sends it off to the Brits.  This batch will include the edits of numerous citizens on numerous wavelets.  These batches will also have conflict resolutions for edits on wavelets the French server is hosting.  The Brits chew on that bundle and some of it, but not all of it, gets sent on to the Germans.  So the Brits clip out all the parts of the bundle that the Germans don’t need (and in fact shouldn’t see) and pass it on.  The trick?  Well, usually if you delete a portion of a signed message the signature breaks, but by designing the bundles just right the design avoids that.

They do this with hash trees.  The bundles are a tree; the messages are the leaves.  Each node in the tree has a hash of it’s immediate children.  At the root we sign just the top most hash.  You can clip off any branch in this tree as long as you leave it’s hash behind.  Cute.

I’ve been having fun thinking of other applications for this trick.  It lets you sign a document and then selectively reveal portions of that document.  For example you could use it to play battleship; signing the overall board at the beginning and then revealing each square one at a time as the game proceeds.  You could use it to sign a secret document which is later declassified, but with portions of the document censored out.

Wave – Part 1?

Wave is neat and I currently think it will be very widely adopted.  This note is quick summary of what it appears to be.  This is very impressionistic.  The  specifications  are amazingly rough!  I’m not sure I’d call this a platform but I’m sure other people will.  It certainly creates a large option space for building things.  Wave certain mets one of the requirements of a great platform; it opens up so many options for things to do that if you ask “What is it for?” the right answer is: “I don’t know.”  Or as a friend of mine once observed: a good platform is about nothing.  The demos at Google IO obscure that.  That always happens.  For example when they first demo’d the Macintosh they had build one or two spectacular application, MacPaint for example.  People would look at their demos and think: “Oh so this is a machine for doing drawings.”

Wave provides the tools to do realtime distributed coordination of a complex activity.  For example that activity might be a game of checkers, or developing a product plan, a conversation, the distribution of a todo list.  So Wave provides tools to solve the coordination problems that arise when you have a data structure distributed around and multiple parties all modifying it.  Wave adopts the same technique we use for source control,  optimistic  concurrency.  Everybody edits their local copy.  These edits may turn out to conflict with each other.  The resulting conflicts are resolved by some mechanism, which in the Wave terminology are given the math like name operational transforms.  In source control systems I’ve always called that conflict resolution.

A Wave document is said to consist of a set of wavelets which in turn contain one or more XML documents.  For example a Wave document representing a game might have wavelets for all the players, spectators, officials, the score board, game state, the moves, advertising, discussion threads, and individual comments, etc. etc.  Nothing in the Wave  specification  blocks out how all those wavelets manage to relate to each other.  Different activities will, of course have different kinds of constitute parts.  Nothing I’ve read yet specifies even the building blocks for things like users, bits of HTML text, etc.

But the spec does block out what the primitives are for editing the XML documents that constitute the atomic elements of the Wave.  Those operations are a small set of  editing  operations: move pointer, insert text, insert XML tag, split XML element.  It reminds you of using a text editor.

These are the operations which might give rise to conflict.  If Alice and Bob are working on the plan for a party and both change the budget for snacks those edits might both be represented by a series of operations (move to char 120, delete 5 characters, insert “25.00”); with Alice entering “25.00” and Bob entering “45.00”.  The protocol has a scheme for resolving this conflict.  It does not average the two!  It just picks one,  deterministically, and move on.

That’s about it.  But there are some entertaining bits piled on top that are fun, and necessary.  I’ll mention three of these: historical background, how this all gets federated, how access rights are managed.

Optimistic  concurrency  goes back at least into the 1970s, at least that’s the first time I saw it.  I think the first time I saw it used to for a realtime application with human users was drawing system out of Parc in the 1990s – and one of Google’s whitepapers on Wave mentions that.  These days there are two very nice applications that I’ve used to coordinate activities Subethaedit and Etherpad.  I highly recommend Etherpad to anybody who’s working on an agenda or meeting notes jointly with other people – it’s fun.

While it is possible to imagine implementing Wave entirely as a peer to peer system with no central coordination, Subethaedit actually does that.      Wave implementors are all going to have a server that labors on behalf of the users participating in the activity the Wave represents by storing the Wave document, and  orchestrating  the on going edits and naively resolving conflicts as they arise.  The plan is to allow a user’s wave server to collaborate with other such servers.  That works by having one server act as master for each wavelet.  It’s worth noting that every participant in a Wave document is not  necessarily  a participant in every wavelet of that document.  In our example game, two spectators at a game can have a private chat within the game’s Wave document.  To be responsive each server caches copies of the wavelets his users are participating in, and to for reasons of  thrift and privacy these are limited to just those.  The  authoritative  server is  responsible  for retaining the master copy of the wavelet and for resolving conflicts.  So every edit flows to this master and then back out to the other  participating  servers.  There is a bit of crypto complexity layered on on top of that to assure that that bad actors can’t  masquerade  as another participant.

It is very unclear at this point how access rights to waves are managed.  Obviously wavelets will have participating users.  The participation will be asserted along with some rights; for example the right to view but not the right to modify.  In addition there will be groups of users, and these too will be asserted as participating in a wavelet.  If that’s the case then determining if a user has the right to do something will involve searching the user and group assertions on the wavelet.  Remember above that a Wave Document consists of just a set of Wavelets.  Obviously for any given kind of Wave Document there will be more structure than just a set.  For example our spectators’ private conversation would consist of the conversation and then a thread of comments.  It is totally unclear if how access rights are  propagated  or computed across those structures.

Everything is amazingly raw at this point.  Which signals virgin territory.  It’s unclear how good a landlord Google will be, but no doubt a lot of people are going to go forth and attempt to stake some claims on this landscape.