Monthly Archives: January 2005

Solicitation Handler

Here’s a fun idea, a generalization of MIME types.

As a prolog let’s look as details of one of two one-click subscription solutions mentioned in the prior posting. The one that avoids a central hub/server but instead requires that we do some rework on all the clients. That cost got me thinking about how to pack more value into the rework.

In the blog subscription scenario the blog offers a “subscribe now!” button. Clicking it sends to their browser/reader a document which the browser/reader reacts to by creating a subscription. That action is triggered by the MIME type of the document. In Tim’s scenario MIME type is application/atom+xml.

If your managing your blog feeds using a local news reader as I am (e.g. NetNewsWire) then it’s an easy matter for the reader to register as the handler for that MIME type. Things are a bit harder if your new’s reader a web site, for example BlogLines. In that case you need to install something that catches incoming documents of this MIME type and turns around and hands them off to the web site.

Can we generalize this idea? Certainly. How much value can we pack in this subscription offer MIME type?

A simple step up in the idea is to create a MIME type document bouncer. An application you install on your machine and when a document of a given mime type is received or opened it turns around and redirects that document off to helpful website. For example such a beast could help users view documents that aren’t widely supported. There are plenty of these! This would useful.

A nice improvement on that would be to provide a way that helpful websites could plug into this easily. For example. Bob implement a service that can convert documents of MIME type ‘application/powerpoint’ into other formats – for example SVG, Open Office, etc. Bob’s website is a pain to use. Bob needs a way to tell the document bouncing application about his service. So he adds “Add Converter” button his site. Should the user click that a document sent to the user’s reader/browser (might have the type application/dispatch-offer). That gets handed off to the document bouncer which update’s it’s dispatch-tables/pattern-matchers and next time a document of type application/powerpoint come over the threshold it can offer our happy user the option of the document to Bob’s conversion service.

In my fantasy this can get even cooler. Going back to the blog example. The blogs provide a ‘Subscribe!’ button, the document that returns can describe a number a means of subscribing – email, rss, atom, whatever. The new reader(s) can place a number of patterns into the dispatch negotiation program. After hitting the subscribe button the user is presented with a menu of choices. He that allows him to route the subscription offer to the handler that is appropriate for the blog in question. He can redirect the subscribe offer to BlogLines, NetNewsWire, depending on what fits the situation.

One last step. The data feeds from blogs are just the tip of the iceberg. For example all my financial institutions have data that I pull down from them in service of keeping my financial house in order. Each one of these ought to, but doesn’t, have a “get data feed” button. Stitching up the links, for example passing those links thru to my accountant, is a huge plumbing problem.

It’s a fun idea that we might be able to create a very general scheme that allows data providers to describe what they have to offer – a set of choices – in a single document. It’s also a fun idea that data consumers could describe what they offer to handle. Giving the the offer to provide, and offer to handle a MIME type and a manager that the user controls we can keep the user control of making these linkages, where he belongs, while making the user’s experience tractable.

Mathworld privatized and destroyed.

In my father’s day every scientist and engineer paid a regular tax to CRC for access to his table of logarithms, etc. His copy of that book which he gave me when I went off to engineering school is so worn that he rebound it in a piece of cordoroy. These days many pay a tax to Wolfram for something analogous. So that CRC considers Wolfram a very threatening competitor shouldn’t surprise anybody.

What a sad story! It’s really amazing how common this pattern is; a public good emerges. A private entity finds a means to gain ownership of it. Private entity then proceeds to destroy the public good. It’s enough to make you adopt the GNU license.

Making Links

By now we all have come to understand that links are a unit of currency. The number of inbound links you have, the number of customer accounts, the number of subscribers to your site’s feeds are all metrics that denote something about how successful your doing. In turn we know that links create graphs and graphs of links often have power-law distributions with amazing class distinctions betwix the parties in the graph. We know those class distinctions are not a consequence of the merit or value created by the links but instead of how fast the graph is grown or how the nodes merge as market share is rolled up thru mergers. So we know a lot about links as elements in the process of creating wealth. Every scheme for creating links will become the target of bad actors.

We also know that links play a role in the identity problem. That the more you know about a persons links the more accurate your model of him can be. We know that accurate models of users are fungible. A better handle on who the user is enables targeted advertising and more highly discriminated pricing. A better handle on who the user is enables transaction costs to be reduced. Single sign on, one-click purchasing, automated form filling are not the only examples of that.

It surprises me that we need to be reminded of this each time we encounter another effort to create a means to creating a large quantity of links.

This month’s contribution to the let me help establish a mess of links party is one-click-subscription. The puzzle in this case is how to lower the barriers to subscribing to a blog. Solving this problem requires moving three hard to move objects – all the blogs, all the readers, and sticking something in the middle between them. Both suggested solutions need to move all three; but they vary in where they put their emphasis. The blog hosts are probably the easiest of the three to move – they have an incentive to move and the market is already very concentrated.

One plan is the classic big server in the sky plan. Everybody rendezvous around the hub server. Requests to subscribe are posted to the hub. The user’s reader keeps it’s subscription set in synch with the hub. The business model suggested is a consortium organized by the common cause of a stick – fear of somebody else owning this hub – and a carrot – the bloom of increased linking it would encourage. Since early and fast movers will capture power-law elite rewards in such linking build outs there are some interesting drivers to build the consortium. Large existing players should find it advantageous to get on board. The principle problem with this plan is it’s a bit naive. A consortium of this kind is likely to become player in any number of similar hub problems, for example identity. This hub will have account relationship with everybody. It would know a lot about everybody’s interests. To say the least, that’s very hotly disputed territory. This plan has triggered more discussion than the following plan.

The second plan that’s been floated is to introduce into the middle a standard which blogs can adopt and readers can then leverage. This implies changing the behavior of most of the installed base of blog readers. The structure of that installed base is less easily shifted. The idea is to have the subscribe button return a document to the client’s browser (or blog aggregator/reader) which describes how to subscribe. Automation on the reader side can then respond to that information. This means introducing and driving the adoption of a new type of document, a new MIME type. It probably means installing a new bit of client software on everybody’s machines. The browser market leader would have some advantages in making this happen; and could there for very likely coopt any success in this plan to drive users to use his aggregator. But then that may only point out that the only reason we have a vibrant market of blog reading solutions is because the dominate browser has been dormant for a few years.

These are hard problems, and this is only one of many we currently face.

Viral Communications

I’m not sure why this paper is called ‘Viral Communications” (pdf) but it’s fun and two things leapt out at me.

Future Proof: “Upon installation, the purchaser will have the expectation that they should work for the expected lifetime of the device itself independent of any other changes in the … environment.”

“… a commons where each new cow adds grass.”

I need to go back and add “future proof” to the reasons why people standardize; it’s a varation on what I call there “prevent stranding.” The cow line is a nice way of framing the club-good boundary maintenance problem.

My problem with the term ‘viral communications’ is that it’s so similar to ‘viral marketing.’ I know what viral marketing it; it’s marketing communication that manages to parasite on somebody else’s communication and there by captures some legitimacy which it can use to get past the recipient’s defenses against marketing. The paper is about systems of collaborating devices that use a modicum of coordination to create scalable distributed networks with lots of bandwidth. The vision is to use standards (the ones that guide that collaboration) as a substitute for the lawyers at the FCC.

Tagging Powerlaw


Following up on something Clay mentioned the following chart plots the distribution of tags for four popular URI at del.icio.us. Each line is the tags assigned to one URI. Each point is one tag, the vertical axis is how many times that tag was used to label that URI. The more popular tag for a URI is on the left; the least on the right. Note the power-law distributions.

I’m extremely surprised that the slopes are so similar. Of course a sample of four isn’t very large. If tags were drawn at random from the english language the slope would be slightly larger than -1. I’d assume that as the page becomes more focused the slope becomes more extreme. So that is, I guess, a hypothisis that if you find all the pages with more than 500 tags and extract for each of them a score, i.e. the negative of their slope. The high scoring ones are very focused while the less focused ones are more generic; i.e. scattered over the space of all things tag-able in english.

The audience of del.icio.us readers presumably is also critical in determining the slope. If they are all java programming, web site hacking, 20-30 year old geeks then that certainly imposes a high degree of focus. Or if you like message discipline.

Sadly I don’t see a trivial way to find a random set of pages with more that 500 tags.

Update: More here.

Standards: Bottom-up, Top-down, and other?

I think I’m noticing something I’d not noted before about the statistics of the populations served by the standard. Some standards don’t scale the way you might expect them to. For example the B2B standards written in the 1980s and 1990s have not scaled. They are not used by small businesses. Bear with me as I build some substrate.

Exchange standards provide efficiencies for the transactions between parties. If we all adopt a similar handshake then we can have more handshakes at lower cost. Next up world peace.

In thinking about standards you can focus down on the details of the single exchange, but I find it more fascinating to shift up and think about the populations involved with the standards. For example consider selling a house. Down in the details you can pick apart the process steps taken by a buyer, seller, and middleman they do the transaction. Stepping back, at the real estate market level, look consider the three populations: house buyers, home sellers, and real estate agents.

Statistics gives us tools to talk about these populations. Even the simple statistics can illuminate some interesting things. For example we know that small populations are easier to organize and coordinate. In our example the smallest population is the real estate agents. When society comes to negotiate the rules (aka standards) for home sales the agents are much more likely to get their desires fufilled because they have an easier time getting their act together. The smallest of the three populations in any standard setting scenario have an advantage. That is a political reality.

Another simple statistic: some members of the population do more exchanges than others. The surprising fact is how skewed that is usually is. Some members do a lot more exchanges than others. Again and again when you look at the populations around these exchange standards you find a power-law distribution.

A cartoon approximation of a power law curve splits the population into two groups: the elites and the masses. In the blogging world they call the elite bloggers the A-list. (Sidebar about the risk of cartoons. This crude approximation is blind to the middle class. Blindness can do harm and so can approximations. That said, we return to the fun of this cartoon.)

The small size of the elite population has a political consequences exactly like the small population of realtors. The large transaction elite have power. For example, returning to the rules around real estate sales again, we can look at the population of house sellers for a high volume player – i.e. real estate developers. Not surprisingly they show up at the table to help set the rules.

Ok, so back to the to seemingly new thing.

What I’m noticing today is that there seems to be some very interesting to say about what happens about the correlation between two population statistics.

What happens if the elites tend mostly to exchange with only with each other? That’s what happened with the B2B standards that were written in the 1980s and 1990s. Large businesses recognized the benefits of getting standards in place to improve their efficiency. So they wrote all these standards to fit their needs. Today lots and lots of commerce takes place intermediated by those standards; but the majority of that commerce doesn’t involve small economic entities.

It’s probably worth ringing all the changes here, but that’s a project for another day. For example you get a particular kind of standard when there is a small concentrated elite on one side exchanging with a huge diffuse lower-class on the other.

If a standard is designed with only one class of players, large transactors say, then it won’t be well suited to the needs of the players who were not there when it was designed. I think that’s a very interesting insight.

Let’s go back to the example of the B2B standards designed in the 1980s and 1990s. Consider this design question: “How hard/expensive should it be to adopt this standard we are designing.” The answer the elites gave was “No more than two expert consultants for 3 months.” The answer the masses would have given: “Oh, $49.95 would be acceptible.” There is a third group at the table. The guys that design and implement the standards, i.e. the vendors. Their answer is always “As much as possible.” So the B2B standards of that era got designed with a high adoption cost, i.e. they are standards with very high barrier to entry.

This doesn’t always happen. Sometimes a standard comes out of the masses. As SMTP or HTTP did. Small players solving a problem that then got widely adopted. Such standards have their own problems; for example they may not scale well for the traffic patterns that the elite players experience. If you fear the power-law’s tendency to concentrate power you might like this kind of standard. If your trying to consolidate a market you might prefer the other kind of standard making.

Owning an installed base of words.

At one point in David Weinberger’s delightful after dinner speech he’s working thru some thoughts on ontologies and tagging by describing how odd the 200’s are in the Dewey Decimal system. “The Buddhists, their to the right of the decimal point.”

He then asks why hasn’t this been fixed. The short answer is “immovable installed base,” but Dave has much more fun with it by asking his listeners to visualize librarians slowly scrapping the white paint off the back of millions of volumes as they convert to the upgraded version.

Two things came to mind when I read that. First is the way that a finite field of integers creates scarcity, so if there are only N digits in the product bar code you must establish a central registry and that in turn creates a hub, which in turn creates a point of power. The fixed sized fields for IP numbers are another example. Even if you design for unlimited abundance, as for example the domain name system strived to do, you still get forces that lead to scarcity. It’s nice to have a short domain name. It’s nice to own .com.

The second thing that came to mind was how the immovable installed base is on the one hand the object of desire. What the capitalist is seeks to own. Since immovable installed base is but another name for loyal users. While on the other hand it is what the many fear. The careful designer of an ontology lives in fear his legacy will not be the next Dewey Decimal system but rather that he shunted the 360 Million Buddhists to the right of the decimal point. This gives him pause, it slows him down.

The puzzles, not to be lightly tossed aside, are thus. Does the internet’s culture of abundance sufficiently lowers this risk that our designer can to set this particular fear? How do we preclude the key ontologies from being privately owned in the way that Westlaw owns the pointers into all case law.

Rondevous: lower life forms

My father liked to order things off the menu he had never eaten and then cheerfully attempt to get his children to try them. This is a cheerful kind of cruelty that I have inherited. We often told the story in later years of the time we ordered sea urchins in a dingy chinese restaurant in New York. The consensus was that the sea urchins weren’t actually dead when they got to the table. They would go in your mouth and when they discovered you were attempting to chew on them they would quickly flee to the other side of your mouth. Being a very low life form once you had succeeded in biting them in two your reward was two panic sea urchins in your mouth.

I’m reminded of this story by the fun people are having with neologisms these days. For example blog, or folksonomy. A neologism is rarely a very highly evolved creature, which makes it hard to pin down. But there in lies the fun. You can have entire conferences about a single word because collectively nobody really knows what the word means. These littoral zones are full of odd creatures. The tide of Moore’s law and his friends keeps rising. The cheerfully cruel keep finding things to order off the menu.

But, before I got taken prisoner by that nostalgic reminding, what I want to say something about is this definition of ontology that Clay posted this morning.

The definition of ontology I’m referring to is derived more from AI than philosophy: a formal, explicit specification of a shared conceptualization. (Other glosses of this AI-flavored view can be found using Google for define:ontology.) It is that view that I am objecting to.

Now I don’t want to get drawn into the fun that Clays having – bear baiting the information sciences.

What I do want to do is point out that the function of these “explicit specifications of a shared conceptualization” is not just to provide a an solid outcome to the fun-for-all neologism game.

The purpose of these labors is to create shared conceptualizations, explicit specifications, that enable a more casual acts of exchange between parties. The labor to create an ontology isn’t navel gazing. It isn’t about ordering books on the library shelves. It isn’t about stuffing your books worth of knowledge, a tasty chicken salad between two dry crusts – the table of contents and the index.

It’s about enabling a commerce of transactions that take place upon that ontology. Thus a system of weights and measures is an ontology over the problem of measurement and enables exchanges to take place without having to negotiate from scratch each time, probably with the help of lawyer, the meaning of a cord each time you order fire wood. And, weights and measures are only the tip of the iceberg provide the foundation for efficient commerce.

But it’s not just commerce, ontology provide the vocabulary that enables one to describe the weather all over the planet and in turn predict tomorrow’s snow storm. It provides the the opportunity to notice the planet is getting warmer and to decide that – holly crap – it’s true!

To rail against ontology is to rail against both the scientific method and modern capitalism. That’s not a little sand castle on the beach soon to be rolled over by the rising tide. Unlike say Journalism v.s. blogs, it’s not a institution who’s distribution channel is being disintermediated by Mr. Moore and his friends. Those two are very big sand castles. They will be, they are being, reshaped by these processes but they will still stand when it’s over.


What really caught my attention in Clay’s quote was “shared conceptualizations.” Why? Because sharing, is to me, an arc in a graph; and that means network, which means network effect, which means we can start to talk about Reed’s law; power-laws, etc. It implies that each ontology forms a group in the network of actors. To worry about big, durable, or well founded these groups are it to miss the point of what’s happening. It’s the quantity, once again, that counts.

What we are seeing around what is currently labeled as folksonomy is a bloom of tiny groups that have rendezvous around some primitive ontology. For example consider this group at flickr. A small group of people stitching together a quilt. Each one creating a square. They are going to auction it on eBay to raise funds for tsunami victims. For a while they have taken ownership of the word quilt.

What’s not to like? The kinds of ontology that is emerging in examples like that is smaller than your classic ontologies. These are not likely to predict global warming, but they are certainly heart warming.

This is the long tail story from another angle. The huge numbers, excruciating mind boggling diversity, billions and billions of tiny effects that sum up to something huge.