Category Archives: General

Multihome

Yeah, here’s an idea.  Why can’t I set my Firefox browser to distribute my search randomly across N selected search engines?  That would let me continually test which search engine is coughing up the best results with the least offensive ads.  It would erode the monopoly power of the search engines and increase competition.  Well, why not?

Two-sided networks tend to become highly condensed, in the exaggerated case this is called “winner take all.”  One driver toward condensation is the cost for players on either side to maintain relationships with two competing networks.  You could own multiple computers, and run multiple operating systems, but by and large you don’t.  You could use multiple search engines/portals, but by and large you don’t.  The cost of coordinating two homes tends to make it unusual.

The motivation for maintaining two or more homes seem to come down to competition; but in reducing them to that we lose something.  It helps to back out what the benefits of competition are.  I won’t go into that, but terms like choice, reducing provider pricing power, a modicum of self regulation, and innovation all come to mind.

Motivations for multihoming are, presumably, the a vein that can be mined to find counter vialing forces to the natural tendency of two-sided networks to condense.  You can then draw out those players who most strongly feel the benefit to create advocates for a less condensed market structure.

Intermediaries, middlemen again, seem like one way that can happen.  The irony here is that two-sided networks are, at their heart, about reducing the cost of search out or coordination work with the population on the other sided of the network.  I.e. they reduce the cost of multihoming; and enable interoperability.  This irony is one of the reasons that powerful hub owners tend to swallow the intermediaries adjacent to them – the adjacent intermediaries are a source of competitive threat; not because they compete but because they can enable other competitors.

When Google pays the Firefox folks for premiere slot in the UI they are, of course, working around the problem.

Rich Relationship

I’ve been musing about Google’s acquisition of Double Click.

I have a friend who had a problem. When you put his name into Google the results consisted entirely of articles about a contentious event he was peripherally associated with. He spend 3 years engineering other materials on the web, including soliciting links from me and other friends, to drive that drivel off the first page of “his” results. Here’s a guy with an impressive resume of fine work; so the phrase slander comes to mine.  We are all coming to fear Google; because it can casually can destroy.

I’ve mentioned before Double Click is very likely the largest identity provider on the net. They manage that trick by avoiding the hard problem: moving the installed base. They don’t try to change the installed base of browsers. They don’t try to get the installed base of users to sign up. They adopt the gossip model for identity, they build statistical models based on gossip. They sell the gossip models to firms. Part of their payment is gossip about the users.

I hadn’t noticed before how this is similar to Google’s scheme for modeling the value of sites. In original Google they build statistical models of sites based on the link graph. (Obviously the data that web bugs collect is can improve those models).

When I first began thinking about Double Click as an identity provider I was more focused on how evil they appear because they have no relationship with the users they are modeling. That, not surprisingly, makes the user suspicious. “Who are these people talking about me without my involvement!” That Google does the same thing for web sites and that we treat that as less offensive says something deep. Both how alienated we are from our sites; and it reminds one about the entire industry around self presentation (pr, search engine optimization, etc. etc.).

The key reason Double Click offends us is the fear that their model will bite us. While they maybe malicious they are almost certain to be cavalier. The concerns arises regarding Google and the models it builds of our sites. It can casually destroy them.

Nobody should continue to pretend that these gossip models can be avoided and that a handful of firms will have extensive ones. I wondered sometime ago if “you were king of Double Click could you fix this problem”? At the time it seemed to me that part of fixing it would be to begin to build a relationship with the users. I guess it will be interesting, as in “may you live in interesting times,” to see how Google tackles that problem. Hopefully they can do a better job that the credit reporting firms have. The puzzle is how to do that with the tools at hand: vast numbers of people and computers.

Want to get cited? Write an old paper.

citations_vs_year.png

This chart shows the average number of citations papers in a given year have garnered since their publication. Again this is the citeseer data.

Let me those offer three very tentative explanations.  I’m sure people can come up with more.  Presumably, it’s easier to write a fundamental, useful, provocative paper when a field is young. Preferential attachment effects help older papers to bank more citations. Noting how few papers are in the collection in 1985 I suspect the collection manager is seeking out the papers from that era and starting with those that are cited.

Citeseer Year of Publication

citeseer_pubyear1.png
I guess I don’t have any idea what universe Citeseer’s collection actually covers, or attempts to cover. I think it’s papers in Computer Science. I think it attempts to be the entire universe of such papers.

You can down load a dump of their catalog in XML. The copy I got has 717,172 publication ids in it. Many of those have a publication data recorded; and most of those are plotted above.

Looks like the exponential growth in these publications came to an end in the 1990s.

Theories anybody? Extra points for amusing theories. For example they all decided to post at Wikipedia and their blogs instead. Or the opportunities in the web space highly distracting, but made lousy papers. Maybe Microsoft hired everybody and drew a curtain around their work.

The updraft of preferential attachment goes missing

This is posting is a follow up to the posting about the shape of the author citation curve in computer science. That showed, unsurprisingly, the power-law distribution; and the interesting tempering of the inequality among the top few authors. This chart shows only the top 100 papers, their rank runs left to right and the year of their publication runs vertically. I made it expecting to see signs of preferential attachment; but you don’t.

Year of Publication v.s. Rank - Top 100 papers

I think this is yet another example of how the elites are often a different species; but I’ll need to see if I can get a wider window than the top 100 to be sure. I don’t see why professional communities like these would be immune to preferential attachment.

Another Criminal Business Model

My favorite criminal business models involve large firms that subcontract or franchise work to small actors work which would get them into trouble. They usually hand the work over to small actors who lack the assets to be be effectively punished for the bad acts. But here’s an example of a criminal business model where they firm doesn’t even bother to intermediate the bad acts.

In San Francisco … “Last year, United Parcel Service paid $673,334 in fines for 11,788 tickets — an average of one ticket every 45 minutes throughout the year.

The traffic rules are a means society uses to keep things civil and moving smoothly. UPS has figured out that it can make it’s internal operations move more smoothly by making the rest of society run less smoothly. If you made the parking fines progressive, so that legal entities like UPS who abuse the system are progressively fined and escalating amount, then UPS would just restructure their urban delivery operations so the operators where a flurry of smaller operators. That’s what Herbalife does so it can strap butt ugly signs up all over town.

So next time you spend a few minutes stuck in traffic behind double parked UPS truck you can think of the this puzzle: of how is civil order is maintained when commerce trains it’s people to make these trade-offs to it’s advantage.

“McMillan Electric Co. contributed $74,375 toward the total. The family-owned San Francisco firm, which does most of its business downtown, received 1,497 tickets over the year. “It’s a business decision,” company president Pat McMillan said. “Is it cheaper to pay the ticket, or is it cheaper to pay the guys working for me to spend time looking for a legal parking space?”

McMillan pays his workers about $80 an hour and said risking a parking ticket often wins out. ‘I don’t like it, but we’ve got a job to do, and we have to get our guys in there to work.'”

Who’s the jerk, the driver, the firm, or the the boss?

(Nod to Faisal)

Technorati’s WTF vs Everything2

I’m struck by how much Technorati’s new WTF seems to so very analogous to Everything2. Both want to be about providing useful definitions for things. They aren’t quite encyclopedias or dictionaries (ala wikipedia) because their domain is more dynamic, more situated in the cultural noise. I’ve mentioned in passing before my cartoon version of the Everything2 story, e.g by giving bonus points for “cool” postings they ended up getting a site full of entertaining rather than authoritative stuff.  So far the Technorati system has a very simple voting system.  Too simple if you ask me.  T’s WTF is, of course, are in that fun period of first contact with users [1].    But then there is quite substanative difference; Everything2 was a giant experiment in reputation and governance.  I doubt that’s high on Technorati’s agenda.

Preparing to Condense

Clay’s essay urging a strategy of preparing for emergence rather than designing to control risk.  It is perfectly natural for designers to presume that the right route to a mature large system is to enumerate the features of large mature systems and then to start designing, building, and testing them. Clay argues that this behavior is a mistake and I agree.

The 2D B-school plain upon which Clay’s essay rests is plasticity v.s. specificity. As systems mature they become less plastic and more specific. Designers can decide to consciously chart course across this plain.

The nicest turn of phrase in the essay is that “Software systems, however perfect, rarely survive first contact with users unscathed…”. There is a phase change as systems mature, a chilling effect. The designer’s problem is to husband the system’s plasticity, it’s warmth. Spending it during design is a mistake.

The trick is to chart a course during design that increases both specificity and plasticity; so that when you met the users and the cooling begins you can condense structures that are best suited to your joint needs. You want these structures to be ones the users will perceive as legitimate, valuable, comprehensible because they were negotiated jointly. The challenge during design is to prepare to build them, but not to build them.

Most attractive Computer Scientists

Another example for my collection of power law distributions. The data is taken from CiteSeer. It shows the most attractive ten thousand computer scientists. Where attractive means that their papers attracted bibliographic citations. This is normalized by publication year, what ever that means? I feel guilty about posting this one. This kind of competition is extremely toxic to collaboration.

Powerlaw of most cited computer scientists.

The flat bits on the curve appear to be coauthor pairs. The gentle bow to the curve suggests that something regulatory starts to kick in toward the top. If I had data I presume the curve would become more stern.

Somebody must have computed slope of this curve for different academic domains. The shape over time would be even more interesting. Assuming one has a choice about one’s calling, wouldn’t that be valuable input to the decision?

Slander local business, earn a dollar

Last fall InsiderPages.com put a price, on dollar, on what they were willing to pay for local business reviews. I stumbled on this when yesterday morning Google injected some very critical reviews of my auto mechanic onto a map page.  I was checking if I’d found the right phone number.

These reviews were bad enough to trigger my calling two other mechanics.  They couldn’t take my car until next week.

Finally I went back to look at the reviews. They were weird.  They complained about the prices my guy charges for food and refered to some newspaper article about his his bad behavior. They were totally bogus!  He doesn’t sell food and my area papers are far too business friendly to ever write such an article.

The reviews Google subscribed to came from Insider Pages.  Looking up auto mechanics I could see that they, amazingly, had one and only one review for every mechanic in town.  That’s not how community generated content looks.
When I signed up to write a counter review I was prompted for a promotion code and that lead me web pages showing their promotion last fall.

This kind of slander could destroy a small business and it should be criminal. Insider pages knowing paid people to create fraudulent reviews and given the statistics of the reviews I say they made no effort manage the problem. Google’s authority only magnifies the crime. I presume it would be a piece of cake for some lawyer to find a thousand businesses slandered like my auto mechanic has been and bring suit against the various parties in this example. Something more severe than letting the market punish them is appropriate.

This is the dark side of social networking (talent scrapping) sites. Get the talent to commit the crimes you need committed. It’s the social networking site owner as troll. In the worst case the business is looking for a way to force, i.e. blackmail, the local businesses into participating at the site.  Angie’s List has been accused of that.