Monthly Archives: June 2005

Trust in Exchange Swarms (and Streaming)

Exchange networks are full of trust and reliability issues. How do you know that the intermediaries aren’t modifying the content? How do you know that swarm members in a p2p distribution scheme aren’t freeloading?

You can tackle the freeloading problem with a reputation scheme. The participants in the swarm can report on how helpful their peers are. The system can create statistics on those reports. The statistic for a given peer become his reputation. In networks where peers have durable identities and participate over long periods this is probably sufficient to reduce freeloading to reasonable levels. Though you would still need generous participants to enable new participants to enter the swarms.

That participants can collaborate to keep an eye on each other. These models are statistical models of the behavior reports. Successful collaborative efforts demand a lot forgiveness and generosity, so for many measures it’s not a problem that the models are only estimates. What you sum up into the participants reputations will define what the participants aspire to. For example if you model how long a participant stays in the swarm and you can create incentives to stay longer.

The standard solution to middleman problems is to insist on validation end-to-end. That’s one reason the .torrent files in BitTorrent have checksums in them. An intermediary can’t modify the distributed file without the end user noticing.

I’m interested in how to use swarming approaches on streaming data; for example video or audio broadcast, blog pings, weather or other sensor networks. Harder to get an end to end solution for these streams. The broadcaster could sign every packet, I guess, but that would put significantly add to the costs.

Another possible approach is to assure that the checksums and the packets they validate travel by different and hard to predict routes. Participants can report on their peers adherence to a rule that checksum’s should never travel with their associated packets.

BitTorrent’s module dependency

I’ve been reading the BitTorrent code. Which lead to discovering this nice tool for drawing python module dependency graphs.

The drawing shows the modules of a swarm member in the current beta version (4.1.2) of BitTorrent. In the middle, the brown tangle, is the new DHT (or distributed hash table). Using that a swarm implement their own tracker. The two purple bits on the lower right implement the http server for the tracker. The purple bits on the left include the BitTorrent encoder, or bencode, as well as configuration and argument parsing. The stuff at the top implements the swarm’s collaborative file exchange; e.g. is orchestrating the exchange of pieces over connections with peers at reasonably rates. Here is a pdf version of that drawing.

If you want to use the module drawing tool you really need to subclass it’s main class and fine tune what’s shown. That’s not hard but you probably have to climb the foot hills of both the graphviz and python learning curves. If you enjoy playing with graphviz, as I do, then you can edit’s it’s output before handing it off to the graph layout program. The Macintosh application that wraps up graphviz in a pretty Mac GUI is sweet. (Or, LOL, you could paypal me $998.35, email me your code and I’ll make one for you! Fourth of July sale, mention this ad before the 10th of July and get 10% off!)

powerlaw in data structure pointers

The second data set I encountered with power-law behavior was while attempting to improve the performance of a huge software system, circa 1979. But I haven’t managed to collect an example of that kind for my collection so far.

Meanwhile this example shows power-law distributions in the inter-object references for in core data structures for a range of programs.

Back in the day when I’d read articles about GC algorithum design I never saw these graphs. Odd, don’t you think, designing a GC algorithm with out a model of this distribution?

via Rainer Wasserfuhr via his del.icio.us tagging.

Survey

Take the MIT Weblog Survey

I just wanted to post that.

The survey site doesn’t work in Safari, but it’s fine in Firefox. It also seems to think that I “posted” a lot of the links that WordPress bundled with my blog.

P2P Reputation and Social Stratification

The nice thing about P2P content distribution systems is how they lower the barriers to entry for content producers. When things are working then the cost of distributing content to N consumers drops for the producer from N to 1, and the cost for the consumers rises from 1 to 2 (see here). In theory this enables content production to move much further down the long tail. It empowers the smallest players.

The design of these systems is a beautiful example of the design issues around a collaborative system. The consumers need to collaborate. If the total contributions of the consumers don’t amount to 2N then things fall apart. I’m finding it interesting to kick the tires on this problem. You can design systems to temper the freeloading by having the consumers accumulate a reputation. You can base the reputation on reports provided by other consumers. So if A provides content to B, then B can add to A’s reputation as a good actor in the system. If peer to peer exchanges happen in largely random patterns then A’s reputation will be assembled from a diffuse set of partners; making it harder to forge.

I assume it’s possible to design such a scheme. One that would allow peers in the system to know the contribution level of their partners with a reasonable degree of confidence. I haven’t looked very hard. I assume there are some papers on designing such diffuse reputation systems.

Ok, so I take it as a given that I can design a system where the participation demands a uniformity of contribution. But wait, I don’t want that! Look at the real world. Systems in the real world have multiple actors contributing to their total energy; and the distribution of their contributions is usually highly skewed. If the real world is that way, then is it a good idea to design P2P systems that effectively outlaw that distribution? What consequences would follow from that?

One thing’s clear. If you enforced uniformity you’d get class stratification. Participants of similar reputations would tend to flock together.

I’m reading a book about Common Interest Developments, i.e. the semi-walled garden highly homogenous communities created by developers. Their enthusiasts buy into a utopian fantasy. One who’s “overvalued idea” is the maintenance of property values. Other values tend to be displaced.

If we force uniformity of contribution into the architecture of P2P system, then that becomes it’s overvalued idea. Getting all fixated on the prevention of freeloading displaces other values? Why does the real world not work that way.

The delicate trick in getting the P2P reputation system may well be finding ways to encourage a diversity of participants. A means that it doesn’t lead to stratification, with that stratification and progressively tightening regulation of each group’s wall and internal norms. Very interesting tangle of problems.

Freeloading on the healthcare system

Massachusetts, where I live, has a Republican governor who has been attempting to make progress on the healthcare debacle. The other day an article appeared in the paper with the latest chapter in this ongoing attempt. The word “freeloading” appeared in the article; to suggest that the current refinement’s goal is to remove options for freeloading on the system.

Freeloading is a problem in all collective action systems. It’s one of the reasons that classical minded economists run screaming from the analysis of public goods – you know, like public schools, public roads, public health, environmental regulation, etc. etc. More mature economists get a gleam in their eye and roll up their sleeves, of course.

One way to pull the rug out from under any collective activity is to highlight how some people are, or might, be getting
a better deal than others. For example you announce that families with more children are freeloading on the public schools, or that my acquaintance who got encephalitis is freeloading on the health care system. Of course that was probably caused by cut backs on the public health expense of keeping the mosquito population under control.

My first thought when people mention freeloading as a problem around healthcare is Wal*Mart. World’s largest corporation which manages to avoid much of the responsibility for healthcare that comes with being an employer. Employer responsibility for health care was one of the bargains struck by American society during the 20th century; it arose out of union negotiations. Over the last 30 years corporations have discovered that as unions weaken and their political power has increased they can opt-out of that bargain. The society has yet to reach a new consensus about how to bear this responsibility.

Of course there are lots of other candidates for the the role of healthcare freeloader; or to use more exaggerated speech – parasites. A lot of money has been spent to cast malpractice lawyers into this role, though the data would seem to suggest it’s the malpractice insurance providers who benefit from that subplot.

My impression is that since we have a total breakdown in the consensus about where the responsibility for healthcare resides we have fallen into a messy trap where all the players spend vast amounts of resources attempting to shift responsibility to somebody else. The business community created that mess. All now compete to see if they can shift costs someplace else.

The governor’s plan is to force individuals to buy health insurance, much as we force everybody to buy car insurance in this state. So today’s casting for the role of freeloader is individuals. That’s a very popular move on the right, shift risk onto the weakest player in the social game.

But looking more closely it’s interesting to note that the governor is, at the same time, trying to force employers to provide health insurance. To make them pick up the responsibility they dropped over the last few decades. Of course the details of that make all the difference.

One last detail, states are not allowed to work on problems like this without the permission of the Bush administration. To institute any change in the regulatory framework the state has to be granted a waver. Only one state has managed to get a waver that allowed them force firms to bear some of the responsibility, i.e. Hawaii.

Shifting Distribution and Coordination Costs

Here’s a slightly formal way to look at various ways of coordinating an activity. This grew out of my thinking about how to push content from producers to consumers without introducing a hub that coordinates the work. I was thinking about who bears the bandwidth costs.

One obvious way to solve the problem is to have the producer ship the content directly to all N consumers. He pays for N units of outbound bandwidth and each of his consumers pays for 1 unit of inbound bandwidth. The total cost to get the message out is then 2*N. Of course I’m assuming inbound and outbound bandwidth costs are identical. If we assume that point to point message passing is all we’ve got, i.e. no broadcast, then 2*N is the minimal overall cost to get the content distributed.

Two issues. All else being equal the producer would like to shift costs onto the consumers. Secondly – the hard problem here is not moving the bytes around; the hard problem is coordinating their movement. In the our first model most of the coordination cost is born by the producer. That has the benefit that coordination expertise will accumulate so that the cost of coordination can fall and the quality can rise. The producer retains substantial control over the relationships.

It’s not hard to imagine solutions where the consumers do more of the coordination, the cost is split more equitably, the producer’s cost plummet, and the whole system is substantially more fragile. For example we can just line the consumers up in a row and have them bucket brigade the content. We still have N links, and we still have a total coast of 2*N, but most of the consumers are now paying for 2 units of bandwidth; one to consume the content and one to pass it on. In this scheme the producer lucks out and has to pay for only one unit of band width, as does the last consumer in the chain. This scheme is obviously very fragile. A design like this minimizes the chance of coordination expertise condensing so it will likely remain of poor quality and high cost. Control over the relationships is very diffuse.

We can solve the distribution problem by adding a middleman. The producer hands his content to the middleman (adding one more link) and the middleman hands the content off to the consumers. This market architecture has N+1 links or a total cost of this scheme is 2*(N+1). Since the middleman can server multiple producers the chance for coordination expertise to condense is generally higher in this scenario. Everybody, except the middleman, see their costs drop to 1. Assuming the producer doesn’t mind being intermediated he has incentive to shift to this model. His bandwidth costs drop from N to 1, and he doesn’t have to become an expert on coordinating distribution. The middleman becomes a powerful force in the market. That’s a risk for the producers and the consumers.

It is possible to solve problems like these without a middleman, instead we introduce exchange standards. Replacing the middleman with a standard. Aside: Note that the second illustration, Consumers coordinate, is effectively showing a standards based solution as well. We might use a peer to peer distribution scheme, like Bit Torrent for example. To use Bit Torrent’s terminology the substitute for the middleman is called “the swarm” and the coordination is done by an entity known as the “the tracker.” I didn’t show the tracker in my illustration. When bit torrent works perfectly the producer hands one Nth of his content off to each of the N consumers. They then trade content amongst themselves. The cost is approximately 2 units of bandwidth for each of them. The tracker’s job is only to introduce them to each other. The coordination expertise is condensed into the standard. The system is robust if the average consumer contributes slightly over 2 units of bandwidth to the enterprise, it falls apart if that median falls below 2. A few consumers willing to contribute substantially more than 2N can be a huge help in avoiding market failure. The producer can fill that role.

Of course swarming is not the only way we can arrange a standards based solution to this problem. It’s notable because it is both reliable and the total bandwidth cost can be 2N, the minimum. I find it interesting that when the cost approachs that minimum the swarm becomes unreliable. The second model where consumers coordinate the distribution in a bucket brigade can be made more reliable by introducing additional redundant links; these are another way to buy reliablity in exchange for increasing the cost above the 2N minimum.

I find it fascinating to see how the coordination costs, market power, and reliability of the market’s clearing are shifted around in these various scenarios. The bandwidth costs act as a partial proxy for those. Market participants are most concerned about risk. They want to place their faith in a market structure. Once the rendezvous around a given structure then can have a meaningful discussion about the risks that structure creates. The first model has the risk of powerful producer. The second and last models have the risk of policing standards compliance. The middleman has well known agency risks.

Standards based solutions always have problems with policing and freeloading. I think it’s neat to notice that if the producers and consumers are exchanging data over many time periods they can establish a trading framework with reputation, currency, and market clearing schemes that assure that everybody contributes their 2 units of bandwidth. In effect you can make such systems self policing in much the same manner used in a competitive market. Which goes to reenforce the way that exchange standards create and shape markets.

Reciprocity and Data Licensing

The phrase “reciprocal licensing” got into my head a few months ago. If you go poke around you find that the term is mostly used to describe ways that one organization agrees to honor the licenses granted to professionals by some other organization. Licenses are, of course, set of rights and responsibilities. For example is the state of Massachusetts grants me the right to drive a car on it’s highways most of the rest of the planet will also grant me that right. Lots of professions have associated systems of reciprocal licensing: plumbers, teachers, and lawyers. When you grant respect to a person’s title, for example Vice President of Engineering, outside the context where that title was granted you would seem to be engaged in a form of reciprocity.

Reciprocity isn’t just for roles. The GNU software licenses are a form of reciprocal licensing. In exchange for the right to modify the software you relinquish your option of hoarding those modifications. The peer to peer file sharing, systems are another example, in exchange for drawing down a file from the system you are expected to reciprocate by providing bandwidth for further file sharing. The peer to peer messaging and VOIP systems are two more example. Once you start noticing you’ll see lots of these.

Reciprocity is one way of framing the coordination aspects of collaboration. It’s a way of talking more explicitly about the responsibilities that come with club membership around a club goods. The peer to peer system, the GNU code community, the professional society around plumbers or doctors, even the universe of international drivers all have rules that govern how the goods are maintained in collaboration. You can find other examples around patent pools. Some platform vendors have begun to use these ideas to assure that members of their developer networks add to the platform. For example Microsoft’s consumer electronics platform has rules that clarify the timing of various developer IP rights so innovations can flow back into the common pool around their platform – or so I’ve been told.

Peering agreements are another interesting application of reciprocal licensing used is for. IP packet routing, or phone call routing, or check clearing are all examples. The problem is you want to join a distributed network but your concerned that other members of the network will freeload or cheat in various ways that disadvantage you. You need trust at a distance. By setting up a standard contract that every participant is required to sign you can temper some of the risks.

Data pooling agreements have the same challenges. Consider an example. Imagine you are an insurance company. You have worked hard at great expense to collect data about insurance claims. This data is valuable because in aggregate it drives how you set rates. If you had more data you could set rates even more accurately. That could be very profitable. The insurance industry shares data like this by establishing master database. And, I assume, they use a contract much like a patent pool or a peering agreement to reduce the risk that some companies freeload or engage in any number of other ways of gaining an advantage.

Of course all industries have a power-law distribution of players. One way you can disadvantage the small players is to create barriers to entry that only large players can overcome. One place you can establish these barriers is with the rules around the reciprocity agreements. But, on the other hand, you can use the reciprocity agreement to frustrate that. The GNU license is a fine example of that. There is another example found in the open source VOIP community. They need to exchange call routing data. They have a peering agreement for that which tries to keep large players from locking out tiny guys – like me with my basement phone switch.

For the last year or so I’ve been puzzling about the problem of how to equitably distribute blog pings from blogs to interested parties. The tough nut in the design is that there are some very large gorillas in the blog world and they are all, presumably, tempted to horde their ping data to gain various competitive advantages. And, as usual in these two sided markets where there are lots of providers and lots of consumers there is a two sided network effect that could lead to a single winner who acts as the hub for the exchange. A middleman taking a small cut as he coordinates the interactions. The usual way to temper that risk is to introduce some standards that fragment the market while solving the coordination problem enough. With luck you can even get the standard to do a better job than a single hub would.

All that has brought me around to the question of what the reciprocity rules would be in such a market. What the world needs is creative commons like license for pings. I want to know that when I ping weblogs.com, or ping-o-matic, or technorati, or pubsub.com that they aren’t going to horde the resulting data. I want to know that they have signed onto a peering agreement that assures that my ping joins rest of the ping flux as public data. Some rights and responsibilities are granted when you ping these guys. Their responsibilities need to be made more explicit.

I don’t mind if they want to horde information that they accumulated by spidering, polling, and scraping; though I suspect we will see similar creative commons like licenses appearing on sites. Licenses that make explicit the reciprocal nature of the relationships.