Monthly Archives: February 2006

Problem Cases

Malcom Gladwell’s last two essays in the New Yorker (Troublemakers and Million Dollar Murphy) are both about power-law distributions; in fact I think you can make the arguement that most of his writing is about power-law distributions. It cheers me to see such a high profile discussion of the power-law distribution.

These last two essays are about problem cases; i.e. that in a large population of actors a small handful will own the responsiblity for the majority of the externalities the population creates. His examples include problem dogs, problem homeless people, poluting cars, and violent policemen. You’ll notice, the externalities he focuses on are negative externalities.

The more recent essay, which isn’t on line as yet, makes a point that oft goes under appreciated. One’s natural intuitions about how to handle the typical case need to be completely turned upside down when you’re dealing with the elite. If the societal cost of a high maintainance homeless person is a million dollars a year then you ought to be willing to drop a few hundred thousand to keep that cost in control. That’s a very hard pill to swallow when the current social services ethic is to dump a single mother of two off the welfare roles – presumably because you’re concerned about creating “dependency.” In the high cost homeless case you want to create dependency.

Dealing with the elite players in a population is just plain a different problem than dealing with the typical ones, even if it is hopeless hairsplitting to find the edge that distinquishes the elite from the middle-class or typcial members of the population.

I wish he’d found a way to draw more examples to two classes. His emphasis is over weighted toward bad actors which tends to encourage the readers to forget that the most overarching social power-law distribution is that of wealth and property. Similarly his set of examples, with the exception of the police, are all drawn from the poor – people with lousy cars, people without homes, isolated people with unsocialized dogs. This compounds the sin of diverting attention from the powerful elite by licensing the use of exceptional means only on the least powerful members of society.

But there are bad-actors in all populations. Emphasis on bad actors drawn from weak populations makes us forget that how this the same problem always arises. It arises with giant corporations, the ultra-rich, the politically well connected. Many of those actors aren’t bad; but their scale makes the harm they do far greater than a million dollar a year homeless person. Just as the elite in any population tend to, they require regulatory schemes that are entirely different than those used on typical actors.

Record keeping for clubs with anonymous members

Previously I wrote a up a sketch for how blog readers migh guard their privacy by forming reading clubs. The club would reveal the union of what everybody is reading but inside the club it wouldn’t be possible to discern what each member was reading. Last time I stated that engineering this wouldn’t be particularly difficult; but didn’t reveal my sketch for how it might be done.

My idea is/was that the club members would from into a circle. A stream of encrypted traffic woul arrive from the left and be passed onto the right. This traffic would be chat about club business; i.e. what blogs the club is interested in along with notices of the state of those blogs. For example an entry might state “The club is interested in blog vile.example.com as of Jan 12 2006.” or “The blog at embaress.example.com was checked on a 11:14am Jan 21; it last change at 2:37 Dec 17th.” If you record the assertions in the stream for a sufficent period of time you can form a complete model of the blog reading club’s interests and the state of the blogs it’s reads.

My presumption was that individual club members would inject their interests into the stream. This turns out to be harder than I thought. If I inject my interest in lame.example.com into the stream my upstream neighbor can only tell that the club he’s a member of has increasingly lame interests, but if he collaborates with my downstream neighbor then he can pin the blame on me for that. Not good. Ben Laurie pointed this out to me.

The full set of assertions collected by listening to the passing stream is a substitute for a centralize club house were the club keeps their records. That club house is a substitute for the ping aggregation service, i.e. the intermediary the club was meant to avoid. The whole point of this exercise is to hide the reading interests of individual club members from the intermedary.

Ben’s somewhat spontanious suggestion for how to organize this club is to build a club house but run it so the individual members interactions are kept anonymous. Systems like TOR or the anonymous email remixing system illustrate how to let the members communicate with the club house anonymously.

The club house could be a simple web site that enumerates all the blogs the club is keeping an eye. It drops blogs off the list if nobody signals in interest in that blog for a period of time. Club members randomly poll blogs on the list and report what they find back to the club house; including the RSS feed should it change. When a member wants to read his blogs he does sync’s his copy of the club house data. This syncronization can be done in public if the club member is willing to reveal that he is a member of the club. Of course he should synchronize the full database from the club house since otherwise he’d reveal his peculiar interests. By extablishing an anonymous connection to the club house, ala TOR, the member could avoid pulling down the entire database.

While last time I wrote that this engineering a system like this is straight forward, this time I’m less confident of that.

I’m not particularly happy with the introduction of a central club house into the design. Who’s going to volunteer for that thankless task? I’d rather liked the idea that the club members were all asked to carry the same proportion of the load. But now I’m thinking that the streaming around the circle approach is just a scam for relocating the club’s records; and that I”m gotten myself out out on a limb.

Designing distributed anonymous peer to peer databases that enable clubs like this to form looks like a more meaty design problem than I expected. While I’m sure that’s fun for some folks it’s a bit of a barrier to making progress on the problem I care about.

Community drift

This is taken from these notes on a talk Joshua Schacter gave.

As the population gets larger, the bias drifts; del.icio.us/popular becomes
less interesting to the original community members. Work out ways to let the
system fragment in to different areas of attention.

The common cause of a group shifts; in a system like delicious the center is statistical. In other groups it’s a more social construction. Effected by cliques and leaders. Then of course there are the shifts that come from the group’s relationship with it’s surround.

It’s a very nice talk full of things which are true, except when their not.

Popularity Hashing

As regulars know my favorite distribution is power-law.

I think there is a lot of system design that would play out a lot better if people admitted that one or another key statistic about the load on the system will be power-law distributed. It troubles me to read about system designs that implicitly assume that the traffic loads will be uniform. In most cases I suspect the designers haven’t thought this thru.

One way that plays out is that I suspect the folks that designed most of the internet just did not understand that graphs of communication links will condense into power-law graphs. So if you want to build a resilient network your going to have to take steps to make sure that the hubs don’t become single points of failure. Similarly I don’t think they understood that for similar reasons the only a few vendors would capture most of the DNS, most of the email, etc. etc.

I got to rant about this to Ben Laurie the other day. One example I was giving was that the distributed hash table designs all appear to assume that the frequency of looking up individual keys is uniform. I think that’s vanishingly unlikely. I presume that a handful of elite keys will get looked up a lot more than others – for example looking up Star Wars is a lot more common than looking up Julie London’s verson of Sway. Since I presume the traffic is power-law distributed then the elite keys will account for a disproportionate amount of the traffic. If your node in the peer to peer network implementing the distributed hash table happens to get stuck with one of the elite keys your reward is, in effect, a denial of service attack.

The obvious solution is to spray the popular keys across more nodes in the network. Ben had the clever idea that if you could look up the popular stuff one way and the regular stuff in the traditional way. In effect running 2 or more hash tables one for each tier of popularity. Clients of the distributed hash table would, of course, start by trying the popular table and then fall back on the less popular one. Servers of particular keys would monitor their traffic and shift load onto their neighboring tables as necessary.

Tricks like this would presumably be useful even for some simple single process in memory hash tables. A two tiered hash table is likely to get the elite entries densely packed into the fast cache memory where they are never paged out.

It pulls my cord that designers continue to ignore the prevalence of power-law distributions in the populations they are designing for. For example all the economics text books show the price/volume curve as a straight line. Setting aside my irritating I bet there are some really cool algorithms to be discovered that take this to heart.

Shout out to the web.

This posting is for other victims of Oracle Calendaring (an enterprise calendaring system) who are trying to get it to work on the Mac and their Treo (or Palm) synchronization.

Mac:

  • I was unable to get it to play nice with iCal, so no more iCal.
  • So you have to use their 1987 quality calendaring application on the desk top.

Treo:

  • You need use Mark/Space’s Missing Sync because iSync won’t allow you to disable the calendar only; so switch to that before the following steps.
  • The task and address book synching are said to be lousy.
  • You need the Oracle conduits.
  • After they are installed remove the address book and task conduits and restore the Mark/Space conduits.
  • Don’t try to run both the Oracle calendar conduit and the Mark/Space event conduit; since they are both trying to sync the same data.
  • If you get this error: “Out of storage space during update of database CTimeSetupPrefsDB” in the log on the treo then you need to upgrade your Treo’s firmware.

There may, or may not, be a Mac release of the Treo firmware updater for the Mac. If there isn’t on then you need to a window’s box. Virtual PC will work, but you need to manually kill the Mac processes running in the background listing for the Hot Synch request that comes in from the Treo; you can find those using the application known as the Activity Monitor.

That said this thing is junk.

I am assured by other users that it does not consistently move appointments created on the palm onto your main calendar. Those users then give you the sardonic smile of a fellow traveler and report that they have been trained to enter appointments only into Oracle calendar.

There is no work around for that problem on the Mac. I gather on the PC you can force a “full” syncronization and their conduit will then, very slowly, get the right answer.

It is a documented bug with the Mac conduit. Repeating events move over only the first instance of the repeating events.

Stratosphere comes to visit

We are about to have a big storm here in Boston. My favorite part of the forecast discussion:

“Cross sections show potential for tropopause fold and gravity wave formation SE of i-95 midday Sun, as conditions appear favorable for stratospheric intrusion in this region.”

Sounds like that pseudo-science they toss about in sci-fi TV shows.

Captain! Long range sensors indicate a stratospheric intrusion!

“Woo Wow,” Snowy cries “Might that create the potential for a tropopause fold?”

Tompson! Engage the gravity wave!

Focusing Diffuse Rage

I found this posting just fascinating. Recall that displacement is an economist’s word for that moment when you wake up and discover that your job, community, culture, etc. have been displaced by some alternative; typically through no fault of your own.  Some examples. The hurricane floods your neighborhood; Home Depot sucks the life out of your family’s hardware store. The landlord decides to displace tenant farmers off lands they have occupied for generations.  That was this original example, he was make way for more profitable sheep.

The victims of displacement react with rage.  “The rage of the small property holder – the peasant, the artisan, the stall-keeper – against his inexorable ruin by the competition of bigger capital is given a face … to hate: a physical particularity that stands in thought for the abstractions of ‘finance’ and ‘the market’ and ‘the banks’.”

This rage can be a powerful tool for shaping group solidarity.  What the posting illuminates is how the rage often lacks a focus.  If your neighborhood has been flooded by the hurricane you can quickly to organize the citizens into a posse, but you can’t get a rope around Mother Nature’s neck.  Potent rage versus a diffuse unnamed other makes for a, ah, interesting situation.

Clever social activists will work to assure the group’s rage is dissipated.  They directing the rage toward an available target.  Consider some pairs:

Tragic suicide at the high school?  Focus the rage to invigorate the after school activities?

Traditional society finds it is being displaced by more modern and economically vibrant societies.  Focus the rage on first world nations and their citizens?
‘The Jew’ has often become the focus of this rage, as the quote above goes on to say. Rage seeks a concrete focus. Displacement rage often grows and festers until it finds a focus.  Political actors know this. The rage is an opportunity to solve problems.  But the rage is also dangerous, since it might turn on them.  You can’t hang Mother Nature, but you can hang the Mayor.

Bush et. al. refocused the rage against terrorism (a very diffuse threat) onto Iraq.  They were acting as perfectly rational political actors.  They believed was a constructive goal.  But they also saw the risk the rage would focus on something they cared for, say Saudi Arabia.

The posting goes on to suggest that the displacement rage of the Left has yet to find it’s focus. Personally I think Republicans are a fine focus. Republicans are the new Jew.

Typing Injury.

I’d not seen this before.

JWZ’s essay on RSI, or typing injury. “… it terrified me. … my career being over”

I have one of my own written years and years ago. “… a friend who lost the ability to pick up a piece of paper …”

And I see that Bill Clementson recently joined this miserable club.

It’s a puzzle how until it happens one isn’t particularly interested; and even if you were interested getting advise isn’t straight forward. The advice is largely the wisdom of crowds. I.e. it’s hearsay, rumor, and stories like the ones above. The best you can hope for is to pick out the better of the old wive’s tails. It’s not often you get to refer to JWZ as an old wife! There is very little hard science and what exists seems to me to be very lame and often self serving.

It amazes me that an industry that has generated so much wealth hasn’t found a way to fund some substantial research into the affliction that forces the retirement of it’s most productive labor. Of course all minority groups have trouble getting attention for their problems. But in this case the minority group has actually got money. Still, it says something about who captures the wealth.