Monthly Archives: February 2005

Google bet’s the company?

John Robb has been in an entirely appropriate lather about Google Toolbar modifying the pages the user is visiting without the consent of the page author. He’s been trying various approaches to verbalizing his discomfort. Today he stumbles on a real good one

I don’t think that Google clearly thought through its decision to move to Web page modification.  If they succeed in nullifying the opposition to this, they will open the floodgates to Microsoft to rerelease its version of the concept.

Through modification of the browser,  Microsoft could put this on 1,000 times the desktops Google could with its toolbar.  Opposition to the concept is the only thing that is stopping Microsoft from doing it  today. 

It gets worse.  The basis of this modification  is search-based services.  If Microsoft is able to put a basket of search-based services into every Web page  most people  view with a browser (“search in situ” vs. the site based model), Google could be in real trouble.  It could quickly turn Google into Netscape, and I am sure Bill Gates knows this.

This is a bet the company decision.

Sound about right to me. There are days I think this industry is a memory less process.

Caterina Quibbles

City of Empty Hats
Caterina, the Duchess of Flickr, quibbles. Here is a portion of what triggered that.

What creates a community in the real world is proximity…. Same theme in social networking: Orkut, Friendster, even Flickr you GO to a specific location on the web.

I too have an easily triggered sore spot about that. The presumption that community in the real world arises from proximity is romantic nonsense. The romantic anthropologist may wander into the bush and return with a fairy tail of idilic bliss. The professional anthropologist comes back with stories of gossip, long simmering feuds, resource hoarding, murders, kidnapping. It’s easy to announce that proximity breeds community but it’s also a fine way to breed contempt. The romantic myth that tribes huddled around the fire are some how a more valid, original, and pure form of community is poppycock.

Communities crystalize out of working on a common cause. Yes surviving the winter night by keeping a fire alive counts as common cause. As does keeping the wolves from the door or raising the passel of children. But the common cause of making creating a profession, developing a craft, mutual aid in the hobby of keeping your glow in the dark guppies alive, or forcing a vendor to honor a warranty are common causes and none of them demand proximity.

The romantic myth is only part of the problem. Deeper is a confusion about cause and effect. Proximity is a possible solution to a coordination problem, nothing more. It doesn’t create community. Because communities always have coordination problems and one means to solving these is proximity they are oft correlated. Proximity is causative It can either help or hinder working on the common cause. It is entirely neutral. Putting people into a cube space, packing prisoners four to a cell, loading up buses with commuters, assigning 30 third graders in a class room, placing college freshmen together in dorm rooms – none of these create communities except by dump luck or hard work.

Proximity serves other masters as well, in a business plan jargon it’s what’s called sticky. If you have a group of people locked into proximity with each other that’s just about as sticky as it gets. And of course a common cause emerges, the common cause of dealing with the problems created by this lock up. This is why a monopoly’s customers are a community. It maybe acceptable in such cases to say that proximity generates community; but it would be more accurate to say that forced proximity is a coercive way to force a community into existence.

The sentence that follows the one quoted above is:

He challenges that these are communities, there’s “proximity” (all on one site) but not community.

I see signs that he is was beginning to see thru the romantic myth and is looking for words to split the difference; hence the communities v.s. community split. In the end, starting from proximity is not helpful.

Negative Reinforcement

Here is a nice example of reaching a bogus conclusion. I’m a huge fan of positive reenforcement. Animal trainers know that a smart animal will withdraw and become devious if you if you try to train it using negative reinforcement. But it is trivial to prove that negative reinforcement is more highly correlated with improvement than positive.

Let’s train a pile of pennies to behave. Good pennies come up heads. Flip the pennies ten times. Divide them into two piles, the good ones and the bad ones. Now punish the bad ones. Beat them with stick. Repeat the experiment. Notice how their behavior improves! Now reward the good ones. Kiss them. Repeat the experiment. Notice how they appear to be slacking off! Clearly negative reinforcement works and positive reinforcement doesn’t.

The technical term for this is regression toward the mean. If the behavior is random the past performance doesn’t tell you anything about the future. But if you have a high performing penny you can be reasonably confident it’s performance in the future will be only average. In the stock market of pennies should bad pennies have depressed prices invest in them. That is counter intuitive, but only a variation on how we can reach the bogus conclusion that negative reinforcement improves performance.

Here’s an example of the bogus conclusion in real life. At one point the Israeli Air Force concluded that they should stop complementing the pilots that did well. They had noticed that afterward they would perform less well. They concluded they should continue scolding the poor performing pilots. They tended to do better. It’s an interesting example of thinking that the causality is due to what ever you did most recently.

In risky, aka random, situations – such as a warfare, experimentation, entrepreneurial activities, learning, etc. – your very likely to have an extremely strong component of random luck. In that case you can trivially fall into this bogus trap. In this environment you need only find the loosers, and beat them. Later they try some other random schemes to make things work and sure enough some will regress toward the mean. You pat yourself on the back for your deep understanding of behaviorism.

This isn’t an abstract issue. It goes straight to the question of how we design the systems we work inside. Let’s look at Brad DeLong’s recent posting on the tournament based architecture of the academic.

The process of climbing to the top of the professoriate is structured as a tournament, in which the big prizes go to those willing to work the hardest and the smartest from their mid-twenties to their late thirties.

I have a very poor opinion of tournament based systems design, particularly when rounds in the game are a huge chunk of a human life span. I think it’s pop-Darwinism. I think it leads to abuse of power by those who set the rules. I think it makes those who play in the tournament devious because their most highly leveraged approach is to work on gaming the system. I think this design drives out skill diversity and that has fatal consequences on the quality of problem solving. I could go on, but let me draw attention to something else Brad said.

Brad is primarily talking about how the tournament architecture of academia has some exclusionary effects. Not about all the reasons tournaments might be a lame design. He uses as a launching board the blood sport going on around Larry Summer’s recent comments about the women. Because Brad really wants to talk about something else he buries the sentence that clears the air so he can talk about what he’s trying to puzzle out.

And I say this as someone who thinks that Summers’s views on gender, genetics, and math achievement are almost certainly wrong, are unsupported, and should not be pushed forward by somebody who is twenty years beyond the stage of his career where you throw out lots of unfiltered ideas in the belief that what matters is the quality of your best one.

Right on bro.

But I notice how that sentence speaks to the question of tournament design viewed thru the lens of regression to the mean. It appears that there is a forgiving phase in the game when quantity of ideas is treated as a good thing. The phase is highly random. Presumably all your competitors in the game have very similar skills or even extremely similar in skill. How might you get that? Well, design the game design with a recipe that includes very standardized filtering schemes and then add a good dose of group think. Now, given a very uniform pool of players the winners of this tournament will almost surely very be positioned to undergo regression toward after they get the prize. Opps. They should become extremely careful to husband their high rank after they capture it. I.e. they probably ought to shut up; or at least they should stick to what has worked well in prior rounds. If the game is set up with lots of negative reinforcement and that is extra true for those of high rank then this design leads to an extremely conservative behavior being your best option – if your a winner in early rounds. Blogging bad; since it breaks the mouth shut rule. Tenure good, since it tempers at least a little the risk of idea generation in a highly negative reinforcement based system.

Since they generate far more losers as output than winners, systems based on tournaments are fundamentally about negative reinforcement

Now look at high tech. Open systems work to create a large surface where random experiments can take place. Some of these work out, most don’t. Notice how given that if you manage to have a success you ought to fear of regression toward the mean.

Just passing thru

When our play opens, in the 1960s, computers were housed in temples and represented a significant capital expanse for the organizations that housed them. In the 1970s the mini-computer allowed departments to play; and in the 1980s the personal computer allowed individual professionals to play. That trend made it reasonable to assume that computing was a democratizing force; something likely to drive more and more power into the hands of smaller and smaller individual units in the economy.

The data center crowd played a starting role, as villain, in the PC revolutionary’s version of the story. The PC disrupted their power. Bewildered their process, their budgets, their software, and their ability to set standards. Those of us working on the PC considered these folks to be the enemy; Luddites. In some some fairy tail versions of the story the play ends with their death. In the B movie version in the closing scene we see a guy scrapping their name off the office door; but then a few moment’s later he’s putting up a new name. That name is IT.

These IT folks have, in some organizations, regain a lot of their power over how the organization deploys it’s computing resources. The complexity of the Window’s solution and strong network effects around internal organizational tools such as calendaring and email created carrots and sticks that brought the modern in-house IT organization back to life. And then a number of enterprise software packages began to emerge that had one foot in a corporate data center an another on all the PCs of this or that class of professionals. Sales, HR, Purchasing, Document Managment, etc. etc. Many of these sport a centralized database function, which further concentrated power into the data center.

So, what is this. A cyclic process? Did we misread the story of the 60s, 70s, and 80s? Were is the power of the corporate CIO/IT organization going from here? I have an intuition, only that.

I do not believe the power is going to stop in CIO’s data center. I think it’s just passing thru. I think it’s headed up and into the net. There it will fall into the hands of a small number of players as the network effects emerge and cause condensation. Consider email, calendaring, HR, sales pipelines, etc. etc. none of these has a strong reason – beyond illusions of privacy – to remain parked in the company data center. I’ve observed at least four of sales groups at this point that run two sets of books, i.e. two sales pipelines. One in the internal system because they were required to, and another on some internet embedded service run by a 3rd party because it was simple, effective, better socialized, more “situated,” and extremely adaptive.

Pity the poor CIO/IT dudes; 30 years of fighting to get on top again and, if I’m right, it’s only a very short lived interval as the work moves up and out of the firm entirely.

Tribal size

Ted’s post on Finding your Tribe reminds me that I’ve been meaning to see if I could hack something together to say about scale and groups. How many groups is a person typically a member of? If we ask the various social sciences -anthropology, sociology, economics, politics, demography, physiology – do they have answer for us? If we ask the various social movements what have they to say? Or ask similar questions of other metrics on these tribes? What of size of the tribe? What of the half life of membership; or the length of time required to join? What of the topology of overlapping groups?

I’m very suspicious of a kind of pop sociology that declares some number to be definitive. For example that there is an upper limit on the number of friends you can have; or the number of groups you can be a member of; or the set of skills you can accumulate. There are some very large tribes; American Catholic Democrats, or South American women soccer fans, or people who clip coupons. Notice all the tribes unmentioned in Ted’s posting: fathers, dwellers in wet places… It would be a real project to make even a reasonably good list of the groups one is a member of.

Modern life has brought about a shift in the overall statistics of group/tribal membership. Since people, on the whole, seem a happy lot, i suspect, should you ask the members of some insular tribe, or a modern city dweller you probably get about the same distribution of happiness. But the life the insular are living is totally different than that the urbane dweller can live. The richness of modern economics, the density of human habitation, the network of communications allows some people to engage with the world in surprising ways. Ways that are not just hard for the insular citizen to imagine they are actually impossible for him to experience. For him an upper physical reality created an upper bound on what was possible. In that situation the rules of thumb are self evident. When the upper bound evaporates the rules get harder grasp.

If the numbers suggest, which they do, that the group forming is scale free then we need to go back and ask each of those social sciences and movements what they wish make of that. If they wish to sing the praises of a particular scale, or disparage some other scale what should we make of that? The numbers certainly don’t care, they are the facts. Are the new ways of living displacing the old insular models? I think that’s obvious.

Wrong Wrong Wrong!

Jamie is one of my heros and he sure can write, but I’m not impressed by his recent fun raining on the parade of his friend’s open source groupware project. All he’s doing in his fun rant is revealing his loyalty to a world view that treats groups as so vile the only exception to the rule is getting laid and or just possibly going out to dinner with friends. We have milked that stone dry. The PC revolution was two damn decades ago. Get your hands out of your pocket. Playing with your handheld will make you go blind.

Groupware is not about empowering the lord’s castrate to chase check boxes around conference room tables. Groupware is about collegiality. Groupware is about focusing common cause. Groupware is about manufacturing abundance from the aggregated contributions of the many. Groupware is about creating a vibrant scale free civic society. Groupware is about searching the space of solutions to the all the corrosive forces that destroy civility. Groupware is about turning coordination problems into dance square dancing.

This is were the excitement is. This where wikis, and del.icio.us, and flikr, and meet up, and open source, and yahoo groups, and mailing lists, and discussion boards, and peer to peer, and file sharing, and voice IP, are. This is the single most fun real estate the Internet has enabled.

On this one, Jamie is just plain wrong. The dialectic is not between project managers and the noble free spirited creative individual. The dialectic here is between those forces that kill groups and those that make them thrive. Groupware is a tool in that battle. Dilbert’s coworkers are as much a threat to vibrant groups as Dilbert’s boss. Getting laid is not the goal. Raising the babies of common cause is the goal.

LinuxWorld

I’ve spent a half day twice at LinuxWorld here in Boston.

There are some traits that partition the universe of exhibitors. Big firms, little firms, and “dot orgs.” Hardware, Software. Developer facing, enterprise facing. The Developer facing fall into two camps; hardware and software. There are folks in booths, folks in stations inside somebody else’s booth, and then guys wandering around on the floor.

Generally I find the big firm booths lacking in authenticity; over designed booth furniture; bevies of bitter beautiful booth barista; huge lcd screens owned by power point; presenters who’s talk is feed into their ear from a tape player. Sometimes a gimick; for example Sun and AMD have a lan party setup. Well, it’s not a party it’s a contest and it’s obnoxiously loud. All it’s neighbors are pissed.

Redemption for a big firm booth is possible. Some of the firms have let out a big chunk of their real estate to stations occupied by their “developer partners.” These are typically just like little the small firms; except they are sitting at a station that lacks the authentic signs of a real firm – i.e. somebody eating in the back of the booth, the luggage, or better yet a child. This lack of authenticity meant I didn’t actually realize they were there for a while.

In a strange demonstration of some kind of emerging hierarchy of nobility I spent a while talking to a guy in a station at a huge firm who runs the developer network for mediums sized firm and then later in the space of the medium size firm I spoke to guys at stations he had provided them.

There is a lot of stuff going on around ‘thin client.’

I think my favorite booth was LTSP.org’s. They guys seem to mostly work in schools. Their problem; a huge pipe of misc. computers – old, donated, random – that they can’t bring them selves to give away. So they have given life to the thin-client network computer fantasy. Except this time it works. You get the feeling that if it has a CPU, a screen, and a keyboard these guys can make something useful out it. Disk? no way! I’d never notice etherboot.org before. This is such a hack! They replace the ROM on the ethernet card so that you can bootstrap off it, and then off the net. The traffic at both these booths was huge.

I had fun talking to buys who’s companies sell thin client hardware. They all have really fun surprising customer stories. Like the schools making useful stuff out of old junk these guys are really making people’s lives better in little insurance offices, auto dealers, etc. etc.

Lots of people say document management in their booth signage. I don’t think two of them are doing the same thing! I had a lot of fun talking to a guy who’s document management system has mostly gotten used to capture faxes, and scans of paper documents in small to medium sized offices.

I’m mostly not that interested in the enterprise facing vendors; but there is an trend there. When I used to go to Mac World you would see trends. Sometimes they were huge – like one year there were a dozen disk drive expansion vendors. The next year there would be two. The following year there would be none, or one. The trend I think you can see here is tools from small vendors to help solve what could be called the migration problem (moving from Windows to Linux for example) or it could be called the heterogeneous systems problem. I like the second term better. I think we are moving into a much more complex and confusing IT world. Lots of junk of various kinds. A dozen databases where somebody obsessive might prefer one. A hundred systems all jockey for screen space in front of the person getting the work done. Of course if there was nothing but time then we could integrate those hundred systems into one rational unified UI; but who’s got time? It’s much more fun to solve problems and move on to the next problem. Getting a pretty unified UI or master database schema is navel gazing.

So we get the hacks to assure documents of a given account number to pop up in the web browser when ever the call handling software receives a call. We get hacks to keep N different employee account databases in synch. Why N? Mergers, politics, expedience. Surely your not going to tell one division that they can’t install a system that gets them a huge productivity improvement just because it happens to integrate poorly with the LDAP directory you the parent company’s obsessive compulsive IT group has wedded themselves to?

The outlook substitutes market is looking healthy. Someday that’s going to totally disrupt the outlook/exchange network hub. But it’s only motivation right now is cost. CIO’s do not care about cost. They care about risk. They care about game changing advantage. They care about keeping a lid on diversity, aka chaos. Offensively, they are typical monopolists. So these offerings remain poorly impedance matched to the customers that have huge outlook/exchange installations.

There are a lot of offerings in the network monitoring space. Like all of software these show little correlation between features and price.

When I browse at these open source shows I’m always struct by how rarely my “man that’s cool” reaction is. During the first years of the Mac it was a very very common experience. Rich geographic database apps on the mac in 1986, for example. Or the constraint based drawing programs. Or the way that you could drop an object into a pallet and the pallet would learn that subset of the attributes of the object which were relevant to the pallet and create a new cell in the pallet.

I expected I’d see cool innovations in the network monitoring booths. But mostly not. There were some clever tricks for packing more info in the trend graphs; but not really surprising. Nobody seems to be making any progress on the hard problem – managing the alert population given the human failings of the recipients of the alert stream.

I did like the system that observes the OS call patterns of your application and then generates alerts when new things happen.

Nobody seems to be doing anything that creates collaboratively authored rule sets.

I’m always amused by how some booths are vicious in their lead qualification. They have one or maybe two questions they ask and if you don’t happen to give the right answer they ignore you and go back to talking to themselves. They have got that evil beast serendipity in a cage.

Whitespace elimination

I solved my PDF whitespace problem using ghostscript and psutils. Ghostscript provides pdf2ps, and ps2pdf tools. psutil includes psnup which will paste p N pages per page. The power tool in the psutils family is pstops which will do a number of page layout transformations. If you then tinker you can find a combination of scaling, offset, which eliminates all the whitespace for the document in question. So, I have the technology, even if I don’t have an easy to use solutions … yet.

Negative lines of code per day.

Lines-of-code/day is a common bogus metric in primitive performance review systems. I have never seen anybody get a bonus for achieving a large negative number. That can’t be right! What whould a guy be worth to you if he could reduce the size of your system by 10-20%? more.

Baby name drift

I enjoyed this nice little paper on the changes over time in baby names. Drift as a mechanism for cultural change: an example from baby names.”.

They look at the census data on the top thousand baby names over the 20th century. Their goal is to see if they can fit a simple model of how the population picks names to that data. They drag out of the census data a number of interesting facts.

  • New names appear in top thousand, on average 2.3 names a year.
  • The rate of new names varies (they don’t correlate that with other demographic/economic trends).
  • New girl names appear 1.4 times the rate of boy names.
  • Both the rate and the variation have tended to increase as has, of course, the total population and a number of other measures.
  • The top thousand name a decreasing proportion of the population over the century. 91% at the beginning for both males and females. 86%/75% for mail/female at the end.
  • At the same time the slope of the distribution(s) hasn’t changed, instead the larger population has made the long tail much larger. (I’m not sure I entirely buy that.)

The fun the authors are having in the paper is to show how they can create a surprisingly simple simulated world that behaves in just this way.

How simple? Well they don’t need to include any number of things you might think deserve to be in a model of what’s going on here. While we all believe there are baby naming fads they assume that parents select names independent of each other’s choices. While we know that lots of parents name thier children after themselves or their grandparents they assume they don’t. While we know that James is much more likely to be invited in for an interview than Karim given identical resumes they assume that names have no functional value. In summary they assume that name choices are independent, not intergenerational transmitted, and are nonfunctional traits.

The model then simulates naming as a random process with two components. Names are drawn either from the pool of existing names in the prior generation or a small portion are random new names. If Ashley was popular in the last round then she is likely to be popular in the next, as she was in the 1990s From time to time a few new names pop up, as Ashley did in the 1950s. Exceptional roles of the dice can enable a name to make rapid moves in rank, as Ashley did in the second half of the century.

They are very pleased that their model fits the behavior of the data so well. It certainly fits the aggregate data well. For example they can get the variations in the slope of the distribution to behave very closely to what’s in the data. They don’t say anything explicit about the volatility of individual names, like the Ashley case. If this was the wealth distribution rather than baby names that would be a standard question to look into.

The final point is that this is a slightly different model of how to get a distribution like this. They say in passing that the rate of random names entering the population has a large effect on the slope of the distribution; but then leave it at that. I’ll need to look into that.

Meanwhile, the data on which all this is based appears to be sourced from here, so as you can see if you look the data is in ten year buckets, bummer.