Negative Reinforcement

Here is a nice example of reaching a bogus conclusion. I’m a huge fan of positive reenforcement. Animal trainers know that a smart animal will withdraw and become devious if you if you try to train it using negative reinforcement. But it is trivial to prove that negative reinforcement is more highly correlated with improvement than positive.

Let’s train a pile of pennies to behave. Good pennies come up heads. Flip the pennies ten times. Divide them into two piles, the good ones and the bad ones. Now punish the bad ones. Beat them with stick. Repeat the experiment. Notice how their behavior improves! Now reward the good ones. Kiss them. Repeat the experiment. Notice how they appear to be slacking off! Clearly negative reinforcement works and positive reinforcement doesn’t.

The technical term for this is regression toward the mean. If the behavior is random the past performance doesn’t tell you anything about the future. But if you have a high performing penny you can be reasonably confident it’s performance in the future will be only average. In the stock market of pennies should bad pennies have depressed prices invest in them. That is counter intuitive, but only a variation on how we can reach the bogus conclusion that negative reinforcement improves performance.

Here’s an example of the bogus conclusion in real life. At one point the Israeli Air Force concluded that they should stop complementing the pilots that did well. They had noticed that afterward they would perform less well. They concluded they should continue scolding the poor performing pilots. They tended to do better. It’s an interesting example of thinking that the causality is due to what ever you did most recently.

In risky, aka random, situations – such as a warfare, experimentation, entrepreneurial activities, learning, etc. – your very likely to have an extremely strong component of random luck. In that case you can trivially fall into this bogus trap. In this environment you need only find the loosers, and beat them. Later they try some other random schemes to make things work and sure enough some will regress toward the mean. You pat yourself on the back for your deep understanding of behaviorism.

This isn’t an abstract issue. It goes straight to the question of how we design the systems we work inside. Let’s look at Brad DeLong’s recent posting on the tournament based architecture of the academic.

The process of climbing to the top of the professoriate is structured as a tournament, in which the big prizes go to those willing to work the hardest and the smartest from their mid-twenties to their late thirties.

I have a very poor opinion of tournament based systems design, particularly when rounds in the game are a huge chunk of a human life span. I think it’s pop-Darwinism. I think it leads to abuse of power by those who set the rules. I think it makes those who play in the tournament devious because their most highly leveraged approach is to work on gaming the system. I think this design drives out skill diversity and that has fatal consequences on the quality of problem solving. I could go on, but let me draw attention to something else Brad said.

Brad is primarily talking about how the tournament architecture of academia has some exclusionary effects. Not about all the reasons tournaments might be a lame design. He uses as a launching board the blood sport going on around Larry Summer’s recent comments about the women. Because Brad really wants to talk about something else he buries the sentence that clears the air so he can talk about what he’s trying to puzzle out.

And I say this as someone who thinks that Summers’s views on gender, genetics, and math achievement are almost certainly wrong, are unsupported, and should not be pushed forward by somebody who is twenty years beyond the stage of his career where you throw out lots of unfiltered ideas in the belief that what matters is the quality of your best one.

Right on bro.

But I notice how that sentence speaks to the question of tournament design viewed thru the lens of regression to the mean. It appears that there is a forgiving phase in the game when quantity of ideas is treated as a good thing. The phase is highly random. Presumably all your competitors in the game have very similar skills or even extremely similar in skill. How might you get that? Well, design the game design with a recipe that includes very standardized filtering schemes and then add a good dose of group think. Now, given a very uniform pool of players the winners of this tournament will almost surely very be positioned to undergo regression toward after they get the prize. Opps. They should become extremely careful to husband their high rank after they capture it. I.e. they probably ought to shut up; or at least they should stick to what has worked well in prior rounds. If the game is set up with lots of negative reinforcement and that is extra true for those of high rank then this design leads to an extremely conservative behavior being your best option – if your a winner in early rounds. Blogging bad; since it breaks the mouth shut rule. Tenure good, since it tempers at least a little the risk of idea generation in a highly negative reinforcement based system.

Since they generate far more losers as output than winners, systems based on tournaments are fundamentally about negative reinforcement

Now look at high tech. Open systems work to create a large surface where random experiments can take place. Some of these work out, most don’t. Notice how given that if you manage to have a success you ought to fear of regression toward the mean.

2 thoughts on “Negative Reinforcement

  1. Markus Sandy

    Hi. I enjoy reading your weblog, but please look up the definition of “negative reinforcement”. You are confusing it with “punishment” and they are not the same. The word “negative” implies “removal”, just as “positive” implies “introduction”. They do not imply “bad” or “good”.

  2. Ben Hyde

    Markus –

    Problem is that I’m not using the official guild definition; I’m using the colloquial definition. That the behaviorist guild would cast me out of the room for this is probably going to make us both happy.

    So yes you are correct – “Negative Reinforcement strengthens a behavior because a negative condition is stopped or avoided as a consequence of the behavior.” But consider the tournament system design. The negative condition is the possibility that you are removed from the game. That you don’t get to play any more. That you must wander off and find another line of work. The reenforcement is that this consequence is avoided if you behave in a manner that draws you into the next round of the game. I think most players in the game fear the consequence. If they fear X how can it not be punishment?

Leave a Reply

Your email address will not be published. Required fields are marked *