Learning at the Knee of a Random Number Generator

Here’s a thought provoking intersection between behaviorism and statistical thinking; plucked out of this book.

Here’s a very naive model of behaviorism: If the animal behaves well we reward him, and if he behaves poorly we punish him. This primitive version of behaviorism is fraught with problems; but it will do for this discussion.

A very naive statistical model of behavior breaks into two parts; with average behavior and random behavior around that. Animal trainers are experts in leveraging that random bit. If you want to train a goldfish to swim clockwise then before you feed him then you wait for the random clockwise turn before feeding.

Of course animal training is a two way street. Pets expend a great deal of effort attempting to train their owners to feed them. It’s fun to walk in front of the tanks at a large pet store wearing the same color shirt as the staff an watch the fish attempt to trigger your into feeding them.

The hope of training is that you shift the average; but that takes time and in the meanwhile random variations will generate occasional good and bad performances relative to the mean.

So here’s the rub. If the animal behaves in an exceptionally good or bad manner it is likely that in following period his behaviors will return to the previous average behavior. In the jargon of statistics this is an example of regression to the mean.

Regression to the mean is enough to train a bad behavior in the trainer! Consider this scenario: the animal behaves well, the trainer rewards him, and then in the following training rounds the animal is certain to regress back toward his average behavior. The trainer learns from that the reward triggered the regression. The trainer learns not to reward. Which is bogus.

But it get’s worse. Consider this scenario: the animal behaves badly, the trainer punishes the behavior, and then in the following rounds the animal’s behavior randomly regresses back to the mean. The trainer learns that punishment precedes behavior improvements and; not because the animal is learning anything but because the statistics say so. The statistics alone are enough to train the trainer to punish bad behavior but not reward good behavior.

This is very bad!  It’s a fundamental insight of sophisticated behaviorism that punishment is far less effective than rewards; so much so that punishment often doesn’t work at all! Only a stupid or a captive animal will put up with training based on punishment. This problem is redoubled because if you punish the animals become more cautious; which reduces the not just the random variance (which you need) but also suppresses the smart active searching you want the animals doing. Punishment doesn’t work on smart animals and when you can get away with it makes the animal become stupid.

What a mess! Notice that you don’t need an animal to teach the trainer this bogus behavior pattern of never reward, but do punish. If you had the trainer work to train a random number generator he’d learn the same lesson. There really isn’t a more stupid beast than a random number generator. Which I think goes a long way toward explaining why fans of punishment often describe the animals they are trying to train as stupid.

Leave a Reply

Your email address will not be published.