I enjoyed this nice little paper on the changes over time in baby names. Drift as a mechanism for cultural change: an example from baby names.”.
They look at the census data on the top thousand baby names over the 20th century. Their goal is to see if they can fit a simple model of how the population picks names to that data. They drag out of the census data a number of interesting facts.
- New names appear in top thousand, on average 2.3 names a year.
- The rate of new names varies (they don’t correlate that with other demographic/economic trends).
- New girl names appear 1.4 times the rate of boy names.
- Both the rate and the variation have tended to increase as has, of course, the total population and a number of other measures.
- The top thousand name a decreasing proportion of the population over the century. 91% at the beginning for both males and females. 86%/75% for mail/female at the end.
- At the same time the slope of the distribution(s) hasn’t changed, instead the larger population has made the long tail much larger. (I’m not sure I entirely buy that.)
The fun the authors are having in the paper is to show how they can create a surprisingly simple simulated world that behaves in just this way.
How simple? Well they don’t need to include any number of things you might think deserve to be in a model of what’s going on here. While we all believe there are baby naming fads they assume that parents select names independent of each other’s choices. While we know that lots of parents name thier children after themselves or their grandparents they assume they don’t. While we know that James is much more likely to be invited in for an interview than Karim given identical resumes they assume that names have no functional value. In summary they assume that name choices are independent, not intergenerational transmitted, and are nonfunctional traits.
The model then simulates naming as a random process with two components. Names are drawn either from the pool of existing names in the prior generation or a small portion are random new names. If Ashley was popular in the last round then she is likely to be popular in the next, as she was in the 1990s From time to time a few new names pop up, as Ashley did in the 1950s. Exceptional roles of the dice can enable a name to make rapid moves in rank, as Ashley did in the second half of the century.
They are very pleased that their model fits the behavior of the data so well. It certainly fits the aggregate data well. For example they can get the variations in the slope of the distribution to behave very closely to what’s in the data. They don’t say anything explicit about the volatility of individual names, like the Ashley case. If this was the wealth distribution rather than baby names that would be a standard question to look into.
The final point is that this is a slightly different model of how to get a distribution like this. They say in passing that the rate of random names entering the population has a large effect on the slope of the distribution; but then leave it at that. I’ll need to look into that.
Meanwhile, the data on which all this is based appears to be sourced from here, so as you can see if you look the data is in ten year buckets, bummer.