Click here for a more exacting illustration of the power-law distribution of tags for things posted at del.icio.us.
Two things to note. When you click thru to the page at del.icio.us that provides a summary of the postings for a given URL the tags show are a subset of the full set of tags used. The chart shows all the tags used.
Second notice that the model of these as power-law doesn’t fit once you include the tags used only once. I think that’s probably because those tags are often used by users to denote some very personal function. For example they might tag a page monday to indicate that they plan to return to this item next monday. But, that’s just a guess.
These lines are similar to the one you get if you plot the usage frequency of words in English or other languages. These lines appear to be slightly steeper, but not much. I’m still surprised how similar they are.
Hi, I just wrote a post hereon finding various long tails in tagging data, and then found your del.icio.us graphs — pretty much implementing the first graph I mention. Just thought I’d point the post out in case you’re interested or have any related thoughts…
Pingback: pushtag « Teradome