I’m fascinated by the universe of systems that capture a little value from a very large number of people and then aggregate that into something of high value. The list of such systems is huge, just for example:
- Open Source Projects.
- Amazon reviews.
- The original oxford english dictionary.
- The internet movie database.
- CMU’s clever picture categorization game.
- This trick for fooling a Turing test.
- Blogger, Live Journal, Friendster, Orkut
- Netnews, Yahoo Groups
- the web
- this semi-porn site
- This toy.
- Everything 2
- Sheep Market
Here’s a fun little idea I had today along these lines. There is a general classification problem about dialogs. For example say you wanted a classifier for angry dialogs. How could you get such a thing?
Well you could create a Bayesian filter and let a large population of volunteers train it to do the classification. This is the pattern above of a large number of contributors summing up into an aggregated thing of value. In this case the thing of value is a classifier that can recognize a class of text or dialog.
For example here we have a guy who hand built a system to entrap child molesters in online forums by posing as a child. I suspect it would be easy to build a classifier that could monitor dialogs in such chat rooms that did the same thing, and it wouldn’t be hard to find the transcripts to train it with.
It would be very interesting to try this on the messages in a mailing list creating a set of classifiers for various kinds of speech acts: baiting, constructive, helpful, query, answer, discussion, debate, argument, etc. etc. I bet that the spooks have such systems but I wonder how they built them.
I keep trying to find a good name for this class of systems. Brain Farming? Talent Scraping? Help Hoarding?