I bet it really gets their goat, pulls their chain, and makes them cranky. I mean here it is, nine paragraphs in: “validated by an unrelated study indicating that the data collected by Yahoo, Google’s main rival in Internet search, can also help with early detection of the flu”, and then in the next paragraph “”In theory, we could use this stream of information to learn about other disease trends as well,” said Dr. Philip M. Polgreen, assistant professor of medicine and epidemiology at the University of Iowa and an author of the study based on Yahoo’s data.”
The New York Times article, presumably with the help of the Google PR people, is misleading. No doubt the folks at Google who did this work are embaressed at this as it’s an offensive to professional norms. The credit for this should go to the folks at the University of Iowa who with the help of folks at Harvard and Yahoo’s research group in New York did the original work. That’s published here. “Using Internet Searches for Influenza Surveillance” Philip M. Polgreen, Yiling Chen, David M. Pennock, and Forrest D. Nelson. People did that work. Not Google, not Yahoo.
All that is an omission from the so called paper of record about the actual record but it’s not the omission I find most interesting. What I find interesting is the “what’s next.” It’s notable how, if your search Dr. Polgreen at Google you now get almost entirely links to the PR ripple from the NY Time’s article. No doubt he finds that a bit irratting, maybe even depression. So then what?
Ok, now that’s not tacky, that is practically criminal and it’s certainly autistic. If somebody asks you about “suicide methods” you do not reply with an enumeration of same, along with a second link to a site that featuring “staggering” “adult content and images” of “hash reality.” Can you imagine if the local librarian did that? What you do do is you attempt to intervene.
Let’s step back from that colorful and exaggerated example. Let’s spend just a few seconds thinking about what Google might do with this technology. Yeah! They could place some Ads! Why doesn’t that though get covered in the article? I suspect it’s because they didn’t spend those few seconds. That is pretty sloppy.
But a few more minutes and they would get to the much more interesting question, could the search engine in intervene! Could that intervention save lives? I think the answer is obviously yes it could and yes it would. Google doesn’t even attempt to do that on their flu tracking page.
It’s an interesting puzzle to what degree search engine results should be tuned to be more appropriate. As the exagerated example above highlights, machine results are not necessarilly what any reasonably human would do. I very much doubt that any librarian would respond to a question about sucide methods or flu symptoms as google does. No reasonable scientific reseacher would claim that Google invented this technique. Given that, something is clearly broken and needs fixed.
For another example of this, see the search for “miserable failure”; it no longer has Bush on the top of the page, but it does have a bunch of news stories about how Google took him off the top of the page (cementing him there in the process).
I’ve done some neat efforts with good results where I take a search term that found a page I had written, re-run the search, and then use the additional information I find from that search to update my page. Some thorough, well-referenced approach like that generally improves search results over time, but it can take a long time.
One interesting question is, by what automated method could a company like Google discern what searches should trigger an ‘intervention’ or ‘search result that breaks from the statistical model’.
Off the top of my head, ad-hoc, I could make up six searches, three of which shouldn’t trigger any special results, but three of which should:
1. chopin awakening drown suicide themes
2. hollow point bullets
3. gay aids condom
4. drowning painful
5. make hollow point bullets at home
6. aids symptoms
The first obvious suggestion for an automated way to separate 4-6 from 1-3 would be by associating them with what other searches the same searcher performs.
My second idea is a Bayesian approach. The problem would be similar to developing a probabalistic spam-filter, and the model would involve updating probabilities associated with word combos as real-world data came in.
Just in coming up with those six bullshit examples, I realized that the thing linking the ones that seem to require intervention is the intense privacy associated with the concerns. This suggests another avenue: a button on Google that essentially says, “make my results be interventionist.” Or ‘Google Private, a way to get help’, or even a link above typical “suicide methods” search results that reads: “You search tripped our troubled person filters. Would you like to (privately) view search results that might help?”
How to collect those proposed “helping” search results is a second problem. You could limit to *.gov, *.org, and *.edu sites. You could again do a statistical association, and up-weight web-sites that are frequently visited by people who also visit suicide hotlines, etc.
It’s a really interesting observation, Ben. It’s probably worth a full article somewhere. Some information theory journal.