Fun with the AOL Data Leak
Last week AOL did another stupid thing, but at least it was in the name of science. The giant Web portal released a data chunk containing three months' worth of queries to its search engine taken from roughly half a million users. Gathered during the months of March, April, and May, the data shows queries, their date and time, and which Web sites the user ultimately visited. The idea was that this information might be of some use to researchers.
To protect user privacy, AOL replaced the log-in names of searchers with numbers. So you could still see everything that searcher #4356 looked for, but you wouldn't know who #4356 was, except for one problem: it's incredibly easy to figure out who people are based on their searches, because they tend to look for themselves, family members, and things in their immediate geographical vicinity. The New York Times did a great story in which reporters examined searches done by user #4417749 and within hours managed to locate their author, a nice old lady in Georgia who now plans to cancel her AOL subscription.
Bloggers and privacy advocates have pointed out that the information AOL released contains more than just the online search patterns of innocent Georgia ladies. It's unclear what law enforcement might do with the thousands of searches for illegal drugs and pornography. It's equally unclear what the feds will make of the handful of searches for "Muslim death rituals," "Muslim brotherhood," and "Islamic militant web forums." In a nation where the government is seriously contemplating blanket warrants for online surveillance, it's hard to imagine there aren't law enforcement types combing this treasure trove of prepackaged personal data. Imagine getting enough dirt on somebody to haul him or her in for questioning just by downloading 400 megabytes of stuff from AOL! That's like free candy.
After public outcry reached a crescendo, AOL apologized and took the data down. Of course, privacy advocates like the Electronic Privacy Information Center's Marc Rotenberg and the Electronic Frontier Foundation's Kurt Opsahl remain pissed off. Why? Because this is the Interweb, folks. Data never dies here. In fact, you can search the records yourself via DontDelete.com.
Once I visited Don't Delete, I couldn't leave. There's a button you can click to get the search terms from a random user, and every time I hit it, I got another gem. My favorite was user #4206444, obviously a college student trying to cheat quickly on his or her exams in order to get around to the more important things in life. Search phrases like "does social darwinism persist in social welfare policies and in the attitudes of the general public about social welfare" were followed by "free essays on adolescent depression and suicide risks" and "free essays on Charles Dickens Hard Times." In between these queries were hundreds for "sailor moon pictures," "pokemon pictures," "sonic x," and "selena pictures."
As blogger Thomas Claburn points out, there's a kind of poetry to some of the queries. He excerpts a dozen lines from the 8,200 queries made by user #23187425, all of which seem to be a sort of conversation this person was having with the search engine -- he or she never actually clicked on any links but just kept querying with plaintive phrases like "i have had trouble," "i want to change," and "i know who i am."
I'm torn. I love having access to this data, both for its touching human qualities and for the kinds of anthropological information it could yield. But as someone who believes strongly in digital privacy, I simply can't sanction what AOL did. It would be different if I had faith that discovering all those porn searches would somehow inspire people to accept that sexual curiosity is normal. And it would be different if I thought that law enforcement would consider that the people searching for "Islamic militant web forums" might simply be trying to understand the world. But I don't. This data will be used to "prove" that the Internet is crawling with child pornographers and terrorists.
Someday AOL's information should be put into the public domain for anthropologists and cultural researchers of the future. That future, however, is probably decades if not a century away. The data is too close to us now -- too easily weaponized. Nevertheless, I hold out hope that one day our search queries will illuminate us and provide for another generation a digital outline of our daily desires.