Duck Duck Go
My friend Scot recently sent me a g-chat about a new search engine DuckDuckGo. According to their website “DuckDuckGo is a general purpose search engine like Google or Bing.” They then offer four bullet points:
- Get way more instant answers
- Less spam and clutter
- Lots and lots of goodies
- Real Privacy
Those first three sound interesting, but what really piqued my interest was the fourth bullet: Real Privacy. DuckDuckGo will not collect any of your browsing information, which, in turn, could have been used to identify you and potentialyl reveal what you are searching for. Many people might not have a problem with this, but DuckDuckGo offers a very nice illustrated example of why this is potentially a problem. They go on to say in their privacy statement:
“It’s sort of creepy that people at search engines can see all this info about you, but that is not the main concern. The main concern is when they either a) release it to the public or b) give it to law enforcement. ”
“Why would they release it to the public? AOL famously released supposedly anonymous search terms for research purposes, except they didn’t do a good job of making them completely anonymous, and they were ultimately sued over it. In fact, almost every attempt to anonymize data has similarly been later found out to be much less anonymous than initially thought.”
That last line is particularly interesting. Two examples of this that immediately come to mind are the GIC insurance example and the Netflix prize example. All of these, the GIC, AOL, and Netflix examples, all released data to the public for research purposes. And in all of these examples, the releasing organization realized that they could not simply release the data to the public because of privacy concerns. They needed to do something to anonymize the data, so they did something (I’ve talked about this doing something before). But in all of these cases, the supposedly anonymous data all turned out to be, to varying degrees, less anonymous that originally thought. Simple ad hoc procedures like deleting information simply don’t work in protecting privacy unless you live under a rock and have no access to auxilliary information. The only way to be completely safe is to release no data at all. However, releasing nothing to the public prevents valuable research from being done. While GIC, AOL, and Netflix all released data that ended up being less than anonymous, you have to applaud their effort to allow researchers to do what they do: research. The Netflix prize produced plenty of valuable research (Lesson from the Netflix Prize Challenge) and the GIC data had the potential to produce valuable, potentially life saving public health research. Like most things in life, some balance must be found somewhere between the extremes, and the potential benefits of any research must be weighed against the potential costs of a privacy breach.
I see it like this: If you have something of value in your house, you wouldn’t leave the door wide open; you’d lock the door. But no matter what, if someone really wants to, they could break into your house with enough effort. Either way it’s still illegal/unethical. It’s the job of statisticians to put as many locks on the house as possible while still being able to reasonably use the house; it’s the lawyers job to prosecute people who break into the house, whether or not the door is well secured.