Category Archives: Privacy
I was reading Flowingdata.com this morning and the blog post at the top of the page was about privacy. Nathan Yau says:
With all the stuff going on with surveillance and data privacy — especially the past week — it’s worthwhile to revisit this essay by Daniel J. Solove, a professor of law at George Washington University, on why privacy matters even if you “have nothing to hide.”
So, I went and read that article where they talk about the argument “nothing to hide, nothing to worry about.” It seems obvious to me that this argument is weak, but in case you need some quick counter arguments, here is a list of response from that article:
- My response is “So do you have curtains?” or “Can I see your credit-card bills for the last year?”
- So my response to the “If you have nothing to hide … ” argument is simply, “I don’t need to justify my position. You need to justify yours. Come back with a warrant.”
- I don’t have anything to hide. But I don’t have anything I feel like showing you, either.
- If you have nothing to hide, then you don’t have a life.
- Show me yours and I’ll show you mine.
- It’s not about having anything to hide, it’s about things not being anyone else’s business.
- Bottom line, Joe Stalin would [have] loved it. Why should anyone have to say more?
These are all great counter arguments to the “nothing to hide, nothing to worry about” argument, but I’d like to add another: Privacy is a basic human right. I realize that sounds dramatic, but go read the Universal Declaration of Human Rights for yourself.
Article 12: No one shall be subjected to arbitrary interference with his privacy, family, home or correspondence, nor to attacks upon his honour and reputation. Everyone has the right to the protection of the law against such interference or attacks.
And why are human rights important? From the preamble:
Whereas recognition of the inherent dignity and of the equal and inalienable rights of all members of the human family is the foundation of freedom, justice and peace in the world.
So, according to the United Nations, one of your basic human rights is your right to privacy, and human rights are the foundation of freedom, justice, and peace. That’s a big deal. A really big deal. Let me repeat this one more time: Your privacy is a big deal.
The UN does add the caveat of “arbitrary interference” because of course there are situations where you lose the right to privacy. So what does arbitrary mean? Just to be sure I looked up the definition. Google returns two definitions. The first is “Based on random choice or personal whim, rather than any reason or system” and the second is “(of power or a ruling body) Unrestrained and autocratic in the use of authority.” From what little I have read about the NSA privacy stuff, it sounds like these definitions are describing exactly this situation. The “51% sure” that a person is foreign sure sounds a lot like “personal whim” to me. Whether or not this data collection is an “arbitrary interference” is a debate that we need to have as a society. Where is the line?
The issue of privacy and where this line is is particularly interesting to me because when I was in graduate school I wrote my dissertation on statistical disclosure control. This involves attempting to balance the dissemination to researchers of useful data against the privacy of the individual. In terms of what I was studying, the benefit of data dissemination is useful research, often on topics of public health, whereas in the NSA situation the alleged benefit is in fighting terrorism. In both cases, the risk is the erosion of individual privacy. In public health neither extreme is an acceptable solution. Either you release none of the data and you have perfect privacy at the cost of halting scientific research, or you release all of the data to anyone who wants to use it for research (or any other purpose!) and you achieve maximum utility at the cost of total lack of privacy. Some balance between these two must be struck. I think it’s probably the same with the NSA and the government: Some balance must be struck between fighting terrorists, who are a real and legitimate threat, and the ideals that we have as citizens of the United States and as human beings.
One of the difficulties in having this debate, and again I’ll make a comparison to my research, is that quantifying utility and privacy in many situations is very difficult. This makes it hard to really quantify the costs and benefits associated with decisions on where to draw the line when releasing data to researchers. This is likely even more difficult to quantify with what the NSA is doing. If we could easily measure how much terrorism is prevented versus how much privacy we are losing, society could have a debate and reach some sort of conclusion. But we don’t have these measures. (One potential way to measure terrorism would be in attacks prevented, but it’s virtually impossible to do this with any accuracy. And measuring privacy might be even more challenging.) If this were an economic problem we could weigh the costs versus the benefits on some monetary scale. If this were drug testing we would weigh the benefits of the drug versus the side-effects. But in this situation there is no clear way to measure either of these concepts in an objective way, and even if we could, everyone has a different idea about where the bar should be set.
So ultimately, as a society, we’re in a situation where we need to have a debate about privacy protection versus terrorism prevention where both sides have important concerns that are often in conflict with one another. We need to decide where to tip the scale so that we balance these two things in a meaningful way that respects our American and international ideals of freedom, justice, and peace. But we need to do this without using any objective scale. It’s a an extremely difficult problem and it’s just going to get more complicated as we move further into a world dominated by data.
In 2002, “Eric” and “Lola” put an end to their decade-long de facto union. (These are pseudonyms used by the media, because Canadian law forbids the publication of the couple’s real names to protect the privacy of their children.) Lola, a Latin American woman, met Eric, a world-famous billionaire, when she was only 17 and he was 32. Although she wanted to get married throughout their relationship, Eric, who claims like many Quebecois that he “doesn’t believe in marriage,” refused.
Now, speaking as someone who wrote a dissertation about statistical disclosure control, this is not good. Many authors often give examples of why you can’t release certain characteristics about an individual if you want to maintain privacy. Common examples include releasing occupation information for someone who has a very rare occupation (e.g. Senator, Prime Minister) or releasing salary information on someone who has a very large salary. Knowing this type of in formation severely limits to potential pool of people who could possibly be the target. Further, the more details you have about an individual the easier it is to identify that person (e.g. age, location, gender, birth date).
Here is the information from just that one paragraph on Slate (I should be clear here, I am not criticizing slate at all; If Canada really values privacy, they need to think a little bit harder than this though):
- “Eric” is a billionaire (a famous one, even).
- “Eric” was 32 in 1992. (This means he was born in 1959 or 1960 and is about 52 now.)
- “Eric” is from Quebec.
- “Eric” is a man.
- “Lola” is Latin American.
- “Lola” was 17 in 1992.
- “Lola” dated “Eric” for ten years.
- “Lola” is a woman.
I’m not going to do this right now, but
I bet I could you can figure out who “Eric” is in about 15 4 minutes (according to commenter Brandon) with nothing more than Google. (Here is a good place to start.)
Canada, here are some articles you might be interested in:
- Anonymized” data really isn’t—and here’s why not
- Exposed: The erosion of privacy in the Internet era
- What Information is “Personally Identifiable”?
My friend Scot recently sent me a g-chat about a new search engine DuckDuckGo. According to their website “DuckDuckGo is a general purpose search engine like Google or Bing.” They then offer four bullet points:
- Get way more instant answers
- Less spam and clutter
- Lots and lots of goodies
- Real Privacy
Those first three sound interesting, but what really piqued my interest was the fourth bullet: Real Privacy. DuckDuckGo will not collect any of your browsing information, which, in turn, could have been used to identify you and potentialyl reveal what you are searching for. Many people might not have a problem with this, but DuckDuckGo offers a very nice illustrated example of why this is potentially a problem. They go on to say in their privacy statement:
“It’s sort of creepy that people at search engines can see all this info about you, but that is not the main concern. The main concern is when they either a) release it to the public or b) give it to law enforcement. ”
“Why would they release it to the public? AOL famously released supposedly anonymous search terms for research purposes, except they didn’t do a good job of making them completely anonymous, and they were ultimately sued over it. In fact, almost every attempt to anonymize data has similarly been later found out to be much less anonymous than initially thought.”
That last line is particularly interesting. Two examples of this that immediately come to mind are the GIC insurance example and the Netflix prize example. All of these, the GIC, AOL, and Netflix examples, all released data to the public for research purposes. And in all of these examples, the releasing organization realized that they could not simply release the data to the public because of privacy concerns. They needed to do something to anonymize the data, so they did something (I’ve talked about this doing something before). But in all of these cases, the supposedly anonymous data all turned out to be, to varying degrees, less anonymous that originally thought. Simple ad hoc procedures like deleting information simply don’t work in protecting privacy unless you live under a rock and have no access to auxilliary information. The only way to be completely safe is to release no data at all. However, releasing nothing to the public prevents valuable research from being done. While GIC, AOL, and Netflix all released data that ended up being less than anonymous, you have to applaud their effort to allow researchers to do what they do: research. The Netflix prize produced plenty of valuable research (Lesson from the Netflix Prize Challenge) and the GIC data had the potential to produce valuable, potentially life saving public health research. Like most things in life, some balance must be found somewhere between the extremes, and the potential benefits of any research must be weighed against the potential costs of a privacy breach.
I see it like this: If you have something of value in your house, you wouldn’t leave the door wide open; you’d lock the door. But no matter what, if someone really wants to, they could break into your house with enough effort. Either way it’s still illegal/unethical. It’s the job of statisticians to put as many locks on the house as possible while still being able to reasonably use the house; it’s the lawyers job to prosecute people who break into the house, whether or not the door is well secured.