Oh, Canada / Privacy?
I was reading this article on Slate about a billionaire and his “de-facto spouse”. I came across this paragraph (emphasis added):
In 2002, “Eric” and “Lola” put an end to their decade-long de facto union. (These are pseudonyms used by the media, because Canadian law forbids the publication of the couple’s real names to protect the privacy of their children.) Lola, a Latin American woman, met Eric, a world-famous billionaire, when she was only 17 and he was 32. Although she wanted to get married throughout their relationship, Eric, who claims like many Quebecois that he “doesn’t believe in marriage,” refused.
Now, speaking as someone who wrote a dissertation about statistical disclosure control, this is not good. Many authors often give examples of why you can’t release certain characteristics about an individual if you want to maintain privacy. Common examples include releasing occupation information for someone who has a very rare occupation (e.g. Senator, Prime Minister) or releasing salary information on someone who has a very large salary. Knowing this type of in formation severely limits to potential pool of people who could possibly be the target. Further, the more details you have about an individual the easier it is to identify that person (e.g. age, location, gender, birth date).
Here is the information from just that one paragraph on Slate (I should be clear here, I am not criticizing slate at all; If Canada really values privacy, they need to think a little bit harder than this though):
- “Eric” is a billionaire (a famous one, even).
- “Eric” was 32 in 1992. (This means he was born in 1959 or 1960 and is about 52 now.)
- “Eric” is from Quebec.
- “Eric” is a man.
- “Lola” is Latin American.
- “Lola” was 17 in 1992.
- “Lola” dated “Eric” for ten years.
- “Lola” is a woman.
I’m not going to do this right now, but I bet I could you can figure out who “Eric” is in about 15 4 minutes (according to commenter Brandon) with nothing more than Google. (Here is a good place to start.)
Canada, here are some articles you might be interested in:
- Anonymized” data really isn’t—and here’s why not
- Exposed: The erosion of privacy in the Internet era
- What Information is “Personally Identifiable”?
Cheers.
MLB Rankings – 5/23/2012
StatsInTheWild MLB rankings as of May 23, 2012 at 11:51pm. SOS=strength of schedule
| Team | Rank | Change | Record | ESPN | TeamRankings.com | SOS |
| Texas | 1 | – | 27-18 | 3 | 2 | 14 |
| Baltimore | 2 | ↑3 | 28-17 | 2 | 1 | 5 |
| Toronto | 3 | ↑4 | 24-21 | 8 | 5 | 3 |
| TampaBay | 4 | ↑4 | 27-18 | 5 | 4 | 4 |
| LADodgers | 5 | ↓1 | 30-13 | 1 | 3 | 30 |
| Boston | 6 | ↑7 | 22-22 | 16 | 9 | 2 |
| Atlanta | 7 | ↓5 | 26-19 | 4 | 7 | 19 |
| NYYankees | 8 | ↑1 | 23-21 | 10 | 10 | 1 |
| St. Louis | 9 | ↓6 | 24-19 | 6 | 13 | 29 |
| Washington | 10 | ↓4 | 26-18 | 7 | 6 | 17 |
| Cleveland | 11 | ↑10 | 25-18 | 9 | 8 | 12 |
| ChicagoWSox | 12 | ↑7 | 22-22 | 17 | 15 | 13 |
| Miami | 13 | ↓3 | 24-20 | 11 | 12 | 16 |
| Seattle | 14 | ↑6 | 21-25 | 25 | 17 | 8 |
| LAAngels | 15 | ↑7 | 20-25 | 20 | 22 | 6 |
| Cincinnati | 16 | ↓4 | 24-19 | 13 | 11 | 28 |
| Oakland | 17 | ↓3 | 22-23 | 19 | 14 | 7 |
| Detroit | 18 | – | 20-23 | 18 | 18 | 11 |
| Philadelphia | 19 | ↓4 | 22-23 | 14 | 24 | 18 |
| Houston | 20 | ↓6 | 21-23 | 23 | 20 | 25 |
| NYMets | 21 | ↓10 | 24-20 | 15 | 16 | 15 |
| SanFrancisco | 22 | ↓5 | 23-21 | 12 | 19 | 27 |
| Kansas City | 23 | ↑6 | 17-26 | 24 | 23 | 10 |
| Arizona | 24 | ↓1 | 19-25 | 22 | 25 | 23 |
| Pittsburgh | 25 | ↓1 | 20-24 | 21 | 21 | 26 |
| Milwaukee | 26 | ↑1 | 18-26 | 26 | 28 | 24 |
| Colorado | 27 | ↓1 | 16-27 | 27 | 30 | 21 |
| Minnesota | 28 | ↑2 | 15-28 | 29 | 26 | 9 |
| San Diego | 29 | – | 16-28 | 30 | 28 | 20 |
| Chicago Cubs | 30 | ↓5 | 15-29 | 28 | 29 | 22 |
Past Rankings: 5/14/2012 5/7/2012 4/30/2012 4/23/2012 4/16/2012 4/13/2012 Cheers.
The Harvard Sports Analysis Collective
By Ben Blatt
In 1964, Mosteller and Wallace published Inference and Disputed Authorship: The Federalist. The paper used statistical analysis to try to determine if James Madison, Alexander Hamilton, or John Jay was the author of the unaccredited essays that were part of The Federalist Papers. They approached this historical mystery by using differences in word frequencies and Bayesian statistics. While controversial, similar methods have been used to investigate other authorship debates such as Shakespeare’s sonnets and plays. The same can be done for sports articles. While it would certainly be easier to look at the author’s name right underneath the title than to perform a statistical analysis of authors, I thought it would be fun anyways.
View original post 805 more words
Math teachers have been using the “German Tank Problem” for awhile to teach estimators. It goes something like this.
The Allies capture five German tanks. Suppose that the serial numbers on the tanks are 15, 23, 59, 83, and 109. Provide an estimate of the number of tanks that were produced.
You can see that your estimate would be a great deal lower than the estimate if the serial numbers were 1015, 2394, and 9438.
The story and accompaning math problem was written about in the Guardian. My favorite line, which turned out to be true, was this:
The statisticians believed that the Germans, being Germans, had logically numbered their tanks in the order in which they were produced.
It turns out that the statisticians were spot on:
By using this formula, statisticians reportedly estimated that the Germans produced 246 tanks per month between June 1940 and September 1942…
View original post 148 more words
I’d like to see density estimates added to the histograms, but this is interesting.
There are currently 174 books on my Amazon wishlist that I could order directly from Amazon. (My wishlist has a total of 195 books, but 21 are only available from other sellers.) Total price is approximately $3,549 (I rounded all prices to whole dollars), for a mean of approximately $20 per book.
But the median price of a book on my wishlist is (again to the nearest whole dollar) $16; the difference between the median and the mean is a hint that the distribution is skewed. And there are actually two peaks — one centered on $10 and one centered on $16-17. The distribution looks like this:
I’ve cut off the histogram at $100, which omits Mitchell’s Machine Learning at a list price of $168.16. Here’s a zoomed-in version omitting the 23 most expensive (all those over $30):
The two peaks are easy to explain: paperbacks and hardcovers, respectively. The…
View original post 82 more words
Shan Carter (and an army of others) share some sketches from the NYT electoral map
via @freakonometrics: Shan Carter (and an army of others) share some sketches from the NYT electoral map
Cheers.
What a great quote: “As an employer I want the best prepared and qualified employees. I could care less if the source of their education was accredited by a bunch of old men and women who think they know what is best for the world. I want people who can do the job. I want the best and brightest. Not a piece of paper.” -Mark Cuban
This is what I see when i think about higher education in this country today:
Remember the housing meltdown ? Tough to forget isn’t it. The formula for the housing boom and bust was simple. A lot of easy money being lent to buyers who couldn’t afford the money they were borrowing. That money was then spent on homes with the expectation that the price of the home would go up and it could easily be flipped or refinanced at a profit. Who cares if you couldn’t afford the loan. As long as prices kept on going up, everyone was happy. And prices kept on going up. And as long as pricing kept on going up real estate agents kept on selling homes and finding money for buyers.
Until the easy money stopped. When easy money stopped, buyers couldn’t sell. They couldn’t refinance. First sales slowed, then prices started falling…
View original post 1,376 more words

