The size of things

May 24

Posted by statsinthewild

http://htwins.net/scale2/

Cheers.

Posted in Science

Leave a comment

Oh, Canada / Privacy?

May 24

Posted by statsinthewild

I was reading this article on Slate about a billionaire and his “de-facto spouse”. I came across this paragraph (emphasis added):

In 2002, “Eric” and “Lola” put an end to their decade-long de facto union. (These are pseudonyms used by the media, because Canadian law forbids the publication of the couple’s real names to protect the privacy of their children.) Lola, a Latin American woman, met Eric, a world-famous billionaire, when she was only 17 and he was 32. Although she wanted to get married throughout their relationship, Eric, who claims like many Quebecois that he “doesn’t believe in marriage,” refused.

Now, speaking as someone who wrote a dissertation about statistical disclosure control, this is not good. Many authors often give examples of why you can’t release certain characteristics about an individual if you want to maintain privacy. Common examples include releasing occupation information for someone who has a very rare occupation (e.g. Senator, Prime Minister) or releasing salary information on someone who has a very large salary. Knowing this type of in formation severely limits to potential pool of people who could possibly be the target. Further, the more details you have about an individual the easier it is to identify that person (e.g. age, location, gender, birth date).

Here is the information from just that one paragraph on Slate (I should be clear here, I am not criticizing slate at all; If Canada really values privacy, they need to think a little bit harder than this though):

“Eric” is a billionaire (a famous one, even).
“Eric” was 32 in 1992. (This means he was born in 1959 or 1960 and is about 52 now.)
“Eric” is from Quebec.
“Eric” is a man.
“Lola” is Latin American.
“Lola” was 17 in 1992.
“Lola” dated “Eric” for ten years.
“Lola” is a woman.

I’m not going to do this right now, but ~~I bet I could~~ you can figure out who “Eric” is in about 15 4 minutes (according to commenter Brandon) with nothing more than Google. (Here is a good place to start.)

Canada, here are some articles you might be interested in:

Cheers.

Posted in Canada, Law, Privacy

6 Comments

Gelman: Advice on writing research articles

May 24

Posted by statsinthewild

Advice on writing research articles

Cheers.

Posted in Academia, Science

Leave a comment

Data visualiztions

May 24

Posted by statsinthewild

http://selection.datavisualization.ch/

Cheers.

Posted in Math Pictures

Leave a comment

MLB Rankings – 5/23/2012

May 24

Posted by statsinthewild

StatsInTheWild MLB rankings as of May 23, 2012 at 11:51pm. SOS=strength of schedule

Team	Rank	Change	Record	ESPN	TeamRankings.com	SOS
Texas	1	–	27-18	3	2	14
Baltimore	2	↑3	28-17	2	1	5
Toronto	3	↑4	24-21	8	5	3
TampaBay	4	↑4	27-18	5	4	4
LADodgers	5	↓1	30-13	1	3	30
Boston	6	↑7	22-22	16	9	2
Atlanta	7	↓5	26-19	4	7	19
NYYankees	8	↑1	23-21	10	10	1
St. Louis	9	↓6	24-19	6	13	29
Washington	10	↓4	26-18	7	6	17
Cleveland	11	↑10	25-18	9	8	12
ChicagoWSox	12	↑7	22-22	17	15	13
Miami	13	↓3	24-20	11	12	16
Seattle	14	↑6	21-25	25	17	8
LAAngels	15	↑7	20-25	20	22	6
Cincinnati	16	↓4	24-19	13	11	28
Oakland	17	↓3	22-23	19	14	7
Detroit	18	–	20-23	18	18	11
Philadelphia	19	↓4	22-23	14	24	18
Houston	20	↓6	21-23	23	20	25
NYMets	21	↓10	24-20	15	16	15
SanFrancisco	22	↓5	23-21	12	19	27
Kansas City	23	↑6	17-26	24	23	10
Arizona	24	↓1	19-25	22	25	23
Pittsburgh	25	↓1	20-24	21	21	26
Milwaukee	26	↑1	18-26	26	28	24
Colorado	27	↓1	16-27	27	30	21
Minnesota	28	↑2	15-28	29	26	9
San Diego	29	–	16-28	30	28	20
Chicago Cubs	30	↓5	15-29	28	29	22

Past Rankings: 5/14/2012 5/7/2012 4/30/2012 4/23/2012 4/16/2012 4/13/2012 Cheers.

Posted in Baseball, R, Sports

15 Comments

May 23

Posted by statsinthewild

The Harvard Sports Analysis Collective

By Ben Blatt

In 1964, Mosteller and Wallace published Inference and Disputed Authorship: The Federalist. The paper used statistical analysis to try to determine if James Madison, Alexander Hamilton, or John Jay was the author of the unaccredited essays that were part of The Federalist Papers. They approached this historical mystery by using differences in word frequencies and Bayesian statistics. While controversial, similar methods have been used to investigate other authorship debates such as Shakespeare’s sonnets and plays. The same can be done for sports articles. While it would certainly be easier to look at the author’s name right underneath the title than to perform a statistical analysis of authors, I thought it would be fun anyways.

View original post 805 more words

Posted in Uncategorized

1 Comment

May 23

Posted by statsinthewild

3σ → Left

Math teachers have been using the “German Tank Problem” for awhile to teach estimators. It goes something like this.

The Allies capture five German tanks. Suppose that the serial numbers on the tanks are 15, 23, 59, 83, and 109. Provide an estimate of the number of tanks that were produced.

You can see that your estimate would be a great deal lower than the estimate if the serial numbers were 1015, 2394, and 9438.

The story and accompaning math problem was written about in the Guardian. My favorite line, which turned out to be true, was this:

The statisticians believed that the Germans, being Germans, had logically numbered their tanks in the order in which they were produced.

It turns out that the statisticians were spot on:

By using this formula, statisticians reportedly estimated that the Germans produced 246 tanks per month between June 1940 and September 1942…

View original post 148 more words

Posted in Uncategorized

Leave a comment

From Deadspin: The Best Shooters In The NBA, And Why Field Goal Percentage Can’t Identify Them

May 23

Posted by statsinthewild

The Best Shooters In The NBA, And Why Field Goal Percentage Can’t Identify Them

Cheers.

Posted in Basketball, Math Pictures, Sports

Leave a comment

May 17

Posted by statsinthewild

I’d like to see density estimates added to the histograms, but this is interesting.

God plays dice

There are currently 174 books on my Amazon wishlist that I could order directly from Amazon. (My wishlist has a total of 195 books, but 21 are only available from other sellers.) Total price is approximately $3,549 (I rounded all prices to whole dollars), for a mean of approximately $20 per book.

But the median price of a book on my wishlist is (again to the nearest whole dollar) $16; the difference between the median and the mean is a hint that the distribution is skewed. And there are actually two peaks — one centered on $10 and one centered on $16-17. The distribution looks like this:

I’ve cut off the histogram at $100, which omits Mitchell’s Machine Learning at a list price of $168.16. Here’s a zoomed-in version omitting the 23 most expensive (all those over $30):