Category Archives: Uncategorized
Super Bowl Squares
I received an email this morning from a friend: “Is there any sort of a statistical breakdown for which are the best numbers to have in a Super Bowl squares pool (for entertainment purposes only)?”
Now, if my friend were going to use this information to gamble, it would be highly unethical. However, since he clearly stated that it was for “entertainment purposes only,” I feel that I can conduct a study with a clear conscience.
If he had wanted to gamble on it, here is a quick explanation of how that usually takes place. (According to that website: “Basically, if you are at a party where you don’t have betting squares you are a Communist.”)
Anyway, using data from football-reference.com I created a ten by ten frequency table (using R, of course) of exactly how many times each outcome has occurred in the history of the NFL. You can find the graph here.
Somethings to note:
- 2-2 is the worst square by far. It’s only happened 5 times in the history of the league. The fair odds for this square are over 2800-to-1.
- The best squares are, no surprise, 7-0 and 0-7, occurring 581 and 577 times, respectively.
- The other great squares to have are in order, 0-3, 0-4, 4-7, and 7-4. All of these have occurred over 480 times each.
- These 6 outcomes (7-0, 0-7, 0-3, 0-4, 4-7, and 7-4) account for almost 23% of all the NFL games ever played.
Cheers.
Rule for Variance Inflation Factors
A quote from here:
“Goldberger (1991) notes that while the number of pages in econometrics
texts devoted to the problem of multi-collinearity in multiple regression is
large the same books have little to say about sample size. Goldberger states:
“Perhaps that imbalance is attributable to the lack of an exotic polysyllabic
name for ‘small sample size.’ If so, we can remove that impediment by introducing the term micronumerosity” (Goldberger, 1991: 248–249).”
Cheers.
NFL Rankings – After Week 17
Rankings updated as of 1/1/2012; Records updated as of 1/1/2012; CHFF rankings as of 12/28/2011
AFC NFC
Playoff team
Division Champ
Eliminated from Playoffs
| Team | Rank | Change | Record | CHFF Rank |
| New England | 1 | – | 13-3 | 3 |
| Pittsburgh | 2 | ↑ | 12-4 | 1 |
| Green Bay | 3 | ↓ | 15-1 | 6 |
| Baltimore | 4 | – | 12-4 | 5 |
| Atlanta | 5 | – | 10-6 | 11 |
| 6 | ↑ | 8-8 | 2 | |
| New Orleans | 7 | ↓ | 13-3 | 15 |
| 8 | – | 8-8 | 23 | |
| Detroit | 9 | ↑↑ | 10-6 | 10 |
| San Francisco | 10 | – | 13-3 | 4 |
| NY Giants | 11 | ↓↓ | 9-7 | 8 |
| 12 | – | 8-8 | 12 | |
| 13 | – | 8-8 | 18 | |
| 14 | ↑ | 4-12 | 14 | |
| 15 | ↑↑ | 7-9 | 30 | |
| 16 | ↑↑↑↑ | 7-9 | 9 | |
| 17 | ↓↓↓ | 8-8 | 26 | |
| Cincinnati | 18 | ↓↓ | 9-7 | 22 |
| 19 | ↑↑ | 8-8 | 7 | |
| 20 | ↑↑ | 9-7 | 19 | |
| 21 | ↓↓↓ | 6-10 | 13 | |
| Houston | 22 | ↓↓↓ | 10-6 | 20 |
| 23 | – | 6-10 | 24 | |
| 24 | ↑ | 2-14 | 16 | |
| Denver | 25 | ↓ | 8-8 | 30 |
| 26 | ↑↑↑↑ | 5-11 | 28 | |
| 27 | ↓ | 3-13 | 27 | |
| 28 | ↓ | 4-12 | 21 | |
| 29 | ↓ | 8-8 | 29 | |
| 30 | ↓ | 5-11 | 25 | |
| 31 | – | 6-10 | 17 | |
| 32 | – | 2-14 | 32 |
BCS: My offer still stands…….if you want to contact me you can send me a tweet @StatsInTheWild.
Cheers.
Yates and significance tests
I was reading the newest issue of Significance Magazine last night, and I came across this quote in a letter that someone had written to the magazine:
The emphasis given to formal tests of significance throughout [R.A. Fisher’s] Statistical Methods…has caused scientific research workers to pay undue attention to the results of the tests of significance they perform on their data, particularly data derived from experiments, and too little to the estimates of the magnitude of the effects they are investigating.” … “The emphasis on tests of significance and the consideration of the results of each experiment in isolation, have had the unfortunate consequence that scientific workers have often regarded the execution of a test of significance on an experiment as the ultimate objective. (Yates 1951)
I feel like this is extremely relevant today where it seems that the only thing anyone ever cares about in studies is the p-value and whether or not it is less than the mythical 0.05 cut-off. But what strikes me most about this quote is that it was written sixty years ago in 1951.
Cheers.
Statisticians are special because…..
From Andrew Gelman’s Blog: “P.S. Statisticians are special because, deep in our bones, we know about uncertainty. Economists know about incentives, physicists know about reality, movers can fit big things in the elevator on the first try, evolutionary psychologists know how to get their names in the newspaper, lawyers know you should never never never talk to the cops, and statisticians know about uncertainty. Of that, I’m sure.”
Slate and Statistics
Here are two interesting articles related to statistics that were featured on Slate.com two Mondays ago:
The first article, by Kevin Gold, is called “The Leaky Nature of Online Privacy: Network analysis can uncover your personal details even if you choose to hide them.” This led me to LaTanya Sweeney’s webpage (of k-anonymity fame), which I then spent quite a bit of time reading. (I found the work on face de-identification to be very interesting.)
On that same day on Slate, everyone’s favorite former governor of New York, Eliot Spitzer (If you haven’t seen “Client 9” yet, stop what you are doing and watch it) had an article called “World Defeats U.S. in Four Sets: How the decline of American men’s tennis can explain global economics.” In the article, Spitzer discusses the difference between correlation and causation as it relates to tennis and the economy.
Cheers.
Google auto-complete and Slate
Recently, I posted (“Multidimensional Scaling, Republican Presidential Candidates, and ‘a douchebag” and “Tracking the Republican candidates via google auto-complete“) about Google auto-complete and potential Republican presidential candidates. Slate.com posted a good piece called “Google’s GOP Search Suggestion” (a day after my original post, I should note) where they look at the auto-complete for candidates names using a Google image search rather than a straight Google search.
Defense
Tomorrow I defend my dissertation. If all goes well, you will all have the opportunity to finally call me doctor; I know you are just as excited about this as I am. Hopefully, I will have more time to write blog posts post-defense.
Cheers.
Dissertating (in the wild)
Well, I’m almost done with my dissertation, which means I’m almost done with my Ph. D. And when I say done, I mean it in both the senses of “finished” and “sick of”. I have a nearly complete document AND a defense date. Now all I have to do is put the most important skill I learned in grad school to good use: finding and filling out paperwork. Anyone can write a 100+ page dissertation filled with original thoughts, but only the best and brightest can jump through all of the bureaucratic hoops to actually complete the degree.
Anyway, I really enjoyed my dissertation topic, which, I hear, is not something that everyone experiences. I’ll eventually come back to the topic (statistical disclosure limitation), but I really just need some time away from it. I’ll get my wish as I’ll be starting a post-doc this summer researching statistical genetics, which I am probably a little over excited to start.
Cheers.
ENAR (in the wild)
I’m currently attending the 2011 ENAR spring meeting in Miami. I arrived Sunday night an presented a poster at the opening poster presentation session. On Monday, I attended two sessions in the afternoon: the survival analysis section and, later, the policy section.
In the policy section, I saw a presentation entitled “Issues in the use of survival analysis to estimate damages in equal employment cases” by Qing Pan and Jooseph L. Gaswirth, which has been published in the journal Law, Probability, and Risk in the March 2009 issue. The presentation was two-fold: First they presented some basic methods for determining whether or not discrimination had taken place. In this case (age discrimination), it was fairly evident that the infraction has occurred. Second, the authors presented how to assess the compensation that should be awarded to the parties which had been discriminated against. In order to do so, they applied survival analysis techniques to estimate how long someone would have worked at the company if they had been employed. Very interesting stuff.
Along the same legal lines, I happened to pick up a book called “A Very Short Introduction to Statistics” by David J. Hand. While I was flipping through it I came across a section about a woman named Sally Clark. She was a woman who had two children, both of whom died within the first 11 weeks of their respective lives. Subsequently, she was charged with murder as it seemed suspicious that TWO of her children had both died so young. During the trial, Professor Sir Roy Meadow, (famous for proposing the theory of Munchausen Syndrome by Proxy (MSbP)) claimed that the chances of two of her babies dying in this fashion totally by chance was 73,000,000:1. At those odds, I suppose you would have to convict the person. However, his method for arriving at this number was flawed. The Royal Statistics Society issued a statement that began “In the recent highly-publicised case of R v. Sally Clark, a medical expert witness drew on published studies to obtain a figure for the frequency of sudden infant death syndrome (SIDS, or “cot death”) in families having some of the characteristics of the defendant’s family. He went on to square this figure to obtain a value of 1 in 73 million for the frequency of two cases of SIDS in such a family.” (Read the whole statement here.) The way I feel about this can be summed up by some comments my friend (a lawyer) made when I emailed him about this case: “That’s wild that that happened in 1999. I figured it would be like
1899.”
So anyway, I am now sitting in my hotel room at the Leamington (students can’t afford the Hyatt). I’ll leave you with a picture of the hotel I am staying at. I can’t wait to get a job.
Cheers.


