An email from “Ted Wells, Esq.”

Here is an email I received from “Ted Wells, Esq.” (It’s not really from Ted Wells, and the only reason I want to make that clear is because I’m scared the all-powerful Roger Goodell will levy and arbitrarily harsh punishment.)

Dear Greg,

Please help me calculate a probability.  I have been asked to investigate whether Tom Brady and the New England Patriots have conspired to break the rules by using under inflated footballs.  This allegation was made by two different opponents in 2014 and has roots going back over 10 years according to AJ Feely, who allegedly played in the NFL.  Some background data for the probability I need you to calculate:  Based on data provided by Rotowire, Tom has handled 3399 snaps over the past 3 years, an average of almost 71 per game.  If we extrapolate from this to his entire career of 238 games including playoffs, he has handled approximately 16,898 snaps during NFL games.  And before every offensive snap the football is handled by a linesman, an NFL official, who places the ball on the line of scrimmage.

So here is what I need you to calculate:  What is the probability that the Patriots have been using under inflated footballs for an extended period of time, perhaps over a decade, perhaps almost 17,000 offensive snaps, when there have been zero – ZERO – instances in which a game official who handles both team’s footballs over 140 times per game noticed that a Patriots ball might be too soft?  I have also asked Warren Sharp to calculate this for me, but he currently is preoccupied trying to get a gob of peanut butter off the roof of his mouth.

Sincerely,

Ted Wells, Esq.

85% is a unicorn – on predictions in the National Hockey League postseason

statsinthewild:

Great unicorn picture

Originally posted on StatsbyLopez:

Screen Shot 2015-05-20 at 12.40.25 PM

On February 20, the National Hockey League announced a partnership with software company SAP. The alliance’s primary purpose was to bring a new enhanced stats section to NHL.com, built in the shadows of popular analytics sites like war-on-ice and the now dormant extra-skater.

It was, it seemed, a partial admission from the league that it’s best metrics were hosted elsewhere.

“The stats landscape in the NHL is kind of all over the place,” suggested Chris Foster, Director of the NHL’s Digital Business Development, at the time. “One of the goals is to make sure that all of the tools that fans need are on NHL.com.”

One tool presented in February was SAP’s Matchup Analysis, designed to predict the league’s postseason play. The tool claimed 85% accuracy, which Yahoo’s Puck Daddy boasted was good enough to make “TV executives nervous and sports [bettors] rather happy.”

There’s just one problem.

85%…

View original 779 more words

Penalties and the NHL

Originally posted on StatsbyLopez:

Noah and I wrote an article about penalty patterns in the NHL. It’s over on FiveThirtyEight – many thanks to the folks over there for efforts in helping us put it together.

I wanted to share two plots that I thought were interesting, and related to our study.

First, here’s a contrast of penalty violations, comparing the probability of a home team penalty by game type (regular, postseason).

RS_PS

Home teams are unlikely to get penalties when they are owed penalties. They are even less likely to get those penalties in postseason play.

Also, Sam and Micah suggested that it would be worth looking at the type of penalty called given each scenario. While some of the penalties are of unknown types (strange data entry), here’s a mosaic plot of the types that are known, using the first four letters in the columns as the penalty type, along with the penalty differential.

mosaic

I might be missing something, but there doesn’t…

View original 29 more words

A dissatisfied reader

I received this comment on my recent article:

Considering you misspelled Berkeley twice in your article, despite the fact it was spelled correctly (by someone else) in a subsequent quote, and say preposterous things like “the answer to that question is no. It might be yes”, it might behoove you to be a little more civil in your tone throughout – you make stupid and careless mistakes too, like anyone else. I came to this article excited for a strong statistical argument and got mostly adolescent snark and vitriol beneath a “real statistician”. The fact that you get “pissed off” is human, but that you have to keep telling us about it, while ranting belligerently, is puerile and self-absorbed.

Cheers.

No, actually Warren Sharp wasn’t redeemed by the Wells report. He’s still wrong.

TL;DR

Summary

Last year during the AFC Championship game, the New England Patriots were accused of deflating their own footballs below the pressure allowed by the NFL. The Patriots won the game by what felt like 100-7. Following their win, the media, always needing something to talk about, created a kerfuffle of epicly idiotic proportions. In the two weeks leading up to the Super Bowl, the media was interviewing physics professors about PSI and how it is affected by temperature, Bill Nye made an appearance, and Tom Brady had to talk about his balls (teehee). It was quite a low point for sports writing.

One highlight of the lowlights was an article written by a tout named Warren Sharp.  He basically claimed that the Patriots were a huge outlier in terms of their fumble rate and it was impossible to explain how this could happen.  This story was picked up by Slate, Huffington Post, and even RealClearPolitics among others.  However, his “analysis” was riddled with problematic logic, details of which can be found here and here.  Case closed right?

Well apparently this led to a “stat spat“.  Eric Adelson, who was nice enough to take the time to interview me, wrote about the “controversy”.  I told him that I thought almost all of Sharp’s work was garbage (he quoted me as describing it as “98% bunk”, which I’m pretty sure I never actually said because I don’t think I would ever use the word “bunk”).  I was generally unhappy with the way that original article was written as it lent credibility to someone whose argument was so easily dismantled.  However, Adelson presented Sharp’s opinion versus my opinion as if they were equally legitimate in spite of the numerous flaws that myself and Mike Lopez pointed out in our Deadspin piece.

The argument “raged on” and then the Super Bowl happened and deflategate went into hibernation.  Until it was revived recently by the release of the Wells Report.  The Wells Report is a 243-page document detailing the findings of the NFL’s deflategate investigation. (As a fun side note, the firm hired to perform the investigation is the same firm that once denied second hand smoke causes cancer.) Deadspin sums up that report like this:

Patriots Likely Deflated Balls On Purpose

As a result of this Tom Brady was suspended for 4 games, the Patriots were fined $1,000,000, and New England lost some draft picks.

So it seems the Patriots cheated.  (Not the first time they’ve been caught either.)  This caused a shit storm (two words? hyphen?) on Twitter and in the media.  But after reading so many articles about the deflategate “scandal”, one stood out above the rest to me on the WTF scale.  I’m talking about Eric Adelson’s follow-up piece entitled “Deflate-gate report re-energizes stat geek’s controversial fumbling analysis of Patriots“.  Here are the first lines of the article:

It began as an intriguing statistical correlation. It blew up into a national debate. Now it’s a civil engineer’s redemption song.

The civil engineer that Adeslon is referring to is none other than Warren Sharp who has been redeemed by the Well’s report.  Wait.  What?!?!?!

Adelson goes on to say:

Sharp never leapt to the conclusion that the Pats’ alleged deflation of footballs brought about their fumbling advantage – correlation doesn’t mean causation – but many people took it that way. And several statisticians scoffed. After all, this guy runs a gambling site and suddenly he is some sort of stats wizard? One statistician called Sharp’s work “98 percent bunk.”

Some notes:

  • Sharp’s “analysis” regarding the Patriots extremely low fumbling rate is incredibly sloppy.  I’ll point, once again, to my article (with Mike Lopez) that explains some of the many, many flaws in Sharp’s analysis.  No one has really suitably shown that the Patriots ever had a fumbling advantage at all, but Adelson seems to keep stating it as fact for some reason.  In fact, the best work I have seen on quantifying the Patriots fumbling rates was done recently by Mike Lopez, and he finds that “once you account for play and game characteristics, it is really difficult to distinguish between the fumble rates of NFL teams.”
  • I feel like “correlation doesn’t mean causation”, when stated by a member of the media, is code for “I don’t really know what I’m talking about, but I say this to sound smart.”  Of course, he’s right, correlation doesn’t imply causation.  But I don’t know who these “statisticians who scoffed” are that he is referring to.  I don’t believe Mike nor I ever said anything about causation because we couldn’t even really find a strong correlation to begin with.
  • Adelson says, “One statistician called Sharp’s work “98 percent bunk.”  The statistician he is referring to is a guy name Gregory J. Matthews at Loyola University Chicago (I hear he’s pretty good, but that he would never use the word bunk cause he’s not a 75 year old grandmother.)  Here is a new quote for you: Warren Sharp’s analysis of the Patriots fumble rates was amateurish garbage.
  • Finally, a question for Adelson: Why bother calling me to ask for my opinion if you are just going to ignore it anyway and give Sharp’s opinion more weight no matter what I say?  That really pisses me off.

Then there is this:

“Now that it seems likely that the Patriots were violating the rules to gain an advantage,” he [Sharp] wrote, “the fact that they also had an extremely low fumble rate makes it more likely that the relationship between inflation levels and fumbling is real – and more likely that the Patriots have materially benefited from their cheating.”

Disclaimer: “cheating” is not suggested by Sharp. But the proximity between the fumble rate and the possible deflation is gathering more credibility. Sharp’s gun is suddenly smoking again.

  • But the Patriots DIDN’T have extremely low fumbling rates!!!
  • Cheating isn’t suggested by Sharp, it’s just being strongly hinted at by writers in the media who want to create sensationalist stories for the front page of Yahoo Sports.
  • Since members of the media seem to be ignoring all of my rational arguments, here is one that maybe will work: NO NO NO NO NO.  YOU ARE WRONG!  Does yelling work?

The whole article really is a gem of media narrative framing and sensationalism, but I’ll leave you with this one last quote:

“Now I actually have some validation in the field,” Sharp said. “‘Hey, this guy was right all along.'”

No, you weren’t validated.

Recap: Sharp, using terribly flawed statistical analysis, found that the Patriots had outrageously low fumbling rates, then the media picked it up and ran with the story, without asking any questions, because it was convenient and interesting.  And now the Wells report some how redeems Sharp.

That is like saying something like 3+3=7 therefore the sky is filled with water.  The media reports this as brilliant.  Then a mathematician comes along and points out that 3+3 does not in fact equal 7 and therefore the logic is flawed.  Then when it is discovered that the sky is blue (more likely than not), which is like the color of water, the media claims that the original argument is vindicated.   This is insane.

Why don’t people understand that Sharp is wrong??

So, this got me thinking about why this story has so many legs.  I refuse to believe that it’s simply that people are stupid (though some may argue that this is the reason).  Rather, I choose to believe that people simply like simple narratives and interesting anecdotes and so that’s what the media gives them.

A good example of this comes from an article by biological economist Terry Burnham entitled, “A trick for higher SAT scores?  Unfortunately, no.” The article describes an interesting idea (“that people score higher on a test if the questions are hard to read”) backed by statistical evidence.  This got picked up by Malcolm Gladwell, the king of the anecdotes (also a very good and entertaining writer), in his book David and Goliath.  Unfortunately, it’s probably not true. Burnham states:

The original paper reached its conclusions based on the test scores of 40 people. In our paper, we analyze a total of over 7,000 people by looking at the original study and 16 additional studies. Our summary:
Easy-to-read average score: 1.43/3  (17 studies, 3,657 people)
   Hard-to-read average score: 1.42/3  (17 studies, 3,710 people)

Burnham also mentions three lessons that he takes away from this:

  1. Beware simple stories.
  2. Ideas have considerable “Meme-mentum”
  3. We can measure the rate of learning.

These first two lessons are directly applicable to Deflategate and the Warren Sharp “analysis”.  The Patriots have a nearly impossible fumble rate (a simple story!).  Story gets picked up by major media outlets (considerable “Meme-mentum”!).  Unfortunately, the story probably isn’t true.

Finally, Burnham sums up the story as follows.

The story told by Professor Kahneman and by Malcolm Gladwell is very good. In most cases, however, reality is messier than the summary story.

If we were to change Kahneman and Gladwell to Sharp and Adelson, this quote could easily be about deflategate and the Patriots fumbling rate.

Another thought about aggregated rates

Speaking of the Patriots fumble rates, using rates without controlling for any other factors can often lead to erroneous conclusions.  One famous example of this can be found in Bickell, Hammel, and O’Connell (1975) which looked at the rates of admissions of men and women to graduate school at Berkeley in 1973.  44% of men were being admitted while only 35% of women were given the same opportunity.  So a sensationalist media outlet might have posted the headline “Berkeley found to be discriminating against women!”  Imagine the outrage!  Imagine the click through rate!  Fortunately, it wasn’t true (Simpson’s paradox!).  Here is the abstract from that article:

Examination of aggregate data on graduate admissions to the University of California, Berkeley, for fall 1973 shows a clear but misleading pattern of bias against female applicants. Examination of the disaggregated data reveals few decision-making units that show statistically significant departures from expected frequencies of female admissions, and about as many units appear to favor women as to favor men. If the data are properly pooled, taking into account the autonomy of departmental decision making, thus correcting for the tendency of women to apply to graduate departments that are more difficult for applicants of either sex to enter, there is a small but statistically significant bias in favor of women. The graduate departments that are easier to enter tend to be those that require more mathematics in the undergraduate preparatory curriculum. The bias in the aggregated data stems not from any pattern of discrimination on the part of admissions committees, which seem quite fair on the whole, but apparently from prior screening at earlier levels of the educational system. Women are shunted by their socialization and education toward fields of graduate study that are generally more crowded, less productive of completed degrees, and less well funded, and that frequently offer poorer professional employment prospects.

Sharp is guilty of exactly this (among many other things) when he is comparing fumble rates between teams in the NFL and not controlling for any other factors.

Other fun example of this include the US Navy, in the process of recruiting, claiming that it was safer to be in the US Navy than to live in NYC.  They cited the statistics that the death rate in the US Navy during the Spanish American war was only 9 out of 1000 whereas the death rate in NYC was 16 out of 1000.  I use this example in every statistics class I teach, because it’s so easy to figure out the flawed logic (people in the Navy are older than people in NYC on average; the groups aren’t comparable).

A more recent example of this exact same phenomenon, not controlling for age, was seen in Bill Barnwell’s article called Mere Mortals where Barnwell asks the provocative question:

Why is it that baseball players from the ’60s, ’70s, and ’80s are dying more frequently than football players from the same era?

It’s because those baseball players are older on average.  And old people die more often than not old people.  You can’t just compare rates without controlling for external factors.  (Not to mention that you should be using survival analysis rather than comparing death rates in studies like this!)  Details of the problems with that article can be found here.

Anyway, my point in this section is that the direct comparison of simple rates between groups often leads to incorrect conclusions.  Including the conclusions of Mr. Sharp.

There are really two separate issues here.

A quick note that there are really two separate issues in the Warren Sharp / Deflategate “stat spat”

  • One issue is did the Patriots cheat.  The answer to that question is probably yes.  The Patriots have been caught cheating in the past.  And I just assume everyone is trying to get away with as much as possible without getting caught.
  • The second issue is do the Patriots have an impossibly low fumble rate.  The answer to that question is no.  (It might be yes, but no one has shown that yes with a legitimate analysis.)
  • These two issues are likely largely unrelated.  Also, the Patriots fumble rate is just simply not a huge outlier.  (I don’t know how many times I can possible say this.)

Final thought

What really bothers me about this whole situation is that it doesn’t seem to be a series of honest mistakes. Adelson knows what real statisticians think about this.  I know because I am one and he called me and I told him.  Mike Lopez and I also laid out the details of flaws in Sharp’s original arguments and other criticism’s or Sharp’s “analysis” can be found here, here, here, and here.  In spite of all of these very legitimate criticisms, it seems that many members of the media ran with this story anyway. But what sets Adelson apart from the rest of the media, is that he is now claiming that Sharp has been redeemed by the Well’s report and still lending credence to Sharp’s fumble rate analysis.  The only rational explanation for this, in my mind, is willful ignorance of the facts in favor of an interesting narrative. And that really pisses me off.