Stats in the NFL: Part 1 (in the wild)
So the Jets blew it. They were 8-3 and they finished the season 1-4 to miss the playoffs at 9-7.
I was reading this article on the Freakonomics blog.
Passing yards: 1,011
Passer rating: 55.4
This is terrible. Make the guy retire. He WAS great. WAS.
So anyway, the Jets probably should have made the play-offs, but they just lost too many games. The Patriots, on the other hand, probably should have just made the play-offs. I know there are rules and criteria, but do the Cardinals (who got massacred by the patriots) or the Chargers (who started the season 4-8) really deserve to be in the play-offs over the Patriots?
Probably not, but the rules are what they are. There just aren’t any good teams (even mediocre teams) in either the AFC or NFC west this year.
This got me thinking. Don’t they always talk about parity in the NFL? How many commentators ever week do you hear throwing around parity this and parity that. Well where was the parity this year? Is there less parity in the league now than in past years? How can we measure parity?
Let’s start with what a good measure of parity would be. If there were perfect parity in the league every team would finish 8-8. This would be like flipping a coin to decide every game. (Not exactly the most compelling sports league.) The opposite (teams deviate from a record or 8-8), a lack of parity, can be thought of as entropy. Most notably over the last two season we have had a 16 win team and a 0 win team. Clearly, these teams were significantly better and worse, respectively, than the other teams in the league. So how can we measure parity. Lets put all 32 teams into a 32 by 2 contingency table. 32 rows, one for reach team and 2 columns, on for wins and one for losses. (This leads to fixed row and fixed column totals.)
We wish to test the null hypothesis that there is no association between the rows and columns. (ie the team you play for has nothing to do with the number of wins you get). Clearly this is not true and we will always reject the null hypothesis of no association, but we can compare by how much we reject the null hypothesis, namely the p-value. The smaller the p-value, the more parity in the league. While this is not a perfect measure, its and interesting start.
Since we have fixed row and column totals, normally this would lead to using Fisher’s exact test. However, with a 32 by 2 table this is computationally very intensive. Thus as an alternative we can use the Pearson Chi-Square test statistics and the G-squared test statistics. Here I report both statistics. I am going to rely more heavily on the G-squared statistics because of its close relationship with entropy.
95.62900 1.640269e-08 2008 32 7.95362
97.11005 9.714734e-09 2007 32 8.138756
69.63735 8.525854e-05 2006 32 4.704668
94.40966 2.518507e-08 2005 32 7.801207
79.79948 3.535543e-06 2004 32 5.974935
76.59059 9.912913e-06 2003 32 5.573824
56.98146 3.010793e-03 2002 32 3.122683
86.44988 2.244115e-07 2001 31 7.042142
79.80431 2.107774e-06 2000 31 6.198153
71.86353 2.721014e-05 1999 31 5.189674
(Does anyone know how to make nice tables in wordpress blogs?)
This jumbled mess summarizes the results of the G-squared test. Columns one and two are the G-squared test statistic and the respective p-value and column three is the year in which this regular season took place. (Note that the 2009 Super Bowl would correspond to the 2008 regular season.) Column four is the number of teams in the league that year and the last column is the number of standard deviations above the mean that the test statistic is. (p-value would be the best way to compare seasons, but the scale of p-value is difficult to visualize, so the graphs use standard deviation. Also the test statistics cannot be directly compared to one another because they have distributions with differing degrees of freedom.)
Over the past ten years, using the p-values of the G-squared statistic the years with the most entropy were 2005, 2007, and 2008. The years with the most parity were 2002, 1999, and 2006.
Time for some pictures.
The first graph is a plot of NFL season versus what I am calling entropy (Number of standard deviations above the mean of the distribution of the test statistic.) I have also labeled each year with the super bowl champion and the number of wins they had in the regular season. Notice that over the last four years we observe the three highest amounts of entropy.
Further note 2002. Lets take a closer look at this year. There were no teams with 13 wins and every team had at least two wins. This is compared to 2007 when there were FOUR teams with at least THIRTEEN wins (Dallas, Green Bay, Indianapolis, and undefeated New England) and one team (Miami) team with only one win. The histograms of wins in the 2002 and 2007 seasons are below.
Look how tightly bunched the 2002 teams are in the middle and compare that with the 2007 season.
One last picture. The histogram of the 2008 NFL season.
According to my measure of entropy, the level of parity in the NFL over three of the last 4 seasons has been very low. There does not, however, appear to be any upward trend in the amount of parity; rather, it seems as if the level of parity in the league varies trendlessly from year to year.
2002 was a season with unusually high parity with many teams finishing with similar records.
One final thought: I would argue that parity is bad for the league. When there is no standout team, it is difficult to market exciting games. Isn’t it more compelling to watch a play-off game featuring teams who absolutely dominated the regular season (Think Green Bay, Dallas, New England, and Indianapolis from 2007), than a slug fest of mediocrity between two teams that made it into the play-offs by default (I’m looking at you Arizona and San Diego). When parity is high, everyone is mediocre, but someone has to win by default. When there is high entropy good teams exist. I’ll take the latter any day of the week.