Category Archives: Sports
NCAA Basketball top 25 (in the wild)
(Bold Indicates Sweet Sixteen team.)
StatsInTheWild top 25:
1. Kansas
2. Kentucky
3. Syracuse
4. West Virginia
5. Villanova
6. New Mexico
7. Duke
8. Temple
9. Kansas State
10. Georgetown
11. Baylor
12. Pittsburgh
13. Purdue
14. Texas A&M
15. Tennessee
16. Texas
17. Butler
18. Vanderbilt
19. Ohio State
20. Marquette
21. Maryland
22. Richmond
23. Northern Iowa
24. Xavier
25. BYU
Inexplicable regular season losses:
Penn beat Cornell 79-64
Indiana beat Pittsburgh 74-64
Brown beat Princeton 57-54
Evansville beat Northern Iowa 55-54
Cheers.
2 Point Conversions (in the wild)
This past weekend Navy attempted a 2 point conversion in their game against Ohio State. The conversion failed, and they ultimately lost the game. Clearly, in that situation, the 2 point conversion was necessary, but what about other situations that are more complicated? Luckily, someone else has already thought about this.
So, in the spirit of Navy attempting a two point conversion in their game against Ohio State, the Stats in the Wild blog presents one of it’s favorite articles: Refining the Point(s)-After-Touchdown Decision by Harold Sackrowitz
Cheers.
Unassisted Triple Plays (in the wild)
Eric Bruntlett’s unassisted triple play ended the Phillies-Mets game last night. It was the 15th unassisted triple play in MLB history.
From 1909-1927, there were 7 unassisted triple plays, and, since 1992, there have been 7 more unassisted triple plays. The striking thing here is that from 1927-1992, 64 seasons, there was only one unassisted triple play. At the beginning of the century there were 7 in 19 years, about 1 every 2.71 year. In the most recent 18 years, there have been 7 for a rate of 1 ever 2.57 years. Then for the entire middle of the 20th century there was one in 64 seasons.
Here is a graph of index number of triple play versus the year in which it took place:

Does anyone have any explanation as to how this could happen? All I can think of is that it might be possible that hit and runs were more popular during the periods of high triple play prevalence, making it easier to turn one. If anyone has any thoughts on this I would love to hear them.
Cheers.
Chernoff faces (in the wild)
I took a multivariate class several years ago and towards the end of the semester the professor showed us Chernoff faces. I was thinking about them for some reason tonight, and so I figured I’d do a search for Chernoff faces on the internet. Here is an interesting application of Chernoff faces to MLB managers. This got me excited so I did a google search for “R faces” hoping to find an R package for Chernoff faces, however, this search yielded this web site. The headline on that web site says: “French rapper Monsieur R faces up to three years in prison and a 75,000-euro fine for referring to France as a ‘slut’ and a ‘bitch’ and saying ‘I piss on Napoleon and General de Gaulle’ on his latest album.” Not quite what I was looking for, but completely fantastic. God Bless the internet.
After another quick search, I downloaded this R Package. And now I’ve spent all night “Chern”-ing out Chernoff Faces.
Here is one for a few selected MLB hitters:

Here is one for a few selected MLB pitchers:

And here is my favorite about the economy (Data is from here):

I especially like how the Chernoff faces get smaller and actually appear to get sadder as the economy worsens. I guess it’s so bad that even Chernoff’s faces are feeling the recession.
Cheers.
NCAA Sweet 16 picks in the wild.
After re-evaluating, here are my picks after the first two round of the NCAA tournament:
Elite 8: Uconn, Pitt, Memphis, Villanova, Louisville, Oklahoma, Michigan State, UNC
My final four is the same:Memphis, Louisville, Pitt, and UNC
Finals: Memphis versus Pitt
Winner: Memphis
Good picks for the sweet 16 games: (Winner are in Bold, losers are in italics.)
Villanova +120 (I have them as a favorite in this game).
Michigan State -1.5 at -110
Gonzaga +350 (They’ll probably lose, but this price is fantastic.)
Pitt -320 (Added 3/26/2009 11:28 am)
Results: 3-1. A bet of “100” on each of these games yields a profit of 142.16 for a 35.54% return.
For the tournament I am 8-5 with a 19.82% return per bet.
Check out my results from the first round.
NIT predictions:
San Diego State beats St. Mary’s Tonight.
Notre Dame Beats Kentucky.
Then, San Diego State beats Baylor to go to the Finals, and Penn State beats Notre Dame for their spot in the finals.
And I’m switching to Penn State as my pick for the NIT champion.
Good luck.
And.
Cheers.
Gambling on Sports in the Wild
Of course, we’re not gambling with American dollars. That would be illegal, so we use “standard betting units” here. (1 “standard betting unit” is approximately 1 American dollar. But it is definitely NOT an American dollar.)
From my previous post before the first round started. Winners are in bold, losers are in italics:
Good first round bets:
Washington -220
FSU -145
Utah -110
Illinois -200
Arizona St -200
Michigan +190 (This price is fantastic)
Texas A and M +115
Oklahoma St +115 (Seriously? They’re an underdog here?)
Ohio St -160
If you bet 100 “standard betting units” per game, you would have went 5-4 and be up 115.45 “standard betting units” at this point. That is a 12.8% return per bet. Not bad.
Cheers.
NCAA basketball in the wild
It’s that time of year again where I care about college basketball for three weeks and then forget it even exists.
I have collected data on all of the games played so far (through last nights 6OT thriller) and I use some standard modeling techniques to come up with a ranking. Last year I accurately predicted all four of the final 4 (but so did everyone in America), both teams in the finals, and the champion. I also went 29 of 32 in the first round and my bracket finished in the top 1% of all brackets on Yahoo. (Of course all that means is that I will have a dismal year this year.)
Right now (2:17 pm EDT Friday 3/13/2009) here are my thoughts on the NCAA tournament (These will change as more games come in):
Go ahead and compare my predictions with ESPN’s, CBS’s, and Sports Illustrated’s picks (Of course some of those predicted brackets are old).
Try to bear in mind that I am not trying to predict what the committee will do, I am trying to predict what the committee should do. I don’t really think the selection committee will seed West Virginia as a four. And I have a hunch that Penn State will actually get in, but I’ll definitely pick them to lose first round.
The 4 number 1 seed should be: North Carolina, Pittsburgh, Memphis, and UConn
With the 2 seeds going to Duke, Louisville, Michigan St., and Wake Forest
Three seeds: Washington, Gonzaga, Oklahoma, Syracuse
Four Seeds: Villanova, Missouri, UCLA, West Virginia
Last four teams in: Maryland, Auburn, Boston College, Tennessee
First four teams out: Florida, Virgina Tech, New Mexico, Providence
Next four teams out:USC, Northwestern, Penn St, Kentucky
Conference Breakdown:
ACC – 7
Big East – 7
Big Ten – 7
Big 12 – 6
Pac 10 – 5
SEC – 3
Mountain West – 3
Horizon League – 2
Over-rated teams that will get in: LSU and Butler
Under-rated teams that will get in: West Virginia, Illinois, Utah, BYU and Texas
Once the dust settles, I’ll re-evaluate and post my picks on Tuesday or Wednesday of next week.
Cheers.
Lou Pinella in the Wild
I was reading some past “Stats of the Day” over at baseball reference and I stumbled onto this factoid:
Lou Pinella is the only player (1956-2008) to have more than 100 career home runs without a single multi-homer game.
The entire post can be found here.
Cheers.
Stats in the NFL: Part 2 (in the wild)
In a previous post, I talked about parity in the NFL (or lack of parity) between teams. What about parity between divisions? What we want to do is test the null hypothesis that all divisions are the same (ie will end the season with the same number of wins.) To do this we can use the Pearson Chi-square test statistic.
In the table below, I have the year in the first column, the value of the test statistic in the second column, and the p-value in the third column. In all of the years, we never end up rejecting the null hypothesis of even strength between the divisions at the commonly used alpha=.05 level. However, we can use the p-values to look at relative parity among the divisions. A small p-value (in the case corresponds to a large test statistic) indicates that there is not parity among the divisions.
year v p
[1,] 2008 12.765625 0.07802902
[2,] 2007 8.937500 0.25717451
[3,] 2006 1.250000 0.98972973
[4,] 2005 2.437500 0.93172939
[5,] 2004 3.312500 0.85466795
[6,] 2003 1.062500 0.99375635
[7,] 2002 4.078125 0.77073626
Lets take a look at some of the interesting years.
In 2003, the win totals of the divisions was 36, 29, 34, 31, 31, 31, 31, 33. The first four are AFC East, North South, and West respectively. The last four are the NFC in the same order. The best division in the league only won 7 more games than the worst division.
Lets compare that with 2008. The wins totals by division were: 38, 31.5, 38, 23, 38.5, 25, 40, 22 (again in the same order as before.) The difference between the best division and worst division here is 12 games. And look at the miserable AFC and NFC west. 23 and 22 wins respectively. That should win those divisions a collective award for futility. Since 2002, (when the NFL moves to 8 divisions of 4 teams) the fewest wins for a division was 25. Two divisions managed to break that record in ONE season. And A third division tied that record.
Lets put into perspective how bad the AFC and NFC west were with year. The Detroit Lions went 0-16. Zero wins. And their division managed 25 wins this season. 2 more than the AFC west and three more than the atrocious NFC west. The west divisions are terrible.
That is why it makes me so angry that both the Chargers and the Cardinals won play-off games this past weekend. Will some one remind the chargers they started the season 4 – 8 and are champions of the woeful AFC west. And will someone remind the cardinals that they are the Cardinals.
Not convinced the NFC west sucks? How about this. The net points for the teams in the NFC west this season were:
Rams: -233
Seahawks: -98
49ers: -42
and Cardinals: (drumroll please……………….) 1.
Let me repeat that. The Arizona Cardinals, winners of an actual NFL division and winners of a play-off game this year, outscored their opponents in the regular season 427 to 426.
Actually upon further inspection, in 2006, two teams made the play-offs with negative net points. The Giants were outscored by their opponents by 7 points and the Seahawks won the NFC west with a net total of -6 points on the season. In fact every team in the NFC west that year had negative net points. That is fairly impressive.
So congratulations to the Arizona Cardinals and the San Diego Chargers. Champions of futility. And also play-off winners.
Cheers.
Stats in the NFL: Part 1 (in the wild)
So the Jets blew it. They were 8-3 and they finished the season 1-4 to miss the playoffs at 9-7.
I was reading this article on the Freakonomics blog.
Attempts/completions: 175/98
Passing yards: 1,011
Touchdowns: 2
Interceptions: 9
Sacks: 9
Passer rating: 55.4
This is terrible. Make the guy retire. He WAS great. WAS.
So anyway, the Jets probably should have made the play-offs, but they just lost too many games. The Patriots, on the other hand, probably should have just made the play-offs. I know there are rules and criteria, but do the Cardinals (who got massacred by the patriots) or the Chargers (who started the season 4-8) really deserve to be in the play-offs over the Patriots?
Probably not, but the rules are what they are. There just aren’t any good teams (even mediocre teams) in either the AFC or NFC west this year.
This got me thinking. Don’t they always talk about parity in the NFL? How many commentators ever week do you hear throwing around parity this and parity that. Well where was the parity this year? Is there less parity in the league now than in past years? How can we measure parity?
Let’s start with what a good measure of parity would be. If there were perfect parity in the league every team would finish 8-8. This would be like flipping a coin to decide every game. (Not exactly the most compelling sports league.) The opposite (teams deviate from a record or 8-8), a lack of parity, can be thought of as entropy. Most notably over the last two season we have had a 16 win team and a 0 win team. Clearly, these teams were significantly better and worse, respectively, than the other teams in the league. So how can we measure parity. Lets put all 32 teams into a 32 by 2 contingency table. 32 rows, one for reach team and 2 columns, on for wins and one for losses. (This leads to fixed row and fixed column totals.)
We wish to test the null hypothesis that there is no association between the rows and columns. (ie the team you play for has nothing to do with the number of wins you get). Clearly this is not true and we will always reject the null hypothesis of no association, but we can compare by how much we reject the null hypothesis, namely the p-value. The smaller the p-value, the more parity in the league. While this is not a perfect measure, its and interesting start.
Since we have fixed row and column totals, normally this would lead to using Fisher’s exact test. However, with a 32 by 2 table this is computationally very intensive. Thus as an alternative we can use the Pearson Chi-Square test statistics and the G-squared test statistics. Here I report both statistics. I am going to rely more heavily on the G-squared statistics because of its close relationship with entropy.
95.62900 1.640269e-08 2008 32 7.95362
97.11005 9.714734e-09 2007 32 8.138756
69.63735 8.525854e-05 2006 32 4.704668
94.40966 2.518507e-08 2005 32 7.801207
79.79948 3.535543e-06 2004 32 5.974935
76.59059 9.912913e-06 2003 32 5.573824
56.98146 3.010793e-03 2002 32 3.122683
86.44988 2.244115e-07 2001 31 7.042142
79.80431 2.107774e-06 2000 31 6.198153
71.86353 2.721014e-05 1999 31 5.189674
(Does anyone know how to make nice tables in wordpress blogs?)
This jumbled mess summarizes the results of the G-squared test. Columns one and two are the G-squared test statistic and the respective p-value and column three is the year in which this regular season took place. (Note that the 2009 Super Bowl would correspond to the 2008 regular season.) Column four is the number of teams in the league that year and the last column is the number of standard deviations above the mean that the test statistic is. (p-value would be the best way to compare seasons, but the scale of p-value is difficult to visualize, so the graphs use standard deviation. Also the test statistics cannot be directly compared to one another because they have distributions with differing degrees of freedom.)
Over the past ten years, using the p-values of the G-squared statistic the years with the most entropy were 2005, 2007, and 2008. The years with the most parity were 2002, 1999, and 2006.
Time for some pictures.
The first graph is a plot of NFL season versus what I am calling entropy (Number of standard deviations above the mean of the distribution of the test statistic.) I have also labeled each year with the super bowl champion and the number of wins they had in the regular season. Notice that over the last four years we observe the three highest amounts of entropy.

Further note 2002. Lets take a closer look at this year. There were no teams with 13 wins and every team had at least two wins. This is compared to 2007 when there were FOUR teams with at least THIRTEEN wins (Dallas, Green Bay, Indianapolis, and undefeated New England) and one team (Miami) team with only one win. The histograms of wins in the 2002 and 2007 seasons are below.


Look how tightly bunched the 2002 teams are in the middle and compare that with the 2007 season.
One last picture. The histogram of the 2008 NFL season.

Conslusions:
According to my measure of entropy, the level of parity in the NFL over three of the last 4 seasons has been very low. There does not, however, appear to be any upward trend in the amount of parity; rather, it seems as if the level of parity in the league varies trendlessly from year to year.
2002 was a season with unusually high parity with many teams finishing with similar records.
One final thought: I would argue that parity is bad for the league. When there is no standout team, it is difficult to market exciting games. Isn’t it more compelling to watch a play-off game featuring teams who absolutely dominated the regular season (Think Green Bay, Dallas, New England, and Indianapolis from 2007), than a slug fest of mediocrity between two teams that made it into the play-offs by default (I’m looking at you Arizona and San Diego). When parity is high, everyone is mediocre, but someone has to win by default. When there is high entropy good teams exist. I’ll take the latter any day of the week.