BCS Methods Reviewed: An overview of the math and the “math” of the BCS.


Every year, undergraduate super fans follow every pass, every defensive stop, and every concussion of the season attempting to will their team to a national championship.  Alas, it is not meant to be for most.  But fear not, for the powers that be have provided an ample, perennial scape goat for each and every fan of all of the also-rans: The BCS.

Now, I have no problem with anyone blaming the BCS.  I’d even argue that part of the excitement of college football is arguing about the stupidity of the BCS system (But it’s still absurd), and, apparently, getting fans to argue is a fantastic way to make money (Lots of money, in fact, on the backs of essentially unpaid labor).

But what exactly is the BCS?  It’s some combination of humans and computers getting together in a black box and spitting out some sort of ranking system that is used to decide which schools go the the BCS bowls and which schools spent way too much money to end up in the Papa John’s Pizza Bowl.  Of course, all of this is going away for the 2014-15 season, so this is our last year of BCS “joy”.  That also means it’s my last chance to really take a look at how the BCS sausage is made.

Read Full Article

So we entered a Kaggle contest…


…and we won.

Originally posted on StatsbyLopez:

The website Kaggle sponsored a  March Machine Learning Mania  contest over the last few months, which involved picking probabilities for every hypothetical NCAA 2014 tournament game.

Points were awarded, or taken away, given how well each submissions’ probabilities fared, relative to everyone else in the pool (for my statistics-oriented readers, it used the loss function from logistic regression). So, if you picked Florida over Albany with probability 0.80, while Florida won, you would’ve lost ground because the majority of entries had the Gators winning with roughly a 95% probability. Meanwhile, if you picked Ohio State over Dayton with probability 0.55, the Dayton win would’ve helped your entry out, with most folks having OSU as a roughly 80% favorite.

Kaggle was kind enough to release histograms of the picks for all submissions, like these ones for the Elite 8 games here (alongside our eventual submissions):


For example, in the top left, the average…

View original 352 more words

Job news

Big news!  I have accepted an assistant professor position in the Department of Mathematics at Loyola University Chicago.  Everything is now official and me, my wife, and the dogs will be moving to Chicago in August.



Screen Shot 2014-04-21 at 8.38.18 AM


I’ll be there on Friday.  Come check out my art!


Stepwise Variable Selection

Im my regression class that I am teaching, we recently got to the topic of variable selection.  We covered the standard forward, backwards, and stepwise selection procedures, and I then went out of my way to caution them about using these procedures.  In trying to explain this to them, I believe that I have come up with a fantastic analogy for explaining this to students:

Automated variable selection techniques are like putting together the border of a puzzle:  It’s a great place to start, but you’ve still got a long way to go to be finished.  -StatsInTheWild



Paying NCAA players

What if I set up an organization that collected money for college basketball players and then paid them out when they left the NCAA.  It would work like this:

  • An account is set up for each player in the NCAA.
  • Fans can donate to a player or an entire team.
  • Once a player leaves the NCAA, I write them a check for the full balance of their account.  They can do whatever they want with the money.

Not that I would, but how could the NCAA stop me from doing this?



From this article:

However, Cukierski thinks that the use of big data has become too trendy. “The whole big data idea is really within a big hype cycle,” mainly driven by a particular software framework for dealing with information, called Hadoop. “It’s not that Hadoop isn’t useful,” Cukierski says, but when companies look to it to solve small problems, “the people who are data scientists and actually statistically literate are kind of laughing, because you don’t need Hadoop to do most problems.”



Kaggle Update


Well we’ve narrowed the gap to about 0.003, but I still don’t think we can possibly catch Grimp with one game left to go.  Out best bet for the money at this point is a disqualification for some reason.  Screen Shot 2014-04-06 at 11.54.00 AMCheers.


Cinderella Plots 2014

I introduced the Cinderella plot a few years ago, but I haven’t updated it for this year’s tournament until now.  (I’ve been busy with the job search, which was successful(!)….more to come on that….).  So without further delay, I present to you the Cinderella plot for all of the NCAA tournaments from 2002 through 2014:CinderellaPlot2001-2014


As you can see, two 5 seeds (Indiana and Butler) and even an 8 seed (Butler, again) have made it to the final game in the past 13 years.  But they all played a team ranked 3 or higher.  Never in the last 13 years, or ever for that matter, have the seeds for the teams in the finals been this high.  We’ve got a 7 seed versus an 8 seed, though I’d argue that both of these teams are better than their seed.  That’s not to say that they didn’t get a seed that they DESERVED, which is based on what has actually happened in the season.  But that’s a little bit different that projecting what they are capable of in the future.  Based on that, it’s not really THAT surprising that either of these teams is in the final game.  In fact, I had both of them ranked in the top 25 at the end of the season: Kentucky at 13 and UConn at 21.  (Even though I thought Kentucky should have been a 7 seed and UConn an 8. )


Kaggle Excercise



With eleven games to play in the NCAA basketball tournament, we (me and @statsbylopez) find our team in second place in the March Madness Machine learning contest, a mere 0.00365 behind the leader Grimp Whelkin.  So, we’re 0.00365 away from $15K.


Screen Shot 2014-03-28 at 11.37.28 AM


What’s really impressive though, is that both of our entries are doing so well.  Our best entry is currently at 0.47589 and good for second place, but our other entry is at 0.48081, which would STILL BE GOOD FOR SECOND PLACE.  (Maybe we’re on to something here?)  This is good news as we have TWO realistic ways to win this $15K.  One of our models is big on Virginia and the other is big on Michigan State.  It’s possible that our other submission actually moves ahead of our other submission and becomes our scoring submission.

Screen Shot 2014-03-28 at 11.35.18 AMI’m not sure I can handle finishing second for $15K.  I’d rather have finished 100th just so I don’t have to worry so much about this (that’s not true of course.  I’m gonna brag about this forever.)


I do hope Kaggle will release winning scenarios for the top 10 prior to the Final Four so that we can at least use that info to hedge our bets.  Though I’d guess they aren’t going to do this.




Why do I keep writing about Field Goals?

Twitterer @brentonk alerted me to the following Grantland article written by the only man in the world to block me on Twitter, Bill Barnwell.  The following excerpt is from the article (emphasis added):

It would also have some interesting effects on kicker value. In a way, it might seem like it should make kickers more valuable by virtue of giving them more opportunities to make meaningful kicks. The average team attempts about 40 extra points each year, so the difference between a kicker who hits 90 percent of his extra points from the 25-yard line and a kicker who hits 70 percent on the same attempts would be eight extra points per season. On the other hand, we also know there’s no year-to-year consistency for a kicker’s field goal percentage, and that’s likely to be the case for these 40 additional extra-point attempts each year, too. So while teams might pay more for the security of a reliable kicker, they’ll still be just as unlikely to end up with one.

I’ve written about this before, in that we shouldn’t expect there to be consistency from year to year for field goal percentage within a kicker.  It’s because field goals are taken from different distances. You can see just how much variability there is in the distances of place kickers from year to year with this sweet shiny app that I made.

Finally,  as I have written before, even if you do control for distance, I find no evidence that there is any significant variability within kickers between years.





Get every new post delivered to your Inbox.

Join 1,984 other followers