BCS Methods Reviewed: An overview of the math and the “math” of the BCS.


Every year, undergraduate super fans follow every pass, every defensive stop, and every concussion of the season attempting to will their team to a national championship.  Alas, it is not meant to be for most.  But fear not, for the powers that be have provided an ample, perennial scape goat for each and every fan of all of the also-rans: The BCS.

Now, I have no problem with anyone blaming the BCS.  I’d even argue that part of the excitement of college football is arguing about the stupidity of the BCS system (But it’s still absurd), and, apparently, getting fans to argue is a fantastic way to make money (Lots of money, in fact, on the backs of essentially unpaid labor).

But what exactly is the BCS?  It’s some combination of humans and computers getting together in a black box and spitting out some sort of ranking system that is used to decide which schools go the the BCS bowls and which schools spent way too much money to end up in the Papa John’s Pizza Bowl.  Of course, all of this is going away for the 2014-15 season, so this is our last year of BCS “joy”.  That also means it’s my last chance to really take a look at how the BCS sausage is made.

Read Full Article

Stepwise Variable Selection

Im my regression class that I am teaching, we recently got to the topic of variable selection.  We covered the standard forward, backwards, and stepwise selection procedures, and I then went out of my way to caution them about using these procedures.  In trying to explain this to them, I believe that I have come up with a fantastic analogy for explaining this to students:

Automated variable selection techniques are like putting together the border of a puzzle:  It’s a great place to start, but you’ve still got a long way to go to be finished.  -StatsInTheWild



Paying NCAA players

What if I set up an organization that collected money for college basketball players and then paid them out when they left the NCAA.  It would work like this:

  • An account is set up for each player in the NCAA.
  • Fans can donate to a player or an entire team.
  • Once a player leaves the NCAA, I write them a check for the full balance of their account.  They can do whatever they want with the money.

Not that I would, but how could the NCAA stop me from doing this?



From this article:

However, Cukierski thinks that the use of big data has become too trendy. “The whole big data idea is really within a big hype cycle,” mainly driven by a particular software framework for dealing with information, called Hadoop. “It’s not that Hadoop isn’t useful,” Cukierski says, but when companies look to it to solve small problems, “the people who are data scientists and actually statistically literate are kind of laughing, because you don’t need Hadoop to do most problems.”



Kaggle Update


Well we’ve narrowed the gap to about 0.003, but I still don’t think we can possibly catch Grimp with one game left to go.  Out best bet for the money at this point is a disqualification for some reason.  Screen Shot 2014-04-06 at 11.54.00 AMCheers.


Cinderella Plots 2014

I introduced the Cinderella plot a few years ago, but I haven’t updated it for this year’s tournament until now.  (I’ve been busy with the job search, which was successful(!)….more to come on that….).  So without further delay, I present to you the Cinderella plot for all of the NCAA tournaments from 2002 through 2014:CinderellaPlot2001-2014


As you can see, two 5 seeds (Indiana and Butler) and even an 8 seed (Butler, again) have made it to the final game in the past 13 years.  But they all played a team ranked 3 or higher.  Never in the last 13 years, or ever for that matter, have the seeds for the teams in the finals been this high.  We’ve got a 7 seed versus an 8 seed, though I’d argue that both of these teams are better than their seed.  That’s not to say that they didn’t get a seed that they DESERVED, which is based on what has actually happened in the season.  But that’s a little bit different that projecting what they are capable of in the future.  Based on that, it’s not really THAT surprising that either of these teams is in the final game.  In fact, I had both of them ranked in the top 25 at the end of the season: Kentucky at 13 and UConn at 21.  (Even though I thought Kentucky should have been a 7 seed and UConn an 8. )


Kaggle Excercise



With eleven games to play in the NCAA basketball tournament, we (me and @statsbylopez) find our team in second place in the March Madness Machine learning contest, a mere 0.00365 behind the leader Grimp Whelkin.  So, we’re 0.00365 away from $15K.


Screen Shot 2014-03-28 at 11.37.28 AM


What’s really impressive though, is that both of our entries are doing so well.  Our best entry is currently at 0.47589 and good for second place, but our other entry is at 0.48081, which would STILL BE GOOD FOR SECOND PLACE.  (Maybe we’re on to something here?)  This is good news as we have TWO realistic ways to win this $15K.  One of our models is big on Virginia and the other is big on Michigan State.  It’s possible that our other submission actually moves ahead of our other submission and becomes our scoring submission.

Screen Shot 2014-03-28 at 11.35.18 AMI’m not sure I can handle finishing second for $15K.  I’d rather have finished 100th just so I don’t have to worry so much about this (that’s not true of course.  I’m gonna brag about this forever.)


I do hope Kaggle will release winning scenarios for the top 10 prior to the Final Four so that we can at least use that info to hedge our bets.  Though I’d guess they aren’t going to do this.




Why do I keep writing about Field Goals?

Twitterer @brentonk alerted me to the following Grantland article written by the only man in the world to block me on Twitter, Bill Barnwell.  The following excerpt is from the article (emphasis added):

It would also have some interesting effects on kicker value. In a way, it might seem like it should make kickers more valuable by virtue of giving them more opportunities to make meaningful kicks. The average team attempts about 40 extra points each year, so the difference between a kicker who hits 90 percent of his extra points from the 25-yard line and a kicker who hits 70 percent on the same attempts would be eight extra points per season. On the other hand, we also know there’s no year-to-year consistency for a kicker’s field goal percentage, and that’s likely to be the case for these 40 additional extra-point attempts each year, too. So while teams might pay more for the security of a reliable kicker, they’ll still be just as unlikely to end up with one.

I’ve written about this before, in that we shouldn’t expect there to be consistency from year to year for field goal percentage within a kicker.  It’s because field goals are taken from different distances. You can see just how much variability there is in the distances of place kickers from year to year with this sweet shiny app that I made.

Finally,  as I have written before, even if you do control for distance, I find no evidence that there is any significant variability within kickers between years.




The media hates cats and statistical rigor

Here is a paragraph from a recent article in the New York Times entitled “The Evil of the Outdoor Cat” (emphasis added):

And wildlife in this country must share this land with a growing population of about 84 million owned cats, and anywhere from 30 to 80 million feral or stray cats. When all of them do “what’s natural” in a fragmented natural world, it adds up. Using deliberately conservative assumptions, federal researchers recently estimated that free-ranging cats killed about 2.4 billion birds annually in the Lower 48 states, a substantial bite out of the total bird population. Outdoor cats also kill about 12.3 billion small mammals a year — not just the proverbial rats and mice but also chipmunks, rabbits and squirrels — and about 650 million reptiles and amphibians. In some cases, they are pushing endangered species toward extinction.

Those number are huge and look very familiar.  In fact, I believe they are from the 2013 article in Nature Communications  by Dr. Scott Loss called “The impact of free-ranging domestic cats on wildlife of the United States“, which I reviewed and basically concluded it was crap statistically meaningless.  I don’t know why the media insists on printing and reprinting these absolutely meaningless numbers.  Is there a hidden bird agenda in the main stream media?



Kaggle Round of 32


Last night I posted our picks within the distribution of all picks in the kaggle competition.  I’ve now updated that for the third (nee second) round.

If you’re rooting for us, this chart will help you decide who to root for.  If you’re rooting against us, this chart will also help you decide you to root for.  Either way, it’s useful.



Back from Vegas / Kaggle

I got back from Vegas a few hours ago.  If you’ve never been to Vegas during the NCAA tournament, you’re going to want to do that.  There is nothing in the world like a room full of people watching a 15 point game with 90 seconds left screaming at the giant screen for the team that is winning to heave up some threes.  Absolutely incredible.  And watching an upset is even better.  The first two games I saw were Dayton and Harvard winning.  People were going crazy.  I won a few bucks, but nothing incredible.  I did cash twice in two tournaments, hit 8 out of 9 (for $0) on a parlay card, and watched my wife hit 200-1 on Sigma Derby.

But the really exciting news from the last few days is that my kaggle team (me and @statsbylopez) is in the top 10 out of 254 teams.  The complete standings are here.

Yesterday, William Cukierski posted the distributions of the predictions for the first rounds games.  I’ve highlighted in red where our first round predictions fall.  The most important ones so far are the Duke game, which we were very confident in (.93) and lost, which is a big penalty, and the Dayton game, where we had Dayton at .66 to win their game over Ohio State, which is notably different than many of the other teams.  Hopefully the Duke game doesn’t hurt us too much, and we can stay in the top ten.




Get every new post delivered to your Inbox.

Join 1,970 other followers