Category Archives: Uncategorized

Stepwise Variable Selection

Im my regression class that I am teaching, we recently got to the topic of variable selection. We covered the standard forward, backwards, and stepwise selection procedures, and I then went out of my way to caution them about using these procedures. In trying to explain this to them, I believe that I have come up with a fantastic analogy for explaining this to students:

Automated variable selection techniques are like putting together the border of a puzzle: It’s a great place to start, but you’ve still got a long way to go to be finished. -StatsInTheWild

Cheers.

Posted in Uncategorized

Leave a comment

Paying NCAA players

Apr 15

Posted by statsinthewild

What if I set up an organization that collected money for college basketball players and then paid them out when they left the NCAA. It would work like this:

An account is set up for each player in the NCAA.
Fans can donate to a player or an entire team.
Once a player leaves the NCAA, I write them a check for the full balance of their account. They can do whatever they want with the money.

Not that I would, but how could the NCAA stop me from doing this?

Cheers.

Posted in Uncategorized

1 Comment

Hadoop

Apr 14

Posted by statsinthewild

From this article:

However, Cukierski thinks that the use of big data has become too trendy. “The whole big data idea is really within a big hype cycle,” mainly driven by a particular software framework for dealing with information, called Hadoop. “It’s not that Hadoop isn’t useful,” Cukierski says, but when companies look to it to solve small problems, “the people who are data scientists and actually statistically literate are kind of laughing, because you don’t need Hadoop to do most problems.”

Cheers.

Posted in Uncategorized

Leave a comment

Kaggle Update

Apr 6

Posted by statsinthewild

Well we’ve narrowed the gap to about 0.003, but I still don’t think we can possibly catch Grimp with one game left to go. Out best bet for the money at this point is a disqualification for some reason. Cheers.

Posted in Uncategorized

3 Comments

Cinderella Plots 2014

Apr 6

Posted by statsinthewild

I introduced the Cinderella plot a few years ago, but I haven’t updated it for this year’s tournament until now. (I’ve been busy with the job search, which was successful(!)….more to come on that….). So without further delay, I present to you the Cinderella plot for all of the NCAA tournaments from 2002 through 2014:

As you can see, two 5 seeds (Indiana and Butler) and even an 8 seed (Butler, again) have made it to the final game in the past 13 years. But they all played a team ranked 3 or higher. Never in the last 13 years, or ever for that matter, have the seeds for the teams in the finals been this high. We’ve got a 7 seed versus an 8 seed, though I’d argue that both of these teams are better than their seed. That’s not to say that they didn’t get a seed that they DESERVED, which is based on what has actually happened in the season. But that’s a little bit different that projecting what they are capable of in the future. Based on that, it’s not really THAT surprising that either of these teams is in the final game. In fact, I had both of them ranked in the top 25 at the end of the season: Kentucky at 13 and UConn at 21. (Even though I thought Kentucky should have been a 7 seed and UConn an 8. )

Cheers.

Posted in Uncategorized

Leave a comment

Kaggle Excercise

Mar 28

Posted by statsinthewild

With eleven games to play in the NCAA basketball tournament, we (me and @statsbylopez) find our team in second place in the March Madness Machine learning contest, a mere 0.00365 behind the leader Grimp Whelkin. So, we’re 0.00365 away from $15K.

What’s really impressive though, is that both of our entries are doing so well. Our best entry is currently at 0.47589 and good for second place, but our other entry is at 0.48081, which would STILL BE GOOD FOR SECOND PLACE. (Maybe we’re on to something here?) This is good news as we have TWO realistic ways to win this $15K. One of our models is big on Virginia and the other is big on Michigan State. It’s possible that our other submission actually moves ahead of our other submission and becomes our scoring submission.

I’m not sure I can handle finishing second for $15K. I’d rather have finished 100th just so I don’t have to worry so much about this (that’s not true of course. I’m gonna brag about this forever.)

I do hope Kaggle will release winning scenarios for the top 10 prior to the Final Four so that we can at least use that info to hedge our bets. Though I’d guess they aren’t going to do this.

Cheers.

Posted in Uncategorized

1 Comment

Why do I keep writing about Field Goals?

Mar 28

Posted by statsinthewild

Twitterer @brentonk alerted me to the following Grantland article written by the only man in the world to block me on Twitter, Bill Barnwell. The following excerpt is from the article (emphasis added):

It would also have some interesting effects on kicker value. In a way, it might seem like it should make kickers more valuable by virtue of giving them more opportunities to make meaningful kicks. The average team attempts about 40 extra points each year, so the difference between a kicker who hits 90 percent of his extra points from the 25-yard line and a kicker who hits 70 percent on the same attempts would be eight extra points per season. On the other hand, we also know there’s no year-to-year consistency for a kicker’s field goal percentage, and that’s likely to be the case for these 40 additional extra-point attempts each year, too. So while teams might pay more for the security of a reliable kicker, they’ll still be just as unlikely to end up with one.

I’ve written about this before, in that we shouldn’t expect there to be consistency from year to year for field goal percentage within a kicker. It’s because field goals are taken from different distances. You can see just how much variability there is in the distances of place kickers from year to year with this sweet shiny app that I made.

Finally, as I have written before, even if you do control for distance, I find no evidence that there is any significant variability within kickers between years.

Cheers.

Posted in Uncategorized

Leave a comment

The media hates cats and statistical rigor

Mar 22

Posted by statsinthewild

Here is a paragraph from a recent article in the New York Times entitled “The Evil of the Outdoor Cat” (emphasis added):

And wildlife in this country must share this land with a growing population of about 84 million owned cats, and anywhere from 30 to 80 million feral or stray cats. When all of them do “what’s natural” in a fragmented natural world, it adds up. Using deliberately conservative assumptions, federal researchers recently estimated that free-ranging cats killed about 2.4 billion birds annually in the Lower 48 states, a substantial bite out of the total bird population. Outdoor cats also kill about 12.3 billion small mammals a year — not just the proverbial rats and mice but also chipmunks, rabbits and squirrels — and about 650 million reptiles and amphibians. In some cases, they are pushing endangered species toward extinction.

Those number are huge and look very familiar. In fact, I believe they are from the 2013 article in Nature Communications by Dr. Scott Loss called “The impact of free-ranging domestic cats on wildlife of the United States“, which I reviewed and basically concluded it was ~~crap~~ statistically meaningless. I don’t know why the media insists on printing and reprinting these absolutely meaningless numbers. Is there a hidden bird agenda in the main stream media?

Cheers.

Posted in Uncategorized

5 Comments

Kaggle Round of 32

Mar 22

Posted by statsinthewild

Last night I posted our picks within the distribution of all picks in the kaggle competition. I’ve now updated that for the third (nee second) round.

If you’re rooting for us, this chart will help you decide who to root for. If you’re rooting against us, this chart will also help you decide you to root for. Either way, it’s useful.

Cheers.

Posted in Uncategorized

Leave a comment

Back from Vegas / Kaggle

Mar 21

Posted by statsinthewild

I got back from Vegas a few hours ago. If you’ve never been to Vegas during the NCAA tournament, you’re going to want to do that. There is nothing in the world like a room full of people watching a 15 point game with 90 seconds left screaming at the giant screen for the team that is winning to heave up some threes. Absolutely incredible. And watching an upset is even better. The first two games I saw were Dayton and Harvard winning. People were going crazy. I won a few bucks, but nothing incredible. I did cash twice in two tournaments, hit 8 out of 9 (for $0) on a parlay card, and watched my wife hit 200-1 on Sigma Derby.

But the really exciting news from the last few days is that my kaggle team (me and @statsbylopez) is in the top 10 out of 254 teams. The complete standings are here.

Yesterday, William Cukierski posted the distributions of the predictions for the first rounds games. I’ve highlighted in red where our first round predictions fall. The most important ones so far are the Duke game, which we were very confident in (.93) and lost, which is a big penalty, and the Dayton game, where we had Dayton at .66 to win their game over Ohio State, which is notably different than many of the other teams. Hopefully the Duke game doesn’t hurt us too much, and we can stay in the top ten.

Cheers.

Posted in Uncategorized

4 Comments

Stats in the Wild

Category Archives: Uncategorized

Stepwise Variable Selection

Paying NCAA players

Hadoop

Kaggle Update

Cinderella Plots 2014

Kaggle Excercise

Why do I keep writing about Field Goals?

The media hates cats and statistical rigor

Kaggle Round of 32

Back from Vegas / Kaggle

Blogroll

Comedy

Data Art

Data Viz

Jobs

R

Tag Cloud