New England Symposium on Statistics in Sports (NESSIS) 2013
I attended the New England Symposium of Statistics in Sports (NESSIS) last Saturday at Harvard Science Center (See the sweet logo below) where I presented a poster. The conference was organized by Mark Glickman and
Scott Evans Scott Evans
My poster (see below) was about openWAR, which is a project I am working on with Ben Baumer and Shane Jensen. Our goal is to create a completely open source version of wins above replacement (WAR) based entirely on publicly available data. We’ve implemented openWAR in R and the package is currently available on github here: openWAR. When we think it’s ready for primetime, we’ll be putting in on CRAN.
I missed the first featured session because it was at 9:30am, and that’s not how I roll on Satudays. During the parallel sessions at 11:30am, I decided to attend the non-NBA series of talks. The first talk was by Robert Carver and he talked about R.A. Dickey and the curveball. He was followed by Stephanie Kovalchik who gave an interesting talk about trends in tennis intensity. She had a lot of really interesting data visualizations of tennis trends over the past few decades, but I can’t seem to find them online. If anyone knows where I can find there, please point me in the right direction. After her, Dennis Lock gave a talk about using random forests to estimate win probability. At the end of the day I was trying to explain random forests to someone from ESPN (how awesome is that sentence), and I knew that random forests were essentially regression trees based on bootstrapped samples. When I went to look this up to make sure I wasn’t lying about random forests, I found out that at each step the set of predictors in the regression tree is randomly chosen. I did not realize this, but makes total sense. Otherwise, the trees in the forest would all be very similar. So I learned something, and isn’t that the whole point of these conferences?
The final talk in this session was by Michael Pane who was attempting to cluster pitches based on pitch F/X data and improve classification of MLB pitches. They call their procedure CLUMPD and they made a sweet interactive shiny app. But I didn’t write down the URL, and I can’t seem to find it by googling it. Hopefully when they post the slides, the link will be in there.
Following the session I ate lunch with Ben Baumer, Mike Lopez, and one of Mike’s friends from UMass on the rocks outside of the Harvard science center. After lunch I mean to go the the afternoon featured speaker, but I ended up talking to two San Francisco fans about my poster. I asked them if they were presenting at the conference, and they told me that they didn’t even know the conference was going to be there. They were just baseball fans in town to see a few Red Sox games and they apparently just stumbled across NESSIS and my poster. After talking to the two guys from San Francisco, I talked to one of the members of the Tuft’s SABR club about openWAR for the rest of the time allotted for the featured speakers. After we finished talking the actual poster session started at 3:30. I met and spoke with a ton of interesting people.
Here’s a list of some of the interesting people that I talked to while at my poster:
- Vince Gennaro – Author of Diamond Dollars: The Economics of Winning in Baseball, President of SABR, consultant to MLB teams, all around baseball fanatic
- Eric Van – Former consultant for the Boston Red Sox
- Michael Humphries – Author of “Wizardry: Baseball’s All-Time Greatest Fielders Revealed”
- James O’ Malley – Professor at Dartmouth
- Andy Andres – Teacher SABR 101 at Tufts
- Doug Noe – Professor at Miami (OH) (This was my favorite meeting because I had never met him before, but he told me that he really liked my blog and that I had actually written about him before.)
Right at the end of the poster session, Eric Van came over to my openWAR poster and criticized our definition of replacement player. The way that we have defined it, about half of the players we have defined as being in the replacement group are below the average replacement player. While I’m not sure that this isn’t ok technically, it’s a huge success for our larger idea. By making openWAR completely transparent people are free to criticize, critique, and complement every single piece of our procedure (and we definitely welcome constructive criticism), rather than gues at what’s going on inside the black boxes of baseball reference and fan graphs WAR.
NESSIS then closed with a panel discussion. The panel consisted of Ben Baumer, Eric Van, and Vince Gennaro. The picture below is the panel, with Carl Morris (you know he’s a big deal cause he’s got a Wikipedia page) saying some words before the discussion began. The panel was ultimately moderated by Andy Andres.
One of the interesting points the panel made was that in the beginning of SABRmetrics, a lot of the most interesting work was being done by fans and not necessarily the teams themselves. This has entirely changed today due to the fact that baseball teams have access to mountains and mountains of data that are simply not available to the public or the public can’t afford.
Van also pointed out that the numbers don’t tell you everything. You can’t just view numbers and ignore the personality of players. For instance, if the numbers say that a guy should hit 6th instead of 2nd, you have to weigh the improvement your team will gain against the psychology of moving a guy from 2nd to 6th in the line-up. In his words:
The numbers are just sign posts. You have to actually watch the game to see if you’re onto something. -Eric Van
The whole discussion was fantastic, and it was really interesting to hear the perspective of three people who have actually worked in baseball as statistical analysts.
Fantastic overall conference. See you in 2015!
Posted on September 26, 2013, in Uncategorized. Bookmark the permalink. 1 Comment.
Pingback: Fully open-source, transparent implementation of Wins Above Replacement: Results from 2013 | Stats in the Wild