The next step in WAR: openWAR
Wins Above Replacement (WAR)
Wins Above Replacement (WAR) is meant to be a comprehensive way to evaluate the total value of a baseball player. The concept is to compute the added or extra wins that a player provides to a team over that of a “replacement” player. If a baseball team will win X games with a particular team and Y games with that exact same team except that player Z is replaced by a “replacement” level player, then the WAR for player Z is theoretically X-Y.
Of course, to compute this quantity is not straight forward and there are many issues involved in the computation of WAR. For instance, the formal definition of a “replacement” player is a slippery concept to pin down. Conceptually, this player is said to be a “quadruple A” level player: Better than a minor league player, but probably not a major league starter. The idea is that players like this are always readily available. But this is just one of many issues.
Along with this there are also many other complications related to calculating WAR. For starters, there is not one unique way to calculate WAR, as there are many reasonable approaches. Everyone agrees on the formula for things like ERA and batting average and, if you give the same data to 10 different people, they will all get the same answer. WAR is not like this. WAR is a concept with a collection of reasonable methods for implementation.
Baseball-reference.com has put together this handy chart for comparing the different implementations of WAR. Basically there are currently three major implementations of WAR: fWAR (fanGraphs), bWAR (Baseball-reference.com), and WARP (Baseball Prospectus).
Recently, I’ve been involved in a project with Ben Baumer and Shane Jensen to develop a new version of WAR. Our motivating criticisms of the current implementations of WAR are:
- WAR is not reproducible: No reference implementation; No open data set; No open source code
- There is no unified methodology: Each component of WAR is viewed as a separate problem – not a piece of the same problem
- WAR does not consider error estimates: Only reported as point estimates; currently unprincipled estimation of margins of error
It seems that there are other WAR practitioners who also consider these to be issues. For instance, just yesterday Baseball Prospectus published a blog post on their front page “Reworking WARP: The Overlooked Uncertainty of Offense” (After talking to us about it…). While BP is addressing the lack of uncertainty quantification in WAR, I suspect neither they nor any of the other major WAR implementors (e.g. Fan Graphs, Baseball Reference) will be addressing our first point and making their implementation completely open and reproducible using an openly available data set. They are businesses, after all, and have legitimate reasons to keep some of the piece of WAR proprietary for competitive purposes. Though this proprietary nature does mean that any or all of these WAR methods could contain pieces that are completely just made up (though I doubt this is the case), and the public would have no idea. The version of WAR that we hope to create, which we refer to as openWAR, will attempt to alleviate these problems.
- openWAR: a reproducible reference implementation of WAR in a fully open-source R package using partially open data.
- Unified methodology: Conservation of runs; Each component estimated as a piece of the same problem
- Error Estimates: Use resampling methods to report WAR distributions.
Currently, our R package (openWAR) is in the early stage of development with an emphasis on reproducibility. Right now, the latest version of our code is available on github and gives reasonable results, though we still have many details to sort out.
Here are some of the preliminary results:
Top 10 players
Based on their runs above average computed using the openWAR package for the first half of the 2013 season with 95% confidence intervals:
- Mike Trout – 54.5 (32.1, 77,2)
- Miguel Cabrera – 49.7 (24.8, 75.5)
- Chris Davis – 49.0 (25.2, 74.7)
- Jason Kipnis – 33.7 (13.5, 54.2)
- Troy Tulowitzki – 33.2 (14.4, 51.9)
- Paul GoldSchmidt – 32.4 (9.9, 56.7)
- David Ortiz – 32.2 (11.9, 54.7)
- Josh Donaldson – 32.2 (12.7, 53.6)
- Matt Carpenter – 31.0 (11.8, 50.8)
- Carlos Santana – 30.8 (11.7, 50.8)
Pingback: BP | Stats in the Wild