Live Blogging Tango’s Live blog of my Live Blog Live Blogging His Live Blog of the WAR Podcast on Stolen Signs.
So Tango live blogged my live blog of his live blog of the WAR podcast from stolen signs. I am thoroughly enjoying this. Here are my responses (my responses in bold):
use a highly ad hoc methods to reach their replacement level
That’s a FEATURE not a bug!
… still reading
What? How is an ad hoc method better than a definition of a replacement player and then letting the data estimate quantities for us?
and a replacement pool of players and from that pool we are estimating a replacement player for each position
Still a problem.
At least we are trying. We define what a replacement level player is and then try to estimate it based on data. That is how statistics works.
You can criticize is all you want, but what’s a better way to do it? I haven’t seen one.
The way I’m doing it is better. It’s what BRef and Fangraphs have implemented.
There’s art to this. It’s not all science. Well, it’s Bayesian, which is artistic science. Or scientific art?
This is truly astonishing. Tom Tango is a statistical analyst and I believe he is arguing that his completely ad hoc definition of replacement players, which he himself refers to as a feature and not a bug, is better than our attempt at defining a replacement player and then estimating is from the data. That is truly astonishing.
It’s Bayesian in the sense that you seem to have set an ad hoc prior on what replacement level should be, but then you don’t use any data to update what the replacement level is. Tango’s definition of replacement player is Bayesian if you consider a model with no data updating the prior (i.e. the posterior is the same as the prior) to be a Bayesian model.
Does anyone know exactly where these numbers came from?
Yes, me! I have dozens of threads on the topic on my old blog. I don’t have a single best source, but there’s a few easy to find sources.
I’m pretty sure the Fangraphs Library points to one or two of these. And I’m almost positive that Dave at Fangraphs in introducing WAR to Fangraphs talked about it at length in his N-part series.
The internet is the wild west. That’s a feature. It’s not cited like an academic paper, and that’s a feature too.
Being difficult to find is a feature? I disagree.
We think this is important because if two players have the exact same stats and one is a short stop and the other is a first baseman, we believe that the shortstop is providing more offensive value to that team because they are filling a difficult fielding position.
And if the CF outhit the LF, he’d say the same thing. Which is the problem that Sean noted.
If CF are collectively out hitting left fielders then I would argue they are more valuable. But this doesn’t happen in practice. You can argue all you want about theoretically how valuable a position is, but a better way to do it is you let the data tell you want the answer is. For example, in 2017, using 1B as a baseline, the CF was shifted -0.02221 runs per plate appearance and the RF was shifted -0.00801 runs per plate appearance. This indicates that the CF will get more credit than the RF for the same offensive play, but only by 0.0142 per plate appearance. So while I think the argument that offensive players shouldn’t be adjusted for what position they play is a weak argument, even if I do accept it, your complaint is still meaningless because it’s not an issue in practice. The collective RF over the course of a season will outhit the collective CF over the course of a season in practice. It’s possible that someday the CF will outhit the RF, but over the course of TENS OF THOUSANDS of plate appearances this seems unlikely. And further, if that’s what the data is telling us, then we should listen!!!
Ultimately, I don’t think I am right or Tango is right (though Tango thinks openWAR is wrong in this aspect), I just think this particular issue is a matter of choice. We’ve made one choice and Tango has made another and I think they are both valid. I prefer the openWAR way of doing it, but I certainly understand different points of view on this particular choice of adjustments.
I’m just actually wondering if he has.
I did at the time. I think I even had a thread about it. It’s possible I didn’t understand all of it, but I’m pretty sure I understood just about all of it.
We aren’t doing this adjustment at the player level, we are doing this adjustment at the PLATE APPEARANCE level. This means that the average plate appearance involving a CF at the plate will be the same as the average plate appearance involving a RF at the plate. Once all of the plate appearances are aggregated, there is no implication that the average CF has to equal the average LF. It COULD happen, but it would be due to chance.
At the level for the league, and over a period of years, it’ll work out to the same thing just about. Certainly won’t be chance. Or, I have no idea what I’m talking about.
Who cares if it won’t be interpreted properly? It’s the right thing to do statistically!
That in a nutshell is the difference between what Forman and Appelman do, and what everyone else does.
Again, truly astonishing. It sounds to me an awful lot like Tango is saying that what Forman and Appelman do is some data manipulation, and that what everyone else is doing is valid statistics. I don’t really think he means it that way, but it sounds like that. I think (and I’m sure he will tell me if I am wrong) what Tango means is that Forman and Appelman are producing advanced statistics for a general audience and everyone else is making things too complicated. I disagree with this entirely. I don’t think we, as statisticians should dumb it down for a general audience. Instead we should get better at communicating what we are doing. Again, the solution isn’t to dumb it down, it’s to explain out work better.
would casually dismiss a discussion of uncertainty. Maybe I’m misinterpreting this though.
Eh… it’s more that I didn’t think anything would come of it. Compared to the rest of the podcast, where someone driving by could get something out of it, the talk of uncertainty was just theoretical really. We need to see it in action. In other words, it was too early.
That said, I have talked about it for PECOTA, for those who remember when Colin joined BPro, and the “percentile” presentation. I had a few good threads on it, probably around 2009 or 2010.
It’s literally happening as we speak at baseball prospectus.
Here’s one thread on OpenWAR, with Ben Baumer stopping by:
This was an interesting thread, but I’m most interesting in the first comment, which you can read here:
This is so lazy. And as for doubting “it will get any traction”, I’ll just leave this here: Baumer, Brudnicki, McMurray win 2016 SABR Analytics Conference Research Awards #sickburn
You’ll see all the comments in there center around the positional issue.
And more OpenWAR discussion here:
There are some valid (and other not so valid) points that are brought up in this discussion. I’ll highlight one of the not so valid points here:
Tango says: “There is no question that fWAR and rWAR have far more “open source” ideals behind them than openWAR. What? openWAR is fully reproducible with using only publicly available data and the ENTIRE source code is publicly available. This is not true of ANY of the other major implementations of WAR. Argue all you want that openWAR sucks. I’m fine with that (I disagree, but I’m fine with it), but to say that “fWAR and rWAR have far more ‘open source’ ideals behind them than openWAR” seems to me to be a complete misunderstanding of what open source means and what the ideals of open source are.
Posted on December 3, 2017, in Uncategorized. Bookmark the permalink. 4 Comments.
I’ve never understood the method that bWAR and fWAR use to determine replacement level to be ad-hoc. It’s a theoretical method, one that’s based on what people who are involved in rigorously studying baseball believe would be the theoretical record of a team of replacement players. It does appear a bit arbitrary, but a lot of discussion and debate (some details of which may not be recorded) were had to reach that threshold.
When it comes to openWAR, this part of the comment from Guy (cited by Tango in the comment you highlighted at the end) echoes my thoughts about how the replacement player is defined:
“I think using the bottom N players to define replacement level is a reasonable approach, and easy for anyone to replicate and confirm. It would have been better if they had identified a cutoff that comes closer to producing what we believe is the true replacement level—probably something like top 11 position players and top 9 pitchers. After all, their 13/12 decision is no less arbitrary than the 1000 WAR standard they criticize. But defining replacement level by the collective performance of the weakest bench players and callups is a reasonable approach, and arguably has a more solid foundation than current WAR systems.”
A goal of mine is to start taming the wild west nature of sabermetrics and organize some reference like a “Current Index of Sabermetrics”. SABR has some efforts for all baseball research to this end, but it’s a little bit unwieldy (though miles ahead of where it was a few years ago): http://baseballindex.org/
13/12 decision for players is based on how many players there are on a roster. 1000 WAR is TOTALLY arbitrary. They are both ad hoc to some degree, but we can defend 13/12. I don’t even know where 1000 comes from? Is it just that it’s a nice round number?
Well, it is rounded, but it isn’t totally arbitrary.
Start here: http://www.insidethebook.com/ee/index.php/site/comments/how_to_calculate_war/
Then go here: https://www.fangraphs.com/blogs/unifying-replacement-level/
Arbitrary is probably the wrong word. They are chosen to be “reasonable”, but they are in no way even close to optimal. It’s the difference between slugging percentage and wOBA. Both choices of weights are “reasonable” in some sense. In slugging the outcomes are ordered correctly, but are essentially arbitrary choices of weights. wOBA lets the data estimate the weights and it chooses the weights to be optimal according to a specific loss function. That’s what I mean by arbitrary, which is a bit different than the common usage of it.