Jet lag hurts baseball players? The evidence is weak.
I recently came across this article on Gizmodo entitled “The Surprising Way Jet Lag Impacts Major League Baseball Games“. This article was based on a recently published article in the Proceedings of the National Academy of Sciences (PNAS). That article, “How jet lag impairs Major League Baseball performance”, can be read here. This scholarly article was mentioned on NPR, The Washington Post, USA Today, The Wall Street Journal, MSN, and even the AP.
The main conclusions of the paper are that there is evidence of jet lag, manifested in reduced levels of performance, for eastward travel, but insufficient evidence for westward travel. Specifically, they found that slugging percentage was adversely affected for the home-team after traveling, but not for an away team, among other effects.
After a quick read through of the Gizmodo article and a glance at the full manuscript, the red light that goes of in my head alerting me to be skeptical started blinking. Last, night I did a full, thorough reading of the manuscript and I believe there are some pretty serious flaws with this manuscript. Especially given that it was published in such a top tier journal, and that the results were frequently shared in popular press, including a tweet from the AP.
Here is my review.
The authors reviewed 20 years of MLB data for a little bit more than 46,000 games. They identified 4,919 games where a team was observed to have had traveled more than 2 time zones to a game. They conclude there there are several baseball statistics that are adversely affected by jet lag, which include a hitting, fielding, and pitching statistics. While this is potentially and interesting and novel finding, I believe that the conclusions of this manuscript are not warranted by the statistical analysis performed in this manuscript.
- This whole article seems like p-hacking gone wild. They are running a ton of tests and some are shown to be significant. However, even if you accept these results as being the results of jet lag (I’m not even remotely convinced that this demonstrates jet lag), the effects sizes are incredibly small. Let’s look at slugging percentage as an example. From table 1, the authors find that home teams traveling east have a significantly lower slugging percentage. Their estimate is −0.01 ± 0.006 (I’d like to see more significant digits). Let’s assume each team started every home stand jet-lagged after traveling east (definitely not true). Looking at the Red Sox 2017 schedule, there are only a handful of home games that are played after a 2 or 3 time zone trip. And those are usually played after a rest day. So let’s assume that there are 5 games (still likely an overestimate) like this in an entire season. If the average slugging percentage at home was 0.420 and it is reduced to 0.41 for home games with jet lag. Assuming 33 at bats in an average game, this is a difference of 0.33 bases per game. Over the course of the season where a team played five games at home after a 2 or 3 time zone trip, this jet-lag effect on slugging accounts for 1.65 total bases across the entire season. The average MLB team in 2016 average MLB team in 2016 had 2304 total bases in 2016. This is a reduction of 0.0716% in total bases over the course on a whole season. So even if this effect is real (and I’m not convinced that it is), it has very, very little effect on a team’s performance over the course of a season.
- There are many, many confounding variables that this paper doesn’t even attempt to deal with. For instance, with slugging percentage this difference could be due to differences in team strengths, differences in temperatures, dome vs non-dome, physical characteristics of the ball park, etc. None of this is controlled for. And yet, the authors seem willing to attribute all of their findings to jet lag, when, in my opinion, there are much simpler explanations for this significant finding. My guess would be that as you control for some of these other characteristics that I list above, the effect of jet lag would disappear entirely. Of course, that’s just a guess. But this is enough of an issue that the major take-home message – “the effects are sufficiently large to erase the home advantage” – is at best extrapolating, at worst completely false.
- In the results section, the authors define what they mean by “jet-lagged” in accordance with the International Classification of Sleep Disorders, which “requires travel across at least two time zones”. So they defined a team that traveled 2 or 3 time zones to a game as jet-lagged and if a team traveled 0 or 1 times zones and not jet-lagged. I am generally against taking continuous variables and categorizing them, and I feel that way in this setting also. If jet-lag is a problem, as they argue, wouldn’t 3 hours of jet-lag have a greater effect than 2 hours of jet-lag which would have a greater effect than 1 hour of jet-lag? I think the jet-lag variable should have been included in their model as a continuous variable which would allow them to estimate the effect of each hour of jet-lag. Alternatively, they could have included each level of jet-lag as a dummy variable to estimate the effect of each level of jet-lag individually.
- I don’t understand why home and away jet-lag were analyzed separately. This seems pretty easy to put into one model. Just add an indicator for home and away teams and put them in the same model. This would improve power also.
- The authors primarily used one-tailed p-values for almost all of their tests. The authors mention in the methods section that one-sided p-values were used for “parameters that may be adversely impacted by jet-lag”. They then mention that they used a two-tailed test for parameters that are “neutral with respect to jet-lag”. How do you make the distinction between parameters that may be negatively affects and neutral to jet-lag? (This seems like an issue of the garden of forking paths.)
- Continuing with the previous point, they use a two-tailed test when testing, for example, at-bats.
- Also, they should use batters faced rather than at-bats, in my opinion.
- I originally though the authors were not using a multiple testing correction, which would have been a massive error here given the number of tests they were doing. But they mention that they used the Benjamini-Hochberg (BH) procedure with an FDR threshold of 0.2. They did this right. Though I wonder what was considered a family of tests? Was it every test? Or was this done separately for each home and away? Etc…
- Further, when the performing a BH adjustment for multiple testing with an FDR threshold of 0.2 means we are controlling the the number of false discoveries to be less than 20%. This implies that some of our findings are likely false discoveries, however, the authors seem to speak about their conclusions with too much certainty. FDR corrections are fantastic and are used, among other reasons, because they increase the power a given set of tests. However, the trade-off is a higher type I error (finding something significant when the null hypothesis is actually true). However, in spite of all this, the authors seem to assert with certainty that they have detected the effect of jet lag:
- How did they do a winning percentage model? What is the response variable? How did they do a batting average model? Is it the per game batting average that is used as the response variable? This is unclear to me.
- The authors find that “The only defensive metric affected by westward home-team travel was triples allowed (+0.033, P < 0.01).” (Triples allowed!) This could very easily be explained by ballpark differences.
- Regarding significant digits, the tables report p-values with 2, 3, and sometimes 4 significant digits. This is mostly a style critique, but should have been caught somewhere in the editing process.
Conclusion: If I were reviewing this paper my decision would be either reject or revise and resubmit with major revisions. Major revisions would need to include controlling for me confounding factors that could explain the significant findings. I think without greatly improving this analysis, this article is unsuitable for publication, especially in such a highly regarded journal as PNAS is. Further, I think the authors need to tone down their certainty that they have demonstrated a jet lag effect. Their findings would be interesting if they could demonstrate these findings in a more rigorous analysis. There may very well be a jet lag effect in baseball, but without substantial improvements to this analysis, I’m not convinced this manuscript demonstrates much more than the fact that you can get a bunch of significant findings if you do enough tests without properly controlling for confounders.
I’d like to thank Mike Lopez (@statsbylopez) for his helpful comments.