## NCAA tournament projections – March 7, 2017

## Predicted field

**Conference Champ (Actual or predicted) – BOLD**

1 seeds: **N. Carolina (ACC), Gonzaga (WCC), Villanova (Big East), Kansas (Big 12) **

2 seeds: **Kentucky (SEC)**, Duke, West Virginia, **UCLA (Pac-12)**

3 seeds: Louisville, Virginia, Oregon, Florida St.

4 seeds: **Wisconsin (Big Ten)**, Baylor, Florida, Arizona

5 seeds: Purdue, **Wichita St. (MVC)**, Butler, Notre Dame

6 seeds: **Dayton (Atlantic 10)**, Oklahoma St., St. Mary’s (CA), Miami (FL)

7 seed: Iowa St., VCU, Michigan, **SMU (AAC)**,

8 seeds: Rhode Island, Kansas St., S. Carolina, Clemson

9 seeds: Cincinnati, Texas Tech, Creighton, Marquette

10 seeds: Syracuse, Utah, California, Maryland, Wake Forest

11 seeds: Minnesota, TCU, Xavier, USC, **UNC Wilmington (CAA), **Michigan St

12 seeds: Seton Hall, **MTSU (C-USA)**, **Nevada (MWC), UC Irvine (Big West)**

13 seeds: **CSU Bakersfield (WAC), Bucknell (Patriot), Vermont (America East), Akron (Mid-America)**

14 seeds: **Princeton (Ivy), ETSU (Southern), Iona (MAAC), Winthrop (Big South)**

15 seeds: **Mount St. Mary’s (Northeast)** **, Texas Southern (SWAC), North Dakota (Big Sky), FGCU (Atlantic Sun) **

16 seeds: **NC Central (MEAC), N. Kentucky (Horizon)****, ****New Orleans (Southland), South Dakota State (Summit), UT Arlington (Sun Belt), Jacksonville St. (OVC)**

_________________________________________________

Last Four in: Seton Hall, USC, Xavier, TCU

First four out: Northwestern, Indiana, St. Bonaventure, Virginia Tech

:

## Wrestling and I-80

FloWrestling posted a breakdown of the national qualifiers by state. So made some maps. The first map displays the raw number of wrestlers per state. With 49 qualifiers Pennsylvania leads the way, with Ohio and Illinois next (both 31) followed by New Jersey (30). If you look instead at qualifying wrestlers per capita (in this case qualifiers per million residents) as in the second map, South Dakota leads the way with 4.66 qualifiers per million followed by Iowa (4.16), Pennsylvania (3.83), New Jersey (3.35) and Delaware (3.17). Ohio and Illinois are next with 2.67 and 2.41, respectively.

It’s really remarkable to see how localized the sport of wrestling really is. Wrestling basically takes place along Interstate 80. If you look at the 11 states that I80 travels through (New Jersey, Pennsylvania, Ohio, Indiana, Illinois, Iowa, Nebraska, Wyoming, Utah, Nevada, California) almost 55% of qualifiers are from these states even though these states make up only a little over 32% of the US population. Even more remarkably, if you just look at the first 6 states that I80 travels through (New Jersey, Pennsylvania, Ohio, Indiana, Illinois, Iowa) these states along make up over 47% of the qualifiers even though these states make up only a little over 17% of the US population.

And just for fun here’s me in the Massachusetts state quarterfinals in 1999. I was down 7-0 after the first. I won 11-10. Don’t ask me what happened in the semifinals. I’m not over it yet. Give me a few more decades.

Cheers.

## A sequence of Fibonacci facts

The Hampshire College Summer Studies in Mathematics program has what it calls an “Interesting Test” that interested students are invited to take as part of their application. Their web page offers a sample of past test problems. One of these, under the heading “Problems which at first seem to lack sufficient information”, is as follows:

From the third term on, each term of the sequence of real numbers $latex a_1, a_2, ldots, a_{10}$ is the sum of the preceding two terms; that is, $latex a_n = a_{n-1} + a_{n-2}$ for $latex n = 3, 4, 5, ldots, 10$. If $latex a_7 = 17$, what is the sum of all the ten terms?

I’d heard this one before (though it’s hard to find a source for things like this – the cloesst I could find was this Math StackExchange question. The solution is as follows: we can write $latex…

View original post 582 more words

## Open thread for mathematicians on the immigration executive order

Terry Tao is smarter than you. You should listen to him. https://en.wikipedia.org/wiki/Terence_Tao

The self-chosen remit of my blog is “Updates on my research and expository papers, discussion of open problems, and other maths-related topics”. Of the 774 posts on this blog, I estimate that about 99% of the posts indeed relate to mathematics, mathematicians, or the administration of this mathematical blog, and only about 1% are not related to mathematics or the community of mathematicians in any significant fashion.

This is not one of the 1%.

Mathematical research is clearly an international activity. But actually a stronger claim is true: mathematical research is a transnational activity, in that the specific nationality of individual members of a research team or research community are (or should be) of no appreciable significance for the purpose of advancing mathematics. For instance, even during the height of the Cold War, there was no movement in (say) the United States to boycott Soviet mathematicians or theorems, or to…

View original post 1,155 more words

## Jet lag hurts baseball players? The evidence is weak.

I recently came across this article on Gizmodo entitled “The Surprising Way Jet Lag Impacts Major League Baseball Games“. This article was based on a recently published article in the *Proceedings of the National Academy of Sciences (PNAS). *That article, “How jet lag impairs Major League Baseball performance”, can be read here. This scholarly article was mentioned on NPR, The Washington Post, USA Today, The Wall Street Journal, MSN, and even the AP.

The main conclusions of the paper are that there is evidence of jet lag, manifested in reduced levels of performance, for eastward travel, but insufficient evidence for westward travel. Specifically, they found that slugging percentage was adversely affected for the home-team after traveling, but not for an away team, among other effects.

After a quick read through of the Gizmodo article and a glance at the full manuscript, the red light that goes of in my head alerting me to be skeptical started blinking. Last, night I did a full, thorough reading of the manuscript and I believe there are some pretty serious flaws with this manuscript. Especially given that it was published in such a top tier journal, and that the results were frequently shared in popular press, including a tweet from the AP.

Here is my review.

## General overview

The authors reviewed 20 years of MLB data for a little bit more than 46,000 games. They identified 4,919 games where a team was observed to have had traveled more than 2 time zones to a game. They conclude there there are several baseball statistics that are adversely affected by jet lag, which include a hitting, fielding, and pitching statistics. While this is potentially and interesting and novel finding, I believe that the conclusions of this manuscript are not warranted by the statistical analysis performed in this manuscript.

## Major comments

- This whole article seems like p-hacking gone wild. They are running a ton of tests and some are shown to be significant. However, even if you accept these results as being the results of jet lag (I’m not even remotely convinced that this demonstrates jet lag), the effects sizes are incredibly small. Let’s look at slugging percentage as an example. From table 1, the authors find that home teams traveling east have a significantly lower slugging percentage. Their estimate is −0.01 ± 0.006 (I’d like to see more significant digits). Let’s assume each team started every home stand jet-lagged after traveling east (definitely not true). Looking at the Red Sox 2017 schedule, there are only a handful of home games that are played after a 2 or 3 time zone trip. And those are usually played after a rest day. So let’s assume that there are 5 games (still likely an overestimate) like this in an entire season. If the average slugging percentage at home was 0.420 and it is reduced to 0.41 for home games with jet lag. Assuming 33 at bats in an average game, this is a difference of 0.33 bases per game. Over the course of the season where a team played five games at home after a 2 or 3 time zone trip, this jet-lag effect on slugging accounts for 1.65 total bases
*across the entire season*. The average MLB team in 2016 average MLB team in 2016 had 2304 total bases in 2016. This is a reduction of 0.0716% in total bases over the course on a whole season. So even if this effect is real (and I’m not convinced that it is), it has very, very little effect on a team’s performance over the course of a season.

- There are many, many confounding variables that this paper doesn’t even attempt to deal with. For instance, with slugging percentage this difference could be due to differences in team strengths, differences in temperatures, dome vs non-dome, physical characteristics of the ball park, etc. None of this is controlled for. And yet, the authors seem willing to attribute all of their findings to jet lag, when, in my opinion, there are much simpler explanations for this significant finding. My guess would be that as you control for some of these other characteristics that I list above, the effect of jet lag would disappear entirely. Of course, that’s just a guess. But this is enough of an issue that the major take-home message – “the effects are sufficiently large to erase the home advantage” – is at best extrapolating, at worst completely false.

## Specific comments

- In the results section, the authors define what they mean by “jet-lagged” in accordance with the International Classification of Sleep Disorders, which “requires travel across at least two time zones”. So they defined a team that traveled 2 or 3 time zones to a game as jet-lagged and if a team traveled 0 or 1 times zones and not jet-lagged. I am generally against taking continuous variables and categorizing them, and I feel that way in this setting also. If jet-lag is a problem, as they argue, wouldn’t 3 hours of jet-lag have a greater effect than 2 hours of jet-lag which would have a greater effect than 1 hour of jet-lag? I think the jet-lag variable should have been included in their model as a continuous variable which would allow them to estimate the effect of each hour of jet-lag. Alternatively, they could have included each level of jet-lag as a dummy variable to estimate the effect of each level of jet-lag individually.
- I don’t understand why home and away jet-lag were analyzed separately. This seems pretty easy to put into one model. Just add an indicator for home and away teams and put them in the same model. This would improve power also.
- The authors primarily used one-tailed p-values for almost all of their tests. The authors mention in the methods section that one-sided p-values were used for “parameters that may be adversely impacted by jet-lag”. They then mention that they used a two-tailed test for parameters that are “neutral with respect to jet-lag”. How do you make the distinction between parameters that may be negatively affects and neutral to jet-lag? (This seems like an issue of the garden of forking paths.)
- Continuing with the previous point, they use a two-tailed test when testing, for example, at-bats.
- Also, they should use batters faced rather than at-bats, in my opinion.
- I originally though the authors were not using a multiple testing correction, which would have been a massive error here given the number of tests they were doing. But they mention that they used the Benjamini-Hochberg (BH) procedure with an FDR threshold of 0.2. They did this right. Though I wonder what was considered a family of tests? Was it every test? Or was this done separately for each home and away? Etc…
- Further, when the performing a BH adjustment for multiple testing with an FDR threshold of 0.2 means we are controlling the the number of false discoveries to be less than 20%. This implies that some of our findings are likely false discoveries, however, the authors seem to speak about their conclusions with too much certainty. FDR corrections are fantastic and are used, among other reasons, because they increase the power a given set of tests. However, the trade-off is a higher type I error (finding something significant when the null hypothesis is actually true). However, in spite of all this, the authors seem to assert with certainty that they have detected the effect of jet lag:

- How did they do a winning percentage model? What is the response variable? How did they do a batting average model? Is it the per game batting average that is used as the response variable? This is unclear to me.
- The authors find that “The only defensive metric affected by westward home-team travel was triples allowed (+0.033, P < 0.01).” (Triples allowed!) This could very easily be explained by ballpark differences.
- Regarding significant digits, the tables report p-values with 2, 3, and sometimes 4 significant digits. This is mostly a style critique, but should have been caught somewhere in the editing process.

Conclusion: If I were reviewing this paper my decision would be either reject or revise and resubmit with major revisions. Major revisions would need to include controlling for me confounding factors that could explain the significant findings. I think without greatly improving this analysis, this article is unsuitable for publication, especially in such a highly regarded journal as PNAS is. Further, I think the authors need to tone down their certainty that they have demonstrated a jet lag effect. Their findings would be interesting if they could demonstrate these findings in a more rigorous analysis. There may very well be a jet lag effect in baseball, but without substantial improvements to this analysis, I’m not convinced this manuscript demonstrates much more than the fact that you can get a bunch of significant findings if you do enough tests without properly controlling for confounders.

Cheers.

*I’d like to thank Mike Lopez (@statsbylopez) for his helpful comments. *

## Women’s March Chicago – January 20, 2017

My pictures from the Women’s March in Chicago last Saturday.

## Worst. Playoffs. Ever.

After the first 6 NFL playoff games this year, I asked “Where are all the close NFL playoff games?” We were promptly treated to a to two games that were both decided by three or fewer points. The playoffs were exciting again….momentarily. Then the conference championship games treated us to 19 and 23 point blowouts in the AFC and NFC, respectively. So are these the worst playoffs ever? Probably not, but they are definitely the worst in quite some time. See this graph:

Each boxplot summarizes the distribution of scores for margin of victory in each season’s playoff games. It’s immediately evident that the past 5 or 6 seasons have had much closer games than this season.

I’ve also compared the CDF of the margins of victory for the playoffs since 2000:

On this plot, a lot of close games will pull a seasons curve up towards the upper left corner, whereas lots of games that are not close will pull the curve towards the lower right corner.

Time for some fun fact! The average margin of victory for a playoff game in the 2016-17 season was 15.7, which is the highest since 2002-03 when the average margin of victory was 17.09 (driven in large part by the 41-0 beatdown the Jets put on the Colts). Further, the median margin of victory this year so far is 18. That’s the largest median margin of victory in the playoffs since 2000-01 when the median was also 18.

So far of the 10 games played so far, only 2 of them were decided by 10 points or less. This is the smallest number of games decided by ten or less points during 2000-2016. In 2000-01 and 2002-03, there were 3 games in each of those playoffs decided by 10 or less points. The last time there was as few as 4 games decided by ten or less was 2009-10. In 2003-04 and 2010-11, 8 of the 11 playoff games were decided by ten points or less, the most of these types of games in a single season of playoff games since 2000.

Go Falcons!

After much thought and reflection, I’ve decided to part ways with the @Patriots as a fan. It’s been a hell of a 35 years. 1/2

— SITW (@StatsInTheWild) January 26, 2017

Cheers.

## NFL Picks – Superbowl LI

**Total (Playoffs) –**

**SU: 8-2 ATS: 2-8 O/U: 3-6****Wildcards – SU: 4-0 ATS: 1-3 O/U: 3-1**

**Division Round – SU: 2-2 ATS: 1-3 O/U: 0-4**

**Conference Round – SU: 2-0 ATS: 0-2 O/U: 0-2**

## New England vs Atlanta

Prediction: Patriots 28-26 (56.3%)

Pick: Falcons +3

Total: Under 58

## NFL Picks – Conference Championship

**Total (Playoffs) –**

**SU: 8-2 ATS: 2-8 O/U: 3-6****Wildcards – SU: 4-0 ATS: 1-3 O/U: 3-1**

**Division Round – SU: 2-2 ATS: 1-3 O/U: 0-4**

**Conference Round – SU: 2-0 ATS: 0-2 O/U: 0-2**

## Green Bay at Atlanta

Prediction: Falcons 29-28 (55.7%)

Pick: Packers +6

Total: Under 59.5

## Pittsburgh at New England

Prediction: Patriots 25-21 (61.3%)

Pick: Pittsburgh +6

Total: Under 50