Category Archives: Uncategorized

UPDATE: Be sure to read the comments below, and my response

As a newly-minted PhD student, I was talking with a friend about writing papers. “Use LaTeX”, he said. I thought he meant the rubbery material commonly found in lab gloves. But apparently not. LaTeX (pronounced “lay-tech”) is typesetting software that he used for writing papers.

Eager to be on the cutting edge of scholarship, I spent a few days learning how LaTeX worked, how to insert symbols, figures, and tables. I even produced my thesis proposal with it. But my supervisor used Word exclusively, and I had no compelling reason to use LaTeX over Word, so I switched back.

Fast-forward a few years. Now, everyone should be using markdown in a plain text editor, doing statistics in R, uploading versions to github or figshare, and managing citations with JabRef, BibTex or Mendeley. Apparently, Word, Excel, Endnote, and SPSS are…

View original post 1,176 more words

Posted in Uncategorized

Leave a comment

Windows kill a Billion (with a B!?!) birds annually? Probably not.

Feb 6

Posted by statsinthewild

I just came across this article with the following headline:

As many as 988 million birds die annually in window collisions.

Wait, what? Almost a BILLION (with a “B”) birds are killed every year by flying into windows? That’s an outrage! Let’s tear down all of our building and get rid of all our windows in defense of our feathers friends!

No wait. Actually, let’s take a step back and see what’s happening here. First, where does the 988 million number come from? It comes from this study: Bird–building collisions in the United States: Estimates of annual mortality and species vulnerability in some journal called “The Condor”. And actually what they say in their abstract is:

Based on 23 studies, we estimate that between 365 and 988 million birds (median = 599 million) are killed annually by building collisions in the U.S.

So 988 million (nearly a BILLION!) is the upper threshhold of their interval estimate of annual bird deaths by window collisions. But why did the Washington Post leave out the interval estimate and only mention the upper bound? I’d guess because it’s a more sensational headline. (If I’m wrong, please let me know what the reason is.) C0me on media, you’re better than this.

So just to be clear, what the studied actually “showed” is that they estimate that somewhere between 365 and 988 million birds with a median of 599 million are killed in window collisions annually. Even if their methods are completely sound their median estimate is about 40% lower that the sensational upper limit. (Which I suppose would translate into 40% fewer clicks.) So even if they are correct, I feel like the media is being sensationalist about their “findings”.

BUT! BUT! I suspect strongly that their methods are not only not completely sound, I suspect their methods are barely acceptable. In fact, I’ve actually reviewed the statistical methods employed by this same exact author, Dr. Scott R. Loss, before for a similar study that claimed that cats were killing incredibly large numbers of birds. My complete report can be viewed here. (Full disclosure: I was a paid consultant for Alley Cat Allies when I wrote that; They are not paying me for this blog post.) So, Dr. Loss doesn’t have the best statistical track record in my opinion, though he may be a fine ornithologist (he was, after all, the Outstanding Conservation Biology Student of 2009-10. Congrats!). In my professional opinion, as I have stated before, the methods employed in the article “The impact of free-ranging domestic cats on wildlife of the United States” have almost no statistical validity.

While I haven’t fully reviewed the methods in the new paper about bird collisions, I suspect that the same or very similar methods have been employed to reach these astronomically sensational numbers. Again, as I have said before, these numbers may be completely correct (though I suspect they are not), but the statistical methods used to arrive at these numbers are extremely shoddy.

And guess what?!?! Dr. Loss looks like he has a whole series of these papers coming out! Check out his CV under submitted papers. Vehicle collisions, power lines, wind farms!

So Dr. Loss if you’re reading this, please PLEASE consult with a statistician. Also, if you’d like to respond to my criticism, I’d be happy to post your unedited response on my blog.

Cheers.

P.S. By the way, one BILLION (that’s 1,000,000,000) birds per year would mean that almost 32 birds PER SECOND were dying on average in every second, day and night, of every day all year round JUST IN THE UNITED STATES!

Posted in Uncategorized

1 Comment

Beautiful phrase

Feb 3

Posted by statsinthewild

His decision to punt from the Seattle 39-yard line with his team trailing 29-0 early in the third quarter elevated football-coach risk aversion to something more like hyper-milquetoast performance art.

-Josh Levin

Full article

Posted in Uncategorized

Leave a comment

NFL Spreads and Excitement

Feb 2

Posted by statsinthewild

Introduction

Recently I tweeted the following question: “Does the spread of an NFL game predict how exciting the game will be?” If closer games are considered more exciting (I think most people will agree with this), then games with smaller spreads should be, on average, more exciting than games with larger spreads.

So how do we test this? The first big question that we need to answer is this: how do we quantify “excitement” of a game? Luckily, someone has already done this for us. Advance NFL Stats calculates a statistics they call the “excitement index” (EI). It is defined as follows on their website:

Excitement Index (EI) – The measure of how exciting a game is. EI measures the total movement of the Win Probability (WP) line during a game. The more that WP fluctuates, the more dramatic, uncertain, and exciting a game is.

I think this is a great definition of excitement, and whether or not you totally agree that this is or is not the best way to measure this, it is certainly a great place to start.

So to begin with I collected the EIs and spreads for all regular season games from 2010 through 2013 for a total of 1024 games.

Summaries of the data

Looking at a summary of the absolute value of the spread over these 1024 games we find that the average spread is 5.44 points with a median of 4. The inner quartile range of spreads in from 3 to 7.625 and the minimum spread over this time was 0 with a maximum of 26.5(!).

The average EI is 4.036 with a median of 3.9. The inner quartile range of the EI is from 2.7 through 5.2. The least exciting game over this time period was measured to be 1.1 with the most exciting game at 9.8.

Regression Model

Next, I fit a simple linear regression model with excitement as the response variable and absolute value of the spread as the predictor. The fitted regression parameters are:

Intercept: 4.33023 (p-value < 2e-16)
Spread: -0.52 (p-value 4.58e-05)

This means that for ever two point increase in spread, we lose, on average, about 1 point of excitement index. In other words with a spread of 0, the average predicted excitement index (EI) of a game is 4.33023. This is about the 60th percentile of all EIs from 2010-2013. A spread of 3 yields a predicted excitement index of 4.168 with is about the 55th percentile of all excitement indices. When the spread jumps to 7 points, the average excitement level is about 3.95 which is about the 51st percentile of EI.

What can we expect about the most exciting game in a playoff weekend? If we have four games all with spreads of 0, we can expect that the most exciting game will be about a 6 EI, which is about the 85th percentile of EI. If all four games have a spread of 3 points we can expect the most exciting game to be about 5.9 EI, which is about the 84th percentile of EI, and if all 4 games have spreads of 7 we expect the most exciting game to be about 5.67 which is around the 81st percentile of games. So what’s the difference between a 5.7 and a 6 for instance? Here is an example of a 6 and here’s an example of a 5.7. I guess the answer is not much.

What can we expect in the Super Bowl?

The Super Bowl spread tonight is 2.5 points with Denver favored. This gives us an expected excitement index of 4.19523, which is about the 55-th percentile of exciting games. So we should expect the game tonight to be more exciting that than more than half of all regular season games played in the last four years. And that is exciting.

Cheers.

Posted in Uncategorized

Leave a comment

Super Bowl and Phil

Feb 2

Posted by statsinthewild

Some interesting stupid facts:

In years when Punxatawny Phil predicts early spring the average Super Bowl total is 51.67 whereas when long winter is predicted the average total is 44.03. I assume this is because it’s easier to score points in warm weather.
The NFC team averages 30.42 points when Phil predicts early spring as opposed to an average of 23.23 points for a prediction of long winter. This means that the NFC team gains about 1 extra touchdown for a prediction of early spring.
The average margin of victory with a prediction of early spring is 18.5; When long winter is predicted the average margin of victory is 12.6.

Update: Phil emerged this morning at 7:28am and predicted 6 more weeks of winter. This means we can expect a low scoring, close game with no advantage for the NFC. This is reflected in the current Super Bowl line of Denver -2.5

Cheers.

Posted in Uncategorized

Leave a comment

Why biostatistics at Brown was the best thing for me

Jan 28

Posted by statsinthewild

StatsbyLopez

Each January and February, hundreds of future statistics and biostatistics PhD students are pawns in what can be a nasty game of application roulette, which roughly consists of the following steps

1) Apply to a dozen schools, and shell out close to $1000 to do so. You don’t get this money back.

2) Wait. Mostly do this.

3) Interview at a recruitment event, which consists of several mini interviews with faculty, most of whom you’ve never heard of and who talk you about research you’ve mostly never heard of. Meanwhile, they ask you about your research plans that you’re mostly making up. No matter what the school or the circumstance, these days and meetings will undoubtedly be awkward.

4) Wait. Continue to do this:

5) Receive some offer. This first offer is almost never the one you want it to be.

6) Wait some more, while fielding phone calls from…

View original post 645 more words

Posted in Uncategorized

Leave a comment

Super Bowl Pick

Jan 27

Posted by statsinthewild

Denver Broncos vs Seattle Seahawks

Prediction: Seahawks 26-24

Win Probability: Seahawks 54.72%

SU: Seahawks +2.5

Pick: Seahawks +115

OU: Over 46.5

Posted in Uncategorized

Leave a comment

What time does the Super Bowl start?

Jan 26

Posted by statsinthewild

6:30 pm. On FOX.

Cheers.

Posted in Uncategorized

Leave a comment

NFL Conference Championships Picks and Predictions

Jan 16

Posted by statsinthewild

San Francisco 49ers at Seattle Seahawks

Prediction: Seahawks 20-17

Win Probability: Seahawks 54.96%

SU: 49ers +155

Pick: 49ers +3.5

OU: Under 39.5

New England Patriots at Denver Broncos

Prediction: Broncos 31-30

Win Probability: Broncos 54.91%

ML: Patriots +205

Pick: Patriots +5.5

OU: Over 56.5

Posted in Uncategorized

Leave a comment

Slow Getting Up

Jan 11

Posted by statsinthewild

I’m three chapters into this book, and I can’t stop reading it. I wanted to share some of my favorite quotes from the book so far.

I watch the 49ers on TV all year with a new appreciation of the machine. For the first time I’m seeing the big picture through the small screen. I listen to the announcers and read the papers. The media narratives are sensational and simplistic, and when compared to what I know about the team, sound like drivel.

-Nate Jackson, “Slow Getting Up” , Page 22

I’d also describe most statistical analysis of sports that is done by the general media as “sensational and simplistic”. I just don’t think the public has any desire for unremarkable nuanced statistical analysis or narratives. And the two are not unrelated. I’ve seen so many examples of sports writers who overly simplify statistics in order to support their sensationalistic narratives. And if you want to read that stuff, feel free. It’s just not what I’m interested in reading or writing about.

Every morning when I jump onto the soft leather seat and turn over the sweet engine, I tell myself that I better have a great day at practice or I won’t be making the payments. I’l be “workin’ a nine-to-five with a thirty minute”, just like our special teams coach Ronnie Bradford says will happen to us if we keep fucking up plays.

-Nate Jackson, “Slow Getting Up” , Page 33

I absolutely LOVE the phrase “a nine-to-five with a thirty minute.”

Cheers.

Posted in Uncategorized

Leave a comment

Stats in the Wild

Category Archives: Uncategorized

Beware the academic hipster (or, use what works for you) UPDATED

Windows kill a Billion (with a B!?!) birds annually? Probably not.

Beautiful phrase

NFL Spreads and Excitement

Introduction

Summaries of the data

Regression Model

What can we expect in the Super Bowl?

Super Bowl and Phil

Why biostatistics at Brown was the best thing for me

Super Bowl Pick

Denver Broncos vs Seattle Seahawks

What time does the Super Bowl start?

NFL Conference Championships Picks and Predictions

San Francisco 49ers at Seattle Seahawks

New England Patriots at Denver Broncos

Slow Getting Up

Blogroll

Comedy

Data Art

Data Viz

Jobs

R

Tag Cloud