Category Archives: Uncategorized

Clay Travis said another stupid thing. And this time it actually matters.

Personally, I think Clay Travis is kind of a dipshit.  But he’s not really causing any harm by being a dip shit sports writer.  No matter what his takes are on SEC football, they aren’t life or death, and he’s easy enough to ignore (which I usually do). But recently he wrote a piece downplaying the risks of coronavirus that went beyond just being a regular dipshit sports writer, and crossed into the “guy who is totally full of shit writes about something he knows nothing about and puts people’s lives at risk” territory. And since I have a deep personal policy that when people are full of shit, they should be called out for being full of shit, I have decided to write this post.  And Clay Travis is full of shit.

This kind of writing is really, actually dangerous.  First of all, he has a shit ton of followers (650+ thousand on twitter) who tend to demographically skew younger, and at least some of them will take what he is saying seriously.   What I’m saying it that this article almost definitely led to some people taking too casual a stance on the risks of coronavirus.  I mean if I was a junior at Mississippi State and I read this piece a week ago, I definitely would have gone on spring break.  (Even though every epidemiologist in the entire country would have told me not to go).

Coronavirus is a really dangerous virus, and its definitely not “just like the flu”.  In the interest of full disclosure, I was a “this is just a flu” guy in early February.  I even tweeted out “Thousands of people die of the seasonal flu every year, but let’s panic about coronavirus”.  But turns out I was wrong.  This isn’t the seasonal flu.  Dr. Fauci, someone with actual expertise in this, unlike Clay Travis, predicts at this point that there will be millions of infections and 100,000 – 200,000 deaths from Coronavirus.  And worse case scenarios, with no interventions, again from actual experts, were up to 2.2. million American deaths and 500K in England.  So I was wrong on Feb 1. It turns out this shit is serious. (It’s ok to be wrong and change your mind when you get new information!) New York City is breaking records for 911 calls.  One hospital in New York is describing what’s happening there as “It’s what was happening in Italy”.  But you wouldn’t know any of this by reading C’Lay Travis as he boldly proclaimed on March 18, less than two weeks ago, that:

Coronavirus Infections Are Likely To Peak Next Week

This is completely not true.  It was a stupid thing to say then, and its proven to be so very, very wrong since then.  As such, I’e decided to go through the article and give my comments on it.  (My comments are in bold.  Clay’s original article is in italics.)

I know there is coronavirus doom and gloom everywhere you look on social media right now — in particular with viral predictions of millions of coronavirus deaths on the horizon — but I believe most people are missing the key detail in this outbreak: the number of daily new infections.

The “viral predictions of millions of coronavirus deaths” are actually from a team at Imperial College London led by Neil Ferguson, a professor of mathematical biology.  

Don’t focus on the raw numbers of infection or their growth or death rates — the kind of fear porn you see peddled far and wide on social media — just look at the rates of daily new infection that have occurred in coronavirus outbreaks around the globe and you can divine, to a great extent, what the future is likely to hold.

I think the growth rates and the death rates are EXACTLY what we should be focusing on.  The US is currently on pace to have the number of deaths double every 3 days.  This is above both Iran and China at this point in their outbreaks.  We are still below Italy, but we are on pace to pass Italy in terms of cumulative deaths in under 5 days.  

Screen Shot 2020-03-29 at 4.26.12 PM

As a preliminary, yes, I am a lawyer who now makes a living writing and talking primarily about sports. No, I am not an epidemiologist and I’m not a doctor either. So if you don’t want to read any further or want to denigrate my opinions for that reason, you’re certainly entitled to that perspective.

When the coronavirus outbreak started, I was messing around with the data and making a few posts about SIR models.  But in retrospect I don’t think I should have done this.  Because I’m not an epidemiologist nor a medical doctor (but I am a doctor!).  But I’m way closer to someone who should be commenting on this than a former lawyer who is a below average sports writer. 

That being said, I’m not denigrating Travis’ opinions because he isn’t an epidemiologist nor a medical doctor; I’m denigrating them because they are shitty opinions that aren’t based on reality or data or facts.  

But for the rest of you, let me cite my primary source here at the beginning, I am using this website which has been fantastic at providing factually accurate and up to the date infections around the world.

This site he cites seems to be pretty good.

For those of you who have watched, read or listened to my opinions over the years you know that I love to devour data and I’m not afraid to have opinions that run contrary to the masses. Sometimes those opinions focus on sports, other times they deal with politics, media, or business. When I become fascinated by a subject, I can’t stop thinking about it, I want to consume all the information I can about a particular subject.

When those opinions are about sports, politics, the media, or business, no matter how bad (and believe me they are bad opinionsSo bad.)  But these opinions about a global pandemic can cause actual harm.  For some reason, Mr. Travis has about 650+ thousand twitter followers.  That’s enough people who read his stuff to do some real damage.  

And for the past month I have been absolutely fascinated by the coronavirus outbreak around the world. I’ve spent hundreds, if not thousands, of hours reading about this virus and studying the public data arising from outbreaks in countries all over the world.

Many nights in the past several weeks, especially since sports has been shut down, I have slept three or four hours because I haven’t been able to stop devouring information before waking up at 4:30 in the morning to do my national sports talk radio show. And for those of you who have listened, watched or read what I’ve been saying you know that I’ve been more optimistic about America’s ability to withstand this viral outbreak than most people have.

Optimism is a great thing.  I just don’t know how anyone can look at the data that we are seeing now, or on March 18th when he wrote this, and be that optimistic.  

As a result I’ve been denigrated online by many media members with blue checkmarks by their names on Twitter for not being enough of an alarmist, for downplaying the threat of the coronavirus to our nation’s health. That hasn’t been my intent at all, my intent, as it always is, has simply been to share factual data with my audience. As I always say, you can disagree with the opinions I come to based on the underlying facts, but I struggle to ensure I get all my facts right, especially in cases such as these.

But Clay got all the facts wrong here.  

While most in the United States have been focusing on the United States outbreak I’ve paid quite a bit of attention to what’s happened in China, South Korea and Japan.

Why?

Because all three of these countries had coronavirus outbreaks before ours and all three had essentially ended the viral outbreaks in their respective countries before our outbreak really took root. That is, the factual evidence clearly established that when these countries attacked the spread of the virus, they succeeded.

This is correct.  But South Korea from the very beginning had much higher rates of testing and dealt with the real threat that it was much earlier and more seriously that the United States did.  What South Korea was doing to prevent the spread of the disease, the United States has not done.  

Interestingly, all three countries took different paths to end that outbreak — China, with the worst outbreak in the world to this point, was more draconian in its restrictions, while South Korea and Japan handled their outbreaks with much less significant disruptions to their economies and daily life.

But again, South Korea had robust testing in place whereas “The blundering lack of an effective testing program in the US is an unconscionable failure and has led (and will lead) to more transmission of COVID-19.” Comparing the United States to South Korea (or Japan or China) is not a very reasonable comparison.  

Certainly in the months and years ahead it will make a great deal of sense to study every country’s response in an effort to find out the best possible model to adopt for future pandemics, but as I write this China, South Korea, and Japan have a total of 126 new infections today.

This means all three countries have effectively ended the viral outbreaks they were combating. (It’s certainly possible the virus could re-emerge in the future, but right now it seems to be beaten).

He is right that all of those countries have done a very good job “flattening the curve”.  

Now let’s pivot to our country.

Yesterday the United States saw our highest number of new infections to date, adding 1748 new cases. (It’s important to note that daily “new” cases doesn’t mean new in the context of they just happened. “New cases” are the result of infections that generally occurred five or more days in the past.)  As I write this we are on track to exceed yesterday’s number of daily new infections again today and probably for the next several days as well. (If you watch the White House press briefings and listen to Dr. Birx –who has been PHENOMENAL in speaking to the media, alongside of Dr. Fauci — she forecast this occurrence several days ago. Letting astute listeners know that as the testing ramped up in a big way in this country the number of infections would surge as well.)

While Dr. Birx has received some praise, she has also been substantially criticized. Harvard Epidemiologist Marc Lipsitch for instance described Birx’s takes like this: 

Lipsitch called Birx’s characterization of the current coronavirus situation in the U.S. “rosy” and even “deceptive.”

But, and this is key, despite the number of new daily infections that doesn’t mean the virus is continuing to spread at this rapid of a rate, it means that we are catching up to the infections that have already occurred in the days and weeks prior to our aggressive action. That is, since the average incubation for this virus is five days, pretty much everyone testing positive today got the virus before the social distancing and quarantines began in substantial numbers in this country.

He’s not entirely wrong here.  It was possible that the rate would slow.  But given that there are still huge areas of the country that weren’t doing social distancing then (and some areas STILL aren’t doing it!), there was no reason to believe that the rates would go down.  

Right now the fear porn purveyors — all too many of whom are in the media — are spinning these infection numbers as hard as they can to terrify people and convince them we are about to descend into abject despair. That our efforts to combat the virus are failing. But the number of daily new infections is a rearguard action, it’s measuring where we were several days ago in the fight against the virus, not where we are today.

He really needs to cite specific examples here of where the media is “spinning these infection” numbers.  I’d love to see what he was talking about.  “Our efforts to combat the virus” do seem to be too little too late in a bunch of places.  A lot of states in the US didn’t put in stay at home orders until after Travis wrote this article and some states STILL haven’t made stay at home orders (i.e. Florida).  So I’m not sure why anyone thought that the rates of infections would slow.  

And I think that means most of the people panicking in this country right now are missing the signal amid the noise.

Peak new daily infection rates in this country are closer than most think and, and this is significant, hitting a daily new infection peak is a very good thing because it signals we are moving to the backside of our outbreak.

This claim isn’t supported at all by any thing available right now let alone on March 18th when this was written.  I have no idea where he got this idea from.  

And I think we’re going to hit peak new daily infections very soon.

So at the time we had had two days in a row (March 16 and 17) with 22 and 23 deaths, respectively.  On March 18, there were only 10 deaths.  But the overall trend in deaths and confirmed cases was growing right on track with exponential growth and the very next day, March 19, there were 82 deaths recorded.  

Travis has since been proved monumentally wrong, with each of the last 9 days having more deaths than the previous one including March 28 when there were 445 deaths, the most in a day yet.  (Source: Johns Hopkins)

How soon?

Well, the data in other countries suggests that our peak daily infection rate is likely to come next week.

Again, it makes no sense to compare the United States to other countries that had testing in place very early and took the threat seriously from the beginning.  On March 18, we were already on a different than South Korea and Japan.  On March 18, out exponential adventure was just beginning.  Japan and South Korea had already flattened.  Here is what those countries looked like on March 18.  So it’s really hard to defend these statements given the information that he had at the time.  

Screen Shot 2020-03-29 at 5.21.28 PM

Screen Shot 2020-03-29 at 5.20.47 PM

Since then, he’s been even wrongerer: 

Screen Shot 2020-03-29 at 5.24.56 PMScreen Shot 2020-03-29 at 5.24.48 PM

(Source: Johns Hopkins)

How can I project this?

By looking at what already happened in China, Japan and South Korea.

I mean, again, this makes no sense.  Notably South Korea had testing WAY, WAY, WAY ahead of the US.  South Korea was doing proportionally about 6X the amount of testing the US was by the middle of March.  And in February the US was basically doing no testing: 

Screen Shot 2020-03-29 at 5.33.34 PM

As I said earlier in this article, those countries have a total of 126 combined new infections today.

That’s because once their infection rate began to decline it do incredibly rapidly, the inverse of the rapid rise.

In other words this virus ramps up rapidly, but it also declines rapidly.

There were  several counter-examples to this even at the time including Iran and Italy.  

Okay, some of you are saying, but those Asian countries did X, Y, and Z and we aren’t doing X, Y, and Z!

Yup.  So why bother even saying it! 

Well, let’s leave Asia behind and move to Europe, in particular to Italy, which has been the fear porn proxy for  most American social media users. “ITALY, ITALY, ITALY THEY WAIL! Look at the curve in Italy! We match it perfectly we are never going to be able to survive!”

Yes, Italy has suffered a substantial loss of life — 2503 as I write this afternoon — but as a result of this loss of life Italy has undertaken drastic measures to fight the virus. And, significantly, these drastic measures work and they appear to work rapidly against the virus.

Italy recorded its 10,000th death on March 28.  They have begun to flatten their curve.  The United States is about 8 or 9 days behind Italy and it’s true we haven’t seen the same rates of growth in the death rate at Italy, but we also haven’t flattened out curve at all.  Additionally, we are on pace to pass Italy in deaths in around 4 days. 

Screen Shot 2020-03-29 at 4.26.12 PM

On March 14th Italy hit 3,497 daily new infections. On March 15th Italy hit 3,590 new infections, the viral peak for daily new infections so far in their country. Then came 3,233 new infections on March 16th and 3,526 on March 17th. Now it’s still possible, of course, that the number of new daily infections could pop above 3,590, the present high set on March 15th — update they did on 3/18 after I clicked publish — but even with a still vacillating total infection number it seems pretty clear that at a minimum Italy has hit an infection plateau. (The number of daily deaths also peaked in Italy on March 15th at 368 — a new peak death rate came on March 18th after publication — and has declined since then.)

Does it seem “pretty clear that at a minimum Italy has hit an infection plateau” based on the plot below?  I have no idea how anyone could make that statement looking at the plot below.  

Screen Shot 2020-03-29 at 5.45.23 PM

And in fact, Travis was wrong, again.  Daily deaths in Italy have continued to grow including 919 and 889 on March 27 and 28, respectively.  Not exactly a plateau.  

Screen Shot 2020-03-29 at 5.45.48 PM

Far from being an example of exponential growth run wild, Italy stopped the coronavirus’s growth of daily new infections in its country in the space of a week. (This is why so many of these viral epidemiologist studies that go viral on social media are worthless. All of them presume nothing changes. If you want a fascinating read about a man who saw all this before the Chinese outbreak ended, go read the opinion of Micheal Levitt, a biophysicist who won the Nobel prize in chemistry.

In the second paragraph so this article, it states “Although his specialty is not in epidemiology” and the article should have ended there.  This exact phenomenon of armchair epidemiology is exactly why I stopped doing any analysis (other than simple graphs) of coronavirus.  This guy is clearly really smart.  But he’s not an epidemiologist.  He’s a guy who knows a lot about Chemistry.  I’m not saying that this guy couldn’t do epidemiology, but this “analysis” that he did seems relatively ad hoc.  There are plenty of other better places to get information on this from actual epidemiologists who have been studying this exact stuff for decades.  So I’ll continue to play around with the data privately, but I won’t be posting any analysis because, it was pointed out to me, that misinformation and weak analysis can be very dangerous.  When someone screws up an analysis on sports data, it doesn’t put anyone in danger.  

Here’s the essence of Levitt’s analysis from that article: “The rate of infection of the virus in the Hubei province increased by 30% each day — that is a scary statistic. I am not an influenza expert but I can analyze numbers and that is exponential growth.”

Had the growth continued at that rate, the whole world would have become infected within 90 days. But as Levitt continued to process the numbers, the pattern changed. On February 1, when he first looked at the statistics, Hubei Province had 1,800 new cases a day. By February 6, that number had reached 4,700 new cases a day.

But on February 7, something changed. “The number of new infections started to drop linearly and did not stop,” Levitt said. “A week later, the same happened with the number of the deaths. This dramatic change in the curve marked the median point and enabled better prediction of when the pandemic will end. Based on that, I concluded that the situation in all of China will improve within two weeks. And, indeed, now there are very few new infection cases.”  

Yeah.  But again, China basically shut everything down.  It took a lot for them to stop the spread.  And the US hasn’t done nearly enough.  

Back to Italy, the data now reflects that from this point forward their infections will begin to descend, just like happened in China. And if Italy’s infection rate descends like they did in China, Japan and South Korea that will be a rapid descent which will allow a rapid return to normal life.

There was actually no evidence this would happen then, and it definitely didn’t happen.  

Screen Shot 2020-03-29 at 5.45.48 PM

Now Italy appears to be further along in its viral outbreak than either France, Spain, Germany, or England are, but the lessons of Italy appear to reinforce the lessons of China, Japan and South Korea. Indeed, both France and Spain also appear to be close to hitting their peak numbers of new daily infections also. (By the way, the most interesting data point I have seen is from Germany. Somehow Germany has 11,973 infections and only 28 deaths. That’s a death rate for infected patients of .23%. What are they doing better than everyone else? Because that rate of death is very similar to the flu.)

Once again, Travis was wrong.  The black line is where Travis claimed that France and Spain were “close to hitting their peak numbers”.  Not sure how he reach that conclusion, but he was…….not correct.  

Screen Shot 2020-03-29 at 6.12.53 PM

It IS interesting that Germany has a lower death rate.  But there are a ton of possible reasons for that.  

Which is why if you extrapolate the data from China, Japan, South Korea, Italy, France and Spain to the United States then it seems highly likely that our rate of new daily infection will peak sometime late next week.

Firsr off, extrapolation is dangerous in general. Extrapolation by a below average sports writer is really, really dangerous.  Second, if he had actually extrapolated using exponential growth, he would have been more or less correct on Spain and France.  

From that point forward we will be on the backside of this particular outbreak and our daily infection rate will decline rapidly.

Nope. 

Screen Shot 2020-03-29 at 4.26.12 PM

That is, I believe rates of daily new infection in America will peak and then begin to drop precipitously starting at the end of next week. That doesn’t mean that new hotspots might not emerge around the United States or that our fight against the coronavirus is over, but it does means there’s a very good chance the worst of our outbreak will have passed by next week.

Nope. 

Screen Shot 2020-03-29 at 4.26.12 PM

We will, in the words of Dr. Fauci, have flattened the curve.

As mentioned before, Dr. Fauci just today estimated there would be 100,000 – 200,000 deaths in the US.  

What does that mean for America?

Well, since this is primarily a sports site it means our sports may well return faster than we thought. Already the KBA and CBA, the two pro basketball associations in Korea and China, are preparing to start back up their leagues in those countries.

Sites that are “primarily sports” are probably the least equipped to do arm chair epidemiology.  I have a Ph.D. in statistics and did a post-doc in a school of public health, and even I’m refraining from doing any (more) armchair epi.  And I’m way closer to qualified to do it.  

But, much more significantly, it also means our economy may well bounce back more rapidly than many fear. Hopefully by early to mid-April we can begin to embrace a bit more normalcy in our lives once more.

Nope. 

Screen Shot 2020-03-29 at 4.26.12 PM

And in what might be the most lasting legacy of this outbreak, hopefully we will have put in place a series of pandemic systems to allow us to respond more rapidly to more significant and deadly outbreaks that might arise in the future.

At this point I would like to remind people that Donald Trump shut the federal pandemic office in 2018.

The good news is this virus is (probably) not going to kill you and we (likely) only have about a week until we hit the viral peak of new daily infections in America. Loss of life will be in the thousands, at most, and not the tens of thousands or the hundreds of thousands or the millions as the most terrifying of these forecasts have suggested. That doesn’t mean you should stop social distancing and staying at home if you’re ill, by the way, but it does mean that this containment is likely going to work very, very well.

We are on pace to hit 10000 in 6 days.  This is such an egregiously incorrect statement to make a week ago when this article was written.  Not a single expert I have read would have agreed with Travis last week and he’s going to be proven wrong in about 6 days (unless somehow there is a miracle).  

Screen Shot 2020-03-29 at 4.26.12 PM

 

What should you do if you believe my forecast is accurate? Buy stocks. I believe Wall Street has baked in far worse expectations than what the reality of this coronavirus will represent.

So if you like getting your epidemiology advice from a shitty sports writer, why not also get investment advice!  The stock market has gone up since this article was written after the passage of the $2 trillion bail out.  It will be really interesting to see what happens to the stock market in the next year.  And Travis may well be right that you should be buying stocks.  But what’s that saying about a dead clock?  

That’s why I spent all morning buying stocks myself. Yes, I’m putting my money where my mouth is.

The stock market could skyrocket in the next 6 months, and Travis would make a bunch of money.  In fact, I hope the market goes up and I wish him all the best in his investments.  But he’s still be wrong about his coronavirus projections.  

And, by the way, if you hate me and you believe I’m totally wrong in my forecast, you’re entitled to that opinion as well. Indeed, if I personally end up dying of the coronavirus, you have my full permission to make as much fun of me as possible on social media.

To make fun of someone who died of coronavirus, you’d have to be an absolutely shitty person.  I do very much dislike Clay Travis, but I don’t want him to die and I certainly won’t make fun of him if he does die.  That is unthinkably shitty.  

“Writer who said no one would die of COVID 19 dies of COVID 19” is a hell of a headline.

I’d click on that link.

And like most of you I’d be too busy to actually read the article, but I would make a witty and sarcastic comment and share it with all my followers, ensuring it was one of the top trending topics for the day. So have at it.)

The reality is, spoiler alert, we’re all going to die.

But the evidence from around the world suggests that for the vast, vast majority of us, it won’t be from this virus.

This I agree with.  We are all going to die.  And for the vast, vast majority of us, coronavirus won’t kill us.  But it’s possible that most of us will personally know someone who dies from corona.  

That is all for now.  Stay safe.  Stay home.  Here is what to do if you get sick. And please, please, please, don’t get your pandemic information from dipshit sports writers.  (Or dipshit constitutional scholars).

Cheers. 

Coronavirus and Exponential Growth

I’m continuing to marvel at just how well an exponential model fits the number of confirmed cases of coronavirus.  The plot below shows the real number of confirmed cases (on a log10 scale) for Italy, US, Spain, and France to the left of the black vertical line.  It’s nearly perfectly linear!  To the right of the vertical black line we have the projections should this exponential growth continue.  I’ve also provided some basic output from the models for each country, including estimated regression coefficients, R^2 (i.e. how close to exponential is it), and when each country is expect to hit certain number of cases if the current trend were to continue.

Screen Shot 2020-03-18 at 3.07.55 AM

Here is what is looks like on the raw scale:

Screen Shot 2020-03-18 at 3.46.22 AM

All models are simple linear regression with log(confirmed cases) as the response and time since March 1 as a predictor (i.e. March 1 is 1, March 2 is 2, etc.).  Projections are based on the assumption that exponential growth continues as current rates (hopefully this turns out to be a bad assumption!)

United States

Current number of confirmed cases (as of March 17): 6,421

beta = .283 (.275, .290)

growth rate = exp(.283) = 1.327 (1.317, 1.336)

R^2 = .9978 (!!! That’s so exponential!!!!)

Expected to be at X cases:

X = 10,000: March 19

X = 100,000: March 27

X = 1,000,000: April 4

X = 10,000,000: April 13

France

Current number of confirmed cases (as of March 17): 7,699

beta = .257 (.239, .275)

growth rate = exp(.257) = 1.293 (1.270, 1.316)

R^2 = .9846

Expected to be at X cases:

X = 10,000: March 18

X = 100,000: March 27

X = 1,000,000: April 5

X = 10,000,000: April 14

Spain

Current number of confirmed cases (as of March 17): 11,748

beta = .322 (.308, .337)

growth rate = exp(.322) = 1.380 (1.360, 1.401)

R^2 = .9931

Expected (or actual date) to be at X cases:

X = 10,000: March 17 (Actual Date)

X = 100,000: March 24

X = 1,000,000: March 31

X = 10,000,000: April 7

Italy

Current number of confirmed cases (as of March 17): 31,506

beta = .186 (.178, .194)

growth rate = exp(.186) = 1.204 (1.195, 1.214)

R^2 = .9939

Expected to be at X cases:

X = 10,000: March 10 (Actual Date)

X = 100,000: March 23

X = 1,000,000: April 4

X = 10,000,000: April 17

________________________________________________________________

Code can be found here.

Follow me coding on Twitch here.

________________________________________________________________

Cheers!

 

Holy Shit. Holy Shit. Holy Shit. Coronavirus.

Ok.  So Coronavirus.  On February 1, I tweeted something to the effect of “Why are we panicking about this, the seasonal flu kills X number of people per year”.  Whoops!  Let’s look at the data and see just how  wrong I was and how much of a dip shit I am.

I went and got some data from a github page associated with Johns Hopkins University.  Below is a plot of the number of confirmed cases in the United States versus date.  Since March 1, we’ve gone from basically nothing to around 3500 cases.  (CONFIRMED cases.  So who knows how many actual cases there are….)

Screen Shot 2020-03-16 at 12.45.53 AM

I wanted to see if this was actually following exponential growth.  So I took a log of the total number of cases and plotting that vs date.  I get the following plot:

Screen Shot 2020-03-16 at 12.46.08 AM

From January through the end of February there is no indication of exponential growth.  Then on February 29, this starts to look VERY linear.  On the log scale.  Indicating exponential growth.

If we focus only on the period from February 29 through today the plot looks like this:

Screen Shot 2020-03-16 at 12.46.02 AM

And here is what it looks like on the log scale:

Screen Shot 2020-03-16 at 12.46.17 AM

Super linear.

So let’s fit a simple linear regression model with log(cases) as response and days as the predictor.  That model gives us an intercept of 2.574184 and a slope of 0.340881.  This means that the predicted number of cases on a given day is exp(2.57)*exp(.340881*day_number).  Computing the numbers gives us exp(2.57) = 13.12061 and exp(.340881) = 1.406186.  If you think about this in terms of money, this is like starting with about $13 in your bank account on day 1.  And you get 40.6% interest PER DAY.  After a week you have a little over $142 in your account.  In two weeks you are now at a bit over $1,550.  By the time a month has gone by you are now sitting on $362439.9.  After 45 days, you can retire a very wealthy person with $60,238,907.  By the end of 60 DAYS (previous version said MONTHS.  Thanks Hammers for pointing this out), you have over 10 billion dollars.

 

 

Here is what this looks like in a plot for the first 30 days:

 

Screen Shot 2020-03-16 at 12.48.22 AM

And then for the first 60 days. The blue line is the predicted mean and the red lines are prediction intervals.

Screen Shot 2020-03-16 at 12.48.05 AM

 

So obviously, this is extrapolation once we get to day 60 since there aren’t even 10 billion people on earth and the curve at some point has to level off once it’s infected enough people. But what I wanted to show here is just how fast this thing can get out of control given out current path.  It’s easy to ignore an exponential curve in the beginning.  I mean just look at the data.  On March 1 there were only 6 confirmed cases in the US.  Fifteen days later there were only 772.  That’s still basically nothing though in a country of 330 million people.  However, while the first 15 days of March only saw an increase of 766 cases, given the current exponential growth rate of 40% per day, we’ll be over a million cases by April 1.  And by tax day, we would be at 119 million.  Now imagine a 2% mortality rate! These numbers are staggering.

Staggering enough to cancel the NBA, NHL, major golf tournaments, and MLB opening day.  Staggering enough for Chicago to cancel their St. Patricks Day parade.  Staggering enough for colleges and universities to send their students home.  Staggering enough for California, Ohio, Illinois, Massachusetts, and Washington to shut down bars and restaurants.

So the reason we are social distancing is to make that 40% growth rate per day drop.  We have to make that go down.

____________________________________________________________________________________________

As always you can find my code on github.

Code for this post is here.

Also check out some SIR modeling that I did last night here (based on this tutorial).

And finally watch me code on Twitch: https://www.twitch.tv/statsinthewild.

Cheers.

Lisa Goldberg and the Hot Hand in Basketball

So Lisa Goldberg gave a talk at Loyola Chicago on Monday afternoon about the hot hand in basketball where she presented a paper where she shows that there is no statistical evidence that the hot hand exists.  While we didn’t film her talk, this numerphile video is basically the same as what she presented on Monday:

 

The basic argument in the paper is that the probability of making the next shot given you made the previous shot is not statistically different than the probability of making the next shot given you missed the previous shot.  So I have a lot of thoughts on this, but first let’s talk about the really interesting history of this topic.

In 1985, what can very accurately be called the seminal paper from Gilovich, Vallone, and Tversky, which defined the hot hand and concluded there there was no statistical evidence for it.  For years this was considered orthodoxy in most of the sports statistics world, even though almost everyone in basketball feels that the effect is real.

Years after that paper was original published, in 2015, Jason Miller and Adam Sanjuro published a paper where they pointed out that the way that Gilovich et. al. (1985) went about looking for the hot hand was slightly flawed in that their were unaccounted for biases that are introduced in the streaks that were not accounted for when the streak length that is considered is small.

What they pointed out in this paper is really, really interesting.  So let me talk about it for a second.  Let’s say you have a finite string of coin flips from a fair coin.  Call 0 a tails an 1 a heads.   So you might have a string of flips like this: 001101001.  Now, for a fixed number of flips, what is the proportion of 1’s occurring after a 1 in a finite sequence?  It’s 50% right?  Right?!?!  It has to be 50%!

Turns out, it’s not 50%.  Andrew Gelman has an excellent explanation of the issue here.

Following the Miller and Sanjuro paper in 2017, Daks, Desai, and Goldberg published a paper where they updated Gilovich’s original paper using permutation testing to account for the bias that Miller and Sanjuro pointed out in their paper.  In Daks et. al.  they find that even when using the permutation tests and accounting for the bias, they still find no evidence of a hot hand.

This paper led to that numerphile video about the hot hand up at the beginning of this post, though Miller and Sanjuro don’t agree with the findings in that paper.

So what are my thoughts on this?  I don’t think that any of these paper are looking for the hot hand in the correct way.  I think you need to look at building some sort of mixture model or a hidden Markov model with two states representing hot and regular.  Once you fit that model you can look to see if there is a significant difference between the states and compare this model to a one state model and see which one gives you a better fit.  I’ve written about this type of thing before in baseball with Rob Arthur.

I also think, specifically in basketball, you absolutely cannot be viewing the data as a string of 0’s and 1’s.  If you only are looking at makes and misses you are ignoring so many other factors such as shot distance, game situation, distance to nearest defender, etc. that affect the probability of making a shot that need to be controlled for.  What’s nice about studying the hot hand idea in baseball for pitchers is that there are relatively few factors that need to be controlled for when looking at pitchers (runners on base, score, pitch type, etc.).  And it’s also easier to look at pitchers because there are no opposing players who are trying to hinder the pitchers ability to do what they are doing and the pitch is always coming from the same distance away (This is why I think bowling would be a nice place to look for the hot hand.  Someone get me the data!)  In basketball, everything is different from shot to shot.

So am I convinced that the hot hand exists?  My answer is really, truly, I don’t know.  I haven’t seen anything that convinces me it does or does not exists.  And also, it depends.  It depends on exactly how you define the hot hand.

Anyway……..

After Dr. Goldberg’s talk, I was lucky enough to get invited to dinner with her because, dammit, I’m important……….(Also, at the dinner were John and Sue Dewan and Lisa Goldberg’s Daughter)

While at dinner our department head introduced me as the director of our Data Science program and Dr. Goldberg asked me this following question: What are the three things you want students to take away from your program.

I stalled for a bit, and then just straight up said I’m going to avoid answering that, but I’ll tell you want to I think a Data Science student should know how to do.

Towards the end of dinner, I just had to ask Dr. Goldberg the same question she asked me.  What did she think the three most important things were (I hope i remember these at least somewhat accurately):

  1.  Statistics does better with more data and but more data is harder for computers.  Dealing with this issue is fundamental to doing data science.
  2. Remove your personal biases from the analysis.
  3. Design your experiment before hand.

Pretty good answers.  But I’ve now been thinking about this question for a few days now.  So what are the three most important things that I want our Data Science students to know?

After thinking about this for a while here are my answers (in no particular order):

  1. Always try to do the thing you are trying to do.  (For example, if you just want to build the best classifier model, you aren’t that interested in interpreting parameters.)
  2. Data Science consists of two major parts: managing the data and analyzing the data.  Neither part is more or less important.
  3. You can manipulate statistics to say many different things.  Be ethical.  (Present data to others the way you want others would present data to you.)

And of course, I want all of my students to know that the answer to virtually every single question in statistics is “It Depends”.

Ok.  Good night.

Cheers.

 

NFL Super Bowl Squares Distribution

Here is the distribution of the last digits of the final scores of NFL games all-time:

SuperBowlSquares2019

And here is the distribution for recent games, which I believe I defined as since 2000.

SuperBowlSquares2019_Recent

Go bears.

Cheers.

What time does the Super Bowl Start?

6:30pm Eastern Time

Statsinthewild Official Super Bowl Prediction

Prediction: 49ers , 26-25

Spread: 49ers +1

OU: Under 52.5

Just a regular NFL playoff game…..

This game is bonkers…………

KCvsHOU.png

NFL Playoff Predictions – Divisional Round

Divisional Round

Ravens (51.84%) over Titans, 23-22

Chiefs (75.8%) over Texans, 33-18

Packers (52.41%) over Seahawks, 26-24

49ers (60.57%) over Eagles, 26-21

The tentative syllabus for my “radical” redesign of Intro Stat

Here is my tentative syllabus for my radical redesign.  The structure of the course follows roughly the 9 goals put forth in the GAISE report.  Please comment.

Oh also this: I’m getting rid of slides.  I’ll have a marker for board work and a computer to do the analysis and simulations.  But no slides!

  • Week 1-1: Intro class.  Go over syllabus.  Discuss the 9 goals put forth in the GAISE report.  Talk about ethics (IRB, informed consent, etc.)
  • Week 1-2: Software: Introduction to R.  Syntax.  Getting data in/out of R.  Basic structures (e.g. data.frames, matrices, vectors, etc.), etc. Reproducible documents (i.e. R Markdown)

 

  • Week 2-1: Critical consumers: Assign students to read this paper over the weekend.  Spend a full day of class discussing pros and cons.
  • Week 2-2: Collecting data activity.  I am going to make rectangular cards whose length, width, area, labels, and colors have statistical properties that I design.  I’m going to hand them to the class and make them decide what questions we should ask and what we should measure.  We will come back to this data many times throughout the semester.

 

  • Week 3-1: Graphical Displays and Numerical Summaries:
    • Types of data
      • continuous
      • categorical
      • time-to-event data
    • Univariate summaries for continuous data:
      • mean
      • median
      • variance
      • IQR range
      • percentiles
    • Tables for categorical data
    • Univariate dataviz
      • histograms
      • boxplots
      • barplots
      • violin plots
      • maps!
  • Week 3-2: Graphical Displays and Numerical Summaries:
    • Bivariate summaries for continuous data
      • correlation
        • pearson
        • spearman
        • kendall contingency
      • simple linear regression
      • two-way tables
        • odds
        • odds ratio
    • Bivariate dataviz
      • scatter plots
      • mosaic plots
      • stacked bar plots
      • side by side boxplots
      • side by side histograms

(Example data: Hospital General Information.csv https://data.medicare.gov/data/hospital-compare)

 

  • Week 4-1: Variability:
    • Intro to probability
    • Describing Distributions (shape, center, variability, outliers)
      • Expectation and Variance
  • Week 4-2: Variability
    • Bayes Theorem
    • Specific Distributions
      • normal
      • binomial

 

  • Week 5-1: Variability
    • Sampling Distributions
      • Lot’s of simulations!
      • Emphasize the difference between data distribution and sampling distribution
    • Bootstrapping
  • Week 5-2: Variability
    • Central limit theorem (CLT)
      • Lot’s of simulations

 

  • Week 6-1: Randomness
    • Sampling
      • Discuss famous cases where sampling was poorly done (e.g. Dewey defeats Truman)
      • Talk about the Census!
      • Selection bias
      • Discuss sampling strategies (probability vs probability sampling)
        • SRS
        • Stratified
        • Cluster
      • Discuss population vs sample
  • Week 6-2: Statistical Models:
    • Simpsons paradox
    • Very simple models (i.e. X ~ N(mu, sigma))
    • Simple Linear Regression (no inference…..yet)

 

  • Week 7-1: Exam 1
  • Week 7-2: Statistical Inference
    • What is statistical inference?
    • Ideas of point and interval estimation
      • Explain correct interpretation of confidence intervals!
    • Idea of hypothesis testing
      • Type I and Type II errors
    • Multiple testing problems (FWER and FDR)

 

  • Week 8-1: Statistical Inference
    • Hypothesis testing of one mean.
      • parametric tests (Z and t-test)
      • non-parametric test (sign test, permutation test)
  • Week 8-2: Statistical Inference
    • Interval estimation of one mean
      • parametric (Z and t-interval)
      • non-parametric (bootstrap intervals)

 

  • Weel 9-1: Statistical Inference
    • Two dependent samples hypothesis testing
      • parametric (Z and t-test)
      • non-parametric (Wilcoxon signed rank test, permutation test)
    • Interval estimation
      • parametric (Z and t-intervals)
      • non-parametric (bootstrap intervals)
  • Week 9-2: Statistical Inference
    • Two independent samples hypothesis testing
      • parametric (Z and t-test, Welch’s test, pooled variance)
      • non-parametric (Wilcoxon Rank Sum/Mann Whitney U, permutation test)
    • Interval Estimation
      • parametric (Z and t-intervals)
      • non-parametric (bootstrap intervals)

 

  • Week 10-1: Statistical Inference
    • Simple Linear Regression
      • parametric (t-tests)
  • Week 10-2: Statistical Inference
    • k-sample problems
      • parametric (ANOVA) (It’s just regression with categorical predictors!!!!!)
      • non-parametric (Kruskal-Wallis)

 

  • Week 11-1: Statistical Inference/Statistical Models
    • Multipel Regression
  • Week 11-2: Statistical Inference/Statistical Models
    • Multiple Regression

 

  • Week 12-1: Statistical Inference
    • Categorical Data
      • Inference for proportions
        • parametric (using CLT)
        • non-parametric (permutation test)
      • Chi-square tests
        • parametric (using CLT)
        • non-parametric (permutation test)
  • Week 12-2: Statistical Inference/Statistical Models
    • Simple Logistic Regression

 

  • Week 13-1: Statistical Models
    • Survival Analysis
      • Motivate with example why we can’t just use mortality rates (in 100 years everyone is dead!)
      • Censoring
      • Truncation
      • K-M Curves (comparing two K-M curves)
  • Week 13-2: Statistical Inference
    • Intro to Missing Data
      • Examples where ignoring missing data is bad
      • Why is the data missing?
      • Missingness mechanisms
      • Really simple multiple imputation?

 

  • Week 14-1: Statistical Inference
    • Introduction to Bayesian statistics
      • Motivate Why?
      • Define prior, likelihood, posterior
      • Estimating a proportion example
      • Credible Intervals
  • Week 14-2: Statistical Inference
    • Introduction to Bayesian statistics (continued)
      • Bayesian Hypothesis testing
      • Bayes Factor

 

  • Week 15-1: Case study
    • Case study from start to finish.
    • We are going to start with this data set and analyze it from start to finish.
    • We are going to do it “Data Fest Style”: There is no specific question.  We are just looking for interesting stories to tell from the data.
  • Week 15-2: Case Study
    • Case study continued

 

Week 16: Final Exam

 

 

Loading cart ...