Author Archives: statsinthewild
On March 26 Slate posted an article by Tim Requarth entitled “Please, Let’s Stop the Epidemic of Armchair Epidemiology” (which I should mention, convinced me to stop posting my COVID-19 analysis on this very blog). One of the posts mentioned in the Slate piece was this piece by Abe Stanway on March 14 called “Real Time COVID-19 Tracking”. There has since been some back and forth between Requarth and Stanway on twitter, and I decided to take a look for myself at Stanway’s blog post. The post has been reproduced below with my comments in bold. (tl;dr This isn’t a very good article. It’s extremely ad hoc and riddled with statistical shortcomings).
Note: Two things: 1) I don’t want people to stop blogging. I’m not trying to “dunk” on anyone here. (I save that for Clay Travis). Everyone should write more, including Abe Stanway (who seems like a really, really smart guy)! But this topic is serious, and there is so much misinformation out there. Please, please, please, get your information from the experts. I know it may seem like everyone out there can do this stuff, but it’s really complicated. So please, leave it to the experts.
2) Again the title data scientist rears it’s head. The author of the article calls himself a data scientist, but he seems to be lacking some very basic statistical skills (based only on what I’m seeing in this article, so I could be completely wrong). I’m sure Abe Stanway is a very talented (I mean this is straight up really impressive), but I think what I would define as a data scientist and how he interprets the title are very different. But I suppose that is a discussion for a different blog post.
Real Time COVID-19 Tracking
EpiQuery is a realtime “influenza-like illness” (ILI) tracker. It’s updated on a daily basis. The system was set up in 2016 to track emergency room visits with chief complaints that mention flu, fever, and sore throat. These are not confirmed nor denied to be influenza, nor any other disease. Meanwhile, the US government has severely bungled the COVID-19 test rollout, and we’ve only tested ~15k people total so far. In the absence of widespread testing, we need to rely in EpiQuery (and ILINet, the federal CDC version covering all 50 states) to understand the likely growth of the COVID-19 outbreak.
I agree that the US government has bungled the COVID-19 test rollout.
I wouldn’t use the word NEED when saying “need to rely on EpiQuery”. At this point in the article I’m already skeptical. I think the author is speaking with too much certainty already. On March 14, even now, there is so much we don’t know about this disease. We still seem to be learning new symptoms.
I’m going to nit pick here a bit, but I wouldn’t look at the day with the single highest number of cases and call that the peak of flu season. That’s likely noise. It would be better to smooth the data using a moving average possibly and then look for the peak of the moving average.
The chart above shows daily ILI ER visits in NYC. The seasonal peaks are highlighted in pink. We see a very seasonal pattern in the data — every year, there’s generally one major peak in December or January, followed by a gradual decrease in ILI cases. This is the annual flu season, visualized.
This is the same data, but zoomed in on 2020. We see our normal seasonal peak on January 29th, and then we see a marked anomaly starting at around March 1st. The anomaly displays a peak of equal magnitude to the regular seasonal peak.
There does appear to be something going on here, but what I would like to see is some of analysis that this is a real signal and not just noise in the data. It’s not enough to just say it looks like something is happening.
A double peak flu season appears to be exceedingly unlikely, as it has never occurred in any historical flu season since the start of this data (at least in NYC), nor has it ever occurred with a slope of this magnitude. Therefore, I believe a large percentage of this peak indicates COVID-19 ER visits in NYC, and not nominal flu visits.
This is where this all starts to fall apart. He uses the term “exceedingly unlikely” to describe the probability that this is just a normal flu. What is that based on? 4 flu seasons in NYC. That first sentence starts out making a really strong claim and it just gets weaker and weaker……”it has never occurred in any historical flu season”…….WOW that seems convincing………..”since the start of this data”………….Oh, well that’s only 4 years……….”(at least in NYC)”…………….and it’s only 4 years of one city. So to say that this is “exceedingly unlikely” to be just a normal flu isn’t supported by anything. We have no idea what other flu seasons have looked like based on this data.
Also, are double peaked flu seasons rare? Nope. According to William Schaffner, MD, an infectious disease specialist at Vanderbilt University in Nashville:
We may well have, for the second year in a row — unprecedented — a double-barreled influenza season
The 2018-2019 season has been unusual, though, because the flu came in two waves: one that peaked at the end of December, and a second that peaked in early March. The two peaks were caused by two different strains of the flu virus, and the protection given by vaccination early in the season may have waned by the time the second strain appeared.
So literally, just last flu season we saw a double peak season. So, again, to describe the possibility that this peak is just regular flu is “exceedingly unlikely” is just not a statement supported by any data.
As a side note, I wonder what other diseases are out there that cause influenza like illness (ILI) symptoms besides actual flu and Covid-19? I have no idea.
Another note, this peak could very well be COVID-19. But there are other possibilities to explain this peak that the author has basically ignored. The most notable being that he claims it can’t possibly be regular flu when literally last year there was a double peaked flu season where the second peak with in early March. So I’m not convinced that this is actually a COVID-19 spike, but it could be.
The fact that the peak starts at around March 1st, and the fact that this was also the date first confirmed case of COVID-19 in NYC, lends further evidence to support that this spike represents COVID-19 cases.
Yeah, but it could just be regular flu. The author is speaking with too much certainty.
The data above represents daily ER visits. This means that since March 1st, there have been 8,000 cases of ILI-based ER visits in NYC. Subtracting the nominal flu season data (~3,800 cases over this period, assuming a late season R0 of .95), that means there are likely a minimum of 4,200 COVID-19 cases in NYC as of March 12th.
Ok, this is as armchair-y as it gets. Where does the 3,800 number come from? Where does the R0 = 0.95 come from? Is that a widely accepted value? No work is shown how the author arrives at 4,200 COVID cases. The assumptions that the author is making at numerous and substantial. For starters, he is assuming that every single one of these “excess” cases is COVID-19 (again ignoring the fact that there was a 2 peaked flu season just last year). Perhaps the most head scratching part of this whole thing comes in the next graph.
This analysis should be considered a napkin sketch — a more detailed study could estimate the precise start date in NYC, knowing the R0 of COVID-19 is estimated to be 2.2¹ and working backwards to infer when when Patient 0 actually arrived based this parameter and the current curve in red.
He does claim that this should be considered a “napkin sketch”. So he is giving caveats. But then why even do this? Why spend the time on this? Is it worth it to spread “napkin sketch” information about a pandemic? I lean towards no.
He does give a citation for 2.2 R0 for COVID-19, which I appreciate.
The true number of COVID-19 cases in NYC is likely several times higher (given the fact that not all cases present to an ER, and ER cases that are not admitted are sent home without any proper quarantine protocols — aka they are sending people home in Ubers or subways), but I will refrain from speculating on an exact number until I find more data. However, assuming the exponential curve holds, the current case count as of March 14th is around 6,300. Despite the napkin math, this data indicates that NYC is currently adding around 1,000 ER admissions of COVID-19 per day and growing fast.
I think this is a loose usage of the term “exponential curve”. He doesn’t check at all if the curve is actually exponential. This has a formal mathematical meaning, and the informal use of the term “exponential” often means it’s just growing “really fast”.
Also, “adding 1000 ER admission per day” wouldn’t be exponential growth. That’s linear growth.
Finally, where does 6,300 come from? Is this an extrapolation? I don’t follow where that comes from?
BONUS (or, this is where it gets weird):
Below is a breakdown of the cases by neighborhood.
The epicenter appears to be somewhere in Queens.
This is a neighborhood called Corona. You just can’t make this shit up. Edit: yes, I’m fully aware Corona is *always* the highest density of ILI symptoms. This is likely due to the concentration of hospitals in the area. Regardless, this is a joke, and if you take it seriously, you should get out of the house more (once your isolation period is over, of course!)
If this is a joke, it’s hard to tell. And it’s and odd place to put a joke. The entire article is serious in tone, and then the author just throws this in there at the end and claims that this is a joke? I don’t know man. In a different context, maybe it would be more obvious that this is a joke. But it’s not obvious to me that it’s a joke.
Also, looking at raw cases means pretty much nothing. Gotta account for baseline population levels.
All data are available for analysis here. Additional data nationwide can be found on ILINet FluView. Thank you to Ben Hunt for discovering this trove of data, and to Dr. Alfred Illoreta of Mount Sinai and Dr. Ydo Wexler of Amperon for reviewing drafts of this post.
I recently wrote a blog post explaining just how full of shit Clay Travis is about Corona virus. On March 18, he wrote an article (which I won’t link to here because I’m not giving him clicks) where he said:
Loss of life will be in the thousands, at most, and not the tens of thousands or the hundreds of thousands or the millions as the most terrifying of these forecasts have suggested.
As of right now, according to Johns Hopkins there have been 9,619 deaths related to coronavirus with FOUR straight days of 1000+ deaths. So we should hit 10,000 today, which would make Clay Travis’, who let me remind you is wildly unqualified to make projections like this, unequivocally wrong. As a result he apologized to his readers for misleading them and said he would do better int he future.
JUST KIDDING! Of course, he didn’t do that. He tweeted this today:
Clay. Clay. Let me be very clear about this. I’ll speak slowly because you don’t seem very bright. The millions of deaths models were based on the assumption that we did NOTHING to stop the spread of the disease. But we did. We’ve been socially distancing, against your advice, mind you, for a few weeks here in Chicago. And a bunch of other places have been doing it too. Most of America in fact.
And remember, the current projection for number of deaths is between 100K and 240K…..IF we do everything nearly perfectly.
So. Clay. I know you truly believe that you are right. But you aren’t. You’re intentionally being a moron. Please stop.
He followed that tweet up with this gem:
Clay Travis should be the first one to get one, because the reported deaths are still growing exponentially (It’s possible that the growth rate of reported deaths is slowing down ever so slightly, which is good but definitely not a trend yet). And that’s REPORTED deaths, which are almost certainly an undercount of the total deaths from Covid-19. Just take a look at the reported deaths graph below on the log scale. We are going to go crashing through that limit, literally TODAY, where Dr. Travis, M.D. said we’d never get to:
The moral of the story, as always, is that Clay Travis is full of shit. He doesn’t know what he is talking about in regards to coronavirus. He also doesn’t know what he’s talking about when it comes to football either. But in that case it doesn’t matter. It really matters in this case. So stay safe and stay at home. And stop listening to Clay Travis.
Personally, I think Clay Travis is kind of a dipshit. But he’s not really causing any harm by being a dip shit sports writer. No matter what his takes are on SEC football, they aren’t life or death, and he’s easy enough to ignore (which I usually do). But recently he wrote a piece downplaying the risks of coronavirus that went beyond just being a regular dipshit sports writer, and crossed into the “guy who is totally full of shit writes about something he knows nothing about and puts people’s lives at risk” territory. And since I have a deep personal policy that when people are full of shit, they should be called out for being full of shit, I have decided to write this post. And Clay Travis is full of shit.
This kind of writing is really, actually dangerous. First of all, he has a shit ton of followers (650+ thousand on twitter) who tend to demographically skew younger, and at least some of them will take what he is saying seriously. What I’m saying it that this article almost definitely led to some people taking too casual a stance on the risks of coronavirus. I mean if I was a junior at Mississippi State and I read this piece a week ago, I definitely would have gone on spring break. (Even though every epidemiologist in the entire country would have told me not to go).
Coronavirus is a really dangerous virus, and its definitely not “just like the flu”. In the interest of full disclosure, I was a “this is just a flu” guy in early February. I even tweeted out “Thousands of people die of the seasonal flu every year, but let’s panic about coronavirus”. But turns out I was wrong. This isn’t the seasonal flu. Dr. Fauci, someone with actual expertise in this, unlike Clay Travis, predicts at this point that there will be millions of infections and 100,000 – 200,000 deaths from Coronavirus. And worse case scenarios, with no interventions, again from actual experts, were up to 2.2. million American deaths and 500K in England. So I was wrong on Feb 1. It turns out this shit is serious. (It’s ok to be wrong and change your mind when you get new information!) New York City is breaking records for 911 calls. One hospital in New York is describing what’s happening there as “It’s what was happening in Italy”. But you wouldn’t know any of this by reading C’Lay Travis as he boldly proclaimed on March 18, less than two weeks ago, that:
This is completely not true. It was a stupid thing to say then, and its proven to be so very, very wrong since then. As such, I’e decided to go through the article and give my comments on it. (My comments are in bold. Clay’s original article is in italics.)
I know there is coronavirus doom and gloom everywhere you look on social media right now — in particular with viral predictions of millions of coronavirus deaths on the horizon — but I believe most people are missing the key detail in this outbreak: the number of daily new infections.
Don’t focus on the raw numbers of infection or their growth or death rates — the kind of fear porn you see peddled far and wide on social media — just look at the rates of daily new infection that have occurred in coronavirus outbreaks around the globe and you can divine, to a great extent, what the future is likely to hold.
I think the growth rates and the death rates are EXACTLY what we should be focusing on. The US is currently on pace to have the number of deaths double every 3 days. This is above both Iran and China at this point in their outbreaks. We are still below Italy, but we are on pace to pass Italy in terms of cumulative deaths in under 5 days.
As a preliminary, yes, I am a lawyer who now makes a living writing and talking primarily about sports. No, I am not an epidemiologist and I’m not a doctor either. So if you don’t want to read any further or want to denigrate my opinions for that reason, you’re certainly entitled to that perspective.
When the coronavirus outbreak started, I was messing around with the data and making a few posts about SIR models. But in retrospect I don’t think I should have done this. Because I’m not an epidemiologist nor a medical doctor (but I am a doctor!). But I’m way closer to someone who should be commenting on this than a former lawyer who is a below average sports writer.
That being said, I’m not denigrating Travis’ opinions because he isn’t an epidemiologist nor a medical doctor; I’m denigrating them because they are shitty opinions that aren’t based on reality or data or facts.
But for the rest of you, let me cite my primary source here at the beginning, I am using this website which has been fantastic at providing factually accurate and up to the date infections around the world.
This site he cites seems to be pretty good.
For those of you who have watched, read or listened to my opinions over the years you know that I love to devour data and I’m not afraid to have opinions that run contrary to the masses. Sometimes those opinions focus on sports, other times they deal with politics, media, or business. When I become fascinated by a subject, I can’t stop thinking about it, I want to consume all the information I can about a particular subject.
When those opinions are about sports, politics, the media, or business, no matter how bad (and believe me they are bad opinions. So bad.) But these opinions about a global pandemic can cause actual harm. For some reason, Mr. Travis has about 650+ thousand twitter followers. That’s enough people who read his stuff to do some real damage.
And for the past month I have been absolutely fascinated by the coronavirus outbreak around the world. I’ve spent hundreds, if not thousands, of hours reading about this virus and studying the public data arising from outbreaks in countries all over the world.
Many nights in the past several weeks, especially since sports has been shut down, I have slept three or four hours because I haven’t been able to stop devouring information before waking up at 4:30 in the morning to do my national sports talk radio show. And for those of you who have listened, watched or read what I’ve been saying you know that I’ve been more optimistic about America’s ability to withstand this viral outbreak than most people have.
Optimism is a great thing. I just don’t know how anyone can look at the data that we are seeing now, or on March 18th when he wrote this, and be that optimistic.
As a result I’ve been denigrated online by many media members with blue checkmarks by their names on Twitter for not being enough of an alarmist, for downplaying the threat of the coronavirus to our nation’s health. That hasn’t been my intent at all, my intent, as it always is, has simply been to share factual data with my audience. As I always say, you can disagree with the opinions I come to based on the underlying facts, but I struggle to ensure I get all my facts right, especially in cases such as these.
But Clay got all the facts wrong here.
While most in the United States have been focusing on the United States outbreak I’ve paid quite a bit of attention to what’s happened in China, South Korea and Japan.
Because all three of these countries had coronavirus outbreaks before ours and all three had essentially ended the viral outbreaks in their respective countries before our outbreak really took root. That is, the factual evidence clearly established that when these countries attacked the spread of the virus, they succeeded.
This is correct. But South Korea from the very beginning had much higher rates of testing and dealt with the real threat that it was much earlier and more seriously that the United States did. What South Korea was doing to prevent the spread of the disease, the United States has not done.
Interestingly, all three countries took different paths to end that outbreak — China, with the worst outbreak in the world to this point, was more draconian in its restrictions, while South Korea and Japan handled their outbreaks with much less significant disruptions to their economies and daily life.
But again, South Korea had robust testing in place whereas “The blundering lack of an effective testing program in the US is an unconscionable failure and has led (and will lead) to more transmission of COVID-19.” Comparing the United States to South Korea (or Japan or China) is not a very reasonable comparison.
Certainly in the months and years ahead it will make a great deal of sense to study every country’s response in an effort to find out the best possible model to adopt for future pandemics, but as I write this China, South Korea, and Japan have a total of 126 new infections today.
This means all three countries have effectively ended the viral outbreaks they were combating. (It’s certainly possible the virus could re-emerge in the future, but right now it seems to be beaten).
He is right that all of those countries have done a very good job “flattening the curve”.
Now let’s pivot to our country.
Yesterday the United States saw our highest number of new infections to date, adding 1748 new cases. (It’s important to note that daily “new” cases doesn’t mean new in the context of they just happened. “New cases” are the result of infections that generally occurred five or more days in the past.) As I write this we are on track to exceed yesterday’s number of daily new infections again today and probably for the next several days as well. (If you watch the White House press briefings and listen to Dr. Birx –who has been PHENOMENAL in speaking to the media, alongside of Dr. Fauci — she forecast this occurrence several days ago. Letting astute listeners know that as the testing ramped up in a big way in this country the number of infections would surge as well.)
While Dr. Birx has received some praise, she has also been substantially criticized. Harvard Epidemiologist Marc Lipsitch for instance described Birx’s takes like this:
Lipsitch called Birx’s characterization of the current coronavirus situation in the U.S. “rosy” and even “deceptive.”
But, and this is key, despite the number of new daily infections that doesn’t mean the virus is continuing to spread at this rapid of a rate, it means that we are catching up to the infections that have already occurred in the days and weeks prior to our aggressive action. That is, since the average incubation for this virus is five days, pretty much everyone testing positive today got the virus before the social distancing and quarantines began in substantial numbers in this country.
He’s not entirely wrong here. It was possible that the rate would slow. But given that there are still huge areas of the country that weren’t doing social distancing then (and some areas STILL aren’t doing it!), there was no reason to believe that the rates would go down.
Right now the fear porn purveyors — all too many of whom are in the media — are spinning these infection numbers as hard as they can to terrify people and convince them we are about to descend into abject despair. That our efforts to combat the virus are failing. But the number of daily new infections is a rearguard action, it’s measuring where we were several days ago in the fight against the virus, not where we are today.
He really needs to cite specific examples here of where the media is “spinning these infection” numbers. I’d love to see what he was talking about. “Our efforts to combat the virus” do seem to be too little too late in a bunch of places. A lot of states in the US didn’t put in stay at home orders until after Travis wrote this article and some states STILL haven’t made stay at home orders (i.e. Florida). So I’m not sure why anyone thought that the rates of infections would slow.
And I think that means most of the people panicking in this country right now are missing the signal amid the noise.
Peak new daily infection rates in this country are closer than most think and, and this is significant, hitting a daily new infection peak is a very good thing because it signals we are moving to the backside of our outbreak.
This claim isn’t supported at all by any thing available right now let alone on March 18th when this was written. I have no idea where he got this idea from.
And I think we’re going to hit peak new daily infections very soon.
So at the time we had had two days in a row (March 16 and 17) with 22 and 23 deaths, respectively. On March 18, there were only 10 deaths. But the overall trend in deaths and confirmed cases was growing right on track with exponential growth and the very next day, March 19, there were 82 deaths recorded.
Travis has since been proved monumentally wrong, with each of the last 9 days having more deaths than the previous one including March 28 when there were 445 deaths, the most in a day yet. (Source: Johns Hopkins)
Well, the data in other countries suggests that our peak daily infection rate is likely to come next week.
Again, it makes no sense to compare the United States to other countries that had testing in place very early and took the threat seriously from the beginning. On March 18, we were already on a different than South Korea and Japan. On March 18, out exponential adventure was just beginning. Japan and South Korea had already flattened. Here is what those countries looked like on March 18. So it’s really hard to defend these statements given the information that he had at the time.
Since then, he’s been even wrongerer:
How can I project this?
By looking at what already happened in China, Japan and South Korea.
I mean, again, this makes no sense. Notably South Korea had testing WAY, WAY, WAY ahead of the US. South Korea was doing proportionally about 6X the amount of testing the US was by the middle of March. And in February the US was basically doing no testing:
As I said earlier in this article, those countries have a total of 126 combined new infections today.
That’s because once their infection rate began to decline it do incredibly rapidly, the inverse of the rapid rise.
In other words this virus ramps up rapidly, but it also declines rapidly.
There were several counter-examples to this even at the time including Iran and Italy.
Okay, some of you are saying, but those Asian countries did X, Y, and Z and we aren’t doing X, Y, and Z!
Yup. So why bother even saying it!
Well, let’s leave Asia behind and move to Europe, in particular to Italy, which has been the fear porn proxy for most American social media users. “ITALY, ITALY, ITALY THEY WAIL! Look at the curve in Italy! We match it perfectly we are never going to be able to survive!”
Yes, Italy has suffered a substantial loss of life — 2503 as I write this afternoon — but as a result of this loss of life Italy has undertaken drastic measures to fight the virus. And, significantly, these drastic measures work and they appear to work rapidly against the virus.
Italy recorded its 10,000th death on March 28. They have begun to flatten their curve. The United States is about 8 or 9 days behind Italy and it’s true we haven’t seen the same rates of growth in the death rate at Italy, but we also haven’t flattened out curve at all. Additionally, we are on pace to pass Italy in deaths in around 4 days.
On March 14th Italy hit 3,497 daily new infections. On March 15th Italy hit 3,590 new infections, the viral peak for daily new infections so far in their country. Then came 3,233 new infections on March 16th and 3,526 on March 17th. Now it’s still possible, of course, that the number of new daily infections could pop above 3,590, the present high set on March 15th — update they did on 3/18 after I clicked publish — but even with a still vacillating total infection number it seems pretty clear that at a minimum Italy has hit an infection plateau. (The number of daily deaths also peaked in Italy on March 15th at 368 — a new peak death rate came on March 18th after publication — and has declined since then.)
Does it seem “pretty clear that at a minimum Italy has hit an infection plateau” based on the plot below? I have no idea how anyone could make that statement looking at the plot below.
And in fact, Travis was wrong, again. Daily deaths in Italy have continued to grow including 919 and 889 on March 27 and 28, respectively. Not exactly a plateau.
Far from being an example of exponential growth run wild, Italy stopped the coronavirus’s growth of daily new infections in its country in the space of a week. (This is why so many of these viral epidemiologist studies that go viral on social media are worthless. All of them presume nothing changes. If you want a fascinating read about a man who saw all this before the Chinese outbreak ended, go read the opinion of Micheal Levitt, a biophysicist who won the Nobel prize in chemistry.
In the second paragraph so this article, it states “Although his specialty is not in epidemiology” and the article should have ended there. This exact phenomenon of armchair epidemiology is exactly why I stopped doing any analysis (other than simple graphs) of coronavirus. This guy is clearly really smart. But he’s not an epidemiologist. He’s a guy who knows a lot about Chemistry. I’m not saying that this guy couldn’t do epidemiology, but this “analysis” that he did seems relatively ad hoc. There are plenty of other better places to get information on this from actual epidemiologists who have been studying this exact stuff for decades. So I’ll continue to play around with the data privately, but I won’t be posting any analysis because, it was pointed out to me, that misinformation and weak analysis can be very dangerous. When someone screws up an analysis on sports data, it doesn’t put anyone in danger.
Here’s the essence of Levitt’s analysis from that article: “The rate of infection of the virus in the Hubei province increased by 30% each day — that is a scary statistic. I am not an influenza expert but I can analyze numbers and that is exponential growth.”
Had the growth continued at that rate, the whole world would have become infected within 90 days. But as Levitt continued to process the numbers, the pattern changed. On February 1, when he first looked at the statistics, Hubei Province had 1,800 new cases a day. By February 6, that number had reached 4,700 new cases a day.
But on February 7, something changed. “The number of new infections started to drop linearly and did not stop,” Levitt said. “A week later, the same happened with the number of the deaths. This dramatic change in the curve marked the median point and enabled better prediction of when the pandemic will end. Based on that, I concluded that the situation in all of China will improve within two weeks. And, indeed, now there are very few new infection cases.”
Yeah. But again, China basically shut everything down. It took a lot for them to stop the spread. And the US hasn’t done nearly enough.
Back to Italy, the data now reflects that from this point forward their infections will begin to descend, just like happened in China. And if Italy’s infection rate descends like they did in China, Japan and South Korea that will be a rapid descent which will allow a rapid return to normal life.
There was actually no evidence this would happen then, and it definitely didn’t happen.
Now Italy appears to be further along in its viral outbreak than either France, Spain, Germany, or England are, but the lessons of Italy appear to reinforce the lessons of China, Japan and South Korea. Indeed, both France and Spain also appear to be close to hitting their peak numbers of new daily infections also. (By the way, the most interesting data point I have seen is from Germany. Somehow Germany has 11,973 infections and only 28 deaths. That’s a death rate for infected patients of .23%. What are they doing better than everyone else? Because that rate of death is very similar to the flu.)
Once again, Travis was wrong. The black line is where Travis claimed that France and Spain were “close to hitting their peak numbers”. Not sure how he reach that conclusion, but he was…….not correct.
It IS interesting that Germany has a lower death rate. But there are a ton of possible reasons for that.
Which is why if you extrapolate the data from China, Japan, South Korea, Italy, France and Spain to the United States then it seems highly likely that our rate of new daily infection will peak sometime late next week.
Firsr off, extrapolation is dangerous in general. Extrapolation by a below average sports writer is really, really dangerous. Second, if he had actually extrapolated using exponential growth, he would have been more or less correct on Spain and France.
From that point forward we will be on the backside of this particular outbreak and our daily infection rate will decline rapidly.
That is, I believe rates of daily new infection in America will peak and then begin to drop precipitously starting at the end of next week. That doesn’t mean that new hotspots might not emerge around the United States or that our fight against the coronavirus is over, but it does means there’s a very good chance the worst of our outbreak will have passed by next week.
We will, in the words of Dr. Fauci, have flattened the curve.
As mentioned before, Dr. Fauci just today estimated there would be 100,000 – 200,000 deaths in the US.
What does that mean for America?
Well, since this is primarily a sports site it means our sports may well return faster than we thought. Already the KBA and CBA, the two pro basketball associations in Korea and China, are preparing to start back up their leagues in those countries.
Sites that are “primarily sports” are probably the least equipped to do arm chair epidemiology. I have a Ph.D. in statistics and did a post-doc in a school of public health, and even I’m refraining from doing any (more) armchair epi. And I’m way closer to qualified to do it.
But, much more significantly, it also means our economy may well bounce back more rapidly than many fear. Hopefully by early to mid-April we can begin to embrace a bit more normalcy in our lives once more.
And in what might be the most lasting legacy of this outbreak, hopefully we will have put in place a series of pandemic systems to allow us to respond more rapidly to more significant and deadly outbreaks that might arise in the future.
The good news is this virus is (probably) not going to kill you and we (likely) only have about a week until we hit the viral peak of new daily infections in America. Loss of life will be in the thousands, at most, and not the tens of thousands or the hundreds of thousands or the millions as the most terrifying of these forecasts have suggested. That doesn’t mean you should stop social distancing and staying at home if you’re ill, by the way, but it does mean that this containment is likely going to work very, very well.
We are on pace to hit 10000 in 6 days. This is such an egregiously incorrect statement to make a week ago when this article was written. Not a single expert I have read would have agreed with Travis last week and he’s going to be proven wrong in about 6 days (unless somehow there is a miracle).
What should you do if you believe my forecast is accurate? Buy stocks. I believe Wall Street has baked in far worse expectations than what the reality of this coronavirus will represent.
So if you like getting your epidemiology advice from a shitty sports writer, why not also get investment advice! The stock market has gone up since this article was written after the passage of the $2 trillion bail out. It will be really interesting to see what happens to the stock market in the next year. And Travis may well be right that you should be buying stocks. But what’s that saying about a dead clock?
That’s why I spent all morning buying stocks myself. Yes, I’m putting my money where my mouth is.
The stock market could skyrocket in the next 6 months, and Travis would make a bunch of money. In fact, I hope the market goes up and I wish him all the best in his investments. But he’s still be wrong about his coronavirus projections.
And, by the way, if you hate me and you believe I’m totally wrong in my forecast, you’re entitled to that opinion as well. Indeed, if I personally end up dying of the coronavirus, you have my full permission to make as much fun of me as possible on social media.
To make fun of someone who died of coronavirus, you’d have to be an absolutely shitty person. I do very much dislike Clay Travis, but I don’t want him to die and I certainly won’t make fun of him if he does die. That is unthinkably shitty.
“Writer who said no one would die of COVID 19 dies of COVID 19” is a hell of a headline.
I’d click on that link.
And like most of you I’d be too busy to actually read the article, but I would make a witty and sarcastic comment and share it with all my followers, ensuring it was one of the top trending topics for the day. So have at it.)
The reality is, spoiler alert, we’re all going to die.
But the evidence from around the world suggests that for the vast, vast majority of us, it won’t be from this virus.
This I agree with. We are all going to die. And for the vast, vast majority of us, coronavirus won’t kill us. But it’s possible that most of us will personally know someone who dies from corona.
That is all for now. Stay safe. Stay home. Here is what to do if you get sick. And please, please, please, don’t get your pandemic information from dipshit sports writers. (Or dipshit constitutional scholars).
I’m continuing to marvel at just how well an exponential model fits the number of confirmed cases of coronavirus. The plot below shows the real number of confirmed cases (on a log10 scale) for Italy, US, Spain, and France to the left of the black vertical line. It’s nearly perfectly linear! To the right of the vertical black line we have the projections should this exponential growth continue. I’ve also provided some basic output from the models for each country, including estimated regression coefficients, R^2 (i.e. how close to exponential is it), and when each country is expect to hit certain number of cases if the current trend were to continue.
Here is what is looks like on the raw scale:
All models are simple linear regression with log(confirmed cases) as the response and time since March 1 as a predictor (i.e. March 1 is 1, March 2 is 2, etc.). Projections are based on the assumption that exponential growth continues as current rates (hopefully this turns out to be a bad assumption!)
Current number of confirmed cases (as of March 17): 6,421
beta = .283 (.275, .290)
growth rate = exp(.283) = 1.327 (1.317, 1.336)
R^2 = .9978 (!!! That’s so exponential!!!!)
Expected to be at X cases:
X = 10,000: March 19
X = 100,000: March 27
X = 1,000,000: April 4
X = 10,000,000: April 13
Current number of confirmed cases (as of March 17): 7,699
beta = .257 (.239, .275)
growth rate = exp(.257) = 1.293 (1.270, 1.316)
R^2 = .9846
Expected to be at X cases:
X = 10,000: March 18
X = 100,000: March 27
X = 1,000,000: April 5
X = 10,000,000: April 14
Current number of confirmed cases (as of March 17): 11,748
beta = .322 (.308, .337)
growth rate = exp(.322) = 1.380 (1.360, 1.401)
R^2 = .9931
Expected (or actual date) to be at X cases:
X = 10,000: March 17 (Actual Date)
X = 100,000: March 24
X = 1,000,000: March 31
X = 10,000,000: April 7
Current number of confirmed cases (as of March 17): 31,506
beta = .186 (.178, .194)
growth rate = exp(.186) = 1.204 (1.195, 1.214)
R^2 = .9939
Expected to be at X cases:
X = 10,000: March 10 (Actual Date)
X = 100,000: March 23
X = 1,000,000: April 4
X = 10,000,000: April 17
Code can be found here.
Follow me coding on Twitch here.
Ok. So Coronavirus. On February 1, I tweeted something to the effect of “Why are we panicking about this, the seasonal flu kills X number of people per year”. Whoops! Let’s look at the data and see just how wrong I was and how much of a dip shit I am.
I went and got some data from a github page associated with Johns Hopkins University. Below is a plot of the number of confirmed cases in the United States versus date. Since March 1, we’ve gone from basically nothing to around 3500 cases. (CONFIRMED cases. So who knows how many actual cases there are….)
I wanted to see if this was actually following exponential growth. So I took a log of the total number of cases and plotting that vs date. I get the following plot:
From January through the end of February there is no indication of exponential growth. Then on February 29, this starts to look VERY linear. On the log scale. Indicating exponential growth.
If we focus only on the period from February 29 through today the plot looks like this:
And here is what it looks like on the log scale:
So let’s fit a simple linear regression model with log(cases) as response and days as the predictor. That model gives us an intercept of 2.574184 and a slope of 0.340881. This means that the predicted number of cases on a given day is exp(2.57)*exp(.340881*day_number). Computing the numbers gives us exp(2.57) = 13.12061 and exp(.340881) = 1.406186. If you think about this in terms of money, this is like starting with about $13 in your bank account on day 1. And you get 40.6% interest PER DAY. After a week you have a little over $142 in your account. In two weeks you are now at a bit over $1,550. By the time a month has gone by you are now sitting on $362439.9. After 45 days, you can retire a very wealthy person with $60,238,907. By the end of 60 DAYS (previous version said MONTHS. Thanks Hammers for pointing this out), you have over 10 billion dollars.
Here is what this looks like in a plot for the first 30 days:
And then for the first 60 days. The blue line is the predicted mean and the red lines are prediction intervals.
So obviously, this is extrapolation once we get to day 60 since there aren’t even 10 billion people on earth and the curve at some point has to level off once it’s infected enough people. But what I wanted to show here is just how fast this thing can get out of control given out current path. It’s easy to ignore an exponential curve in the beginning. I mean just look at the data. On March 1 there were only 6 confirmed cases in the US. Fifteen days later there were only 772. That’s still basically nothing though in a country of 330 million people. However, while the first 15 days of March only saw an increase of 766 cases, given the current exponential growth rate of 40% per day, we’ll be over a million cases by April 1. And by tax day, we would be at 119 million. Now imagine a 2% mortality rate! These numbers are staggering.
Staggering enough to cancel the NBA, NHL, major golf tournaments, and MLB opening day. Staggering enough for Chicago to cancel their St. Patricks Day parade. Staggering enough for colleges and universities to send their students home. Staggering enough for California, Ohio, Illinois, Massachusetts, and Washington to shut down bars and restaurants.
So the reason we are social distancing is to make that 40% growth rate per day drop. We have to make that go down.
As always you can find my code on github.
Code for this post is here.
And finally watch me code on Twitch: https://www.twitch.tv/statsinthewild.
So Lisa Goldberg gave a talk at Loyola Chicago on Monday afternoon about the hot hand in basketball where she presented a paper where she shows that there is no statistical evidence that the hot hand exists. While we didn’t film her talk, this numerphile video is basically the same as what she presented on Monday:
The basic argument in the paper is that the probability of making the next shot given you made the previous shot is not statistically different than the probability of making the next shot given you missed the previous shot. So I have a lot of thoughts on this, but first let’s talk about the really interesting history of this topic.
In 1985, what can very accurately be called the seminal paper from Gilovich, Vallone, and Tversky, which defined the hot hand and concluded there there was no statistical evidence for it. For years this was considered orthodoxy in most of the sports statistics world, even though almost everyone in basketball feels that the effect is real.
Years after that paper was original published, in 2015, Jason Miller and Adam Sanjuro published a paper where they pointed out that the way that Gilovich et. al. (1985) went about looking for the hot hand was slightly flawed in that their were unaccounted for biases that are introduced in the streaks that were not accounted for when the streak length that is considered is small.
What they pointed out in this paper is really, really interesting. So let me talk about it for a second. Let’s say you have a finite string of coin flips from a fair coin. Call 0 a tails an 1 a heads. So you might have a string of flips like this: 001101001. Now, for a fixed number of flips, what is the proportion of 1’s occurring after a 1 in a finite sequence? It’s 50% right? Right?!?! It has to be 50%!
Turns out, it’s not 50%. Andrew Gelman has an excellent explanation of the issue here.
Following the Miller and Sanjuro paper in 2017, Daks, Desai, and Goldberg published a paper where they updated Gilovich’s original paper using permutation testing to account for the bias that Miller and Sanjuro pointed out in their paper. In Daks et. al. they find that even when using the permutation tests and accounting for the bias, they still find no evidence of a hot hand.
This paper led to that numerphile video about the hot hand up at the beginning of this post, though Miller and Sanjuro don’t agree with the findings in that paper.
So what are my thoughts on this? I don’t think that any of these paper are looking for the hot hand in the correct way. I think you need to look at building some sort of mixture model or a hidden Markov model with two states representing hot and regular. Once you fit that model you can look to see if there is a significant difference between the states and compare this model to a one state model and see which one gives you a better fit. I’ve written about this type of thing before in baseball with Rob Arthur.
I also think, specifically in basketball, you absolutely cannot be viewing the data as a string of 0’s and 1’s. If you only are looking at makes and misses you are ignoring so many other factors such as shot distance, game situation, distance to nearest defender, etc. that affect the probability of making a shot that need to be controlled for. What’s nice about studying the hot hand idea in baseball for pitchers is that there are relatively few factors that need to be controlled for when looking at pitchers (runners on base, score, pitch type, etc.). And it’s also easier to look at pitchers because there are no opposing players who are trying to hinder the pitchers ability to do what they are doing and the pitch is always coming from the same distance away (This is why I think bowling would be a nice place to look for the hot hand. Someone get me the data!) In basketball, everything is different from shot to shot.
So am I convinced that the hot hand exists? My answer is really, truly, I don’t know. I haven’t seen anything that convinces me it does or does not exists. And also, it depends. It depends on exactly how you define the hot hand.
While at dinner our department head introduced me as the director of our Data Science program and Dr. Goldberg asked me this following question: What are the three things you want students to take away from your program.
I stalled for a bit, and then just straight up said I’m going to avoid answering that, but I’ll tell you want to I think a Data Science student should know how to do.
Towards the end of dinner, I just had to ask Dr. Goldberg the same question she asked me. What did she think the three most important things were (I hope i remember these at least somewhat accurately):
- Statistics does better with more data and but more data is harder for computers. Dealing with this issue is fundamental to doing data science.
- Remove your personal biases from the analysis.
- Design your experiment before hand.
Pretty good answers. But I’ve now been thinking about this question for a few days now. So what are the three most important things that I want our Data Science students to know?
After thinking about this for a while here are my answers (in no particular order):
- Always try to do the thing you are trying to do. (For example, if you just want to build the best classifier model, you aren’t that interested in interpreting parameters.)
- Data Science consists of two major parts: managing the data and analyzing the data. Neither part is more or less important.
- You can manipulate statistics to say many different things. Be ethical. (Present data to others the way you want others would present data to you.)
And of course, I want all of my students to know that the answer to virtually every single question in statistics is “It Depends”.
Ok. Good night.
Here is the distribution of the last digits of the final scores of NFL games all-time:
And here is the distribution for recent games, which I believe I defined as since 2000.
Prediction: 49ers , 26-25
Spread: 49ers +1
OU: Under 52.5