My thoughts on “Real Time COVID-19 Tracking”

On March 26 Slate posted an article by Tim Requarth entitled “Please, Let’s Stop the Epidemic of Armchair Epidemiology” (which I should mention, convinced me to stop posting my COVID-19 analysis on this very blog). One of the posts mentioned in the Slate piece was this piece by Abe Stanway on March 14 called “Real Time COVID-19 Tracking”. There has since been some back and forth between Requarth and Stanway on twitter, and I decided to take a look for myself at Stanway’s blog post. The post has been reproduced below with my comments in bold. (tl;dr This isn’t a very good article. It’s extremely ad hoc and riddled with statistical shortcomings).

Note: Two things: 1) I don’t want people to stop blogging. I’m not trying to “dunk” on anyone here. (I save that for Clay Travis). Everyone should write more, including Abe Stanway (who seems like a really, really smart guy)! But this topic is serious, and there is so much misinformation out there. Please, please, please, get your information from the experts. I know it may seem like everyone out there can do this stuff, but it’s really complicated. So please, leave it to the experts.

2) Again the title data scientist rears it’s head. The author of the article calls himself a data scientist, but he seems to be lacking some very basic statistical skills (based only on what I’m seeing in this article, so I could be completely wrong). I’m sure Abe Stanway is a very talented (I mean this is straight up really impressive), but I think what I would define as a data scientist and how he interprets the title are very different. But I suppose that is a discussion for a different blog post.

_________________________________________________________________________________________

Real Time COVID-19 Tracking

EpiQuery is a realtime “influenza-like illness” (ILI) tracker. It’s updated on a daily basis. The system was set up in 2016 to track emergency room visits with chief complaints that mention flu, fever, and sore throat. These are not confirmed nor denied to be influenza, nor any other disease. Meanwhile, the US government has severely bungled the COVID-19 test rollout, and we’ve only tested ~15k people total so far. In the absence of widespread testing, we need to rely in EpiQuery (and ILINet, the federal CDC version covering all 50 states) to understand the likely growth of the COVID-19 outbreak.

I agree that the US government has bungled the COVID-19 test rollout.

I wouldn’t use the word NEED when saying “need to rely on EpiQuery”. At this point in the article I’m already skeptical. I think the author is speaking with too much certainty already. On March 14, even now, there is so much we don’t know about this disease. We still seem to be learning new symptoms.

Seasonal peaks highlighted in influenza-like illness in ER visits in NYC from 2016 to present — Daily ER ILI visits since 2016, seasonal peaks highlighted in pink.

I’m going to nit pick here a bit, but I wouldn’t look at the day with the single highest number of cases and call that the peak of flu season. That’s likely noise. It would be better to smooth the data using a moving average possibly and then look for the peak of the moving average.

The chart above shows daily ILI ER visits in NYC. The seasonal peaks are highlighted in pink. We see a very seasonal pattern in the data — every year, there’s generally one major peak in December or January, followed by a gradual decrease in ILI cases. This is the annual flu season, visualized.

Zoomed in ILI ER visit data, with first confirmed NYC COVID-19 highlighted

This is the same data, but zoomed in on 2020. We see our normal seasonal peak on January 29th, and then we see a marked anomaly starting at around March 1st. The anomaly displays a peak of equal magnitude to the regular seasonal peak.

There does appear to be something going on here, but what I would like to see is some of analysis that this is a real signal and not just noise in the data. It’s not enough to just say it looks like something is happening.

A double peak flu season appears to be exceedingly unlikely, as it has never occurred in any historical flu season since the start of this data (at least in NYC), nor has it ever occurred with a slope of this magnitude. Therefore, I believe a large percentage of this peak indicates COVID-19 ER visits in NYC, and not nominal flu visits.

This is where this all starts to fall apart. He uses the term “exceedingly unlikely” to describe the probability that this is just a normal flu. What is that based on? 4 flu seasons in NYC. That first sentence starts out making a really strong claim and it just gets weaker and weaker……”it has never occurred in any historical flu season”…….WOW that seems convincing………..”since the start of this data”………….Oh, well that’s only 4 years……….”(at least in NYC)”…………….and it’s only 4 years of one city. So to say that this is “exceedingly unlikely” to be just a normal flu isn’t supported by anything. We have no idea what other flu seasons have looked like based on this data.

Also, are double peaked flu seasons rare? Nope. According to William Schaffner, MD, an infectious disease specialist at Vanderbilt University in Nashville:

We may well have, for the second year in a row — unprecedented — a double-barreled influenza season

Other articles have also mentioned this double barreled flu season, too. Including this one from pbs.org which said:

The 2018-2019 season has been unusual, though, because the flu came in two waves: one that peaked at the end of December, and a second that peaked in early March. The two peaks were caused by two different strains of the flu virus, and the protection given by vaccination early in the season may have waned by the time the second strain appeared.

So literally, just last flu season we saw a double peak season. So, again, to describe the possibility that this peak is just regular flu is “exceedingly unlikely” is just not a statement supported by any data.

As a side note, I wonder what other diseases are out there that cause influenza like illness (ILI) symptoms besides actual flu and Covid-19? I have no idea.

Another note, this peak could very well be COVID-19. But there are other possibilities to explain this peak that the author has basically ignored. The most notable being that he claims it can’t possibly be regular flu when literally last year there was a double peaked flu season where the second peak with in early March. So I’m not convinced that this is actually a COVID-19 spike, but it could be.

The fact that the peak starts at around March 1st, and the fact that this was also the date first confirmed case of COVID-19 in NYC, lends further evidence to support that this spike represents COVID-19 cases.

Yeah, but it could just be regular flu. The author is speaking with too much certainty.

The data above represents daily ER visits. This means that since March 1st, there have been 8,000 cases of ILI-based ER visits in NYC. Subtracting the nominal flu season data (~3,800 cases over this period, assuming a late season R0 of .95), that means there are likely a minimum of 4,200 COVID-19 cases in NYC as of March 12th.

Ok, this is as armchair-y as it gets. Where does the 3,800 number come from? Where does the R0 = 0.95 come from? Is that a widely accepted value? No work is shown how the author arrives at 4,200 COVID cases. The assumptions that the author is making at numerous and substantial. For starters, he is assuming that every single one of these “excess” cases is COVID-19 (again ignoring the fact that there was a 2 peaked flu season just last year). Perhaps the most head scratching part of this whole thing comes in the next graph.

Hand drawn best-fit lines for a normal flu season (pink) and current COVID-19 outbreak (red)

Hand drawn best-fit lines?! Yikes. It is my opinion that anyone who calls them selves a data scientist should understand how to fit a curve to a set of data. There are numerous options (loess, splines, a simple moving average, etc.). If I am interpreting this correctly, the author makes it sound like they just…..drew a line through the data. (That can’t be right, right?) They then extrapolate (which is dangerous!) what the flu season “should” have looked liked with no COVID-19. And then attribute ALL of the growth in the curve at the end to COVID-19. Not great.

**I still want to know the answer to this question: Are both the pink and red curves literally hand drawn?**

This analysis should be considered a napkin sketch — a more detailed study could estimate the precise start date in NYC, knowing the R0 of COVID-19 is estimated to be 2.2¹ and working backwards to infer when when Patient 0 actually arrived based this parameter and the current curve in red.

He does claim that this should be considered a “napkin sketch”. So he is giving caveats. But then why even do this? Why spend the time on this? Is it worth it to spread “napkin sketch” information about a pandemic? I lean towards no.

He does give a citation for 2.2 R0 for COVID-19, which I appreciate.

The true number of COVID-19 cases in NYC is likely several times higher (given the fact that not all cases present to an ER, and ER cases that are not admitted are sent home without any proper quarantine protocols — aka they are sending people home in Ubers or subways), but I will refrain from speculating on an exact number until I find more data. However, assuming the exponential curve holds, the current case count as of March 14th is around 6,300. Despite the napkin math, this data indicates that NYC is currently adding around 1,000 ER admissions of COVID-19 per day and growing fast.

I think this is a loose usage of the term “exponential curve”. He doesn’t check at all if the curve is actually exponential. This has a formal mathematical meaning, and the informal use of the term “exponential” often means it’s just growing “really fast”.

Also, “adding 1000 ER admission per day” wouldn’t be exponential growth. That’s linear growth.

Finally, where does 6,300 come from? Is this an extrapolation? I don’t follow where that comes from?

BONUS (or, this is where it gets weird):

Below is a breakdown of the cases by neighborhood.

The epicenter appears to be somewhere in Queens.

This is a neighborhood called Corona. You just can’t make this shit up. Edit: yes, I’m fully aware Corona is *always* the highest density of ILI symptoms. This is likely due to the concentration of hospitals in the area. Regardless, this is a joke, and if you take it seriously, you should get out of the house more (once your isolation period is over, of course!)

If this is a joke, it’s hard to tell. And it’s and odd place to put a joke. The entire article is serious in tone, and then the author just throws this in there at the end and claims that this is a joke? I don’t know man. In a different context, maybe it would be more obvious that this is a joke. But it’s not obvious to me that it’s a joke.

Also, looking at raw cases means pretty much nothing. Gotta account for baseline population levels.

¹https://www.ncbi.nlm.nih.gov/pubmed/32097725

All data are available for analysis here. Additional data nationwide can be found on ILINet FluView. Thank you to Ben Hunt for discovering this trove of data, and to Dr. Alfred Illoreta of Mount Sinai and Dr. Ydo Wexler of Amperon for reviewing drafts of this post.

_______________________________________________________________________

Posted on April 7, 2020, in Uncategorized. Bookmark the permalink. Leave a comment.

Leave a comment
Comments 0

Stats in the Wild

My thoughts on “Real Time COVID-19 Tracking”

Real Time COVID-19 Tracking

Leave a comment

Comments 0

Leave a comment Cancel reply

Blogroll

Comedy

Data Art

Data Viz

Jobs

R

Tag Cloud

Stats in the Wild

My thoughts on “Real Time COVID-19 Tracking”

Real Time COVID-19 Tracking

Share this:

Leave a comment

Comments 0

Leave a comment Cancel reply

Blogroll

Comedy

Data Art

Data Viz

Jobs

R

Tag Cloud