Dangers of the coronavirus (COVID-19) are dramatically over-hyped, and it’s easy to prove. It’s also a great example of how poor interpretation of data can mislead a lot of people – in this case, virtually the entire world.
When the scare first started, I more or less ignored it. There have been so many major news stories that promote irrational fear of one thing or another that I reflexively just wait these things out a little. But lately, I’ve noticed more and more “precautions” as I travel – more questions about where I’ve been, whether I have any symptoms, more masked faces in airports and more empty seats on airplanes – so I reluctantly decided to investigate it a little.
The first thing I did was to just ask some friends and colleagues what all the fuss was about. How is this different from other outbreaks? The answer was usually the same: It’s the mortality rate. A large percentage of people who are infected with the coronavirus die. That’s what makes it so scary.
But is it true?
Mortality rate involves two factors: the number of people who are infected and the number of people who die as a result of the infection. An error in either of these numbers would of course lead to an error in the mortality rate. For example, if you understate the number of people who are infected, then you will overstate the mortality rate. And that’s what’s happening now, in a big way.
Consider this recent article from the New York Times. Notice how the author includes a deeply flawed and overstated mortality rate along with an invalid comparison, and then a little later, admits there could be a problem with this number. Which part do you think most people will remember? (Emphasis mine.)
“‘Globally, about 3.4 percent of reported Covid-19 cases have died,’ Dr. Tedros said. ‘By comparison, seasonal flu generally kills far fewer than 1 percent of those infected.'”
… then a few sentences later …
“The figure does not include mild cases that do not require medical attention and is skewed by Wuhan, where the death rate is several times higher than elsewhere in China. It is also quite possible that there are many undetected cases that would push the mortality rate lower.”
That last sentence is key. First, many people who get sick with common cold and flu symptoms don’t go to the doctor. So those cases aren’t reported. Second, in the case of COVID-19, the testing is severely limited. Both issues mean that the number of reported cases is much lower than the number of people who are infected. That might sound alarming, but in a way, it’s good news. It means the mortality rate is much lower than what is being widely reported. It also means comparing the reported COVID-19 mortality rate to rates of seasonal flu outbreak – which is based on an estimate of actual symptomatic illnesses, whether or not the infected person was tested or even went to a healthcare provider – is not a valid comparison. Not even close.
And here’s one from the Wall Street Journal. In this article, they assert an outright falsehood. The author states that “Globally, the new coronavirus has infected 93,123 people since it was first identified … according to data compiled by Johns Hopkins University. A total of 3,198 coronavirus patients have died.” Does this mean the mortality rate is 3.4%? Well, that’s what Business Insider decided to claim in the headline of this article.
But if you look at the referenced John’s Hopkins information here, you’ll see that they are counting “confirmed” cases. Again, there are many more people infected than have been confirmed. That means the mortality rate is much lower than 3.4%. Probably dramatically lower.
There may be other issues with the data as well that push the error in either direction. For example, it could be that some people have been infected but haven’t yet succumbed to the illness. Or it could be that the data has been fudged or mistaken one way or another, for one reason or another. We in the field of data management know that bad data quality is extremely common – especially when analysis is rushed out the door.
But assuming the data being reported is valid, it’s clear that the mortality rate is significantly overstated. Even in cases where reporters can claim that they are technically reporting the facts, the implication is clear. And the reader is left to dig and think carefully to find the issue. Most of us don’t have time for that, and the passing impression formed is that of a looming global catastrophe.
So, misinformation and fear continue to spread even faster and more dangerously than the virus itself, usually with the best of intentions I’m sure. And when testing for COVID-19 is done more widely and the number of confirmed cases skyrockets, there will still be no cause for panic. It will only be further evidence that the mortality rate is much lower that what has been widely claimed or implied.
I’ll remain open-minded as more data is collected and (hopefully) a few journalists begin to analyze that data more carefully and responsibly. But for now, it’s business as usual for me. If you extend your hand, I’ll shake it. If you hug me, I’ll hug you back. It’s going to take more evidence than what’s been presented so far for me to give in to hysteria and give up the most basic human contact.
Update: The latest “official” estimate from the CDC of the mortality rate of Covid-19 is approximately .26%, with a very wide range of values for different age groups (another important topic regarding interpretation and use of data). Clearly this rate is much, much lower than the 3.4% that was being widely communicated at the time this post was written.
As the post predicted, the reason that the mortality rate “changed” to a much lower number has little to due with the lethality of the disease and instead has to do with how the number is calculated. The 3.4% mortality rate is called the Case Fatality Rate (CFR), which refers to deaths among reported cases. However, the number was widely interpreted as if it was the Infection Fatality Rate (IFR), which refers to the percentage of deaths among people infected with the virus, whether or not the infection was reported. The IFR obviously involves estimating how many people are infected in real life.
Is this just a trivial semantic issue? No. When people hear “mortality rate” – whatever term is used and however it’s calculated – they will, and did, reflexively use that number to guess their chances of dying if they get infected with the virus, especially if CFR for Covid-19 is compared with IFR for other illnesses, such as influenza, which was extremely common. Whether or not this was an intentional effort to “get people to take this virus more seriously” or just an innocent mistake, it was dishonest.
Having said all that, if I had known how politicized this subject would become, I would have worded the post much differently. I guarantee that there is no possible way to infer any political leanings I might have from the original post.