
Lies, Damn lies and statistics
“There are three types of lies: lies, damn lies and statistics” warns Benjamin Disreili or perhaps Mark Twain depending on which source you quote. Irrespective of who you attribute the quote to it remains a timely warning about the pitfalls of blindly believing in numbers. Statistics can, and has lead to some on the most consequential mistakes in both medicine and history.
Now perhaps more than ever it is important to take a considered and objective approach to the data. At the current time much of the globe is involved in a frightening war against Corona virus. The reasons for this are primarily statistical.
Let me summarize, while plagerising Wiki. Seven strains of human coronaviruses are known, of which four produce the generally mild symptoms of the common cold: Corona There have been potentially more serious strains
The name “coronavirus” is derived from Latin corona, meaning “crown” or “wreath”. The name refers to the characteristic appearance of virions (the infective form of the virus) by electron microscopy, which have a fringe of surface projections creating an image reminiscent of a crown or of a solar corona.
Human coronaviruses were first discovered in the late 1960s. Many members of this family have since been identified, including SARS-CoV in 2003, HCoV NL63 in 2004, HKU1 in 2005, MERS-CoV in 2012, and SARS-CoV-2 (formerly known as 2019-nCoV) in 2019. Most of these have involved respiratory tract infections and a the common cold.
Thus far, nothing to bad to worry about. Late last year however scientists discovered a new coronavirus Covid 19 (the 19 refers to 2019 when it was discovered in case you were wondering). The virus was initially discovered in the Wuhan province in China. It behaved much like any other cold or flu did – it caused respiratory tract infections and in the elderly or vulnerable could precipitate death. As things progressed we saw alarming fatality rates in people infected.
The a cold or flu will push vulnerable people over the edge- to the rate of between 250000 to 750000 a year depending of the data you use and the year you survey. Now using complicated regression analysis, which is educated guessing, health official think the death rate is about .1 to .3 %. The discrepancy is based on the base rate of infection, which we can never measure, but have to estimate. That means we estimate between one to three people in every thousand who get infected will die. Mostly it is old or sick people
Now the bad news, when health officials calculated the death rate for Covid 19 in The Wuhan province they got a rate of 4.6 %. If you do the maths it would be up to 40 times more lethal than the flu. If we extrapolate that, we get about 34,000,000 deaths of Covid if it is as infective as the flu. Early data also suggested that it was more transmissible and infective than the flu and we also saw a concerning number of deaths in middle aged people, which was unusual for an average influenza.
Covide spread out of China and then into Italy, of all places. In a short period of time we saw death rates calculated at 10%. In fact at the time of writing the world wide death rate was still at above 4.6% and death rate in Italy was well above twelve. In Germany and a number of other countries however death rates are under 1% .
So with theses very contradictory statistics, lets see if we can make sense of this conflicting data.
a)The average age of Corona virus death in Italy is a the current time 81.
b)The median age of corona virus death world wide is 73.
c)By an interesting coincidence life expectancy world wide is 72.6.
d) Life expectancy in Italy is 82.54., but it is slightly lower for males at 80.5.
These of course are estimates and different figures are being quoted. On current uncertainty to many epidemiologists is the huge discrepancy between mortality rates in Germany (less than 1%) and those in Italy (close to 12%).
One interpretation that the corona virus is through some unique biological feature of sorts particularly virulent to older persons and that Italy has an old population.
Statistics might however give one possible explanation into the unusual data coming from Corona virus figures.
“There are three types of lies: lies, damn lies and statistics” Benjamin Disreili or Mark Twain
Is is prudent to remember Mark Twains prophetic words about statistic. With Covid even the a smallest change in methodology or baseline assumptions can cause incrementally large changes in the final calculations. In regards to mortality rates attributing cause of death is not as easy as might appear at first glance.
Except in trauma or very specific causes of death. Most people have a number of medical problems at the time of death and deciding which is the major factor is not always clear cut.
When old or sick people die they often have multiple illnesses and deciding which should be considered the cause of death is complicated. In the case of corona, just because someone tests positive for corona doesn’t mean it was the cause of death. After all many thousands of people have thus far have recovered from the virus with little if any problems.
Currently however the methodology in Italy is to attribute 100% of those deaths to Corona virus if a deceased person tests positive. Attributing every death to Corona would seem on face value a gross statistical error referred to as false causality. It creates the problem that the more we test people who have died or who are very sick, the more we risk elevating mortality calculations upwards ,perhaps by very large margins. In statistics it is important to never forget a simple axiom: Causality does not imply causation.
Causality does not imply causation.
It is salient to know than on average about 1800 people die in Italy every day any way- that is the baseline mortality rate. Over the last few weeks we have been seeing death rates of 600 to 800 people per day attributed to Corona but we have no data that indicates that the absolute death rate is any higher at all.
If for instance we test all people who have died, we may have positive tests in situations where the virus was not in fact the cause of death. A way to establish if there are higher rates of death attributable to Corona would be to be able to measure a direct increase in mortality above the expected. We should be able to detect an increase in absolute mortality in the range of 30 to 40%. That data, not totally unexpectedly, is not currently available, and it is unclear if and when this might occur.
Then close correlation between age of death of corona patients with normal mortality rates brings about the question of what percentage of deaths in people who test positive for corona should in fact be attributed to the virus. If data does not end up demonstrating an appreciable risk in absolute mortality, then there is a real risk that the death rates in Italy are more due to redistribution of mortality attribution rather than areal increase in death rates due to Corona.
The most serious problem in calculating mortality however is in calculating the number of undiagnosed infections. Mortality rate is, in simplified terms, number of deaths by divided by the number infections. There are huge variations in the estimations of undiagnosed cases- ranging from two to one hundred times the rates of officially diagnosed people. The result of this is that using the exact same data as another person it is possible to get mortality rates from 4.6 % to .023%, just by altering my assumption (guess) on the rate of undiagnosed infections.
There is an interesting statistical mistake that can occur in mortality calculations based on this issue. There is a simple way to increase the mortality rate for any disease to 100%- only tests the dead. Think about it. If we diagnose a particular disease only in dead people, the mortality rate becomes 100%. If we only test very sick people I we artificially inflate mortality rates by this same process.
“The line between the sublime and the ridiculous can be only one step”. Napoleon
The concern now is that the whole global response to the pandemic has been based on an estimated lethality rate of corona of at least 3%. If we have made a statistical error due to false causality and sampling error, and the rates are much lower than this, then the validity of the whole global response is in question. If the true rate is below 1 % we are looking at a powerful, but not than unusual flu. If we start getting true rates of .3% or less we are dealing with a garden variety flu. Now if , heaven forbid, the final calculations are under .1%, we have a nasty case of athletes foot. When I crunch the figures I can get estimates as low as .02% or as high as 8% depending on the statistical slight of hand I use.
We will not know for sure about the true statistics until the crisis is well and truly over. Until that time it is prudent to remember the lessons of history. If the experts failed to head Twain’s warning we will all have been witness to what will be remembered as the single greatest statistical error in human history.
The line between the sublime and the ridiculous can be only one step, especially when it comes to statistics.