How to know if we are beating the Coronavirus?
A data analyst's approach to exponentially increasing COVID-19 numbers
Into COVID-19 numbers
During the current outbreak of COVID-19, we are receiving a lot of information about everyday record-breaking raw numbers in different countries. With an exponential growth of cases, and digital news outlets profiting from commercializing our attention, there seems to be a beneficial environment to keep the population constantly checking for updated numbers on a day to day basis. Yet there seems to be very little information about how we are actually dealing with the disease.
There is a lot of discussion about the accuracy of the numbers we have access to, some studies state that real numbers (counting asymptomatic or unreported cases) could be between 10 or 100 times higher than what we see. And the complex epidemiological and political characteristics of the outbreak (and of its fatal cases) make it really hard to achieve an accurate projection of what we can expect over the next few days and weeks.
The intended goal of this paper is to present a different way of understanding exponentially growing numbers, with the information available, to get a glimpse of how we are doing in the fight against Corona.
As the days pass, this is one of the most common charts used to show each country’s position on it:
Given that the spread of this particular disease follows an exponential growth, and that it’s hard for our brains to really understand these numbers, we tend to react with our most common response to the unknown: fear.
As the disease spreads, the number of total cases is not going to get any lower than it currently is (even if we put ourselves into lockdown for a year, or decide to leave Earth as Elon suggests), and by no means it should be used as a projection of the final number of cases there will be. To get a fair estimate of the progress we're making –if any–, it’s necessary to keep track of the historical virus growth: how many new cases appear every day.
One can only try to imagine the reaction of French people, on the morning of April 4th, as their country reported 25.000 new cases had appeared overnight.
Randomly spread of the virus, plus our limited means for testing can cause our statistics to look quite noisy, like a rollercoaster of emotions. So, for the scope of this paper, we are going to be using the average of the prior 7 days, to get a cleaner view of how we are doing in fighting the disease.
Here we can see the famous curves that need to be flattened to avoid our health systems from collapsing, receiving more daily (and quarantined) cases than they can handle.
The first point that I want to raise is that time doesn't seem to be a great dimension to understand our progress, as the disease spread is only affected by how many sick people there are and how many people each of them will infect. A better dimension to analyze the spread progress should be total cases instead, using time only to segment our accounting of daily cases.
The logarithmic approach
The difficult part with exponentially growing numbers, is that it’s really hard for us to see how different countries compare: Brazil, UK and US, although their leaders seem to be taking similar economic-savvy measurements, are in totally different stages and it’s hard to see how Brazil could be following the US’s path, just a couple weeks behind.
For a better understanding of these dimensions, a logarithmic approach will be used to tackle the difficult task of understanding and comparing large exponential numbers, something our brains struggle with. When we plot growth vs. total cases on a logarithmic scale, we can see that disregarding the country, they all seem to follow a similar trend.
It’s easier to see here how Jair, Boris and Donald may not be having so different scenarios after all.
The exponential trajectory
Although Chinese numbers are highly debated (actually Xi is the only one defending their veracity), they can help us understand what can we expect of a country that has moved out of the logarithmic relationship between these 2 indicators. As a country progresses in fighting the disease, daily cases will start to decrease, while the number of total cases will stop the exponential growth, ejecting itself from the diagonal we see. South Korea, for example, has managed to get out of the path early before reaching 10.000 total cases (April 2020).
The following charts will be updated automatically every day to track each country’s progress. As an Argentinian based in Berlin, with friends back home and spread out around the US, Mexico, Spain, UK, Finland, Italy, India, Indonesia, Australia and New Zealand, I’ve decided to filter out countries to keep the charts from having way too much information to read (there are over 180 countries with cases). A more inclusive chart of the same trend can be found here.
Can we turn it off?
By replacing the axis, and resorting to logarithmic plotting, we are now able to better understand what’s the current situation of each of the countries that are currently fighting the spread. But there is another interesting point to this: time can actually be used in our favor. One of the few things we know about the disease is that it stays contagious for around 2 weeks, so in the count for how many infected people there might be, we can disregard cases that have been reported more than 14 days ago, arriving to the 3rd and last metric of this paper: active cases.
As oriental countries show us (where the virus spread first, having had more time to fight it), when plotting against active cases instead of total cases, we can actually invert the path!
The need for disaggregated information
Each country can fight as well as each of its states does. Unfortunately, disaggregated information about states and dates is really hard to find. As based in Berlin and quite-selfishly aware of German numbers, I’ve decided to crawl the information the Robert Koch Institute provides day by day to follow up.
As of April 10th, Berlin has maintained for 20 days an overall average of around 180 cases new cases, growing from 100 to 200 (x2) daily cases, while the number of total cases soared from 1000 to 4500 (x4.5), so even if the growth seems to be going up, getting out of the exponential path is an important first step towards decreasing it. Prost!
Aggregating it back
In the end, the only way to finally beat the disease is if all of us do our part, leaving no one behind. So for me, the most relevant chart in all of this situation should always be:
Most of the charts in this paper will update automatically on a daily basis. Leaving a script to run every day is prone to many frustrating errors, should anything change in the data sources it could crash unexpectedly. Yet there is a fatal error I’m looking forward to: when the virus is finally beaten (and it will be), the chart should try to reach the position (0 growth, 0 active cases), and the script will crash trying to plot a (0,0) in a logarithmic scale. Then, and only then it will be put to rest.
I’ve tried to avoid making many analysis on the current situation of different countries, such as ‘Argentina seems to be giving a good fight’, as the charts will continue updating and no one really knows how it will follow. The point of this project was not to present an analysis on the countries' situation, but a different way of understanding the scarce information available.