Visualising COVID19 Timeline - Concepts and Caveats
Transformation of the COVID19 outbreak into a pandemic has resulted in, inter alia, a slew of data and related resources. Animated visualisation makes these data accessible to the masses. There are, however, some key concepts that one needs to be aware of to astutely interpret information from such graphics. This article explores some of these concepts and highlight important caveats to keep in mind when reading timeline graphs.
Linear scale vs Log scale
On a linear scale there is a fixed increment at regular intervals. The positioning of the tick marks on the axis for a graph with linear scale is calculated by addition. Whereas, on a log scale the ticks on the axis are marked using multiplication factor. This makes log scale graphs less intuitive to understand the underlying information and therefore it requires some training to parse graphs with log scale. Let's plot some data to get a clarity on this distinction between these scales. Consider that we start with one infected individual on day one and that each day passing the number of infection doubles. That is the number of infections over the next few days would be like 1,2,4,8,16,32...etc. The graphs below shows this trend over 10 days plotted with y-axis having a linear (left) and a log (right) scale. Notice how different the two curves look!
The left graph is somewhat easier to interpret as compared to the one on the right. For example, if we want to know how many infected individual are there on day 9 then from the left graph we can see that its between 200 to 300 (exact number is 256). However, getting this information from the right graph is rather difficult unless of-course if you are fluent with logarithms. So, even bother plotting on log scale? When there is skewness in the data, on a linear scale some of data points dominates over others and to avoid that the log scale comes handy. This feature is specially valuable when analysing trends in a time series data. To appreciate this, let's now have a look at the real time series data for the COVID19 outbreak. The graphs below shows the timeline for the emergence of infection cases for top five countries (based on the number of total cases) during the months of March and April.
Looking at the graph with a linear y-axis (left), it seems that the United States has far more cases while the other four countries have comparable number of cases. Which is true. However, what this graph fails to portray is that all of these five countries are showing similar trend for the increase in the number of cases. This is well captured by plotting the same data on a log scale (right). And again, a well-trained eye, looking at the graph with a log scale, would immediately notice that, by April end, the US has more number of cases as compared to other countries by a factor of 10x!
Total cases vs total cases per unit population
The extent of the COVID19 outbreak in a region can be gauged through the total number of cases within that region. However, we also need to know the total population for that region to correctly assess the situation. For instance, in case of countries with less population the ratio of total cases by total population could be higher than the corresponding ratio for countries with large population even if the total infection cases are comparable in the two countries. The graphs below shows top 10 affected countries based on total number of cases (left) and total number of cases per million (right). As can be seen clearly that smaller countries show up prominently when cases per unit population are plotted. Curiously, Spain and Switzerland appear in both these graphs.
Total confirmed cases and testing capacity
The data on total cases is a function of the rate of diagnosis. As the countries ramp-up their testing capacity there is an uptick on the number of total cases. Similarly, in case the rate of testing goes down, for whatever reason, there would be a decline in the new cases. However, this does not imply that the spread of the disease has reduced. In fact, some countries takes time to get the testing infrastructure in place and as a result would see a rise in new cases at a later time. The graph below show the top 10 affected countries on the basis of new cases identified on 30 April. It is evident that some new countries appear here as compared to the graph based on total number of cases.
The graph above shows how increase in the daily number of tests performed correlates well with the increase in the number of new cases diagnosed daily for India and Canada.