Coronavirus disease (COVID-19) is caused by the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) and has had a worldwide effect. On March 11 2020, the World Health Organization (WHO) declared it a pandemic, pointing to the more than 1.6 million cases of the coronavirus illness in over 110 countries and territories around the world at the time.
In this report, I hope to give an overview of the pandemic in the UK by trying to answer the following questions:
And special thanks to Emmadoughty and RamiKrispin for collating and sharing the datasets.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
uk1 = pd.read_csv('UK_by_day.csv',parse_dates=['Date'])
uk2 = pd.read_csv('UK_by_area.csv',parse_dates=['date'])
uk1.head()
uk2.head()
plt.plot(uk1.CumCases,label='Total cases')
plt.plot(uk1.CumDeaths,label='Total deaths')
plt.title('The number of cases and deaths')
plt.legend()
plt.show()
The graph indicates that the number of confirmed cases began to grow exponentially after one and a half months since outbreak (around March 10 2020). One month later (April 10 2020), the UK has 70,272 confirmed cases and 8,958 deaths recorded.
country = uk2.groupby('country').sum().confirm.sort_values()
plt.pie(country,explode=(0,0,0,.1),autopct='%.1f%%',radius=1.3,labels=country.index,colors=('b','c','y','g'))
plt.title('Proportion of cases by country')
plt.show()
As shown above, the vast majority of infections happened in England, accounting for 88% of the total cases, while Northern Ireland was least affected with the corresponding figure of 1.5%.
uk2.query("type=='NHSR' and country=='England'").groupby('area').sum().confirm.sort_values().iloc[1:8].plot.barh()
plt.title('The number of cases by region (in England)')
plt.show()
We can see that London has the largest number of confirmed cases among the regions of England, with almost twice that of the second most affected region, the Midlands.
In epidemiology, the basic reproduction number (R0) of an infection can be thought of as the expected number of cases directly generated by one case.
The R0 value only applies when everyone in a population is completely vulnerable to the disease (which is the case for a novel coronavirus with no vaccines as of yet). Therefore, all of the goverment's restrictive measures(i.e. lockdown, social distancing) are intended to lower the R0 value until it's less than 1.
With limited epidemiological knowledge and a basic dataset, I will use a very crude model to estimate the reproductive rates and show the general numerical growth trend of confirmed cases and deaths:
uk1['R0Cases'] = uk1.CumCases / (uk1.CumCases - uk1.NewCases) - 1
uk1['R0Deaths'] = uk1.CumDeaths / (uk1.CumDeaths - uk1.NewDeaths) - 1
uk1.Date = pd.date_range('2020-1-25','2020-4-10',freq='D')
uk1.set_index('Date',inplace=True)
plt.figure(figsize=(8,4))
plt.plot(uk1['2020-2-28':].R0Cases,label='R0 of infection')
plt.plot(uk1['2020-2-28':].R0Deaths,label='R0 of deaths')
plt.title('The R0 values of infection and deaths')
plt.legend()
plt.show()
Despite the simplicity of the formula, there is some information we can gain from this graph:
In the middle of an exponential curve, it is very hard to tell how it will develop and whether a turning point is approaching. However, for an exponential function y = f(x), if we plot y values in the x-axis, and the changes of x value (i.e. delta x) in the y-axis, both in logarithmic scale, the resulting graph should be linear.
This method has provided us with a simple and straightforward approach to show the trend:
Italy = pd.read_csv('Italy_by_day.csv',parse_dates=['date'],index_col=['date'])
sns.regplot(x='cumulative_cases',y='daily_positive_cases',data=Italy,order=4).set(xscale='log',yscale='log')
plt.title('The turning point in Italy')
plt.show()
Italy is one of the countries most affected by the coronavirus and currently has the highest death rate, with hundreds of new cases confirmed daily and a huge number of total cases. However, from the graph above we can see that Italy is getting out of the crisis and the hardest times have passed: the line has dropped off and is no longer experiencing exponential growth.
sns.regplot(x='CumCases',y='NewCases',data=uk1['2020-2-28':],order=4).set(xscale='log',yscale='log')
plt.title('The turning point in the UK')
plt.show()
Compared to Italy, the UK faces a more challenging situation at the moment. There is still no clear turning point observed in the graph, indicating that the disease is still progressing in line with the exponential law. However, toward the end of the curve, we can see that the scatter points are more clustered, and the statistical algorithms also simulated a downward trend. Therefore, with an appropriate degree of caution, I would suggest that we are approaching the turning point (the peak), although there are uncertainties.
plt.figure(figsize=(8,4))
plt.plot(uk1['2020-02-24':].CumCases,label='UK')
plt.plot(Italy.cumulative_cases,label='Italy')
plt.title('Number of confirmed cases')
plt.grid(linestyle='--')
plt.legend()
plt.show()
Given the information that:
Combining the data, graphs and anlyses in this section, assuming that the development of coronavirus in the UK will follow a similar pattern to Italy, we have reason to believe that the disease will reach a peak within the next few days in the UK, and the daily increase in cases will drop below 1,000 within the next 25 days.
Finally, we can examine the relationship between infection and mortality.
sns.pairplot(uk1[['NewCases','NewDeaths','CumCases','CumDeaths']],height=1.5,diag_kind='kde')
plt.show()
From this graph, we can tell that:
Therefore, we must lower the mortality rate as a matter of priority.
This report has examined three aspects of the coronavirus pandemic in the UK, and the conclusions are as follows: