How Deadly Is this Ebola Outbreak?
The current Ebola outbreak in Guinea, Liberia, and Sierra Leone is making headlines around the world, and rightfully so: it's a frightening disease, and last week the World Health Organization reported its spread is outpacing their response. Nearly 900 of the more than 1,600 people infected during this outbreak have died, including some leading medical professionals trying to stanch the outbreak's spread. And yesterday, one of the American doctors who contracted the disease arrived back in the U.S. for treatment.
Many sources state that Ebola virus outbreaks have a case fatality rate of up to 90%, but a look at the data about ebola shows the death rate significantly varies based on the ebola species, case location, and year.
Plotting Ebola Outbreaks Since 1976
Infection with the ebola virus causes a hemorrhagic fever. Symptoms most commonly appear 8 to 10 days after exposure, and include fever, headache, joint and muscle aches, and weakness. These symptoms quickly escalate to diarrhea, vomiting, stomach pain, lack of appetite, abnormal internal and external bleeding, and organ failure.
The disease first appeared in Africa in 1976, and since then sporadic outbreaks have occurred as indicated in graph 1, which depicts data from the World Health Organization web site. (You can download my Minitab project file, which includes all of the data used in this blog post, here.)
According to the Centers for Disease Control, of the five known species of the Ebola virus, only three have resulted in large outbreaks. The current outbreak is associated with the species Zaire ebolavirus (EBOV). The two other species that have been associated with large outbreaks are Bundibugyo ebolavirus (BDBV) and Sudan ebolavirus (SUDV).
Graphing the outbreak death rate over time can help us understand the impact of species, location, and year. But plotting raw outbreak death rates, as I did above, is not ideal due to the difference in case numbers (sample size) across outbreaks. Let's try a different approach.
Assessing Ebola Outbreaks with Binary Logistic Regression
Fitting a model which accounts for the different sample sizes and then plotting the model predictions over time is more appropriate than simply graphing the raw fatality numbers.
I put the data into Minitab Statistical Software and used binary logistic regression to fit a model with three predictors: year, ebola virus species, and location of outbreak. I could not fit interactions among these factors because of the limited amount of data available.
All three predictors had p-values below 0.001, indicating strong statistical significance:
I also created a scatterplot to illustrate the model's predicted death rates over time:
We can draw the following conclusions from the binary logistic regression analysis and the graph above:
- The death rate from ebola decreases over time.
- The death rate is significantly different across species. After accounting for the effects of location and time, species SUDV and BDBV have lower death rates than EBOV. The current outbreak is EBOV.
- The death rate is significantly different across locations. After accounting for the effects of species and time, Gabon, Sudan, and the current outbreak location (Guinea, Sierra Leone, and Liberia), appear to have a lower death rate.
Assessing the Current EBOV Outbreak with Binary Logistic Regression
The current outbreak has a low death rate relative to previous EBOV outbreaks. Since the current location has not appeared before, we can not tell whether this decreased death rate is due to improvements in treatment over time, the quality of care available in the location of the outbreak, or some other factor, such as better immunity to the virus in the region.
The graph below shows the EBOV death rate predictions from a binary logistic regression model fit to the EBOV data only.
The current outbreak is severe in terms of number of cases, but the death rate is lower than expected based on past EBOV outbreaks in different locations.
Seeing the Outbreak Day by Day
One final graph shows the number of new cases per day by location for the current outbreak.
Cases per day has fluctuated widely in Guinea, while Liberia and Sierra Leone have both seen an extremely rapid rise in cases per day since mid-July.
This is one graph that will change greatly from day-to-day as the outbreak runs its course. Let's hope the data quickly return to 0 new cases per day for all locations.