The Falling Child Project : Using Binary Logistic Regression to Predict Borewell Rescue Success
by Lion "Ari" Ondiappan Arivazhagan, guest blogger.
An alarming number of borewell accidents, especially involving little children, have occurred across India in the recent past. This is the second of a series of articles on Borewell accidents in India. In the first installment of the series, I used the G-chart in Minitab Statistical Software to predict the probabilities of innocent children falling into open borewells, which are sunk by farmers for agricultural and drinking water, while playing in the fields.
In this article, I will use the power of predictive analytics to predict the probability of successfully rescuing a trapped child based on the inputs of the child's age and gender using Binary Logistic Regression.
In Minitab, we can use Stat > Regression > Binary Logistic Regression to create models when the response of interest (Rescue, in this case) is binary and only takes two values: successful or unsuccessful.
Borewell accidents data collected and provided by The Falling Child Project (www.fallingchild.org), a non-governmental organization (NGO) based in the United States, has been used for this predictive analysis.
Part of the raw data provided by the NGO is shown Table 1 below. A total of 62 borewell accident cases in India have been documented from 2001 to January 2015.
As part of the analysis, Minitab will predict probabilities for the events you are interested in, based on your model. The predicted probabilities for unsuccessful events versus the Predicted Age and Predicted Gender are shown in the scatterplot below.
We can predict, with 70% confidence, that the probability of unsuccessful rescue is 15% higher for a male child of age 2 than that for a female child of same age. However, it is surprising to note that above age 5, girls have about a 10 % higher chance of an unsuccessful rescue attempt than boys.
I should note that one outlier, a male of age 60, was replaced with a male of age 6 to reduce the unnecessary effect of outlier on the whole analysis / output.
From the Binary Logistic Regression analysis above, we can predict that boys of age 5 and above have a greater chance of being successfully rescued than do girls of the same age. Although the analysis indicates a P-value of 0.736,hinting that there is not much of interaction between the age of the child and its gender in predicted probabilities, the over all model's P-Value is reasonable at 0.291, hinting at a moderate 70% confidence level in the model.
However, the scatter plot of predicted probabilities shown above paints a different picture. The age 5 seems to be critical age beyond which girls have lesser chances of being rescued alive than boys do.
My goal in performing this analysis and sharing my findings is to be helpful to the rescue teams that plan these rescue efforts, so that they can increase the chances of successfully rescuing every trapped child, boy or girl.
About the Guest Blogger:
Ondiappan "Ari" Arivazhagan is an honors graduate in civil / structural engineering from the University of Madras. He is a certified PMP, PMI-SP, and PMI-RMP from the Project Management Institute. He is also a Master Black Belt in Lean Six Sigma and has done Business Analytics from IIM, Bangalore. He has 30 years of professional global project management experience in various countries and has almost 14 years of teaching / training experience in project management, analytics, risk management, and Lean Six Sigma. He is the Founder-CEO of International Institute of Project Management (IIPM), Chennai, and can be reached at firstname.lastname@example.org.
An earlier version of this article was published on LinkedIn.