This Saturday is Leap Day, so some lucky 39-year-olds on the verge of 40 get to say they're only 10! What's the easiest way to determine the probability of being born on a certain day, like February 29? It's to assume every day of the year has an equal probability of being a birthday. But ... no surprise here ... the numbers actually disagree!
If you average statistics from the United Nations from 1969 to 2015, excluding years for which there is no data (1976, 1977, 2007, and 2010), the seasonality looks like this:
Seeing this pattern made me think that the seasonality could be different in different countries around the world. While we don’t have good data for all of them, the United Nations does provide statistics covering various years for 148 countries or areas. As a matter of course, data’s not provided to the day, so we won’t be able to precisely determine the number of births on Leap Days, but we will be able to see where births in February and March are most common.
The best thing about doing this analysis in Minitab is that it’s easy to remove unneeded rows from the data. For example, if you download and open the comma separated values file, you’ll see that the first row of the data is not a month. Instead, it’s the total for an entire year:
It’s easy to preserve the original worksheet in Minitab while you create a worksheet without the extra information. In this case, just follow these steps:
- Choose Data > Subset Worksheet.
- In Do you want to include or exclude rows, choose Exclude rows that match condition.
- In Column, enter Month.
Once you choose the column, Minitab shows you a list of all of the values in the column. You don’t have to know every value in the column to subset the data. Although I noticed the Total values right away because they were at the top, I can also remove the Unknown and Missing values at the same time.
- In Values, check the values to exclude.
- Enter a new worksheet name, such as Birth data. Click OK.
With a few formulas to get birth rates for comparisons and a bit more subsetting, you can produce graphs that show the most popular countries for babies in February and March.
When you look at all of the data the United Nations provides, The United States of America has the most recorded births by a wide margin. The other represented countries are on the continents of Africa, Europe, and Asia.
If you look at birth rates only in leap years from the United Nations data, the United Kingdom of Great Britain and Northern Ireland replaces Poland among the top 10 countries or areas with the most births.
Looking at only the most recent leap year changes the list considerably. The United States of America, Pakistan, Republic of Korea, France, and the Russian Federation, and Italy which all appeared on the first two graphs are gone. They are replaced by Brazil, Mexico, Turkey, Algeria, South Africa, and Germany which did not appear on either of the previous two graphs.
It’s interesting to see that the United States has historically been high on the number of births in February and March although the original bar graph showed that February was the least popular month for births in the United States. This pattern persists in many of the other countries that appear on the list, but there are two interesting exceptions. The Republic of Korea shows a high number of births in February relative to the rest of the year. South Africa has a relatively even pattern for the number of reported births.
Want to learn more about subsetting worksheets?
Check out this support article, Subsetting a worksheet based on starting values of a row.