The easiest way to determine the probability of being born on a certain day is to assume that every day of the year has an equal probability of being a birthday. But academic scholarship tends to point to seasonal variation in births. If you average statistics from the United Nations, the seasonality in the United States of America from 1969 to 2013, excluding 1976 and 1977, looks like this:
Seeing this pattern made me think that the seasonality could be different in different countries around the world. While we don’t have good data for all of them, the United Nations does provide statistics covering various years for 145 countries or areas. As a matter of course, data’s not provided to the day, so we won’t be able to precisely determine the number of births on Leap Days, but we will be able to see where births in February and March are most common.
The best thing about doing this analysis in Minitab is that it’s easy to remove unneeded rows from the data. For example, if you download and open the comma separated values file, you’ll see that the first row of the data is not a month. Instead, it’s the total for an entire year:
It’s easy to preserve the original worksheet in Minitab while you create a worksheet without the extra information. In this case, just follow these steps:
- Choose Data > Subset Worksheet.
- In Do you want to include or exclude rows, choose Exclude rows that match condition.
- In Column, enter Month.
Once you choose the column, Minitab shows you a list of all of the values in the column. You don’t have to know every value in the column to subset the data. Although I noticed the Total values right away because they were at the top, I can also remove the Unknown and Missing values at the same time.
- In Values, check the values to exclude.
- Enter a new worksheet name, such as Birth data. Click OK.
With a few formulas to get birth rates for comparisons and a bit more subsetting, you can produce graphs that show the most popular countries for babies in February and March.
When you look at all of the data the United Nations provides, Saint Helena: Ascension has the highest birth rates combined in February and March. Ghana, with a population over 26,000,000 today, is the most populous location in the top 10.
If you look at birth rates only in leap years from the United Nations data, Tajikistan moves into the top spot. South Africa moves into the top 10 locations and becomes the most populous location on the list.
Looking at only the most recent leap year changes the list considerably. Neither Tajikistan, Ghana, nor South Africa appear. The Republic of Korea is the most populous location among the top 10.
Because we’re looking at rates, none of these locations will actually have the most leap year babies. The raw numbers mean that countries with lots of births, such as China and India, will have the most babies born on February 29. But in terms of probability, we see that many locations have a seasonality effect that means for a randomly-selected baby in a particular location, the probability that they are born on Leap Day should increase based on when babies tend to be born in that location. And, of course, if you want a Leap Day baby and you’re inclined to occasionally stretch associations into causes, it’s not too early to start planning your 2020 vacation to Saint Helena.
Want to see more about subsetting worksheets? Check out Subset a worksheet based on starting values of a row.