Probability is the heart and soul of statistics: at a very simple level, we collect data about something we wish to understand, then we use statistics to assess the likelihood—or probability—of the situation described by the data.
In practice, this simple idea gets complicated very quickly. Anyone who's been involved in a 6 Sigma project can confirm that sometimes the data we collect, and the results of our statistical analyses, seem far from simple.
If you ask a statistician or mathematician about "probability," you probably need to get more specific. You know how Inuit peoples supposedly have dozens of words for snow, because they have such a detailed awareness of different types of snow? Same thing applies to statisticians and types of probability. You've got subjective probability, conditional probability, joint probability, physical probability, evidential probability...you get the idea.
Now, you may not need to worry about all of those variations unless you're either an Inuit or a statistician (or both). Snow is snow, and probability is probability, right?
Sure...but there's dry snow and then there's the heavy, wet snow that causes tree limbs to break. If you own an orchard, you'd better know the difference. Similarly, there's single-event probability and there's cumulative probability, and if you're looking at data, you need to understand the difference.
The simplest type of probability is the measure of the chance that a single event will occur. For example, let's say we have a fair, six-sided die. The probability that a single throw will be a 4 is 1/6, because only 1 of the six sides is a 4.
Simple, right? Similarly, the probability that a single roll of the die will be a 1 is 1/6. The same holds true for 2, and for 3, and for 5, and for 6. The single-event probability that a roll of the die will result in any one face you select is 1 in 6.
Cumulative probability measures the odds of two, three, or more events happening. There's just one catch involved: each event needs to be independent of the others—you can't have two events that occur at the same time, or have the outcome of a first event influence the probability of the next (which would be conditional probability).
An easy way to get the concept of independence is to think about tossing a coin: for any one toss, it cannot land on both heads and tails, right? Moreover, getting a head or a tail on your first toss has no effect on whether you get a head or tail on your second toss. So tossing a coin is an independent event.
The events in cumulative probability may be sequential, like coin tosses in a row, or they may be in a range. For example, if you're observing a response with three categories, the cumulative probability for an observation with response 2 would be the probability that the predicted response is 1 OR 2. So to find the odds of ONE of these two events occurring, we add—or accumulate—the chances of either one occurring.
If all this sounds a little confusing, that's okay. We can very easily illustrate the difference between cumulative and single-event probability by putting the data for rolling a die into a table. The table below shows the probability of getting a selected face value (1 through 6) when you throw a single die; the cumulative probability of getting a selected face value or less when you throw a single die; and finally the cumulative probability of getting a selected face value when you throw 1 to 6 separate dice (or 1 die up to six times).
Face Value | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|
Probability of rolling face value on a single die (odds of 1 possible event) |
1/6 or 16.6% | 1/6 or 16.6% | 1/6 or 16.6% | 1/6 or 16.6% | 1/6 or 16.6% | 1/6 or 16.6% |
Cumulative Probability of rolling face value or less on a single die (odds of 1, 2, or more events) |
1/6 |
2/6 or 33.3% |
3/6 or 50% |
4/6 or 66.6% |
5/6 or 83.3% |
6/6 or 100% |
Cumulative Probability of rolling a given face value for multiple dice |
1/6 |
11/36 or 30.5% |
91/216 or 42.1% |
671/1296 or 51.8% |
4651/7776 or 59.8% |
3103/46656 or 66.5% |
The odds that we'll roll a 1 on a single roll of the die will be 1/6, right? That's a single-event probability. But if we roll the die and want to know the probability that we will roll a 1 or a 2, that's cumulative probability, because it is the accumulated value of the odds of one OR the other happening. Obviously, the odds of rolling a 6 or a 5 or a 4 or a 3 or a 2 or a 1 on a single die will be 100%.
What if we throw the die multiple times? The probability of rolling a 6 with one die is 1/6, so the probability of not rolling a 6 is 5/6. If we roll two dice, we have 6 x 6 = 36 possible outcomes, and 5 x 5 = 25 of those will not include a 6, leaving 11/36 possible outcomes that will.
With three dice, we have 6 x 6 x 6 = 216 outcomes, 5 x 5 x 5 = 125 of which don't include a 6, leaving 91 that do, and so on. So the cumulative probability of getting at least one 6 when you roll six dice is 66.5%, as shown in the table above, and as reader ThatCalicoCat points out in the comment below.
So the next time you're analyzing your data—or placing a bet on the craps table in Vegas—make sure you understand the type of probability you're dealing with!