Using Probability Distribution Plots to See Data Clearly

Minitab Blog Editor | 20 August, 2015

Topics: Data Analysis, Statistics

When we take pictures with a digital camera or smartphone, what the device really does is capture information in the form of binary code. At the most basic level, our precious photos are really just a bunch of 1s and 0s, but if we were to look at them that way, they'd be pretty unexciting.

In its raw state, all that information the camera records is worthless. The 1s and 0s need to be converted into pictures before we can actually see what we've photographed.

We encounter a similar situation when we try to use statistical distributions and parameters to describe data. There's important information there, but it can seem like a bunch of meaningless numbers without an illustration that makes them easier to interpret.

For instance, if you have data that follows a gamma distribution with a scale of 8 and a shape of 7, what does that really mean? If the distribution shifts to a shape of 10, is that good or bad? And even if you understand it, how easy would it be explain to people who are more interested in outcomes than statistics?

Enter the Probability Distribution Plot

That's where the probability distribution plot comes in. Making a probability distribution plot using Minitab Statistical Software will create a picture that helps bring the numbers to life. Even novices can benefit from understanding their data’s distribution.

Let's take a look at a few examples.

Changing Shape

A building materials manufacturer develops a new process to increase the strength of its I-beams. The old process fit a gamma distribution with a scale of 8 and a shape of 7, whereas the new process has a shape of 10. 

estimates

The manufacturer does not know what this change in the shape parameter means, and the numbers alone don't tell the story. 

But if we go in Minitab to Graph > Probability Distribution Plot, select the "View Probability" option, and enter the information about these distributions, the impact of the change will be revealed.

Here's the original process, with the shape of 7:

And here is the plot for the new process, with a shape of 10: 

The probability distribution plots make it easy to see that the shape change increases the number of acceptable beams from 91.4% to 99.5%, an 8.1% improvement. What's more, the right tail appears to be much thicker in the second graph, which indicates the new process creates many more unusually strong units. Hmmm...maybe the new process could ultimately lead to a premium line of products.

Communicating Results

Suppose a chain of department stores is considering a new program to reduce discrepancies between an item’s tagged price and the amount is charged at the register. Ideally, the system would eliminate any discrepancies, but a ± 0.5% difference is considered acceptable. However, implementing the program will be extremely expensive, so the company runs a pilot test in a single store. 

In the pilot study, the mean improvement is small, and so is the standard deviation. When the company's board looks at the numbers, they don't see the benefits of approving the program, given its cost. 

communicate results data

The store's quality specialist thinks the numbers aren't telling the story, and decides to show the board the pilot test data in a probability distribution plot instead: 

By overlaying the before and after distributions, the specialist makes it very easy to see that price differences using the new system are clustered much closer to zero, and most are in the ± 0.5% acceptable range. Now the board can see the impact of adopting the new system. 

Comparing Distributions

An electronics manufacturer counts the number of printed circuit boards that are completed per hour. The sample data is best described by a Poisson distribution with a mean of 3.2. However, the company's test lab prefers to use an analysis that requires a normal distribution and wants to know if it is appropriate.

The manufacturer can easily compare the known distribution with a normal distribution using the probability distribution plot. If the normal distribution does not approximate the Poisson distribution, then the lab's test results will be invalid.

As the graph indicates, the normal distribution—and the analyses that require it—won’t be a good fit for data that follow a Poisson distribution with a mean of 3.2.

Creating Probability Distribution Plots in Minitab

It's easy to use Minitab to create plots to visualize and to compare distributions and even to scrutinize an area of interest.

Let's say a market researcher wants to interview customers with satisfaction scores between 115 and 135. Minitab’s Individual Distribution Identification feature shows that these scores are normally distributed with a mean of 100 and a standard deviation of 15. However, the analyst can’t visualize where his subjects fall within the range of scores or their proportion of the entire distribution.

Choose Graph > Probability Distribution Plot > View Probability.
Click OK.

dialog box

From Distribution, choose Normal.
In Mean, type 100.
In Standard deviation, type 15.
Click on the "Shaded Area" tab. 

distribution plot dialog box 2

In Define Shaded Area By, choose X Value.
Click Middle.
In X value 1, type 115.
In X value 2, type 135.
Click OK.

Minitab creates the following plot: 

distribution plot

About 15% of sampled customers had scores in the region of interest (115-135). This is not a very large percentage, so the researcher may face challenges in finding qualified subjects.

Using Probability Distribution Plots

Just like your camera when it assembles 1s and 0s into pictures, probability distribution plots let you see the deeper meaning of the numbers that describe your distributions. You can use these graphs to highlight the impact of changing distributions and parameter values, to show where target values fall in a distribution, and to view the proportions that are associated with shaded areas. These simple plots also clearly and easily communicate these advanced concepts to a non-statistical audience that might be confused by hard-to-understand concepts and numbers.