Statistics Tip 2: Reaching a Sweet Conclusion With Confidence Intervals

You Need Confidence Intervals. (REALLY.)

WHY?

See “Tip 1: Every sample statistic is a little bit wrong.”

ABSTRACT:

Scenario: We collect sample data, draw pictures of the data, and calculate statistics.
Problem: See Tip 1: a sample does not provide an ‘exact’ depiction of population information.
Solution: Confidence Intervals

DETAIL:

Scenario: We hope to estimate population statistics (formally known as ‘parameters’) with our sample data.

Problem: The population parameter is likely to be at least a little higher or lower than the sample statistic, but with sample statistics we never know how much.

Solution: Let Minitab use your sample statistics to create a 95% confidence interval for the population parameter. Then, if the 95% confidence interval is (a, b), you can conclude the following:

“We are 95% confident that the population statistic (parameter)
is between ‘a’ and ‘b’.”

EXAMPLE: (A true story, thanks to Chris, who attended a recent Minitab Public Training session)

Scenario: Chris and his family are big fans of a chewy candy, introduced to the US in New York City in 1896. This was the first candy that was individually wrapped as ‘penny candy’, and was originally developed as an economical and convenient (non-melting) substitute for chocolate. Chris’s family buys this chewy candy in boxes that contain multiple flavors.

Problem: Chris’ family’s favorite flavor seemed to be underrepresented among the many flavors in the boxes.

Solution: Back in August, Chris bought 2 boxes of these chewy, tasty treats, and had the family join in on sorting and counting the various flavors. Just as they’d suspected, the sample data that they gathered from these boxes supported their belief. The boxes contained 111 pieces each (so a total sample size of 222 pieces), and 5 flavors. So, if evenly represented, there should be about 22 pieces of each flavor in each box.

Chris’s family found a total of 77 cherry-flavored chews in the two boxes, so about 35% in each box. And, there were a mere 19 pieces of the orange-flavored treats between the 2 boxes, or only about 9% in each box they sampled.

Is this merely sampling variation? Or is this strong enough evidence to infer that this candy maker actually puts more of the cherry flavor in the box, and less of their favorite flavor.

Using Minitab to create confidence intervals for the percentage of pieces of each flavor, we can say the following:

“We are 95% confident that across all packages sold,
the % of cherry-flavored pieces is between 28.4% and 48.3%.”

AND

“We are 95% confident that across all packages sold,
the % of orange-flavored pieces is between 5.2% and 13.0%.”

If each of the 5 flavors was evenly represented, and this was sampling variation only, we should see 20% in each of these intervals. Since neither interval contains 20% we have strong evidence (high confidence) that preference is given to some flavors.

Chris’s family was curious enough to contact the manufacturer of these chewy, tasty treats. Chris reports that they responded promptly and enthusiastically. Their customer service representative told them that the amounts of each flavor are based on customer preferences, with targeted amounts as follows:

a whopping 36% cherry,

18% grape and 18% chocolate,

and only 14% raspberry and 14% Orange.

That’s a sweet deal if you prefer the cherry-flavored chews,
but not if you want more of the other flavors of these chewy, tasty treats.

In my next post, I'll share some technical details summarizing confidence intervals.