Hypothesis Testing | MinitabBlog posts and articles about hypothesis testing, especially in the course of Lean Six Sigma quality improvement projects.
http://blog.minitab.com/blog/hypothesis-testing-2/rss
Sun, 03 May 2015 11:49:51 +0000FeedCreator 1.7.3Banned: P Values and Confidence Intervals! A Rebuttal, Part 1
http://blog.minitab.com/blog/adventures-in-statistics/banned-p-values-and-confidence-intervals-a-rebuttal-part-1
<p>Banned! In February 2015, editor David Trafimow and associate editor Michael Marks of the <em>Journal of Basic and Applied Social Psychology</em> <a href="http://www.tandfonline.com/doi/full/10.1080/01973533.2015.1012991#abstract" target="_blank">declared</a> that the null hypothesis statistical testing procedure is invalid. They promptly banned P values, confidence intervals, and hypothesis testing from the journal.</p>
<p>The journal now requires descriptive statistics and effect sizes. They also encourage large sample sizes, but they don’t require it.</p>
<p>This is the first of two posts in which I focus on the ban. In this post, I’ll start by showing how hypothesis testing provides crucial information that descriptive statistics alone just can't convey. In my next post, I’ll explain the editors' rationale for the ban—and why I disagree with them.</p>
P Values and Confidence Intervals Are Valuable!
<p>It’s really easy to show how P values and confidence intervals are valuable. Take a look at the graph below and determine which study found a true treatment effect and which one didn’t. The difference between the treatment group and the control group is the effect size, which is what the editors want authors to focus on.</p>
<p><img alt="Bar chart that compares the effect size of two studies" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/94164d874cc69ffe3763cf5cee64d47b/banned_pvalues.png" style="width: 576px; height: 384px;" /></p>
<p>Can you tell? The truth is that the results from both of these studies could represent either a true treatment effect or a random fluctuation due to sampling error.</p>
<p>So, how do you know? There are three factors at play.</p>
<ul>
<li><strong>Effect size</strong>: The larger the effect size, the less likely it is to be a random fluctuation. Clearly, Study A has a larger effect size. The large effect seems significant, but it’s not enough by itself.</li>
<li><strong>Sample size</strong>: A larger sample size allows you to detect smaller effects. If the sample size for Study B is large enough, its smaller treatment effect may very well be real.</li>
<li><strong>Variability in the data</strong>: The greater the variability, the more likely you’ll see large differences between the experimental groups due to random sampling error. If the variability in Study A is large enough, its larger difference may be attributable to random error rather than a treatment effect.</li>
</ul>
<p>The effect size from either study could be meaningful, or not, depending on the other factors. As you can see, there are scenarios where the larger effect size in Study A can be random error while the smaller effect size in Study B can be a true treatment effect.</p>
<p>Presumably, these statistics will all be reported under the journal's new focus on effect size and descriptive statistics. However, assessing different combinations of effect sizes, sample sizes, and variability gets fairly complicated. The ban forces journal readers to use a subjective eyeball approach to determine whether the difference is a true effect. And this is just for <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/hypothesis-tests/tests-of-means/why-use-2-sample-t/" target="_blank">comparing two means</a>, which is about as simple as it can get! (How the heck would you even perform multiple regression analysis with only descriptive statistics?!)</p>
<p>Wouldn’t it be nice if there was some sort of statistic that incorporated all of these factors and rolled them into one objective number?</p>
<p>Hold on . . . that’s the P value! The <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-correctly-interpret-p-values" target="_blank">P value</a> provides an objective standard for everyone assessing the results from a study.</p>
<p>Now, let’s consider two different experiments that have studied the same treatment and have come up with the following two estimates of the effect size.</p>
<strong>Effect Size Study C</strong>
<strong>Effect Size Study D</strong>
10
10
<p>Which estimate is better? It is pretty hard to say which 10 is better, right? Wouldn’t it be nice if there was a procedure that incorporated the effect size, sample size, and variability to provide a range of probable values <em>and</em> indicate the precision of the estimate?</p>
<p>Oh wait . . . that’s the confidence interval!</p>
<p>If we create the <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-confidence-intervals-and-confidence-levels" target="_blank">confidence intervals</a> for Study C [-5 25] and Study D [8 12], we gain some very valuable information. The confidence interval for Study C is both very wide and contains 0. This estimate is imprecise, and we can't rule out the possibility of no treatment effect. We're not learning anything from this study. On the other hand, the estimate from Study D is both very precise and statistically significant.</p>
<p>The two studies produced the same point estimate of the effect size, but the confidence interval shows that they're actually very different.</p>
<p>Focusing solely on effect sizes and descriptive statistics is inadequate. P values and confidence intervals contribute truly important information that descriptive statistics alone can’t provide. That's why banning them is a mistake.</p>
<p><a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-why-we-need-to-use-hypothesis-tests-in-statistics">See a graphical explanation of how hypothesis tests work</a>.</p>
<p>If you'd like to see some fun examples of hypothesis tests in action, check out my posts about the Mythbusters!</p>
<ul>
<li><a href="http://blog.minitab.com/blog/adventures-in-statistics/busting-the-mythbusters-are-yawns-contagious">Busting the Mythbusters with Statistics: Are Yawns Contagious?</a></li>
<li><a href="http://blog.minitab.com/blog/adventures-in-statistics/using-hypothesis-tests-to-bust-myths-about-the-battle-of-the-sexes">Using Hypothesis Tests to Bust Myths about the Battle of the Sexes</a></li>
</ul>
<p>The editors do raise some legitimate concerns about the hypothesis testing process. In my next post, I assess their arguments and explain why I believe a ban still is not justified.</p>
Data AnalysisHypothesis TestingLearningStatisticsStatistics HelpStatsThu, 30 Apr 2015 12:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/banned-p-values-and-confidence-intervals-a-rebuttal-part-1Jim FrostNo Horsing Around with the Poisson Distribution, Troops
http://blog.minitab.com/blog/quality-data-analysis-and-statistics/no-horsing-around-with-the-poisson-distribution-troops
<p>In 1898, Russian economist Ladislaus Bortkiewicz published his first statistics book entitled <em>Das Gesetz der keinem Zahlen</em><em>,</em> in which he included an example that eventually became famous for illustrating the Poisson distribution. <img alt="horses" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/d76b2523d4819d498f66c0a10250df7b/horses.jpg" style="margin: 10px 15px; float: right; width: 250px; height: 154px;" /></p>
<p><span style="line-height: 18.9090900421143px;">Bortkiewicz </span>researched the annual deaths by horse kicks in the Prussian Army from 1875-1984. Data was recorded from 14 different army corps, with one being the Guard Corps. (According to one Wikipedia article on the subject, the Guard Corps may have been responsible for Prussia’s elite Guard units.) Let's take a closer look at his data and see what Minitab has to say using a Poisson goodness-of-fit test.</p>
<p>Here's the data set (thank you, <a href="http://www.math.uah.edu/stat/data/HorseKicks.html" target="_blank">University of Alabama in Huntsville</a>):</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f7e1af57-c25e-4ec3-a999-2166d525717e/Image/823a75a6dcf75897e05eaaa468d7350c/data_set.PNG" style="width: 997px; height: 466px;" /><br />
</p>
What Is the Poisson Distribution?
<p>As a review, the Poisson distribution is a discrete probability distribution for the counts of events that occur randomly in a given interval of time or space. The Poisson distribution only has one parameter, which is called lambda (or mean). To divert your attention just a little bit before we run our goodness-of-fit test, let’s look at how the distribution changes with different values of lambda. <span style="line-height: 1.6;">Go to </span><strong style="line-height: 1.6;">Graph > </strong><strong style="line-height: 1.6;">Probability Distribution Plot > </strong><strong style="line-height: 1.6;">View Single</strong><span style="line-height: 1.6;">. Select <em>Poisson </em>from the Distribution drop-down and enter in <em>.5</em> for the mean, then press <em>OK</em>:</span></p>
<p align="center"><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f7e1af57-c25e-4ec3-a999-2166d525717e/Image/770927a9991955cf769dc73e508af2f4/pic1.png" style="width: 415px; height: 334px;" /></p>
<p>After I created my first plot, I created 3 more probability distribution plots with lambda at 2, 4, 10. I then used Minitab’s Layout Tool under the <a href="http://blog.minitab.com/blog/marilyn-wheatleys-blog/getting-the-most-out-of-your-text-data-part-iii">Editor Menu</a> to combine four graphs.</p>
<p>As lambda increases, the graphs begin to resemble a normally distributed curve:</p>
<p style="text-align: center;"><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f7e1af57-c25e-4ec3-a999-2166d525717e/Image/3e77aef5baac0edd39721bdf9a57a0de/pic2.png" style="width: 577px; height: 385px;" /></p>
Does This Data Follow a Poisson Distribution?
<p><span style="line-height: 1.6;">Interesting, right? But let's get back on track and test if the overall data obtained by Bortkiewicz follows a Poisson distribution. </span></p>
<p><span style="line-height: 1.6;">I first had to stack the data from 14 columns into one column. This is done via </span><strong style="line-height: 1.6;">Data > Stack > Columns…</strong></p>
<p style="text-align: center;"><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f7e1af57-c25e-4ec3-a999-2166d525717e/Image/7a29a555220937423f326e10d1fe2475/pic3.png" style="width: 461px; height: 335px;" /></p>
<p><span style="line-height: 1.6;">With the data stacked, I went to</span><strong style="line-height: 1.6;"> Stat > Basic Statistics > Goodness-of-Fit for Poisson…, </strong><span style="line-height: 1.6;">filling out the dialog as shown below:</span></p>
<p align="center"><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f7e1af57-c25e-4ec3-a999-2166d525717e/Image/b8dd1fc71f1cccbc277b3b35770b037e/pic4.png" style="width: 434px; height: 334px;" /></p>
<p>After I clicked OK, Minitab delivered the following results:</p>
<p style="text-align: center;"><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f7e1af57-c25e-4ec3-a999-2166d525717e/Image/b6e13edc14af23b6c5f1b27b1616e066/results.PNG" style="width: 339px; height: 216px;" /></p>
<p><span style="line-height: 1.6;">The Poisson mean, or lambda, is 0.70. This means that we can expect, on average, 0.70 deaths per one corps per one year. If I knew of these statistics and served in the army corps at that time, I would have treated my horse like gold. Anything my horse wants, it gets.</span></p>
<p>Further down you’ll see a table showing the observed counts and the Expected Counts for the number of deaths by horse. The expected counts visually mirror pretty well to what was observed. To further validate these claims that this data can be modeled by a Poisson distribution, we can use the p-value for the Goodness-of-Fit Test in the last section of the output.</p>
<p>The hypothesis for the Chi-Square Goodness-of-Fit test for Poisson is:</p>
<p style="margin-left: 40px;">Ho: The data follow a Poisson distribution</p>
<p style="margin-left: 40px;">H1: The data do not follow a Poisson distribution</p>
<p>We are going to use an alpha level of 0.05. Since our p-value is greater than our alpha, we can say that we do not have enough evidence to reject the null hypothesis, which is that the horse kick deaths per year follow a Poisson distribution.</p>
<p>The chart below shows how close the both the expected and observed values for deaths are to each other. </p>
<p style="text-align: center;"><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f7e1af57-c25e-4ec3-a999-2166d525717e/Image/8200b5c574e8824a563089db792909e8/pic5.png" style="width: 577px; height: 385px;" /></p>
<p><span style="line-height: 1.6;">I've been thinking about what other data could have been collected to serve as potential predictors if we wanted to do a poisson regression. We could then see if there were any significant relationships between our horse kick death counts and some factor of interest. Maybe corps location or horse breed could have been documented? Given that the space or unit of time is considered one year, that specific location or breed would have to be the same value for the entire length of that time. For example, Corps 14 in 1893 must have remained entirely in “Location A” during that year, or every horse in a particular corps must be of the same breed for a particular year.</span></p>
<p>According to <a href="http://equusmagazine.com/article/whyhorseskick_012307-8294">equusmagazine.com</a>, horses kick for six reasons:</p>
<ul>
<li>"I feel threatened."</li>
<li>"I feel good."</li>
<li>"I hurt."</li>
<li>"I feel frustrated."</li>
<li>"Back off."</li>
<li>"I'm the boss around here."</li>
</ul>
<p>Wouldn’t this have made for a great categorical variable?</p>
<p> </p>
Data AnalysisFun StatisticsHypothesis TestingStatisticsStatistics HelpTue, 14 Apr 2015 12:00:00 +0000http://blog.minitab.com/blog/quality-data-analysis-and-statistics/no-horsing-around-with-the-poisson-distribution-troopsAndy CheshireUnderstanding Hypothesis Tests: Confidence Intervals and Confidence Levels
http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-confidence-intervals-and-confidence-levels
<p>In this series of posts, I show how hypothesis tests and confidence intervals work by focusing on concepts and graphs rather than equations and numbers. </p>
<p>Previously, I used graphs to <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-significance-levels-alpha-and-p-values-in-statistics" target="_blank">show what statistical significance really means</a>. In this post, I’ll explain both confidence intervals and confidence levels, and how they’re closely related to P values and significance levels.</p>
How to Correctly Interpret Confidence Intervals and Confidence Levels
<p><img alt="Illustration of confidence levels" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/a9bd1376510c8289a0daf15f5bcd376f/ci.gif" style="float: right; width: 327px; height: 224px;" />A confidence interval is a range of values that is likely to contain an unknown population parameter. If you draw a random sample many times, a certain percentage of the confidence intervals will contain the population mean. This percentage is the confidence level.</p>
<p>Most frequently, you’ll use confidence intervals to bound the mean or standard deviation, but you can also obtain them for regression coefficients, proportions, rates of occurrence (Poisson), and for the differences between populations.</p>
<p>Just as there is a common <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-correctly-interpret-p-values" target="_blank">misconception of how to interpret P values</a>, there’s a common misconception of how to interpret confidence intervals. In this case, the confidence level is<em> <strong>not </strong></em>the probability that a specific confidence interval contains the population parameter.</p>
<p>The confidence level represents the theoretical ability of the analysis to produce accurate intervals if you are able to assess <em>many intervals</em> and you know the value of the population parameter. For a <em>specific</em> confidence interval from one study, the interval either contains the population value or it does not—there’s no room for probabilities other than 0 or 1. And you can't choose between these two possibilities because you don’t know the value of the population parameter.</p>
<p style="margin-left: 40px;">"The parameter is an unknown constant and no probability statement concerning its value may be made." <br />
<em><span style="line-height: 1.6;">—Jerzy Neyman, original developer of confidence intervals.</span></em></p>
<p>This will be easier to understand after we discuss the graph below . . .</p>
<p>With this in mind, how <em>do</em> you interpret confidence intervals?</p>
<p>Confidence intervals serve as good estimates of the population parameter because the procedure tends to produce intervals that contain the parameter. Confidence intervals are comprised of the point estimate (the most likely value) and a margin of error around that point estimate. The margin of error indicates the amount of uncertainty that surrounds the sample estimate of the population parameter.</p>
<p>In this vein, you can use confidence intervals to assess the precision of the sample estimate. For a specific variable, a narrower confidence interval [90 110] suggests a more precise estimate of the population parameter than a wider confidence interval [50 150].</p>
Confidence Intervals and the Margin of Error
<p>Let’s move on to see how confidence intervals account for that margin of error. To do this, we’ll use the same tools that we’ve been using to understand hypothesis tests. I’ll create a <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/introductory-concepts/basic-concepts/sampling-distribution/" target="_blank">sampling distribution</a> using probability distribution plots, the <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/probability-distributions-and-random-data/distributions/t-distribution/" target="_blank">t-distribution</a>, and the variability in our data. We'll base our confidence interval on the <a href="http://support.minitab.com/datasets/FamilyEnergyCost.MTW">energy cost data set</a> that we've been using.</p>
<p>When we looked at <a href="http://blog.minitab.com/blog/adventures-in-statistics/when-should-i-use-confidence-intervals-prediction-intervals-and-tolerance-intervals" target="_blank">significance levels</a>, the graphs displayed a sampling distribution centered on the null hypothesis value, and the outer 5% of the distribution was shaded. For confidence intervals, we need to shift the sampling distribution so that it is centered on the sample mean and shade the middle 95%.</p>
<p><img alt="Probability distribution plot that illustrates how a confidence interval works" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/80de5f2397507752d74ffff86fbd94ea/ci_sample_mean.png" style="width: 576px; height: 384px;" /></p>
<p>The shaded area shows the range of sample means that you’d obtain 95% of the time using our sample mean as the point estimate of the population mean. This range [267 394] is our 95% confidence interval.</p>
<p>Using the graph, it’s easier to understand how a specific confidence interval represents the margin of error, or the amount of uncertainty, around the point estimate. The sample mean is the most likely value for the population mean given the information that we have. However, the graph shows it would not be unusual at all for other random samples drawn from the same population to obtain different sample means within the shaded area. These other likely sample means all suggest different values for the population mean. Hence, the interval represents the inherent uncertainty that comes with using sample data.</p>
<p>You can use these graphs to calculate probabilities for specific values. However, notice that you can’t place the population mean on the graph because that value is unknown. Consequently, you can’t calculate probabilities for the population mean, just as Neyman said!</p>
Why P Values and Confidence Intervals Always Agree About Statistical Significance
<p>You can use either P values or confidence intervals to determine whether your results are statistically significant. If a hypothesis test produces both, these results will agree.</p>
<p>The confidence level is equivalent to 1 – the alpha level. So, if your significance level is 0.05, the corresponding confidence level is 95%.</p>
<ul>
<li>If the P value is less than your significance (alpha) level, the hypothesis test is statistically significant.</li>
<li>If the confidence interval does not contain the null hypothesis value, the results are statistically significant.</li>
<li>If the P value is less than alpha, the confidence interval will not contain the null hypothesis value.</li>
</ul>
<p>For our example, the P value (0.031) is less than the significance level (0.05), which indicates that our results are statistically significant. Similarly, our 95% confidence interval [267 394] does not include the null hypothesis mean of 260 and we draw the same conclusion.</p>
<p>To understand why the results always agree, let’s recall how both the significance level and confidence level work.</p>
<ul>
<li>The significance level defines the distance the sample mean must be from the null hypothesis to be considered statistically significant.</li>
<li>The confidence level defines the distance for how close the confidence limits are to sample mean.</li>
</ul>
<p>Both the significance level and the confidence level define a distance from a limit to a mean. Guess what? The distances in both cases are exactly the same!</p>
<p>The distance equals the <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/hypothesis-tests/basics/what-is-a-critical-value/" target="_blank">critical t-value</a> * <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/hypothesis-tests/tests-of-means/what-is-the-standard-error-of-the-mean/" target="_blank">standard error of the mean</a>. For our energy cost example data, the distance works out to be $63.57.</p>
<p>Imagine this discussion between the null hypothesis mean and the sample mean:</p>
<p><strong>Null hypothesis mean, hypothesis test representative</strong>: Hey buddy! I’ve found that you’re statistically significant because you’re more than $63.57 away from me!</p>
<p><strong>Sample mean, confidence interval representative</strong>: Actually, I’m significant because <em>you’re</em> more than $63.57 away from <em>me</em>!</p>
<p>Very agreeable aren’t they? And, they always will agree as long as you compare the correct pairs of P values and confidence intervals. If you compare the incorrect pair, you can get conflicting results, as shown by common mistake #1 in this <a href="http://blog.minitab.com/blog/real-world-quality-improvement/3-common-and-dangerous-statistical-misconceptions" target="_blank">post</a>.</p>
Closing Thoughts
<p>In statistical analyses, there tends to be a greater focus on P values and simply detecting a significant effect or difference. However, a statistically significant effect is not necessarily meaningful in the real world. For instance, the effect might be too small to be of any practical value.</p>
<p>It’s important to pay attention to the both the magnitude and the precision of the estimated effect. That’s why I'm rather fond of confidence intervals. They allow you to assess these important characteristics along with the statistical significance. You'd like to see a narrow confidence interval where the entire range represents an effect that is meaningful in the real world.</p>
<p>For more about confidence intervals, read my post where I <a href="http://blog.minitab.com/blog/adventures-in-statistics/when-should-i-use-confidence-intervals-prediction-intervals-and-tolerance-intervals">compare them to tolerance intervals and prediction intervals</a>.</p>
<p>If you'd like to see how I made the probability distribution plot, please read: <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-create-a-graphical-version-of-the-1-sample-t-test-in-minitab">How to Create a Graphical Version of the 1-sample t-Test</a>.</p>
Data AnalysisHypothesis TestingLearningStatisticsStatistics HelpStatsThu, 02 Apr 2015 12:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-confidence-intervals-and-confidence-levelsJim FrostHow Could You Benefit from a Box-Cox Transformation?
http://blog.minitab.com/blog/applying-statistics-in-quality-projects/how-could-you-benefit-from-a-box-cox-transformation
<p>Imagine that you are watching a race and that you are located close to the finish line. When the first and fastest runners complete the race, the differences in times between them will probably be quite small.</p>
<p>Now wait until the last runners arrive and consider their finishing times. For these slowest runners, the differences in completion times will be extremely large. This is due to the fact that for longer racing times a small difference in speed will have a significant impact on completion times, whereas for the fastest runners, small differences in speed will have a small (but decisive) impact on arrival times.</p>
<p>This phenomenon is called “<a href="http://blog.minitab.com/blog/adventures-in-statistics/curing-heteroscedasticity-with-weighted-regression-in-minitab-statistical-software">heteroscedasticity</a>” (non-constant variance). In this example, the amount of Variation depends on the average value (small variations for shorter completion times, large variations for longer times).</p>
<p><img alt="" spellcheck="true" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/cb6a4d6b498b3525a6d18d579db20557/race.JPG" style="width: 781px; height: 120px;" /></p>
<p>This distribution of running times data will probably not follow the familiar bell-shaped curve (a.k.a. the normal distribution). The resulting distribution will be asymmetrical with a longer tail on the right side. This is because there's small variability on the left side with a short tail for smaller running times, and larger variability for longer running times on the right side, hence the longer tail.</p>
<p><img alt="" spellcheck="true" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/a70fc75a0255884e65f231ac072d42c2/distribution_plot.jpg" style="width: 578px; height: 344px;" /></p>
<p>Why does this matter?</p>
<ul>
<li>Model bias and spurious interactions: If you are performing a regression or a design of experiments (any statistical modelling), this asymmetrical behavior may lead to a bias in the model. If a factor has a significant effect on the average speed, because the variability is much larger for a larger average running time, many factors will seem to have a stronger effect when the mean is larger. This is not due, however, to a true factor effect but rather to an increased amount of variability that affects all factor effect estimates when the mean gets larger. This will probably generate spurious interactions due to a non-constant variation, resulting in a very complex model with many (spurious and unrealistic) interactions.</li>
<li>If you are performing a standard capability analysis, this analysis is based on the normality assumption. A substantial departure from normality will bias your capability estimates.</li>
</ul>
The Box-Cox Transformation
<p>One solution to this is to transform your data into normality using a Box-Cox transformation. Minitab will select the best mathematical function for this data transformation. The objective is to obtain a normal distribution of the transformed data (after transformation) and a constant variance.</p>
<p>Consider the asymmetrical function below :</p>
<p><img alt="" spellcheck="true" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/796a7b0d27c6613ac17f983c839701e5/transformed_distribution.jpg" style="width: 515px; height: 326px;" /></p>
<p> <img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/9b0c5998839682f685055bb5168ab540/log_function.JPG" style="width: 437px; height: 313px;" /></p>
<p>If a logarithmic transformation is applied to this distribution, the differences between smaller values will be expanded (because the slope of the logarithmic function is steeper when values are small) whereas the differences between larger values will be reduced (because of the very moderate slope of the log distribution for larger values). If you inflate differences on the left tail and reduce differences on the right side tail, the result will be a symmetrical normal distribution, and a variance that is now constant (whatever the mean). This is the reason why in the <a href="http://www.minitab.com/products/minitab">Minitab Assistant</a>, a Box- Cox transformation is suggested whenever this is possible for non-normal data, and why in the <span style="line-height: 18.9090900421143px;">Minitab </span><span style="line-height: 1.6;">regression or DOE (design of experiments) dialogue boxes, the Box-Cox transformation is an option that anyone may consider if needed to transform residual data into normality.</span></p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/fc4ec75ca192d81dabf8556ebdf751e8/transformation.JPG" style="width: 430px; height: 611px;" /></p>
<p>The diagram above illustrates how, thanks to a Box-Cox transformation, performed by the Minitab Assistant (in a capability analysis), an asymmetrical distribution has been transformed into a normal symmetrical distribution (with a successful normality test).</p>
Box-Cox Transformation and Variable Scale
<p>Note that Minitab will search for the best transformation function, which may not necessarily be a logarithmic transformation.</p>
<p>As a result of this transformation, the physical scale of your variable may be altered. When looking at a capability graph, one may not recognize his typical values for the variable scale (after transformation). However, the estimated Ppk and Pp capability indices will be reliable and based on a normal distribution. Similarly, in a regression model, you need to be aware that the coefficients will be modified, although the transformation is obviously useful to remove spurious interactions and to identify the factors that are really significant.</p>
Data AnalysisDesign of ExperimentsHypothesis TestingLearningQuality ImprovementRegression AnalysisStatisticsStatistics HelpStatsMon, 30 Mar 2015 12:00:00 +0000http://blog.minitab.com/blog/applying-statistics-in-quality-projects/how-could-you-benefit-from-a-box-cox-transformationBruno ScibiliaHow to Create a Graphical Version of the 1-sample t-Test in Minitab
http://blog.minitab.com/blog/adventures-in-statistics/how-to-create-a-graphical-version-of-the-1-sample-t-test-in-minitab
<p>This is a companion post for a series of blog posts about understanding hypothesis tests. In this series, I create a graphical equivalent to a 1-sample t-test and confidence interval to help you understand how it works more intuitively.</p>
<p>This post focuses entirely on the steps required to create the graphs. It’s a fairly technical and task-oriented post designed for those who need to create the graphs for illustrative purposes. If you’d instead like to gain a better understanding of the concepts behind the graphs, please see the following posts:</p>
<ul>
<li><a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-why-we-need-to-use-hypothesis-tests-in-statistics" target="_blank">Understanding Hypothesis Tests: Why We Need to Use Hypothesis Tests</a></li>
<li><a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-significance-levels-alpha-and-p-values-in-statistics" target="_blank">Understanding Hypothesis Tests: The Significance Level and P Values</a></li>
<li><a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-confidence-intervals-and-confidence-levels" target="_blank">Understanding Hypothesis Tests: Confidence Intervals and Confidence Levels</a></li>
</ul>
<p>To create the following graphs, we’ll use Minitab’s probability distribution plots in conjunction with several statistics obtained from the 1-sample t output. If you’d like more information about the formulas that are involved, you can find them in Minitab at: <strong>Help > Methods and Formulas > Basic Statistics > 1-Sample t</strong>.</p>
<p>The data for this example is <a href="http://support.minitab.com/datasets/FamilyEnergyCost.MTW">FamilyEnergyCost</a> and it is just one of the many data set examples that can be found in <a href="http://support.minitab.com/datasets/">Minitab’s Data Set Library</a>. We’ll perform the regular 1-sample t-test with a null hypothesis mean of 260, and then graphically recreate the results. </p>
<p><img alt="1-sample t-test output from Minitab statistical software" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/e26965956d7d682888dd0c749e10f7af/1t_swo.png" style="width: 485px; height: 123px;" /></p>
How to Graph the Two-Tailed Critical Region for a Significance Level of 0.05
<p>To create a graphical equivalent to a 1-sample t-test, we’ll need to graph the <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/probability-distributions-and-random-data/distributions/t-distribution/" target="_blank">t-distribution</a> using the correct number of degrees of freedom. For a 1-sample t-test, the degrees of freedom equals the sample size minus 1. So, that’s 24 degrees of freedom for our sample of 25.</p>
<ol>
<li>In Minitab, choose: <strong>Graph > Probability Distribution Plot > View Probability</strong>.</li>
<li>In <strong>Distribution</strong>, select <strong>t</strong>.</li>
<li>In <strong>Degrees of freedom</strong>, enter <em>24</em>.</li>
<li>Click the <strong>Shaded Area</strong> tab.</li>
<li>In <strong>Define Shaded Area By</strong>, select <strong>Probability</strong> and <strong>Both Tails</strong>.</li>
<li>In <strong>Probability</strong>, enter <em>0.05</em>.</li>
<li>Click <strong>OK</strong>.</li>
</ol>
<p>You should see this graph.</p>
<p><img alt="Probability distribution plot of t-values" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/4201bc7734483312e6056e023f12272b/t_value_plot_crtical_region.png" style="width: 576px; height: 384px;" /></p>
<p>This graph shows the distribution of t-values for a sample of our size with the t-values for the end points of the critical region. The <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/hypothesis-tests/tests-of-means/what-is-a-t-value/" target="_blank">t-value</a> for our sample mean is 2.29 and it falls within the critical region.</p>
<p>For my blog posts, I thought displaying the x-axis in the same units as our measurement variable (energy costs) would make the graph easier to understand. To do this, we need to transform the x-axis scale from t-values to energy costs.</p>
<p>Transforming the t-values to energy costs for a distribution centered on the null hypothesis mean requires a simple calculation:</p>
<p style="margin-left: 40px;">Energy Cost = Null Hypothesis Mean + (t-value * SE Mean)</p>
<p>We’ll use the null hypothesis value that we entered in the dialog box (260) and the SE Mean value that appears in the 1-sample t-test output (30.8). We need to calculate the energy cost values for all of the t-values that will appear on the x-axis (-4 to +4).</p>
<p>For example, a t-value of 1 equals 290.8 (260 + (1*30.8). Zero is the null hypothesis value, which is 260.</p>
<p>Next, we need to replace the t-values with the energy cost equivalents.</p>
<ol>
<li>Choose <strong>Editor > Select Item > X Scale</strong>.</li>
<li>Choose <strong>Editor > Edit X Scale</strong>.</li>
<li>In <strong>Major Tick Position</strong>, choose <strong>Number of Ticks</strong> and enter <em>9</em>.</li>
<li>Click the <strong>Show</strong> tab and check the <strong>Low</strong> check box for <strong>Major ticks</strong> and <strong>Major tick labels</strong>.</li>
<li>Click the <strong>Labels</strong> tab of the dialog box that appears. Enter the energy cost values that you calculated as shown below. I use rounded values to keep the x-axis tidy. Click <strong>OK</strong>.</li>
</ol>
<p><img alt="Dialog box for showing the transformed values on the x-scale" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/f96d38fa628633a7e4ff3a12b44483c0/edit_scale_dialog.png" style="width: 400px; height: 348px;" /></p>
<p>You should see this graph. To cleanup the x-axis, I had to delete the t-values that were still showing from before. Simply click each t-value once and press the <strong>Delete</strong> key.</p>
<p><img alt="Probability distribution plot of t-distribution with the x-scale transformed to energy costs" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/69de6d5f41cf0703764895b379c5d3fb/t_value_plot_crtical_region2.png" style="width: 576px; height: 384px;" /></p>
<p>Let’s add a reference line to show where our sample mean falls within the sampling distribution and critical region. The trick here is that the x-axis still uses t-values despite displaying the energy costs. We need to use the t-value for our sample mean that appears in the 1-sample t output (2.29).</p>
<ol>
<li>Choose <strong>Editor > Add > Reference Lines</strong>.</li>
<li>In <strong>Show reference lines at X values</strong>, enter<em> 2.29.</em></li>
<li>Click <strong>OK</strong>.</li>
<li>Double click the <em>2.29</em> that now appears on the graph.</li>
<li>In the dialog box that appears, enter <em>330.6</em> in <strong>Text</strong>.</li>
<li>Click <strong>OK</strong>.</li>
</ol>
<p>After editing the title and the x-axis label, you should have a graph similar to the one below.</p>
<p><img alt="Probability distribution plot with two-tailed critical region for a significance level of 0.05" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/212878044412db4ec165745b18c010e8/sig_level_05.png" style="width: 576px; height: 384px;" /></p>
How to Graph the P Value for a 1-sample t-Test
<p>To do this, we’ll duplicate the graph we created above and then modify it. This allows us to reuse some of the work that we’ve already done.</p>
<ol>
<li>Make sure the graph we created is selected.</li>
<li>Choose <strong>Editor > Duplicate Graph</strong>.</li>
<li>Double click the blue distribution curve on the graph.</li>
<li>Click the <strong>Shaded Area</strong> tab in the dialog box that appears.</li>
<li>In <strong>Define Shaded Area By</strong>, select <strong>X Value</strong> and <strong>Both Tails</strong>.</li>
<li>In <strong>X value</strong>, enter <em>2.29</em>.</li>
<li>Click <strong>OK</strong>.</li>
</ol>
<p>You’ll need to edit the graph title and delete some extra numbers on the x-axis. After these edits, you should have a graph similar to this one.</p>
<p><img alt="Probability distribution plot that displays the p-value for our sample mean" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/4a599dfe53a1c065837de772a5b157fb/p_value.png" style="width: 576px; height: 384px;" /></p>
How to Graph the Confidence Interval for a 1-sample t-test
<p>To graphically recreate the confidence interval, we’ll need to start from scratch for this graph. </p>
<ol>
<li>In Minitab, choose: <strong>Graph > Probability Distribution Plot > View Probability</strong>.</li>
<li>In <strong>Distribution</strong>, select <strong>t</strong>.</li>
<li>In <strong>Degrees of freedom</strong>, enter <em>24</em>.</li>
<li>Click the <strong>Shaded Area</strong> tab.</li>
<li>In <strong>Define Shaded Area By</strong>, select <strong>Probability</strong> and <strong>Middle</strong>.</li>
<li>Enter <em>0.025</em> in both <strong>Probability 1</strong> and <strong>Probability 2</strong>.</li>
<li>Click <strong>OK</strong>.</li>
</ol>
<p>Your graph should look like this:</p>
<p><img alt="Probability distribution plot that represents a confidence interval with t-values" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/9ec475422d97330eb236d76f37ee576a/ci_t_values.png" style="width: 576px; height: 384px;" /></p>
<p>Like before, we’ll need to transform the x-axis into energy costs. For this graph, I’ll only display the x-values for the end points of the confidence interval and the sample mean. So, we need to convert the three t-values of -2.064, 0, 2.064.</p>
<p>The equation to transform the t-values to energy costs for a distribution centered on the sample mean is:</p>
<p style="margin-left: 40px;">Energy Cost = Sample Mean + (t-score * SE Mean)</p>
<p>We obtain the following rounded values that represent the lower confidence limit, sample mean, and upper confidence limit: 267, 330.6, 394.</p>
<p>Simply double click the values in the x-axis to edit each individual label. Replace the t-value with the energy cost value. After editing the graph title, you should have a visual representation of the confidence interval that looks like this. I rounded the values for the confidence limits.</p>
<p><img alt="Probability distribution plot that displays a visual representation of a 95% confidence interval around the sample mean" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/80de5f2397507752d74ffff86fbd94ea/ci_sample_mean.png" style="width: 576px; height: 384px;" /></p>
Consider Using Minitab's Command Language
<p>When I create multiple graphs that involve many steps, I generally use Minitab's command language. This may sound daunting if you're not familiar with using this command language. However, Minitab makes this easier for you.</p>
<p>After you create one graph, choose <strong>Editor > Copy Command Language</strong>, and paste it into a text editor, such as Notepad. Save the file with the extension *.mtb and you have a Minitab Exec file. This Exec file contains all of the edits you made. Now, you can easily create similar graphs simply by modifying the parts that you want to change.</p>
<p>You can also get help for the command language right in Minitab. First, make sure the command prompt is enabled by choosing <strong>Editor > Enable Commands</strong>. At the prompt, type <em>help dplot</em>, and Minitab displays the help specific to probability distribution plots!</p>
<p>To run an exec file, choose <strong>File > Other Files > Run an Exec</strong>. Click <strong>Select File</strong> and browse to the file you saved. Here are the MTB files for my graphs for the <a href="//cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/File/e4994557f813b872b03687363259faa2/prob_plot_alpha.mtb">critical region</a>, <a href="//cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/File/3a0e83912f4826db06ee8c0777a5cf73/prob_plot_p.mtb">P value</a>, and <a href="//cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/File/00f225aa85ce31c80cfae24c1704fa1c/ci_sample.mtb">confidence interval</a>.</p>
<p>Happy graphing!</p>
Hypothesis TestingSix SigmaWed, 25 Mar 2015 12:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/how-to-create-a-graphical-version-of-the-1-sample-t-test-in-minitabJim FrostUnderstanding Hypothesis Tests: Significance Levels (Alpha) and P values in Statistics
http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-significance-levels-alpha-and-p-values-in-statistics
<p>What do significance levels and P values mean in hypothesis tests? What <em>is </em>statistical significance anyway? In this post, I’ll continue to focus on concepts and graphs to help you gain a more intuitive understanding of how hypothesis tests work in statistics.</p>
<p>To bring it to life, I’ll add the significance level and P value to the graph in my previous post in order to perform a graphical version of the 1 sample t-test. It’s easier to understand when you can see what statistical significance truly means!</p>
<p>Here’s where we left off in <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-why-we-need-to-use-hypothesis-tests-in-statistics" target="_blank">my last post</a>. We want to determine whether our sample mean (330.6) indicates that this year's average energy cost is significantly different from last year’s average energy cost of $260.</p>
<p><img alt="Descriptive statistics for the example" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/135cd05bde9f7f16ab396a8525d2b09c/desc_stats.png" style="width: 302px; height: 87px;" /></p>
<p><img alt="Probability distribution plot for our example" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/76699c4c1f2bd6c83b88c1ac8e93aa54/sampling_dist_null.png" style="width: 595px; height: 397px;" /></p>
<p>The graph above shows the <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/introductory-concepts/basic-concepts/sampling-distribution/" target="_blank">distribution of sample means</a> we’d obtain under the assumption that the null hypothesis is true (population mean = 260) and we repeatedly drew a large number of random samples.</p>
<p>I left you with a question: where do we draw the line for statistical significance on the graph? Now we'll add in the significance level and the P value, which are the decision-making tools we'll need.</p>
<p>We'll use these tools to test the following hypotheses:</p>
<ul>
<li>Null hypothesis: The population mean equals the hypothesized mean (260).</li>
<li>Alternative hypothesis: The population mean differs from the hypothesized mean (260).</li>
</ul>
What Is the Significance Level (Alpha)?
<p>The significance level, also denoted as alpha or α, is the probability of rejecting the null hypothesis when it is true. For example, a significance level of 0.05 indicates a 5% risk of concluding that a difference exists when there is no actual difference.</p>
<p>These types of definitions can be hard to understand because of their technical nature. A picture makes the concepts much easier to comprehend!</p>
<p>The significance level determines how far out from the null hypothesis value we'll draw that line on the graph. To graph a significance level of 0.05, we need to shade the 5% of the distribution that is furthest away from the null hypothesis.</p>
<p><img alt="Probability plot that shows the critical regions for a significance level of 0.05" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/212878044412db4ec165745b18c010e8/sig_level_05.png" style="width: 595px; height: 397px;" /></p>
<p>In the graph above, the two shaded areas are equidistant from the null hypothesis value and each area has a probability of 0.025, for a total of 0.05. In statistics, we call these shaded areas the <em>critical region</em> for a two-tailed test. If the population mean is 260, we’d expect to obtain a sample mean that falls in the critical region 5% of the time. The critical region defines how far away our sample statistic must be from the null hypothesis value before we can say it is unusual enough to reject the null hypothesis.</p>
<p>Our sample mean (330.6) falls within the critical region, which indicates it is statistically significant at the 0.05 level.</p>
<p>We can also see if it is statistically significant using the other common significance level of 0.01.</p>
<p><img alt="Probability plot that shows the critical regions for a significance level of 0.01" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/8744b853f28396be001c2ee9678a9c14/sig_level_01.png" style="width: 595px; height: 397px;" /></p>
<p>The two shaded areas each have a probability of 0.005, which adds up to a total probability of 0.01. This time our sample mean does not fall within the critical region and we fail to reject the null hypothesis. This comparison shows why you need to choose your significance level before you begin your study. It protects you from choosing a significance level because it conveniently gives you significant results!</p>
<p>Thanks to the graph, we were able to determine that our results are statistically significant at the 0.05 level without using a P value. However, when you use the numeric output produced by <a href="http://www.minitab.com/en-us/products/minitab/features/" target="_blank">statistical software</a>, you’ll need to compare the P value to your significance level to make this determination.</p>
What Are P values?
<p>P-values are the probability of obtaining an effect at least as extreme as the one in your sample data, assuming the truth of the null hypothesis.</p>
<p>This definition of P values, while technically correct, is a bit convoluted. It’s easier to understand with a graph!</p>
<p>To graph the P value for our example data set, we need to determine the distance between the sample mean and the null hypothesis value (330.6 - 260 = 70.6). Next, we can graph the probability of obtaining a sample mean that is at least as extreme in both tails of the distribution (260 +/- 70.6).</p>
<p><img alt="Probability plot that shows the p-value for our sample mean" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/4a599dfe53a1c065837de772a5b157fb/p_value.png" style="width: 595px; height: 397px;" /></p>
<p>In the graph above, the two shaded areas each have a probability of 0.01556, for a total probability 0.03112. This probability represents the likelihood of obtaining a sample mean that is at least as extreme as our sample mean in both tails of the distribution if the population mean is 260. That’s our P value!</p>
<p>When a P value is less than or equal to the significance level, you reject the null hypothesis. If we take the P value for our example and compare it to the common significance levels, it matches the previous graphical results. The P value of 0.03112 is statistically significant at an alpha level of 0.05, but not at the 0.01 level.</p>
<p>If we stick to a significance level of 0.05, we can conclude that the average energy cost for the population is greater than 260.</p>
<p>A common mistake is to interpret the P-value as the probability that the null hypothesis is true. To understand why this interpretation is incorrect, please read my blog post <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-correctly-interpret-p-values">How to Correctly Interpret P Values</a>.</p>
Discussion about Statistically Significant Results
<p>A hypothesis test evaluates two mutually exclusive statements about a population to determine which statement is best supported by the sample data. A test result is statistically significant when the sample statistic is unusual enough relative to the null hypothesis that we can reject the null hypothesis for the entire population. “Unusual enough” in a hypothesis test is defined by:</p>
<ul>
<li>The assumption that the null hypothesis is true—the graphs are centered on the null hypothesis value.</li>
<li>The significance level—how far out do we draw the line for the critical region?</li>
<li>Our sample statistic—does it fall in the critical region?</li>
</ul>
<p>Keep in mind that there is no magic significance level that distinguishes between the studies that have a true effect and those that don’t with 100% accuracy. The common alpha values of 0.05 and 0.01 are simply based on tradition. For a significance level of 0.05, expect to obtain sample means in the critical region 5% of the time when <em>the</em> <em>null hypothesis is</em> <em>true</em>. In these cases, you won’t know that the null hypothesis is true but you’ll reject it because the sample mean falls in the critical region. That’s why the significance level is also referred to as an <em>error</em> rate!</p>
<p>This <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/hypothesis-tests/basics/type-i-and-type-ii-error/" target="_blank">type of error</a> doesn’t imply that the experimenter did anything wrong or require any other unusual explanation. The graphs show that when the null hypothesis is true, it is possible to obtain these unusual sample means for no reason other than random sampling error. It’s just luck of the draw.</p>
<p>Significance levels and P values are important tools that help you quantify and control this type of error in a hypothesis test. Using these tools to decide when to reject the null hypothesis increases your chance of making the correct decision.</p>
<p>In my next post, I’ll continue to use this graphical framework to help you <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-confidence-intervals-and-confidence-levels">understand confidence intervals and confidence levels</a>!</p>
<p>If you'd like to see how I made these graphs, please read: <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-create-a-graphical-version-of-the-1-sample-t-test-in-minitab" target="_blank">How to Create a Graphical Version of the 1-sample t-Test</a>.</p>
Data AnalysisHypothesis TestingLearningStatisticsStatistics HelpStatsThu, 19 Mar 2015 12:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-significance-levels-alpha-and-p-values-in-statisticsJim FrostP-value Roulette: Making Hypothesis Testing a Winner’s Game
http://blog.minitab.com/blog/rkelly/p-value-roulette-making-hypothesis-testing-a-winner%E2%80%99s-game
<p>Welcome to the Hypothesis Test Casino! The featured game of the house is roulette. But this is no <em>ordinary</em> game of roulette. This is p-value roulette!</p>
<p>Here’s how it works: We have two roulette wheels, the Null wheel and the Alternative wheel. Each wheel has 20 slots (instead of the usual 37 or 38). You get to bet on one slot.</p>
<p><img alt="http://upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Edvard_Munch_-_At_the_Roulette_Table_in_Monte_Carlo_-_Google_Art_Project.jpg/256px-Edvard_Munch_-_At_the_Roulette_Table_in_Monte_Carlo_-_Google_Art_Project.jpg" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/8647ae2930d63e128d09f0b2cc5cdb87/p_value_roulette.jpg" style="line-height: 20.7999992370605px; border-width: 1px; border-style: solid; margin: 10px 15px; width: 256px; height: 166px; float: right;" /></p>
<p>What happens if the ball lands in the slot you bet on? Well, that depends on which wheel we spin. If we spin the Null wheel, you lose your bet. But if we spin the Alternative wheel, you win!</p>
<p>I’m sorry, but we can’t tell you which wheel we’re spinning.</p>
<p>Doesn’t that sound like a good game?</p>
<p>Not convinced yet? I assure you the odds are in your favor <em>if </em>you choose your slot wisely. Look, I’ll show you a graph of some data from the Null wheel. We spun it 10,000 times and counted how many times the ball landed in each slot. As you can see each slot is just as likely as any other, with a probability of about 0.05 each. That means there’s a 95% probability the ball won’t land on your slot, so you have only a 5% chance of losing—no matter what—<em>if</em> we happen to spin the Null wheel.</p>
<p><img alt="histogram of p values for null hypothesis" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/dc5efcd7001f33a77bea1c635af837e5/histogram_of_p_values_null_hypothesis.png" style="width: 576px; height: 384px;" /></p>
<p>What about that Alternative wheel, you ask? Well, we’ve had quite a few different Alternative wheels over the years. Here’s a graph of some data from one we were spinning last year:</p>
<p><img alt="histogram of p values from alternative hypothesis" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/dd0cafe3375f3202adaf3542d15eb9ab/histogram_of_p_values_alternative_hypothesis.png" style="width: 576px; height: 384px;" /></p>
<p>And just a few months ago, we had a different one. Check out the data from this one. It was very, very popular.</p>
<p><img alt=" histogram of p-values from popular alternative hypothesis" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/fc6f0ff641e7eb4d3f7750c8163ac968/histogram_of_p_values_alternative_hypothesis_2.png" style="width: 576px; height: 384px;" /></p>
<p>Now that’s what I call an Alternative! People in the know always picked the first slot. You can see why.</p>
<p>I’m not allowed to show you data from the current game. But I assure you the Alternatives all follow this same pattern. They tend to favor those smaller numbers.</p>
<p>So, you’d like to play? Great! Which slot would you like to bet on?</p>
Is this on the level?
<p>No, I don’t really have a casino with two roulette wheels. My graphs are simulated p-values for a <a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/what-is-a-t-test-and-why-is-it-like-telling-a-kid-to-clean-up-that-mess-in-the-kitchen">1-sample t-test</a>. The null hypothesis is that the mean of a process or population is 5. The two-sided alternative is that the mean is different from 5. In my first graph, the null hypothesis was true: I used Minitab to generate random samples of size 20 from a normal distribution with mean 5 and standard deviation of 1. For the other two graphs, the only thing I changed was the mean of the normal distribution I sampled from. For the second graph, the mean was 5.3. For the final graph, the mean was 5.75.</p>
<p>For just about any hypothesis test you do in Minitab <a href="http://www.minitab.com/products/minitab">Statistical Software</a>, you will see a p-value. Once you understand how p-values work, you will have greater insight into what they are telling you. Let’s see what we can learn about p-values from playing p-value roulette.</p>
<ol>
<li>Just as you didn’t know whether you are spinning the Null or Alternative wheel, you don’t know for sure whether the null hypothesis is true or not. But basing your decision to reject the null hypothesis on the p-value favors your chance of making a good decision.<br />
</li>
<li>If the null hypothesis is true, then any p-value is just as likely as any other. You control the probability of making a Type I error by rejecting only when the p-value falls within a narrow range, typically 0.05 or smaller. A <a href="http://blog.minitab.com/blog/the-stats-cat/understanding-type-1-and-type-2-errors-from-the-feline-perspective-all-mistakes-are-not-equal">Type I error</a> occurs if you incorrectly reject a true null hypothesis.<br />
</li>
<li>If the alternative hypothesis is true, then smaller p-values become more likely and larger p-values become less likely. That’s why you can think of a small p-value as evidence in favor of the alternative hypothesis.<br />
</li>
<li>It is tempting to try to interpret the p-value as the probability that the null hypothesis is true. But that’s not what it is. The null hypothesis is either true, or it’s not. Each time you “spin the wheel” the ball will land in a different slot, giving you a different p-value. But the truth of the null hypothesis—or lack thereof—remains unchanged.<br />
</li>
<li>In the roulette analogy there were different alternative wheels, because there is not usually just a single alternative condition. There are infinitely many mean values that are not equal to 5; my graphs looked at just two of these.<br />
</li>
<li>The probability of rejecting the null hypothesis when the alternative hypothesis is true is called the power of the test. In the 1-sample t-test, the power depends on how different the mean is from the null hypothesis value, relative to the standard error. While you don’t control the true mean, you can reduce the standard error by taking a larger sample. This will give the test greater power.<br />
</li>
</ol>
You Too Can Be a Winner!
<p>To be a winner at p-value roulette, you need to make sure you are performing the right hypothesis test, and that your data fit the assumptions of that test. Minitab’s <a href="http://www.minitab.com/en-us/products/minitab/assistant/">Assistant menu</a> can help you with that. The Assistant helps you choose the right statistical analysis, provides easy-to-understand guidelines to walk you through data collection and analysis. Then it gives you clear graphical output to let you know how to interpret your p-value, while helping you evaluate whether your data are appropriate, so you can trust your results.</p>
<p> </p>
Hypothesis TestingStatisticsStatistics HelpStatsThu, 12 Mar 2015 11:00:00 +0000http://blog.minitab.com/blog/rkelly/p-value-roulette-making-hypothesis-testing-a-winner%E2%80%99s-gameRob KellyUnderstanding Hypothesis Tests: Why We Need to Use Hypothesis Tests in Statistics
http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-why-we-need-to-use-hypothesis-tests-in-statistics
<p>Hypothesis testing is an essential procedure in statistics. A hypothesis test evaluates two mutually exclusive statements about a population to determine which statement is best supported by the sample data. When we say that a finding is statistically significant, it’s thanks to a hypothesis test. How do these tests really work and what does statistical significance actually mean?</p>
<p>In this series of three posts, I’ll help you intuitively understand how hypothesis tests work by focusing on concepts and graphs rather than equations and numbers. After all, a key reason to use <a href="http://www.minitab.com/products/minitab/features/">statistical software </a>like Minitab is so you don’t get bogged down in the calculations and can instead focus on understanding your results.</p>
<p>To kick things off in this post, I highlight the rationale for using hypothesis tests with an example.</p>
The Scenario
<p>An economist wants to determine whether the monthly energy cost for families has changed from the previous year, when the mean cost per month was $260. The economist randomly samples 25 families and records their energy costs for the current year. (The data for this example is <a href="http://support.minitab.com/datasets/FamilyEnergyCost.MTW">FamilyEnergyCost</a> and it is just one of the many data set examples that can be found in <a href="http://support.minitab.com/datasets/">Minitab’s Data Set Library</a>.)</p>
<p><img alt="Descriptive statistics for family energy costs" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/135cd05bde9f7f16ab396a8525d2b09c/desc_stats.png" style="width: 302px; height: 87px;" /></p>
<p>I’ll use these descriptive statistics to create a probability distribution plot that shows you the importance of hypothesis tests. Read on!</p>
The Need for Hypothesis Tests
<p>Why do we even need hypothesis tests? After all, we took a random sample and our sample mean of 330.6 is different from 260. That <em>is</em> different, right? Unfortunately, the picture is muddied because we’re looking at a sample rather than the entire population.</p>
<p>Sampling error is the difference between a sample and the entire population. Thanks to sampling error, it’s entirely possible that while our <em>sample </em>mean is 330.6, the population mean could still be 260. Or, to put it another way, if we repeated the experiment, it’s possible that the second sample mean could be close to 260. A hypothesis test helps assess the likelihood of this possibility!</p>
Use the Sampling Distribution to See If Our Sample Mean is Unlikely
<p>For any given random sample, the mean of the sample almost certainly doesn’t equal the true mean of the population due to sampling error. For our example, it’s unlikely that the mean cost for the entire population is exactly 330.6. In fact, if we took multiple random samples of the same size from the same population, we could plot a distribution of the sample means.</p>
<p>A <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/introductory-concepts/basic-concepts/sampling-distribution/" target="_blank">sampling distribution</a> is the distribution of a statistic, such as the mean, that is obtained by repeatedly drawing a large number of samples from a specific population. This distribution allows you to determine the probability of obtaining the sample statistic.</p>
<p>Fortunately, I can create a plot of sample means without collecting many different random samples! Instead, I’ll create a probability distribution plot using the <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/probability-distributions-and-random-data/distributions/t-distribution/" target="_blank">t-distribution</a>, the sample size, and the <a href="http://blog.minitab.com/blog/adventures-in-statistics/assessing-variability-for-quality-improvement" target="_blank">variability</a> in our sample to graph the sampling distribution.</p>
<p>Our goal is to determine whether our sample mean is significantly different from the null hypothesis mean. Therefore, we’ll use the graph to see whether our sample mean of 330.6 is unlikely assuming that the population mean is 260. The graph below shows the expected distribution of sample means.</p>
<p><img alt="Sampling distribution plot for the null hypothesis" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/76699c4c1f2bd6c83b88c1ac8e93aa54/sampling_dist_null.png" style="width: 595px; height: 397px;" /></p>
<p>You can see that the most probable sample mean is 260, which makes sense because we’re assuming that the null hypothesis is true. However, there is a reasonable probability of obtaining a sample mean that ranges from 167 to 352, and even beyond! The takeaway from this graph is that while our sample mean of 330.6 is not the most probable, it’s also not outside the realm of possibility.</p>
The Role of Hypothesis Tests
<p>We’ve placed our sample mean in the context of all possible sample means while assuming that the null hypothesis is true. Are these results statistically significant?</p>
<p>As you can see, there is no magic place on the distribution curve to make this determination. Instead, we have a continual decrease in the probability of obtaining sample means that are further from the null hypothesis value. Where do we draw the line?</p>
<p>This is where hypothesis tests are useful. A hypothesis test allows us quantify the probability that our sample mean is unusual. In <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-significance-levels-alpha-and-p-values-in-statistics">my next blog post</a>, I’ll continue to use this graphical framework and add in the significance level and P value to show how hypothesis tests work and what statistical significance means.</p>
<p>If you'd like to see how I made these graphs, please read: <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-create-a-graphical-version-of-the-1-sample-t-test-in-minitab" target="_blank">How to Create a Graphical Version of the 1-sample t-Test</a>.</p>
Data AnalysisHypothesis TestingStatisticsStatistics HelpStatsThu, 05 Mar 2015 16:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-why-we-need-to-use-hypothesis-tests-in-statisticsJim FrostChoosing Between a Nonparametric Test and a Parametric Test
http://blog.minitab.com/blog/adventures-in-statistics/choosing-between-a-nonparametric-test-and-a-parametric-test
<p>It’s safe to say that most people who use statistics are more familiar with parametric analyses than nonparametric analyses. Nonparametric tests are also called distribution-free tests because they don’t assume that your data follow a specific distribution.</p>
<p>You may have heard that you should use nonparametric tests when your data don’t meet the assumptions of the parametric test, especially the assumption about normally distributed data. That sounds like a nice and straightforward way to choose, but there are additional considerations.</p>
<p>In this post, I’ll help you determine when you should use a:</p>
<ul>
<li>Parametric analysis to test group means.</li>
<li>Nonparametric analysis to test group medians.</li>
</ul>
<p>In particular, I'll focus on an important reason to use nonparametric tests that I don’t think gets mentioned often enough!</p>
Hypothesis Tests of the Mean and Median
<p>Nonparametric tests are like a parallel universe to parametric tests. The table shows related pairs of <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/hypothesis-tests/basics/hypothesis-tests-in-minitab/" target="_blank">hypothesis tests</a> that <a href="http://www.minitab.com/en-us/products/minitab/features/" target="_blank">Minitab statistical software</a> offers.</p>
<p style="text-align: center;"><strong>Parametric tests (means)</strong></p>
<p style="text-align: center;"><strong>Nonparametric tests (medians)</strong></p>
<p style="text-align: center;">1-sample t test</p>
<p style="text-align: center;">1-sample Sign, 1-sample Wilcoxon</p>
<p style="text-align: center;">2-sample t test</p>
<p style="text-align: center;">Mann-Whitney test</p>
<p style="text-align: center;">One-Way ANOVA</p>
<p style="text-align: center;">Kruskal-Wallis, Mood’s median test</p>
<p style="text-align: center;">Factorial DOE with one factor and one blocking variable</p>
<p style="text-align: center;">Friedman test</p>
Reasons to Use Parametric Tests
<p><strong>Reason 1: Parametric tests can perform well with skewed and nonnormal distributions</strong></p>
<p>This may be a surprise but parametric tests can perform well with continuous data that are nonnormal if you satisfy these sample size guidelines.</p>
<p style="text-align: center;"><strong>Parametric analyses</strong></p>
<p style="text-align: center;"><strong>Sample size guidelines for nonnormal data</strong></p>
<p style="text-align: center;">1-sample t test</p>
<p style="text-align: center;">Greater than 20</p>
<p style="text-align: center;">2-sample t test</p>
<p style="text-align: center;">Each group should be greater than 15</p>
<p style="text-align: center;">One-Way ANOVA</p>
<ul>
<li style="text-align: center;">If you have 2-9 groups, each group should be greater than 15.</li>
<li style="text-align: center;">If you have 10-12 groups, each group should be greater than 20.</li>
</ul>
<p><strong>Reason 2: Parametric tests can perform well when the spread of each group is different</strong></p>
<p>While nonparametric tests don’t assume that your data follow a normal distribution, they do have other assumptions that can be hard to meet. For nonparametric tests that compare groups, a common assumption is that the data for all groups must have the same spread (dispersion). If your groups have a different spread, the nonparametric tests might not provide valid results.</p>
<p>On the other hand, if you use the 2-sample t test or One-Way ANOVA, you can simply go to the <strong>Options</strong> subdialog and uncheck <em>Assume equal variances</em>. Voilà, you’re good to go even when the groups have different spreads!</p>
<p><strong>Reason 3: Statistical power</strong></p>
<p>Parametric tests usually have more <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/power-and-sample-size/what-is-power/" target="_blank">statistical power</a> than nonparametric tests. Thus, you are more likely to detect a significant effect when one truly exists.</p>
Reasons to Use Nonparametric Tests
<p><strong>Reason 1: Your area of study is better represented by the median</strong></p>
<p><img alt="Comparing two skewed distributions" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/7223b01bc095dbd652bd863be5288cfe/mean_or_median.png" style="float: right; width: 200px; height: 181px; margin: 10px 15px;" />This is my favorite reason to use a nonparametric test and the one that isn’t mentioned often enough! The fact that you <em>can</em> perform a parametric test with nonnormal data doesn’t imply that the mean is the best <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/summary-statistics/measures-of-central-tendency/" target="_blank">measure of the central tendency</a> for your data.</p>
<p>For example, the center of a skewed distribution, like income, can be better measured by the median where 50% are above the median and 50% are below. If you add a few billionaires to a sample, the mathematical mean increases greatly even though the income for the typical person doesn’t change.</p>
<p>When your distribution is skewed enough, the mean is strongly affected by changes far out in the distribution’s tail whereas the median continues to more closely reflect the center of the distribution. For these two distributions, a random sample of 100 from each distribution produces means that are significantly different, but medians that are not significantly different.</p>
<p>Two of my colleagues have written excellent blog posts that illustrate this point:</p>
<ul>
<li>Michelle Paret: <a href="http://blog.minitab.com/blog/michelle-paret/using-the-mean-its-not-always-a-slam-dunk" target="_blank">Using the Mean in Data Analysis: It’s Not Always a Slam-Dunk</a></li>
<li>Redouane Kouiden: <a href="http://blog.minitab.com/blog/statistics-for-lean-six-sigma/the-non-parametric-economy-what-does-average-actually-mean" target="_blank">The Non-parametric Economy: What Does Average Actually Mean?</a></li>
</ul>
<p><strong>Reason 2: You have a very small sample size</strong></p>
<p>If you don’t meet the sample size guidelines for the parametric tests and you are not confident that you have normally distributed data, you should use a nonparametric test. When you have a really small sample, you might not even be able to ascertain the distribution of your data because the distribution tests will lack sufficient power to provide meaningful results.</p>
<p>In this scenario, you’re in a tough spot with no valid alternative. Nonparametric tests have less power to begin with and it’s a double whammy when you add a small sample size on top of that!</p>
<p><strong>Reason 3: You have ordinal data, ranked data, or outliers that you can’t remove</strong></p>
<p>Typical parametric tests can only assess continuous data and the results can be significantly affected by outliers. Conversely, some nonparametric tests can handle ordinal data, ranked data, and not be seriously affected by outliers. Be sure to check the assumptions for the nonparametric test because each one has its own data requirements.</p>
Closing Thoughts
<p>It’s commonly thought that the need to choose between a parametric and nonparametric test occurs when your data fail to meet an assumption of the parametric test. This can be the case when you have both a small sample size and nonnormal data. However, other considerations often play a role because parametric tests can often handle nonnormal data. Conversely, nonparametric tests have strict assumptions that you can’t disregard.</p>
<p>The decision often depends on whether the mean or median more accurately represents the center of your data’s distribution.</p>
<ul>
<li>If the mean accurately represents the center of your distribution and your sample size is large enough, consider a parametric test because they are more powerful.</li>
<li>If the median better represents the center of your distribution, consider the nonparametric test even when you have a large sample.</li>
</ul>
<p>Finally, if you have a very small sample size, you might be stuck using a nonparametric test. Please, collect more data next time if it is at all possible! As you can see, the sample size guidelines aren’t really that large. Your chance of detecting a significant effect when one exists can be very small when you have both a small sample size and you need to use a less efficient nonparametric test!</p>
Hypothesis TestingStatisticsStatistics HelpThu, 19 Feb 2015 13:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/choosing-between-a-nonparametric-test-and-a-parametric-testJim FrostWhat’s the Probability that Your Favorite Football Team Will Win?
http://blog.minitab.com/blog/customized-data-analysis/what%E2%80%99s-the-probability-that-your-favorite-football-team-will-win
<div>
<p>If you wanted to figure out the probability that your favorite football team will win their next game, how would you do it? My colleague <a href="http://blog.minitab.com/blog/understanding-statistics-and-its-application">Eduardo Santiago</a> and I recently looked at this question, and in this post we'll share how we approached the solution. Let’s start by breaking down this problem:<img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/8954fcace8f66a536aca06fad36a4c5a/boy_football_200.png" style="margin: 10px 15px; float: right; width: 200px; height: 200px;" /></p>
<ol>
<li>There are only two possible outcomes: your favorite team wins, or they lose. Ties are a possibility, but they're very rare. So, to simplify things a bit, we’ll assume they are so unlikely that could be disregarded from this analysis.</li>
<li>There are numerous factors to consider.
<ol style="list-style-type:lower-alpha;">
<li>What will the playing conditions be?</li>
<li>Are key players injured?</li>
<li>Do they match up well with their opponent?</li>
<li>Do they have home-field advantage?</li>
<li>And the list goes on...</li>
</ol>
</li>
</ol>
<p>First, since we assumed the outcome is binary, we can put together a <a href="http://blog.minitab.com/blog/real-world-quality-improvement/using-binary-logistic-regression-to-investigate-high-employee-turnover">Binary Logistic Regression</a> model to predict the probability of a win occurring. Next, we need to find which predictors would be best to include. After <a href="http://www.thepredictiontracker.com/ncaaresults.php" target="_blank">a little research</a>, we found the betting markets seem to take all of this information into account. Basically, we are utilizing the wisdom of the masses to find out what they believe will happen. Since betting markets take this into account, we decided to look at the probability of a win, given the spread of a NCAA football game. </p>
Data Collection
<p>If you are not convinced about how accurate the spreads can be in determining the outcome of a game: win or loss, we collected data for every college football game played <span style="line-height: 20.7999992370605px;">between 2000 and 2014</span><span style="line-height: 1.6;">. The structure of the data is illustrated below. The third column has the spread (or line) provided by casinos at Vegas, and the last column displayed is the actual score differential (vscore – hscore).</span></p>
<p><strong><em>Note</em></strong><em>: In betting lines, a negative spread indicates how many points you are favored over the opponent. In short, you are giving the opponent a certain number of points. </em></p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/52aaa628ea28b55523232a9b2da6b623/table1.png" style="width: 600px; height: 352px;" /></p>
<p><span style="line-height: 1.6;">The original win-or-lose question can be rephrased then as follows: Is the difference between the spreads and actual score differentials statistically significant?</span></p>
<p>Since we have two populations that are dependent we would compare them via a paired t test. In other words, both the <em>Spread</em> and <em>scoreDiffer</em> are observations (a priori and a posteriori) for the same game and they reflect the relative strength of the home team <em>i</em> versus the road team <em>j</em>.</p>
<p>Using <strong>Stat > Basic Statistics > Paired t </strong>in Minitab Statistical Software, we get the output below.</p>
<p style="margin-left: 40px;"><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/5531697c176006a4057f4ab7b6fda7dc/t_test_output.png" style="width: 500px; height: 189px;" /></p>
<p>Since the p-value is larger than 0.05, we can conclude from the 15 years of data that the average difference between Las Vegas spreads and actual score differentials is not significantly different from zero. With this we are saying that the bias that could exist between both measures of relative strength for teams is not different from zero, which in lay terms means that <em>on average</em> the error that exists between Vegas and actual outcomes is negligible.</p>
<p>It is worth noting that the results above were obtained with a sample size of 10,476 games! So we hope you'll excuse our not including <a href="http://blog.minitab.com/blog/understanding-statistics/how-much-data-do-you-really-need-check-power-and-sample-size">power calculations</a> here.</p>
<p>As a final remark on spreads, the histogram of the differences below shows a couple of interesting things:</p>
<ul>
<li>The average difference between the spreads and score differentials seem to be very close to zero. So don’t get too excited yet, as the spreads cannot be used to predict the exact score differential for a game. Nevertheless, with extremely high probability the spread will be very close to the score differential.</li>
<li>The standard deviation, however, is 15.5 points. That means that if a game shows a spread for your favorite team of -3 points, the outcome could be with high confidence within plus or minus 2 standard deviations of the point estimate, which is -3 ± 31 points in this case. So your favorite team could win by 34 points, or lose by 28!</li>
</ul>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f8f64fe200b85bcd5753a62737735de3/histogram1.png" style="width: 577px; height: 385px;" /></p>
<p align="center"><em>Figure 1 - Distribution of the differences between scores and spreads</em></p>
The Binary Logistic Regression Model
<p>By this point, we hope you are convinced about how good these spread values could be. To make the output more readable we summarized the data as follows:</p>
</div>
<p style="margin-left: 40px;"><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/218aa990c975fa2292d84926ba0002f0/table2.png" style="width: 250px; height: 405px;" /></p>
Creating our Binary Logistic Regression Model
<p>After summarizing the data, we used the Binary Fitted Line Plot (new in Minitab 17) to come up with our model. </p>
<p>If you are following along, here are the steps:</p>
<ol>
<li>Go to <strong>Stat > Regression > Binary Fitted Line Plot</strong></li>
<li>Fill out the dialog box as shown below and click <strong>OK</strong>.</li>
</ol>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ae06bb10e129b9c527b07700a57a7e2f/dialog1.png" style="width: 600px; height: 457px;" /></p>
<p><span style="line-height: 1.6;">The steps will produce the following graph:</span></p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/e2acef300ee9d1cd605ff089c217c8d5/binary_fitted_line_plot_w1024.png" style="width: 600px; height: 400px;" /></p>
Interpreting the Plot
<p>If your team is favored to win by 25 points or more, you have a very good chance of winning the game, but what if the spread is much closer?</p>
<p>For the 2014 National Championship, Ohio State was an underdog by 6 points to Oregon. Looking at the Binary Fitted Line Plot the probability of a 6-point underdog to win the game is close to 31% in college football. </p>
<p>Ohio State University ended up beating Oregon by 22 points. Given that the differences described in Figure 1 are normally distributed with respect to zero, then if we assume the spread is given (or known), we can compute the probability of the national championship game outcome being as extreme—or more—as it turned out.</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/977aafa9be7a8e2736b1340bca0b3b62/distribution_plot.png" style="width: 576px; height: 384px;" /></p>
<p>With Ohio State 6 point underdogs, and a standard deviation of 15.53, we can run a Probability Distribution Plot to show that Ohio State would win by 22 points or more 3.6% of the time.</p>
<p>Eduardo Santiago and myself will be giving a talk on using statistics to rank college football teams at the upcoming <a href="http://www.amstat.org/meetings/csp/2015/" target="_blank">Conference on Statistical Practice</a> in New Orleans. Our talk is February 21 at 2 p.m. and we would love to have you join. </p>
Fun StatisticsHypothesis TestingRegression AnalysisThu, 12 Feb 2015 13:00:00 +0000http://blog.minitab.com/blog/customized-data-analysis/what%E2%80%99s-the-probability-that-your-favorite-football-team-will-winDaniel GriffithStatistics: Another Weapon in the Galactic Patrol’s Arsenal
http://blog.minitab.com/blog/statistics-in-the-field/statistics-another-weapon-in-the-galactic-patrol%E2%80%99s-arsenal
<p><em><span style="line-height: 1.6;">by Matthew Barsalou, guest blogger. </span></em></p>
<p>E. E. Doc <a href="http://en.wikipedia.org/wiki/E._E._Smith" target="_blank">Smith</a>, one of the greatest authors ever, wrote many classic books such as <a href="http://en.wikipedia.org/wiki/Skylark_%28series%29" target="_blank">The Skylark of </a><a href="http://en.wikipedia.org/wiki/Skylark_%28series%29">Space</a> and his <a href="http://en.wikipedia.org/wiki/Lensman_series" target="_blank">Lensman</a> series. Doc Smith’s imagination knew no limits; his Galactic <a href="http://en.wikipedia.org/wiki/Galactic_Patrol" target="_blank">Patrol</a> had millions of combat fleets under its command and possessed planets turned into movable, armored weapons platforms. Some of the Galactic Patrol’s weapons may be well known. For example, there is the sunbeam, which concentrated the entire output of a sun’s energy into one beam.</p>
<p><span style="line-height: 1.6;"><img alt="amazing stories featuring E. E. "Doc" Smith" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/0d1ef573ea1b75bd2e6364f219ec6a19/docsmithcover.png" style="border-width: 1px; border-style: solid; margin: 10px 15px; float: right; width: 296px; height: 400px;" />The Galactic Patrol also created the negasphere, a planet-sized dark matter/dark energy bomb that could eat through anything. I’ll go out on a limb and assume that they first created a container that could contain such a substance,</span><span style="line-height: 20.7999992370605px;"> </span><span style="line-height: 20.7999992370605px;">at least briefly</span><span style="line-height: 1.6;">.</span></p>
<p>When I read about such technology, I always have to wonder “How did they test it?” I can see where Minitab Statistical Software could be very helpful to the Galactic Patrol. How could the Galactic Patrol evaluate smaller, torpedo-sized units of negasphere? Suppose negasphere was created at the time of firing in a space torpedo and needed to be contained for the first 30 seconds after being fired, lest it break containment early and damage the ship that is firing it or rupture the torpedo before it reaches a space pirate.</p>
<p>The table below shows data collected from fifteen samples each of two materials that could be used for negasphere containment. Material 1 has a mean containment time of 33.951 seconds and Material 2 has a mean of 32.018 seconds. But is this difference statically significant? Does it even matter?</p>
<p style="text-align: center;"><strong>Material 1</strong></p>
<p style="text-align: center;"><strong>Material 2</strong></p>
<p style="text-align: center;">34.5207</p>
<p style="text-align: center;">32.1227</p>
<p style="text-align: center;">33.0061</p>
<p style="text-align: center;">31.9836</p>
<p style="text-align: center;">32.9733</p>
<p style="text-align: center;">31.9975</p>
<p style="text-align: center;">32.4381</p>
<p style="text-align: center;">31.9997</p>
<p style="text-align: center;">34.1364</p>
<p style="text-align: center;">31.9414</p>
<p style="text-align: center;">36.1568</p>
<p style="text-align: center;">32.0403</p>
<p style="text-align: center;">34.6487</p>
<p style="text-align: center;">32.1153</p>
<p style="text-align: center;">36.6436</p>
<p style="text-align: center;">31.9661</p>
<p style="text-align: center;">35.3177</p>
<p style="text-align: center;">32.0670</p>
<p style="text-align: center;">32.4043</p>
<p style="text-align: center;">31.9610</p>
<p style="text-align: center;">31.3107</p>
<p style="text-align: center;">32.0303</p>
<p style="text-align: center;">34.0913</p>
<p style="text-align: center;">32.0146</p>
<p style="text-align: center;">33.2040</p>
<p style="text-align: center;">31.9865</p>
<p style="text-align: center;">32.5601</p>
<p style="text-align: center;">32.0079</p>
<p style="text-align: center;">35.8556</p>
<p style="text-align: center;">32.0328</p>
<p><span style="line-height: 1.6;">The questions we're asking and the type and distribution of the data we have should determine the types of statistical test we perform. Many statistical tests for continuous data require an assumption of normality, and this can easily be tested in our <a href="http://www.minitab.com/products/minitab">statistical software</a> by going to <strong>Graphs > Probability Plot…</strong> and entering the columns containing the data.</span></p>
<p><span style="line-height: 1.6;"><img alt="probability plot of material 1" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ebd5796caf013f0204dbddc33c06df56/probability_plot1.png" style="width: 581px; height: 388px;" /></span></p>
<p><span style="line-height: 1.6;"><img alt="probability plot of material 2" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a8464e0302753942334c4e11d31482e5/probability_plot2.png" style="width: 580px; height: 388px;" /></span></p>
<p><span style="line-height: 1.6;">The null hypothesis is “the data are normally distributed,” and the resulting P-values are greater 0.05, so we <a href="http://blog.minitab.com/blog/understanding-statistics/things-statisticians-say-failure-to-reject-the-null-hypothesis">fail to reject the null hypothesis</a>. That means we can evaluate the data using tests that require the data to be normally distributed.</span></p>
<p>To determine if the mean of Material 1 is indeed greater than the mean of Material 2, we perform a two sample t-test: go to <strong>Stat > Basic Statistics > 2 Sample t…</strong> and select “Each sample in its own column.” We then choose “Options..” and select “Difference > hypothesized difference.”</p>
<p><img alt="two-sample t-test and ci output" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/3e270a93cceb77f6818345bcb41c9110/2_sample_t_test_output.png" style="width: 546px; height: 226px;" /></p>
<p><span style="line-height: 1.6;">The P-value for the two sample t-test is less than 0.05, so we can conclude there is a statistically significant difference between the materials. But the two sample t-test does not give us a complete picture of the situation, so we should look at the data by going to <strong>Graph > Individual Value Plot...</strong> and selecting a simple graph for multiple Y’s.</span></p>
<p><span style="line-height: 1.6;"><img alt="individual value plot " src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/96d713980aefd4912a402dc156802788/individual_value_plot1.png" style="width: 583px; height: 391px;" /></span></p>
<p><span style="line-height: 1.6;">The mean of Material 1 may be higher, but our biggest concern is identifying a material that does not fail in 30 seconds or less. Material 2 appears to have far less variation and we can assess this by performing an F-test: go to <strong>Stat > Basic Statistics > 2 Variances…</strong> and select “Each sample in its own column.” Then choose “Options..” and select “Ratio > hypothesized ratio.” The data is normally distributed, so put a checkmark next to “Use test and confidence intervals based on normal distribution.”</span></p>
<p><img alt="two variances test output" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/28aa53d2ec2582e3b41e29fb5f55331f/two_variances_test_output.png" style="width: 482px; height: 563px;" /></p>
<p>The P-value is less than 0.05, so we can conclude the evidence does supports the alternative hypothesis that the variance of the first material is greater than the variance of the second material. Having already looked at a graph of the data, this should come as no surprise</p>
<p>No statistical software program can tell us which material to choose, but Minitab can provide us with the information needed to make an informed decision. The objective is to exceed a lower specification limit of 30 seconds and the lower variability of Material 2 will achieve this better than the higher mean value for Material 1. Material 2 looks good, but the penalty for a wrong decision could be lost space ships if the negasphere breaches its containment too soon, so we must be certain.</p>
<p>The Galactic Patrol has millions of ships so a failure rate of even one per million would be unacceptably high so we should perform a capability study by going to<strong> Quality Tools > Capability Analysis > Normal…</strong> Enter the column containing the data for Material 1 and use the same column for the subgroup size and then enter a lower specification of 30. This would then be repeated for Material 2.</p>
<p><img alt="process capability for material 1" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/9c10d14f155707770eb3688aec834ca2/process_capability_report1.png" style="width: 635px; height: 476px;" /></p>
<p><img alt="Process Capability for Material 2" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/eaf12e0874c393730037cf504a90fa8f/process_capability_report2.png" style="width: 638px; height: 476px;" /></p>
<p><span style="line-height: 1.6;">Looking at the Minitab generated capability studies, we can see that Material 1 can be expected to fail thousands of times per million uses, but Material 2 would is not expected to fail at all. In spite of the higher mean, the Galactic Patrol should use Material 2 for the negaspehere torpedoes. </span></p>
<p> </p>
<p> </p>
<div>
<p style="line-height: 20.7999992370605px;"><strong>About the Guest Blogger</strong></p>
<p style="line-height: 20.7999992370605px;"><em><a href="https://www.linkedin.com/pub/matthew-barsalou/5b/539/198" target="_blank">Matthew Barsalou</a> is a statistical problem resolution Master Black Belt at <a href="http://www.3k-warner.de/" target="_blank">BorgWarner</a> Turbo Systems Engineering GmbH. He is a Smarter Solutions certified Lean Six Sigma Master Black Belt, ASQ-certified Six Sigma Black Belt, quality engineer, and quality technician, and a TÜV-certified quality manager, quality management representative, and auditor. He has a bachelor of science in industrial sciences, a master of liberal studies with emphasis in international business, and has a master of science in business administration and engineering from the Wilhelm Büchner Hochschule in Darmstadt, Germany. He is author of the books <a href="http://www.amazon.com/Root-Cause-Analysis-Step---Step/dp/148225879X/ref=sr_1_1?ie=UTF8&qid=1416937278&sr=8-1&keywords=Root+Cause+Analysis%3A+A+Step-By-Step+Guide+to+Using+the+Right+Tool+at+the+Right+Time" target="_blank">Root Cause Analysis: A Step-By-Step Guide to Using the Right Tool at the Right Time</a>, <a href="http://asq.org/quality-press/display-item/index.html?item=H1472" target="_blank">Statistics for Six Sigma Black Belts</a> and <a href="http://asq.org/quality-press/display-item/index.html?item=H1473&xvl=76115763" target="_blank">The ASQ Pocket Guide to Statistics for Six Sigma Black Belts</a>.</em></p>
</div>
Data AnalysisHypothesis TestingStatisticsTue, 03 Feb 2015 13:00:00 +0000http://blog.minitab.com/blog/statistics-in-the-field/statistics-another-weapon-in-the-galactic-patrol%E2%80%99s-arsenalGuest BloggerAnalyzing Qualitative Data, part 1: Pareto, Pie, and Stacked Bar Charts
http://blog.minitab.com/blog/applying-statistics-in-quality-projects/analyzing-qualitative-data-part-1-pareto-pie-and-stacked-bar-charts
<p>In several previous blogs, I have discussed the use of statistics for <a href="http://blog.minitab.com/blog/applying-statistics-in-quality-projects/using-nonparametric-analysis-to-visually-manage-durations-in-service-processes">quality improvement in the service sector</a>. Understandably, services account for a very large part of the economy. Lately, when meeting with several people from financial companies, I realized that one of the problems they faced was that they were collecting large amounts of "qualitative" data: types of product, customer profiles, different subsidiaries, several customer requirements, etc.</p>
<p>There are several ways to process such qualitative data. Qualitative data points may still be counted, and once they have been counted they may be quantitatively (numerically) analyzed using statistical methods.</p>
<p>I will focus on the analysis of qualitative data using a simple and obvious example. In this case, we would like to analyze mistakes on invoices made during a period of several weeks by three employees (anonymously identified).</p>
<p><img src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/545c0823fc7368e795585c38424891d9/quali1.jpg" style="width: 288px; height: 273px;" /></p>
<p>I will present three different ways to analyze such qualitative data (counts). In this post, I will cover:</p>
<ol>
<li>A very simple graphical approach based on bar charts to display counts (stacked and clustered bars), Pareto diagrams and Pie charts.</li>
</ol>
<p>Then, in my next post, I will demonstrate: </p>
<ol start="2">
<li> A more complex approach for testing statistical significance using a Chi-square test.<br />
</li>
<li> An even more complex multivariate approach (using correspondence analysis).</li>
</ol>
<p>Again, the main purpose of this example is to show several ways to analyze qualitative data. Quantitative data represent numeric values such as the number of grams, dollars, newtons, etc., whereas qualitative data may represent text values such as different colours, types of defects or different employees.</p>
<p>The <a href="http://www.minitab.com/en-us/products/minitab/assistant/">Assistant</a> in Minitab 17 provides a great breakdown of two main data types: </p>
<p><img alt="" spellcheck="true" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/2fd46235529df11ab90d53efa677b706/quali2.jpg" style="width: 586px; height: 316px; border-width: 1px; border-style: solid;" /></p>
Charts and Diagrams with Qualitative Data
<p>I first created a pie chart using the Minitab Assistant (<strong>Assistant > Graphical Analysis</strong>) as well as a stacked bar chart on counts (from the graph menu of Minitab, select <strong>Graph > Bar Charts</strong>) to describe the proportion of each type of mistakes according to the day of the week.</p>
<p><img alt="" spellcheck="true" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/15ec9831d178df8fc0cbaddab0975c89/pie_chart_of_mistake_by_day___summary_report.jpg" style="width: 478px; height: 358px; border-width: 1px; border-style: solid;" /></p>
<p>In the pie charts above, the proportion of mistake types seems to be fairly similar across the different days of the week.</p>
<p> <img alt="" spellcheck="true" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/4b92a1293aff3f424d5a6f751653fb17/quali3.jpg" style="width: 403px; height: 302px; border-width: 1px; border-style: solid;" /></p>
<p>The number of mistakes also seems to be very stable and uniform according to day of week, when we see the stacked bar chart above.</p>
<p><img alt="" spellcheck="true" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/c23dcf3e01cedf8aaad5bad176437ed2/quali4.jpg" style="width: 426px; height: 330px;" /></p>
<p>Now let's create a stacked bar chart on counts to analyze mistakes by employees. In this second graph, shown above, large variations in the number of errors do occur according to employees. The distribution of errors also seems to be very different, with more “Product” errors associated with employee A.</p>
Qualitative Data in a Pareto Chart
<p><img alt="" spellcheck="true" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/30893b16e7ab4a75024498b7c3cf9fdf/pareto_chart_of_mistake_by_person___diagnostic_report.jpg" style="width: 768px; height: 547px;" /></p>
<p>Above we see <span style="line-height: 1.6;"><span><a href="http://blog.minitab.com/blog/understanding-statistics/explaining-quality-statistics-so-your-boss-will-understand-pareto-charts">Pareto charts</a></span> created using the Minitab Assistant (above): an overall Pareto and some additional Pareto diagrams, one for each employee. Again, it's easy to identify the large number of “product” mistakes (red columns) for employee A.</span></p>
<span style="line-height: 1.6;">Stacked Bar Charts of Qualitative Data</span>
<p><img alt="" spellcheck="true" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/79589c080171780e682cbd69d3353a0e/quali6.jpg" style="width: 426px; height: 347px;" /></p>
<p><span style="line-height: 20.7999992370605px;">Mistake counts are represented as percentages in the s</span><span style="line-height: 1.6;">tacked bar chart above. For each employee the error types are summed up to obtain 100% (within each employee's column). This provides a clearer understanding of how each employee's mistakes are distributed. Again, the high percentage of “Product” errors (in yellow) for employee A is very noticeable, but also note the high percentage, proportionately, of “Address” mistakes (blue areas) for employee C.</span></p>
<p><img alt="" spellcheck="true" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/9da688410bcb56f516061a4a26e64dfe/quali7.jpg" style="width: 434px; height: 346px;" /></p>
<p>The stacked bar chart above displays changes in the number of errors and in error types according to the week (time trends). Notice that in the last three weeks, at the end of the period, only product and address issues occurred. Apparently error types tend to shift towards more “product” and “address” types of errors, at the end of the period.</p>
Different Views of the Data Give a More Complete Picture
<p>These diagrams do provide a clear picture of mistake occurrences according to employees, error types and weeks. However, as you've seen, it takes several graphs to provide a good understanding of the issue.</p>
<p>This is still a subjective approach though, several people seated around the same table looking at these same graphs, might interpret them differently and in some cases, this could result in endless discussions.</p>
<p>Therefore we would also like to use a more scientific and rigorous approach: the Chi-square test. <a href="http://blog.minitab.com/blog/applying-statistics-in-quality-projects/analyzing-qualitative-data-part-2-chi-square-and-multivariate-analysis">We'll cover that in my next post</a>. </p>
<p> </p>
Data AnalysisHypothesis TestingQuality ImprovementSix SigmaStatisticsStatsWed, 28 Jan 2015 13:00:00 +0000http://blog.minitab.com/blog/applying-statistics-in-quality-projects/analyzing-qualitative-data-part-1-pareto-pie-and-stacked-bar-chartsBruno ScibiliaWhat Are T Values and P Values in Statistics?
http://blog.minitab.com/blog/statistics-and-quality-data-analysis/what-are-t-values-and-p-values-in-statistics
<p>If you’re not a statistician, looking through statistical output can sometimes make you feel a bit like <em>Alice in</em> <em>Wonderland. </em>Suddenly, you step into a fantastical world where strange and mysterious phantasms appear out of nowhere. </p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/6f4053a89257952fef0b9998547dffe2/tweedle_tweedledum.jpg" style="line-height: 20.7999992370605px; float: right; width: 248px; height: 255px; margin: 10px 15px;" /></p>
<p>For example, consider the T and P in your t-test results.</p>
<p>“Curiouser and curiouser!” you might exclaim, like Alice, as you gaze at your output.</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/1e5a4c064f43f19169121222402e4560/t_test_results_one_sided.jpg" style="width: 467px; height: 121px;" /></p>
<p>What are these values, really? Where do they come from? Even if you’ve used the p-value to interpret the statistical significance of your results<span style="line-height: 20.7999992370605px;"> </span><span style="line-height: 20.7999992370605px;">umpteen times</span><span style="line-height: 1.6;">, its actual origin may remain murky to you.</span></p>
T & P: The Tweedledee and Tweedledum of a T-test
<p>T and P are inextricably linked. They go arm in arm, like Tweedledee and Tweedledum. Here's why.</p>
<p>When you perform a t-test, you're usually trying to find evidence of a significant difference between population means (2-sample t) or between the population mean and a hypothesized value (1-sample t). <a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/what-is-a-t-test-and-why-is-it-like-telling-a-kid-to-clean-up-that-mess-in-the-kitchen">The t-value measures the size of the difference relative to the variation in your sample data</a>. Put another way, T is simply the calculated difference represented in units of standard error. The greater the magnitude of T (it can be either positive or negative), the greater the evidence <em>against </em>the null hypothesis that there is no significant difference. The closer T is to 0, the more likely there isn't a significant difference.</p>
<p>Remember, the t-value in your output is calculated from only one sample from the entire population. It you took repeated random samples of data from the same population, you'd get slightly different t-values each time, due to random sampling error (which is really not a mistake of any kind–it's just the random variation expected in the data).</p>
<p>How different could you expect the t-values from many random samples from the same population to be? And how does the t-value from your sample data compare to those expected t-values?</p>
<p>You can use a t-distribution to find out.</p>
Using a t-distribution to calculate probability
<p>For the sake of illustration, assume that you're using a 1-sample t-test to determine whether the population mean is greater than a hypothesized value, such as 5, based on a sample of 20 observations, as shown in the above t-test output.</p>
<ol>
<li>In Minitab, choose <strong>Graph > Probability Distribution Plot</strong>.</li>
<li>Select <strong>View Probability</strong>, then click <strong>OK</strong>.</li>
<li>From <strong>Distribution</strong>, select <strong>t</strong>.</li>
<li>In <strong>Degrees of freedom</strong>, enter <em>19</em>. (For a 1-sample t test, the degrees of freedom equals the sample size minus 1).</li>
<li>Click <strong>Shaded Area</strong>. Select <strong>X Value</strong>. Select <strong>Right Tail</strong>.</li>
<li> In <strong>X Value</strong>, enter 2.8 (the t-value), then click <strong>OK</strong>.</li>
</ol>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/bc5183a42a169d45632fd4f6c0b153b3/distribution_plot_t_2.8" style="width: 576px; height: 384px;" /></p>
<p>The highest part (peak) of the distribution curve shows you where you can expect most of the t-values to fall. Most of the time, you’d expect to get t-values close to 0. That makes sense, right? Because if you randomly select representative samples from a population, the mean of most of those random samples from the population should be close to the overall population mean, making their differences (and thus the calculated t-values) close to 0.</p>
T values, P values, and poker hands
<p>T values of larger magnitudes (either negative or positive) are less likely. The far left and right "tails" of the distribution curve represent instances of obtaining extreme values of t, far from 0. For example, the shaded region represents the probability of obtaining a t-value of 2.8 or greater. Imagine a magical dart that could be thrown to land randomly anywhere under the distribution curve. What's the chance it would land in the shaded region? The calculated probability is 0.005712.....which rounds to 0.006...which is...the p-value obtained in the t-test results! <img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/5633b267494c2017d6d7c7544247d57d/poker_picture.jpg" style="float: right; width: 200px; height: 164px; margin: 10px 15px;" /></p>
<p>In other words, the probability of obtaining a t-value of 2.8 or higher, when sampling from the same population (here, a population with a hypothesized mean of 5), is approximately 0.006.</p>
<p>How likely is that? Not very! For comparison, the probability of being dealt 3-of-a-kind in a 5-card poker hand is over three times as high (≈ 0.021).</p>
<p>Given that the probability of obtaining a t-value this high or higher when sampling from this population is so low, what’s more likely? It’s more likely this sample doesn’t come from this population (with the hypothesized mean of 5). It's much more likely that this sample comes from different population, one with a mean greater than 5.</p>
<p>To wit: Because the p-value is very low (< alpha level), you reject the null hypothesis and conclude that there's a statistically significant difference.</p>
<p>In this way, T and P are inextricably linked. Consider them simply different ways to quantify the "extremeness" of your results under the null hypothesis. You can’t change the value of one without changing the other.</p>
<p>The larger the absolute value of the t-value, the smaller the p-value, and the greater the evidence against the null hypothesis.(You can verify this by entering lower and higher t values for the t-distribution in step 6 above).</p>
Try this two-tailed follow up...
<p>The t-distribution example shown above is based on a one-tailed t-test to determine whether the mean of the population is greater than a hypothesized value. Therefore the t-distribution example shows the probability associated with the t-value of 2.8 only in one direction (the right tail of the distribution).</p>
<p>How would you use the t-distribution to find the p-value associated with a t-value of 2.8 for two-tailed t-test (in both directions)?</p>
<p><strong>Hint:</strong> In Minitab, adjust the options in step 5 to find the probability for both tails. If you don't have a copy of Minitab, download a free <a href="http://it.minitab.com/en-us/products/minitab/free-trial.aspx" target="_blank">30-day trial version</a>.</p>
Hypothesis TestingTue, 27 Jan 2015 13:10:00 +0000http://blog.minitab.com/blog/statistics-and-quality-data-analysis/what-are-t-values-and-p-values-in-statisticsPatrick RunkelA Minitab Holiday Tale: Featuring the Two Sample t-Test
http://blog.minitab.com/blog/statistics-in-the-field/a-minitab-holiday-tale-featuring-the-two-sample-t-test
<p><em><span style="line-height: 1.6;">by Matthew Barsalou, guest blogger</span></em></p>
<p>Aaron and Billy are two very competitive—and not always well-behaved—eight-year-old twin brothers. They constantly strive to outdo each other, no matter what the subject. If the boys are given a piece of pie for dessert, they each automatically want to make sure that their own piece of pie is bigger than the other’s piece of pie. This causes much exasperation, aggravation and annoyance for their parents. Especially when it happens in a restaurant (although the restaurant situation has improved, since they have been asked not to return to most local restaurants).</p>
<p><img alt="A bag of coal" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/d2ccbe9f7c8e887281272ae49854893f/bag_of_coal.jpg" style="border-width: 1px; border-style: solid; margin: 10px 15px; float: right; width: 200px; height: 200px;" />Sending the boys to their rooms never helped. The two would just compete to see who could stay in their room longer. This Christmas their parents were at wits' ends, and they decided the boys needed to be taught a lesson so they could grow up to be upstanding citizens. Instead of the new bicycles the boys were going to get—and probably just race till they crashed anyway—their parents decided to give them each a bag of coal.</p>
<p>An astute reader might ask, “But what does this have to do with <a href="http://www.minitab.com/products/minitab">Minitab</a>?” Well, dear reader, the boys need to figure out who got the most coal. Immediately upon opening their packages, the boys carefully weighed each piece of coal and entered the data into Minitab.</p>
<p><span style="line-height: 1.6;">Then they selected <strong>Stat > Basic Statistics > Display Descriptive Statistics</strong> and used the "Statistics" options dialog to select the metrics they wanted, including the sum of the weights they'd entered:</span></p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/dacaebac62e3cc4c2e29329d0a779720/descriptivestatistics.png" style="width: 600px; height: 208px;" /></p>
<p><span style="line-height: 1.6;">Billy quickly saw that he had the most coal, and yelled, “I have 279.383 ounces and you only have 272.896 ounces, and the mean of my pieces of coal is more than the mean of yours. Mine weigh more, so our parents must love me more.” </span></p>
<p><span style="line-height: 1.6;">“Not so fast,” said Aaron. “You may have a higher mean value, but is the difference statistically significant?” There was only one thing left for the boys to do: perform a <a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/t-for-2-should-i-use-a-paired-t-or-a-2-sample-t">two sample t-test</a>.</span></p>
<p><span style="line-height: 1.6;">In Minitab, Aaron selected </span><strong><span style="line-height: 1.6;">Stat > Basic Statistics > 2-Sample t…</span></strong></p>
<p>The boys left the default values at a confidence level of 95.0 and a hypothesized difference of 0. The alternative hypothesis was “Difference ≠ hypothesized difference” because the only question they were asking was “Is there a statistically significant difference?” between the two data sets.</p>
<p>The two troublemakers also selected “Graphs” and checked the options to display an individual value plot and a boxplot. They knew they should look at their data. Having the graphs available would also make it easier for them to communicate their results to higher authorities, in this case, their poor parents.</p>
<p><img alt="Individual Value Plot of Coal" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/bf541d8df2461a8edff9060789394b00/individual_value_plot_of_coal.png" style="width: 577px; height: 385px;" /></p>
<p><img alt="Boxplot of Coal" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/8945d7a038de654d008f68dc0a8886d3/boxplot_of_coal.png" style="width: 577px; height: 385px;" /></p>
<p>Both the individual value plots and boxplots showed that Aaron's bag of coal had pieces with the highest individual weights. But he also had the pieces with the least weight. So the values for his Christmas coal were scattered across a wider range than the values for Billy‘s Christmas coal. But was there really a difference?</p>
<p>Billy went running for his tables of Student‘s t-scores so he could interpret the resulting t-value of -0.71. Aaron simply looked at the resulting p-value of 0.481. The p-value was greater than 0.05 so the boys could not conclude there was a true difference in the weight of their Christmas "presents."</p>
<p><img alt="600" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/549762a9cb277536a76baedba32617d3/2_sample_t_test_coal.png" style="width: 683px; height: 305px;" /></p>
<p><span style="line-height: 1.6;">The boys dutifully reported the results, with illustrative graphs, each demanding that they get a little more to best the other. Clearly, receiving coal for Christmas had done nothing to reduce their level of competitiveness. Their parents realized the boys were probably not going to grow up to be upstanding citizens, but they may at least become good statisticians.</span></p>
<p>Happy Holidays.</p>
<p> </p>
<p style="line-height: 20.7999992370605px;"><strong>About the Guest Blogger</strong></p>
<p style="line-height: 20.7999992370605px;"><em><a href="https://www.linkedin.com/pub/matthew-barsalou/5b/539/198" target="_blank">Matthew Barsalou</a> is a statistical problem resolution Master Black Belt at <a href="http://www.3k-warner.de/" target="_blank">BorgWarner</a> Turbo Systems Engineering GmbH. He is a Smarter Solutions certified Lean Six Sigma Master Black Belt, ASQ-certified Six Sigma Black Belt, quality engineer, and quality technician, and a TÜV-certified quality manager, quality management representative, and auditor. He has a bachelor of science in industrial sciences, a master of liberal studies with emphasis in international business, and has a master of science in business administration and engineering from the Wilhelm Büchner Hochschule in Darmstadt, Germany. He is author of the books <a href="http://www.amazon.com/Root-Cause-Analysis-Step---Step/dp/148225879X/ref=sr_1_1?ie=UTF8&qid=1416937278&sr=8-1&keywords=Root+Cause+Analysis%3A+A+Step-By-Step+Guide+to+Using+the+Right+Tool+at+the+Right+Time" target="_blank">Root Cause Analysis: A Step-By-Step Guide to Using the Right Tool at the Right Time</a>, <a href="http://asq.org/quality-press/display-item/index.html?item=H1472" target="_blank">Statistics for Six Sigma Black Belts</a> and <a href="http://asq.org/quality-press/display-item/index.html?item=H1473&xvl=76115763" target="_blank">The ASQ Pocket Guide to Statistics for Six Sigma Black Belts</a>.</em></p>
Fun StatisticsHypothesis TestingStatisticsTue, 23 Dec 2014 13:00:00 +0000http://blog.minitab.com/blog/statistics-in-the-field/a-minitab-holiday-tale-featuring-the-two-sample-t-testGuest BloggerAre Preseason Football or Basketball Rankings More Accurate?
http://blog.minitab.com/blog/the-statistics-game/are-preseason-football-or-basketball-rankings-more-accurate
<p>College basketball season tips off today, and for the second straight season Kentucky is the #1 ranked preseason team in the AP poll. Last year Kentucky did not live up to that ranking in the regular season, going 24-10 and earning a lowly 8 seed in the NCAA tournament. But then, in the tournament, they overachieved and made a run all the way to the championship game...before losing to Connecticut.</p>
<p>In football, Florida State was the AP poll preseason #1 football team. While they are currently still undefeated, they aren't quite playing like the #1 team in the country. So this made me wonder, which preseason rankings are more accurate, football or basketball?</p>
<p>I gathered <a href="//cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/File/1d3961db92c5ba14bc90b2b8323b95f8/preseason_basketball_vs__football_rankings.MTW">data</a> from the last 10 seasons, and recorded the top 10 teams in the preseason AP poll for both football and basketball. Then I recorded the difference between their preseason ranking and their final ranking. Both sports had 10 teams that weren’t ranked or receiving votes in the final poll, so I gave all of those teams a final ranking of 40.</p>
Creating a Histogram to Compare Two Distributions
<p>Let’s start with a histogram to look at the distributions of the differences. (It's always a good idea to look at the distribution of your data when you're starting an analysis, whether you're looking at quality improvement data work or sports data for yourself.) </p>
<p>You can create this graph in Minitab <a href="http://www.minitab.com/products/minitab">Statistical Software</a> by selecting <strong>Graph > Histograms</strong>, choosing "With Groups" in the dialog box, and using the Basketball Difference and Football Difference columns as the graph variables:</p>
<p><img alt="Histogram" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/53055c57978dbfa85d28688cc816c98a/histogram_of_basketball_difference__football_difference.jpg" style="width: 720px; height: 480px;" /></p>
<p>The differences in the rankings appear to be pretty similar. Most of the data is towards the left side of this histogram, meaning for most cases the difference between the preseason and final ranking is pretty small.</p>
Conducting a Mann-Whitney Hypothesis Test on Two Medians
<p>We can further investigate the data by performing a hypothesis test. Because the data is heavily skewed, I’ll use <a href="http://blog.minitab.com/blog/the-statistics-game/do-the-data-really-say-female-named-hurricanes-are-more-deadly">a Mann-Whitney test</a>. This compares the medians of two samples with similarly-shaped distributions, as opposed to a <a href="http://blog.minitab.com/blog/understanding-statistics/guidelines-and-how-tos-for-the-2-sample-t-test">2-sample t test</a>, which compares the means. <span style="line-height: 20.7999992370605px;">The median is the middle value of the data. Half the observations are less than or equal to it, and half the observations are greater than or equal to it.</span><span style="line-height: 20.7999992370605px;"> </span></p>
<p>To perform this test in our statistical software, we select <strong>Stat > Nonparametrics > Mann-Whitney</strong>, then choose the appropriate columns for our first and second sample: </p>
<p><img alt="Mann-Whitney Test" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/1a1f239841b82e60170e6ecbc8077d4b/mann_whitney.jpg" style="width: 689px; height: 241px;" /></p>
<p>The basketball rankings have a smaller median difference than the football rankings. However, when we examine the <a href="http://blog.minitab.com/blog/understanding-statistics/three-things-the-p-value-cant-tell-you-about-your-hypothesis-test">p-value</a> we see that this difference is not statistically significant. There is not enough evidence to conclude that one preseason poll is more accurate than the other.</p>
<p>But what about the best teams? I grouped each of the top 3 ranked teams and looked at the median difference between their preseason and final rank.</p>
<p><img alt="Bar Chart" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/692a3db40dd5d3b4c20d539f92395629/bar_chart.jpg" style="width: 720px; height: 480px;" /></p>
<p>The preseason AP basketball poll has a smaller difference for the #1 and #3 ranked teams. But the football poll is better for the #2 team, having an impressive median value of 1. Overall, both polls are relatively good, as neither has a median value greater than 6. And the differences are close enough that we can’t conclude that one is more accurate than the other.</p>
What Does It Mean for the Teams?
<p>While the odds are against both Kentucky and Florida State to finish the season ranked #1 in their respective polls, previous seasons indicate that they’re still likely to finish as one of the top teams. This is better news for Kentucky, as being one of the top teams means they’ll easily make the NCAA basketball tournament and get a high seed. However, Florida State must finish as one of the top 4 teams, or else they’ll miss out on the football postseason completely.</p>
<p>So while we can’t conclude one poll is better than the other, teams at the top of the AP basketball poll are clearly much more likely to reach the postseason than football.</p>
Data AnalysisFun StatisticsHypothesis TestingStatistics in the NewsFri, 14 Nov 2014 15:03:33 +0000http://blog.minitab.com/blog/the-statistics-game/are-preseason-football-or-basketball-rankings-more-accurateKevin Rudy