Hypothesis Testing | MinitabBlog posts and articles about hypothesis testing, especially in the course of Lean Six Sigma quality improvement projects.
http://blog.minitab.com/blog/hypothesis-testing-2/rss
Wed, 01 Apr 2015 13:08:09 +0000FeedCreator 1.7.3How Could You Benefit from a Box-Cox Transformation?
http://blog.minitab.com/blog/applying-statistics-in-quality-projects/how-could-you-benefit-from-a-box-cox-transformation
<p>Imagine that you are watching a race and that you are located close to the finish line. When the first and fastest runners complete the race, the differences in times between them will probably be quite small.</p>
<p>Now wait until the last runners arrive and consider their finishing times. For these slowest runners, the differences in completion times will be extremely large. This is due to the fact that for longer racing times a small difference in speed will have a significant impact on completion times, whereas for the fastest runners, small differences in speed will have a small (but decisive) impact on arrival times.</p>
<p>This phenomenon is called “<a href="http://blog.minitab.com/blog/adventures-in-statistics/curing-heteroscedasticity-with-weighted-regression-in-minitab-statistical-software">heteroscedasticity</a>” (non-constant variance). In this example, the amount of Variation depends on the average value (small variations for shorter completion times, large variations for longer times).</p>
<p><img alt="" spellcheck="true" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/cb6a4d6b498b3525a6d18d579db20557/race.JPG" style="width: 781px; height: 120px;" /></p>
<p>This distribution of running times data will probably not follow the familiar bell-shaped curve (a.k.a. the normal distribution). The resulting distribution will be asymmetrical with a longer tail on the right side. This is because there's small variability on the left side with a short tail for smaller running times, and larger variability for longer running times on the right side, hence the longer tail.</p>
<p><img alt="" spellcheck="true" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/a70fc75a0255884e65f231ac072d42c2/distribution_plot.jpg" style="width: 578px; height: 344px;" /></p>
<p>Why does this matter?</p>
<ul>
<li>Model bias and spurious interactions: If you are performing a regression or a design of experiments (any statistical modelling), this asymmetrical behavior may lead to a bias in the model. If a factor has a significant effect on the average speed, because the variability is much larger for a larger average running time, many factors will seem to have a stronger effect when the mean is larger. This is not due, however, to a true factor effect but rather to an increased amount of variability that affects all factor effect estimates when the mean gets larger. This will probably generate spurious interactions due to a non-constant variation, resulting in a very complex model with many (spurious and unrealistic) interactions.</li>
<li>If you are performing a standard capability analysis, this analysis is based on the normality assumption. A substantial departure from normality will bias your capability estimates.</li>
</ul>
The Box-Cox Transformation
<p>One solution to this is to transform your data into normality using a Box-Cox transformation. Minitab will select the best mathematical function for this data transformation. The objective is to obtain a normal distribution of the transformed data (after transformation) and a constant variance.</p>
<p>Consider the asymmetrical function below :</p>
<p><img alt="" spellcheck="true" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/796a7b0d27c6613ac17f983c839701e5/transformed_distribution.jpg" style="width: 515px; height: 326px;" /></p>
<p> <img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/9b0c5998839682f685055bb5168ab540/log_function.JPG" style="width: 437px; height: 313px;" /></p>
<p>If a logarithmic transformation is applied to this distribution, the differences between smaller values will be expanded (because the slope of the logarithmic function is steeper when values are small) whereas the differences between larger values will be reduced (because of the very moderate slope of the log distribution for larger values). If you inflate differences on the left tail and reduce differences on the right side tail, the result will be a symmetrical normal distribution, and a variance that is now constant (whatever the mean). This is the reason why in the <a href="http://www.minitab.com/products/minitab">Minitab Assistant</a>, a Box- Cox transformation is suggested whenever this is possible for non-normal data, and why in the <span style="line-height: 18.9090900421143px;">Minitab </span><span style="line-height: 1.6;">regression or DOE (design of experiments) dialogue boxes, the Box-Cox transformation is an option that anyone may consider if needed to transform residual data into normality.</span></p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/fc4ec75ca192d81dabf8556ebdf751e8/transformation.JPG" style="width: 430px; height: 611px;" /></p>
<p>The diagram above illustrates how, thanks to a Box-Cox transformation, performed by the Minitab Assistant (in a capability analysis), an asymmetrical distribution has been transformed into a normal symmetrical distribution (with a successful normality test).</p>
Box-Cox Transformation and Variable Scale
<p>Note that Minitab will search for the best transformation function, which may not necessarily be a logarithmic transformation.</p>
<p>As a result of this transformation, the physical scale of your variable may be altered. When looking at a capability graph, one may not recognize his typical values for the variable scale (after transformation). However, the estimated Ppk and Pp capability indices will be reliable and based on a normal distribution. Similarly, in a regression model, you need to be aware that the coefficients will be modified, although the transformation is obviously useful to remove spurious interactions and to identify the factors that are really significant.</p>
Data AnalysisDesign of ExperimentsHypothesis TestingLearningQuality ImprovementRegression AnalysisStatisticsStatistics HelpStatsMon, 30 Mar 2015 12:00:00 +0000http://blog.minitab.com/blog/applying-statistics-in-quality-projects/how-could-you-benefit-from-a-box-cox-transformationBruno ScibiliaHow to Create a Graphical Version of the 1-sample t-Test in Minitab
http://blog.minitab.com/blog/adventures-in-statistics/how-to-create-a-graphical-version-of-the-1-sample-t-test-in-minitab
<p>This is a companion post for a series of blog posts about understanding hypothesis tests. In this series, I create a graphical equivalent to a 1-sample t-test and confidence interval to help you understand how it works more intuitively.</p>
<p>This post focuses entirely on the steps required to create the graphs. It’s a fairly technical and task-oriented post designed for those who need to create the graphs for illustrative purposes. If you’d instead like to gain a better understanding of the concepts behind the graphs, please see the following posts:</p>
<ul>
<li><a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-why-we-need-to-use-hypothesis-tests-in-statistics" target="_blank">Understanding Hypothesis Tests: Why We Need to Use Hypothesis Tests</a></li>
<li><a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-significance-levels-alpha-and-p-values-in-statistics" target="_blank">Understanding Hypothesis Tests: The Significance Level and P Values</a></li>
<li>Understanding Hypothesis Tests: Confidence Intervals (forthcoming)</li>
</ul>
<p>To create the following graphs, we’ll use Minitab’s probability distribution plots in conjunction with several statistics obtained from the 1-sample t output. If you’d like more information about the formulas that are involved, you can find them in Minitab at: <strong>Help > Methods and Formulas > Basic Statistics > 1-Sample t</strong>.</p>
<p>The data for this example is <a href="http://support.minitab.com/datasets/FamilyEnergyCost.MTW">FamilyEnergyCost</a> and it is just one of the many data set examples that can be found in <a href="http://support.minitab.com/datasets/">Minitab’s Data Set Library</a>. We’ll perform the regular 1-sample t-test with a null hypothesis mean of 260, and then graphically recreate the results. </p>
<p><img alt="1-sample t-test output from Minitab statistical software" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/e26965956d7d682888dd0c749e10f7af/1t_swo.png" style="width: 485px; height: 123px;" /></p>
How to Graph the Two-Tailed Critical Region for a Significance Level of 0.05
<p>To create a graphical equivalent to a 1-sample t-test, we’ll need to graph the t-distribution using the correct number of degrees of freedom. For a 1-sample t-test, the degrees of freedom equals the sample size minus 1. So, that’s 24 degrees of freedom for our sample of 25.</p>
<ol>
<li>In Minitab, choose: <strong>Graph > Probability Distribution Plot > View Probability</strong>.</li>
<li>In <strong>Distribution</strong>, select <strong>t</strong>.</li>
<li>In <strong>Degrees of freedom</strong>, enter <em>24</em>.</li>
<li>Click the <strong>Shaded Area</strong> tab.</li>
<li>In <strong>Define Shaded Area By</strong>, select <strong>Probability</strong> and <strong>Both Tails</strong>.</li>
<li>In <strong>Probability</strong>, enter <em>0.05</em>.</li>
<li>Click <strong>OK</strong>.</li>
</ol>
<p>You should see this graph.</p>
<p><img alt="Probability distribution plot of t-values" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/4201bc7734483312e6056e023f12272b/t_value_plot_crtical_region.png" style="width: 576px; height: 384px;" /></p>
<p>This graph shows the distribution of t-values for a sample of our size with the t-values for the end points of the critical region. The <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/hypothesis-tests/tests-of-means/what-is-a-t-value/" target="_blank">t-value</a> for our sample mean is 2.29 and it falls within the critical region.</p>
<p>For my blog posts, I thought displaying the x-axis in the same units as our measurement variable (energy costs) would make the graph easier to understand. To do this, we need to transform the x-axis scale from t-values to energy costs.</p>
<p>Transforming the t-values to energy costs for a distribution centered on the null hypothesis mean requires a simple calculation:</p>
<p style="margin-left: 40px;">Energy Cost = Null Hypothesis Mean + (t-value * SE Mean)</p>
<p>We’ll use the null hypothesis value that we entered in the dialog box (260) and the SE Mean value that appears in the 1-sample t-test output (30.8). We need to calculate the energy cost values for all of the t-values that will appear on the x-axis (-4 to +4).</p>
<p>For example, a t-value of 1 equals 290.8 (260 + (1*30.8). Zero is the null hypothesis value, which is 260.</p>
<p>Next, we need to replace the t-values with the energy cost equivalents.</p>
<ol>
<li>Choose <strong>Editor > Select Item > X Scale</strong>.</li>
<li>Choose <strong>Editor > Edit X Scale</strong>.</li>
<li>In <strong>Major Tick Position</strong>, choose <strong>Number of Ticks</strong> and enter <em>9</em>.</li>
<li>Click the <strong>Show</strong> tab and check the <strong>Low</strong> check box for <strong>Major ticks</strong> and <strong>Major tick labels</strong>.</li>
<li>Click the <strong>Labels</strong> tab of the dialog box that appears. Enter the energy cost values that you calculated as shown below. I use rounded values to keep the x-axis tidy. Click <strong>OK</strong>.</li>
</ol>
<p><img alt="Dialog box for showing the transformed values on the x-scale" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/f96d38fa628633a7e4ff3a12b44483c0/edit_scale_dialog.png" style="width: 400px; height: 348px;" /></p>
<p>You should see this graph. To cleanup the x-axis, I had to delete the t-values that were still showing from before. Simply click each t-value once and press the <strong>Delete</strong> key.</p>
<p><img alt="Probability distribution plot of t-distribution with the x-scale transformed to energy costs" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/69de6d5f41cf0703764895b379c5d3fb/t_value_plot_crtical_region2.png" style="width: 576px; height: 384px;" /></p>
<p>Let’s add a reference line to show where our sample mean falls within the sampling distribution and critical region. The trick here is that the x-axis still uses t-values despite displaying the energy costs. We need to use the t-value for our sample mean that appears in the 1-sample t output (2.29).</p>
<ol>
<li>Choose <strong>Editor > Add > Reference Lines</strong>.</li>
<li>In <strong>Show reference lines at X values</strong>, enter<em> 2.29.</em></li>
<li>Click <strong>OK</strong>.</li>
<li>Double click the <em>2.29</em> that now appears on the graph.</li>
<li>In the dialog box that appears, enter <em>330.6</em> in <strong>Text</strong>.</li>
<li>Click <strong>OK</strong>.</li>
</ol>
<p>After editing the title and the x-axis label, you should have a graph similar to the one below.</p>
<p><img alt="Probability distribution plot with two-tailed critical region for a significance level of 0.05" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/212878044412db4ec165745b18c010e8/sig_level_05.png" style="width: 576px; height: 384px;" /></p>
How to Graph the P Value for a 1-sample t-Test
<p>To do this, we’ll duplicate the graph we created above and then modify it. This allows us to reuse some of the work that we’ve already done.</p>
<ol>
<li>Make sure the graph we created is selected.</li>
<li>Choose <strong>Editor > Duplicate Graph</strong>.</li>
<li>Double click the blue distribution curve on the graph.</li>
<li>Click the <strong>Shaded Area</strong> tab in the dialog box that appears.</li>
<li>In <strong>Define Shaded Area By</strong>, select <strong>X Value</strong> and <strong>Both Tails</strong>.</li>
<li>In <strong>X value</strong>, enter <em>2.29</em>.</li>
<li>Click <strong>OK</strong>.</li>
</ol>
<p>You’ll need to edit the graph title and delete some extra numbers on the x-axis. After these edits, you should have a graph similar to this one.</p>
<p><img alt="Probability distribution plot that displays the p-value for our sample mean" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/4a599dfe53a1c065837de772a5b157fb/p_value.png" style="width: 576px; height: 384px;" /></p>
How to Graph the Confidence Interval for a 1-sample t-test
<p>To graphically recreate the confidence interval, we’ll need to start from scratch for this graph. </p>
<ol>
<li>In Minitab, choose: <strong>Graph > Probability Distribution Plot > View Probability</strong>.</li>
<li>In <strong>Distribution</strong>, select <strong>t</strong>.</li>
<li>In <strong>Degrees of freedom</strong>, enter <em>24</em>.</li>
<li>Click the <strong>Shaded Area</strong> tab.</li>
<li>In <strong>Define Shaded Area By</strong>, select <strong>Probability</strong> and <strong>Middle</strong>.</li>
<li>Enter <em>0.025</em> in both <strong>Probability 1</strong> and <strong>Probability 2</strong>.</li>
<li>Click <strong>OK</strong>.</li>
</ol>
<p>Your graph should look like this:</p>
<p><img alt="Probability distribution plot that represents a confidence interval with t-values" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/9ec475422d97330eb236d76f37ee576a/ci_t_values.png" style="width: 576px; height: 384px;" /></p>
<p>Like before, we’ll need to transform the x-axis into energy costs. For this graph, I’ll only display the x-values for the end points of the confidence interval and the sample mean. So, we need to convert the three t-values of -2.064, 0, 2.064.</p>
<p>The equation to transform the t-values to energy costs for a distribution centered on the sample mean is:</p>
<p style="margin-left: 40px;">Energy Cost = Sample Mean + (t-score * SE Mean)</p>
<p>We obtain the following rounded values that represent the lower confidence limit, sample mean, and upper confidence limit: 267, 330.6, 394.</p>
<p>Simply double click the values in the x-axis to edit each individual label. Replace the t-value with the energy cost value. After editing the graph title, you should have a visual representation of the confidence interval that looks like this. I rounded the values for the confidence limits.</p>
<p><img alt="Probability distribution plot that displays a visual representation of a 95% confidence interval around the sample mean" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/80de5f2397507752d74ffff86fbd94ea/ci_sample_mean.png" style="width: 576px; height: 384px;" /></p>
Consider Using Minitab's Command Language
<p>When I create multiple graphs that involve many steps, I generally use Minitab's command language. This may sound daunting if you're not familiar with using this command language. However, Minitab makes this easier for you.</p>
<p>After you create one graph, choose <strong>Editor > Copy Command Language</strong>, and paste it into a text editor, such as Notepad. Save the file with the extension *.mtb and you have a Minitab Exec file. This Exec file contains all of the edits you made. Now, you can easily create similar graphs simply by modifying the parts that you want to change.</p>
<p>You can also get help for the command language right in Minitab. First, make sure the command prompt is enabled by choosing <strong>Editor > Enable Commands</strong>. At the prompt, type <em>help dplot</em>, and Minitab displays the help specific to probability distribution plots!</p>
<p>To run an exec file, choose <strong>File > Other Files > Run an Exec</strong>. Click <strong>Select File</strong> and browse to the file you saved. Here are the MTB files for my graphs for the <a href="//cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/File/e4994557f813b872b03687363259faa2/prob_plot_alpha.mtb">critical region</a>, <a href="//cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/File/3a0e83912f4826db06ee8c0777a5cf73/prob_plot_p.mtb">P value</a>, and <a href="//cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/File/00f225aa85ce31c80cfae24c1704fa1c/ci_sample.mtb">confidence interval</a>.</p>
<p>Happy graphing!</p>
Hypothesis TestingSix SigmaWed, 25 Mar 2015 12:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/how-to-create-a-graphical-version-of-the-1-sample-t-test-in-minitabJim FrostUnderstanding Hypothesis Tests: Significance Levels (Alpha) and P values in Statistics
http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-significance-levels-alpha-and-p-values-in-statistics
<p>What do significance levels and P values mean in hypothesis tests? What <em>is </em>statistical significance anyway? In this post, I’ll continue to focus on concepts and graphs to help you gain a more intuitive understanding of how hypothesis tests work in statistics.</p>
<p>To bring it to life, I’ll add the significance level and P value to the graph in my previous post in order to perform a graphical version of the 1 sample t-test. It’s easier to understand when you can see what statistical significance truly means!</p>
<p>Here’s where we left off in <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-why-we-need-to-use-hypothesis-tests-in-statistics" target="_blank">my last post</a>. We want to determine whether our sample mean (330.6) indicates that this year's average energy cost is significantly different from last year’s average energy cost of $260.</p>
<p><img alt="Descriptive statistics for the example" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/135cd05bde9f7f16ab396a8525d2b09c/desc_stats.png" style="width: 302px; height: 87px;" /></p>
<p><img alt="Probability distribution plot for our example" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/76699c4c1f2bd6c83b88c1ac8e93aa54/sampling_dist_null.png" style="width: 595px; height: 397px;" /></p>
<p>The graph above shows the distribution of sample means we’d obtain under the assumption that the null hypothesis is true (population mean = 260) and we repeatedly drew a large number of random samples.</p>
<p>I left you with a question: where do we draw the line for statistical significance on the graph? Now we'll add in the significance level and the P value, which are the decision-making tools we'll need.</p>
<p>We'll use these tools to test the following hypotheses:</p>
<ul>
<li>Null hypothesis: The population mean equals the hypothesized mean (260).</li>
<li>Alternative hypothesis: The population mean differs from the hypothesized mean (260).</li>
</ul>
What Is the Significance Level (Alpha)?
<p>The significance level, also denoted as alpha or α, is the probability of rejecting the null hypothesis when it is true. For example, a significance level of 0.05 indicates a 5% risk of concluding that a difference exists when there is no actual difference.</p>
<p>These types of definitions can be hard to understand because of their technical nature. A picture makes the concepts much easier to comprehend!</p>
<p>The significance level determines how far out from the null hypothesis value we'll draw that line on the graph. To graph a significance level of 0.05, we need to shade the 5% of the distribution that is furthest away from the null hypothesis.</p>
<p><img alt="Probability plot that shows the critical regions for a significance level of 0.05" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/212878044412db4ec165745b18c010e8/sig_level_05.png" style="width: 595px; height: 397px;" /></p>
<p>In the graph above, the two shaded areas are equidistant from the null hypothesis value and each area has a probability of 0.025, for a total of 0.05. In statistics, we call these shaded areas the <em>critical region</em> for a two-tailed test. If the population mean is 260, we’d expect to obtain a sample mean that falls in the critical region 5% of the time. The critical region defines how far away our sample statistic must be from the null hypothesis value before we can say it is unusual enough to reject the null hypothesis.</p>
<p>Our sample mean (330.6) falls within the critical region, which indicates it is statistically significant at the 0.05 level.</p>
<p>We can also see if it is statistically significant using the other common significance level of 0.01.</p>
<p><img alt="Probability plot that shows the critical regions for a significance level of 0.01" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/8744b853f28396be001c2ee9678a9c14/sig_level_01.png" style="width: 595px; height: 397px;" /></p>
<p>The two shaded areas each have a probability of 0.005, which adds up to a total probability of 0.01. This time our sample mean does not fall within the critical region and we fail to reject the null hypothesis. This comparison shows why you need to choose your significance level before you begin your study. It protects you from choosing a significance level because it conveniently gives you significant results!</p>
<p>Thanks to the graph, we were able to determine that our results are statistically significant at the 0.05 level without using a P value. However, when you use the numeric output produced by <a href="http://www.minitab.com/en-us/products/minitab/features/" target="_blank">statistical software</a>, you’ll need to compare the P value to your significance level to make this determination.</p>
What Are P values?
<p>P-values are the probability of obtaining an effect at least as extreme as the one in your sample data, assuming the truth of the null hypothesis.</p>
<p>This definition of P values, while technically correct, is a bit convoluted. It’s easier to understand with a graph!</p>
<p>To graph the P value for our example data set, we need to determine the distance between the sample mean and the null hypothesis value (330.6 - 260 = 70.6). Next, we can graph the probability of obtaining a sample mean that is at least as extreme in both tails of the distribution (260 +/- 70.6).</p>
<p><img alt="Probability plot that shows the p-value for our sample mean" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/4a599dfe53a1c065837de772a5b157fb/p_value.png" style="width: 595px; height: 397px;" /></p>
<p>In the graph above, the two shaded areas each have a probability of 0.01556, for a total probability 0.03112. This probability represents the likelihood of obtaining a sample mean that is at least as extreme as our sample mean in both tails of the distribution if the population mean is 260. That’s our P value!</p>
<p>When a P value is less than or equal to the significance level, you reject the null hypothesis. If we take the P value for our example and compare it to the common significance levels, it matches the previous graphical results. The P value of 0.03112 is statistically significant at an alpha level of 0.05, but not at the 0.01 level.</p>
<p>If we stick to a significance level of 0.05, we can conclude that the average energy cost for the population is greater than 260.</p>
<p>A common mistake is to interpret the P-value as the probability that the null hypothesis is true. To understand why this interpretation is incorrect, please read my blog post <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-correctly-interpret-p-values">How to Correctly Interpret P Values</a>.</p>
Discussion about Statistically Significant Results
<p>A hypothesis test evaluates two mutually exclusive statements about a population to determine which statement is best supported by the sample data. A test result is statistically significant when the sample statistic is unusual enough relative to the null hypothesis that we can reject the null hypothesis for the entire population. “Unusual enough” in a hypothesis test is defined by:</p>
<ul>
<li>The assumption that the null hypothesis is true—the graphs are centered on the null hypothesis value.</li>
<li>The significance level—how far out do we draw the line for the critical region?</li>
<li>Our sample statistic—does it fall in the critical region?</li>
</ul>
<p>Keep in mind that there is no magic significance level that distinguishes between the studies that have a true effect and those that don’t with 100% accuracy. The common alpha values of 0.05 and 0.01 are simply based on tradition. For a significance level of 0.05, expect to obtain sample means in the critical region 5% of the time when <em>the</em> <em>null hypothesis is</em> <em>true</em>. In these cases, you won’t know that the null hypothesis is true but you’ll reject it because the sample mean falls in the critical region. That’s why the significance level is also referred to as an <em>error</em> rate!</p>
<p>This <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/hypothesis-tests/basics/type-i-and-type-ii-error/" target="_blank">type of error</a> doesn’t imply that the experimenter did anything wrong or require any other unusual explanation. The graphs show that when the null hypothesis is true, it is possible to obtain these unusual sample means for no reason other than random sampling error. It’s just luck of the draw.</p>
<p>Significance levels and P values are important tools that help you quantify and control this type of error in a hypothesis test. Using these tools to decide when to reject the null hypothesis increases your chance of making the correct decision.</p>
<p>In my next post, I’ll continue to use this graphical framework to help you understand confidence intervals!</p>
<p>If you'd like to see how I made these graphs, please read: <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-create-a-graphical-version-of-the-1-sample-t-test-in-minitab" target="_blank">How to Create a Graphical Version of the 1-sample t-Test</a>.</p>
Data AnalysisHypothesis TestingLearningStatisticsStatistics HelpStatsThu, 19 Mar 2015 12:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-significance-levels-alpha-and-p-values-in-statisticsJim FrostP-value Roulette: Making Hypothesis Testing a Winner’s Game
http://blog.minitab.com/blog/rkelly/p-value-roulette-making-hypothesis-testing-a-winner%E2%80%99s-game
<p>Welcome to the Hypothesis Test Casino! The featured game of the house is roulette. But this is no <em>ordinary</em> game of roulette. This is p-value roulette!</p>
<p>Here’s how it works: We have two roulette wheels, the Null wheel and the Alternative wheel. Each wheel has 20 slots (instead of the usual 37 or 38). You get to bet on one slot.</p>
<p><img alt="http://upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Edvard_Munch_-_At_the_Roulette_Table_in_Monte_Carlo_-_Google_Art_Project.jpg/256px-Edvard_Munch_-_At_the_Roulette_Table_in_Monte_Carlo_-_Google_Art_Project.jpg" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/8647ae2930d63e128d09f0b2cc5cdb87/p_value_roulette.jpg" style="line-height: 20.7999992370605px; border-width: 1px; border-style: solid; margin: 10px 15px; width: 256px; height: 166px; float: right;" /></p>
<p>What happens if the ball lands in the slot you bet on? Well, that depends on which wheel we spin. If we spin the Null wheel, you lose your bet. But if we spin the Alternative wheel, you win!</p>
<p>I’m sorry, but we can’t tell you which wheel we’re spinning.</p>
<p>Doesn’t that sound like a good game?</p>
<p>Not convinced yet? I assure you the odds are in your favor <em>if </em>you choose your slot wisely. Look, I’ll show you a graph of some data from the Null wheel. We spun it 10,000 times and counted how many times the ball landed in each slot. As you can see each slot is just as likely as any other, with a probability of about 0.05 each. That means there’s a 95% probability the ball won’t land on your slot, so you have only a 5% chance of losing—no matter what—<em>if</em> we happen to spin the Null wheel.</p>
<p><img alt="histogram of p values for null hypothesis" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/dc5efcd7001f33a77bea1c635af837e5/histogram_of_p_values_null_hypothesis.png" style="width: 576px; height: 384px;" /></p>
<p>What about that Alternative wheel, you ask? Well, we’ve had quite a few different Alternative wheels over the years. Here’s a graph of some data from one we were spinning last year:</p>
<p><img alt="histogram of p values from alternative hypothesis" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/dd0cafe3375f3202adaf3542d15eb9ab/histogram_of_p_values_alternative_hypothesis.png" style="width: 576px; height: 384px;" /></p>
<p>And just a few months ago, we had a different one. Check out the data from this one. It was very, very popular.</p>
<p><img alt=" histogram of p-values from popular alternative hypothesis" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/fc6f0ff641e7eb4d3f7750c8163ac968/histogram_of_p_values_alternative_hypothesis_2.png" style="width: 576px; height: 384px;" /></p>
<p>Now that’s what I call an Alternative! People in the know always picked the first slot. You can see why.</p>
<p>I’m not allowed to show you data from the current game. But I assure you the Alternatives all follow this same pattern. They tend to favor those smaller numbers.</p>
<p>So, you’d like to play? Great! Which slot would you like to bet on?</p>
Is this on the level?
<p>No, I don’t really have a casino with two roulette wheels. My graphs are simulated p-values for a <a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/what-is-a-t-test-and-why-is-it-like-telling-a-kid-to-clean-up-that-mess-in-the-kitchen">1-sample t-test</a>. The null hypothesis is that the mean of a process or population is 5. The two-sided alternative is that the mean is different from 5. In my first graph, the null hypothesis was true: I used Minitab to generate random samples of size 20 from a normal distribution with mean 5 and standard deviation of 1. For the other two graphs, the only thing I changed was the mean of the normal distribution I sampled from. For the second graph, the mean was 5.3. For the final graph, the mean was 5.75.</p>
<p>For just about any hypothesis test you do in Minitab <a href="http://www.minitab.com/products/minitab">Statistical Software</a>, you will see a p-value. Once you understand how p-values work, you will have greater insight into what they are telling you. Let’s see what we can learn about p-values from playing p-value roulette.</p>
<ol>
<li>Just as you didn’t know whether you are spinning the Null or Alternative wheel, you don’t know for sure whether the null hypothesis is true or not. But basing your decision to reject the null hypothesis on the p-value favors your chance of making a good decision.<br />
</li>
<li>If the null hypothesis is true, then any p-value is just as likely as any other. You control the probability of making a Type I error by rejecting only when the p-value falls within a narrow range, typically 0.05 or smaller. A <a href="http://blog.minitab.com/blog/the-stats-cat/understanding-type-1-and-type-2-errors-from-the-feline-perspective-all-mistakes-are-not-equal">Type I error</a> occurs if you incorrectly reject a true null hypothesis.<br />
</li>
<li>If the alternative hypothesis is true, then smaller p-values become more likely and larger p-values become less likely. That’s why you can think of a small p-value as evidence in favor of the alternative hypothesis.<br />
</li>
<li>It is tempting to try to interpret the p-value as the probability that the null hypothesis is true. But that’s not what it is. The null hypothesis is either true, or it’s not. Each time you “spin the wheel” the ball will land in a different slot, giving you a different p-value. But the truth of the null hypothesis—or lack thereof—remains unchanged.<br />
</li>
<li>In the roulette analogy there were different alternative wheels, because there is not usually just a single alternative condition. There are infinitely many mean values that are not equal to 5; my graphs looked at just two of these.<br />
</li>
<li>The probability of rejecting the null hypothesis when the alternative hypothesis is true is called the power of the test. In the 1-sample t-test, the power depends on how different the mean is from the null hypothesis value, relative to the standard error. While you don’t control the true mean, you can reduce the standard error by taking a larger sample. This will give the test greater power.<br />
</li>
</ol>
You Too Can Be a Winner!
<p>To be a winner at p-value roulette, you need to make sure you are performing the right hypothesis test, and that your data fit the assumptions of that test. Minitab’s <a href="http://www.minitab.com/en-us/products/minitab/assistant/">Assistant menu</a> can help you with that. The Assistant helps you choose the right statistical analysis, provides easy-to-understand guidelines to walk you through data collection and analysis. Then it gives you clear graphical output to let you know how to interpret your p-value, while helping you evaluate whether your data are appropriate, so you can trust your results.</p>
<p> </p>
Hypothesis TestingStatisticsStatistics HelpStatsThu, 12 Mar 2015 11:00:00 +0000http://blog.minitab.com/blog/rkelly/p-value-roulette-making-hypothesis-testing-a-winner%E2%80%99s-gameRob KellyUnderstanding Hypothesis Tests: Why We Need to Use Hypothesis Tests in Statistics
http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-why-we-need-to-use-hypothesis-tests-in-statistics
<p>Hypothesis testing is an essential procedure in statistics. A hypothesis test evaluates two mutually exclusive statements about a population to determine which statement is best supported by the sample data. When we say that a finding is statistically significant, it’s thanks to a hypothesis test. How do these tests really work and what does statistical significance actually mean?</p>
<p>In this series of three posts, I’ll help you intuitively understand how hypothesis tests work by focusing on concepts and graphs rather than equations and numbers. After all, a key reason to use <a href="http://www.minitab.com/products/minitab/features/">statistical software </a>like Minitab is so you don’t get bogged down in the calculations and can instead focus on understanding your results.</p>
<p>To kick things off in this post, I highlight the rationale for using hypothesis tests with an example.</p>
The Scenario
<p>An economist wants to determine whether the monthly energy cost for families has changed from the previous year, when the mean cost per month was $260. The economist randomly samples 25 families and records their energy costs for the current year. (The data for this example is <a href="http://support.minitab.com/datasets/FamilyEnergyCost.MTW">FamilyEnergyCost</a> and it is just one of the many data set examples that can be found in <a href="http://support.minitab.com/datasets/">Minitab’s Data Set Library</a>.)</p>
<p><img alt="Descriptive statistics for family energy costs" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/135cd05bde9f7f16ab396a8525d2b09c/desc_stats.png" style="width: 302px; height: 87px;" /></p>
<p>I’ll use these descriptive statistics to create a probability distribution plot that shows you the importance of hypothesis tests. Read on!</p>
The Need for Hypothesis Tests
<p>Why do we even need hypothesis tests? After all, we took a random sample and our sample mean of 330.6 is different from 260. That <em>is</em> different, right? Unfortunately, the picture is muddied because we’re looking at a sample rather than the entire population.</p>
<p>Sampling error is the difference between a sample and the entire population. Thanks to sampling error, it’s entirely possible that while our <em>sample </em>mean is 330.6, the population mean could still be 260. Or, to put it another way, if we repeated the experiment, it’s possible that the second sample mean could be close to 260. A hypothesis test helps assess the likelihood of this possibility!</p>
Use the Sampling Distribution to See If Our Sample Mean is Unlikely
<p>For any given random sample, the mean of the sample almost certainly doesn’t equal the true mean of the population due to sampling error. For our example, it’s unlikely that the mean cost for the entire population is exactly 330.6. In fact, if we took multiple random samples of the same size from the same population, we could plot a distribution of the sample means.</p>
<p>A sampling distribution is the distribution of a statistic, such as the mean, that is obtained by repeatedly drawing a large number of samples from a specific population. This distribution allows you to determine the probability of obtaining the sample statistic.</p>
<p>Fortunately, I can create a plot of sample means without collecting many different random samples! Instead, I’ll create a probability distribution plot using the t-distribution, the sample size, and the <a href="http://blog.minitab.com/blog/adventures-in-statistics/assessing-variability-for-quality-improvement" target="_blank">variability</a> in our sample to graph the sampling distribution.</p>
<p>Our goal is to determine whether our sample mean is significantly different from the null hypothesis mean. Therefore, we’ll use the graph to see whether our sample mean of 330.6 is unlikely assuming that the population mean is 260. The graph below shows the expected distribution of sample means.</p>
<p><img alt="Sampling distribution plot for the null hypothesis" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/76699c4c1f2bd6c83b88c1ac8e93aa54/sampling_dist_null.png" style="width: 595px; height: 397px;" /></p>
<p>You can see that the most probable sample mean is 260, which makes sense because we’re assuming that the null hypothesis is true. However, there is a reasonable probability of obtaining a sample mean that ranges from 167 to 352, and even beyond! The takeaway from this graph is that while our sample mean of 330.6 is not the most probable, it’s also not outside the realm of possibility.</p>
The Role of Hypothesis Tests
<p>We’ve placed our sample mean in the context of all possible sample means while assuming that the null hypothesis is true. Are these results statistically significant?</p>
<p>As you can see, there is no magic place on the distribution curve to make this determination. Instead, we have a continual decrease in the probability of obtaining sample means that are further from the null hypothesis value. Where do we draw the line?</p>
<p>This is where hypothesis tests are useful. A hypothesis test allows us quantify the probability that our sample mean is unusual. In <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-significance-levels-alpha-and-p-values-in-statistics">my next blog post</a>, I’ll continue to use this graphical framework and add in the significance level and P value to show how hypothesis tests work and what statistical significance means.</p>
<p>If you'd like to see how I made these graphs, please read: <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-create-a-graphical-version-of-the-1-sample-t-test-in-minitab" target="_blank">How to Create a Graphical Version of the 1-sample t-Test</a>.</p>
Data AnalysisHypothesis TestingStatisticsStatistics HelpStatsThu, 05 Mar 2015 15:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-why-we-need-to-use-hypothesis-tests-in-statisticsJim FrostChoosing Between a Nonparametric Test and a Parametric Test
http://blog.minitab.com/blog/adventures-in-statistics/choosing-between-a-nonparametric-test-and-a-parametric-test
<p>It’s safe to say that most people who use statistics are more familiar with parametric analyses than nonparametric analyses. Nonparametric tests are also called distribution-free tests because they don’t assume that your data follow a specific distribution.</p>
<p>You may have heard that you should use nonparametric tests when your data don’t meet the assumptions of the parametric test, especially the assumption about normally distributed data. That sounds like a nice and straightforward way to choose, but there are additional considerations.</p>
<p>In this post, I’ll help you determine when you should use a:</p>
<ul>
<li>Parametric analysis to test group means.</li>
<li>Nonparametric analysis to test group medians.</li>
</ul>
<p>In particular, I'll focus on an important reason to use nonparametric tests that I don’t think gets mentioned often enough!</p>
Hypothesis Tests of the Mean and Median
<p>Nonparametric tests are like a parallel universe to parametric tests. The table shows related pairs of <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/hypothesis-tests/basics/hypothesis-tests-in-minitab/" target="_blank">hypothesis tests</a> that <a href="http://www.minitab.com/en-us/products/minitab/features/" target="_blank">Minitab statistical software</a> offers.</p>
<p style="text-align: center;"><strong>Parametric tests (means)</strong></p>
<p style="text-align: center;"><strong>Nonparametric tests (medians)</strong></p>
<p style="text-align: center;">1-sample t test</p>
<p style="text-align: center;">1-sample Sign, 1-sample Wilcoxon</p>
<p style="text-align: center;">2-sample t test</p>
<p style="text-align: center;">Mann-Whitney test</p>
<p style="text-align: center;">One-Way ANOVA</p>
<p style="text-align: center;">Kruskal-Wallis, Mood’s median test</p>
<p style="text-align: center;">Factorial DOE with one factor and one blocking variable</p>
<p style="text-align: center;">Friedman test</p>
Reasons to Use Parametric Tests
<p><strong>Reason 1: Parametric tests can perform well with skewed and nonnormal distributions</strong></p>
<p>This may be a surprise but parametric tests can perform well with continuous data that are nonnormal if you satisfy these sample size guidelines.</p>
<p style="text-align: center;"><strong>Parametric analyses</strong></p>
<p style="text-align: center;"><strong>Sample size guidelines for nonnormal data</strong></p>
<p style="text-align: center;">1-sample t test</p>
<p style="text-align: center;">Greater than 20</p>
<p style="text-align: center;">2-sample t test</p>
<p style="text-align: center;">Each group should be greater than 15</p>
<p style="text-align: center;">One-Way ANOVA</p>
<ul>
<li style="text-align: center;">If you have 2-9 groups, each group should be greater than 15.</li>
<li style="text-align: center;">If you have 10-12 groups, each group should be greater than 20.</li>
</ul>
<p><strong>Reason 2: Parametric tests can perform well when the spread of each group is different</strong></p>
<p>While nonparametric tests don’t assume that your data follow a normal distribution, they do have other assumptions that can be hard to meet. For nonparametric tests that compare groups, a common assumption is that the data for all groups must have the same spread (dispersion). If your groups have a different spread, the nonparametric tests might not provide valid results.</p>
<p>On the other hand, if you use the 2-sample t test or One-Way ANOVA, you can simply go to the <strong>Options</strong> subdialog and uncheck <em>Assume equal variances</em>. Voilà, you’re good to go even when the groups have different spreads!</p>
<p><strong>Reason 3: Statistical power</strong></p>
<p>Parametric tests usually have more <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/power-and-sample-size/what-is-power/" target="_blank">statistical power</a> than nonparametric tests. Thus, you are more likely to detect a significant effect when one truly exists.</p>
Reasons to Use Nonparametric Tests
<p><strong>Reason 1: Your area of study is better represented by the median</strong></p>
<p><img alt="Comparing two skewed distributions" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/7223b01bc095dbd652bd863be5288cfe/mean_or_median.png" style="float: right; width: 200px; height: 181px; margin: 10px 15px;" />This is my favorite reason to use a nonparametric test and the one that isn’t mentioned often enough! The fact that you <em>can</em> perform a parametric test with nonnormal data doesn’t imply that the mean is the best <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/summary-statistics/measures-of-central-tendency/" target="_blank">measure of the central tendency</a> for your data.</p>
<p>For example, the center of a skewed distribution, like income, can be better measured by the median where 50% are above the median and 50% are below. If you add a few billionaires to a sample, the mathematical mean increases greatly even though the income for the typical person doesn’t change.</p>
<p>When your distribution is skewed enough, the mean is strongly affected by changes far out in the distribution’s tail whereas the median continues to more closely reflect the center of the distribution. For these two distributions, a random sample of 100 from each distribution produces means that are significantly different, but medians that are not significantly different.</p>
<p>Two of my colleagues have written excellent blog posts that illustrate this point:</p>
<ul>
<li>Michelle Paret: <a href="http://blog.minitab.com/blog/michelle-paret/using-the-mean-its-not-always-a-slam-dunk" target="_blank">Using the Mean in Data Analysis: It’s Not Always a Slam-Dunk</a></li>
<li>Redouane Kouiden: <a href="http://blog.minitab.com/blog/statistics-for-lean-six-sigma/the-non-parametric-economy-what-does-average-actually-mean" target="_blank">The Non-parametric Economy: What Does Average Actually Mean?</a></li>
</ul>
<p><strong>Reason 2: You have a very small sample size</strong></p>
<p>If you don’t meet the sample size guidelines for the parametric tests and you are not confident that you have normally distributed data, you should use a nonparametric test. When you have a really small sample, you might not even be able to ascertain the distribution of your data because the distribution tests will lack sufficient power to provide meaningful results.</p>
<p>In this scenario, you’re in a tough spot with no valid alternative. Nonparametric tests have less power to begin with and it’s a double whammy when you add a small sample size on top of that!</p>
<p><strong>Reason 3: You have ordinal data, ranked data, or outliers that you can’t remove</strong></p>
<p>Typical parametric tests can only assess continuous data and the results can be significantly affected by outliers. Conversely, some nonparametric tests can handle ordinal data, ranked data, and not be seriously affected by outliers. Be sure to check the assumptions for the nonparametric test because each one has its own data requirements.</p>
Closing Thoughts
<p>It’s commonly thought that the need to choose between a parametric and nonparametric test occurs when your data fail to meet an assumption of the parametric test. This can be the case when you have both a small sample size and nonnormal data. However, other considerations often play a role because parametric tests can often handle nonnormal data. Conversely, nonparametric tests have strict assumptions that you can’t disregard.</p>
<p>The decision often depends on whether the mean or median more accurately represents the center of your data’s distribution.</p>
<ul>
<li>If the mean accurately represents the center of your distribution and your sample size is large enough, consider a parametric test because they are more powerful.</li>
<li>If the median better represents the center of your distribution, consider the nonparametric test even when you have a large sample.</li>
</ul>
<p>Finally, if you have a very small sample size, you might be stuck using a nonparametric test. Please, collect more data next time if it is at all possible! As you can see, the sample size guidelines aren’t really that large. Your chance of detecting a significant effect when one exists can be very small when you have both a small sample size and you need to use a less efficient nonparametric test!</p>
Hypothesis TestingStatisticsStatistics HelpThu, 19 Feb 2015 13:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/choosing-between-a-nonparametric-test-and-a-parametric-testJim FrostWhat’s the Probability that Your Favorite Football Team Will Win?
http://blog.minitab.com/blog/customized-data-analysis/what%E2%80%99s-the-probability-that-your-favorite-football-team-will-win
<div>
<p>If you wanted to figure out the probability that your favorite football team will win their next game, how would you do it? My colleague <a href="http://blog.minitab.com/blog/understanding-statistics-and-its-application">Eduardo Santiago</a> and I recently looked at this question, and in this post we'll share how we approached the solution. Let’s start by breaking down this problem:<img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/8954fcace8f66a536aca06fad36a4c5a/boy_football_200.png" style="margin: 10px 15px; float: right; width: 200px; height: 200px;" /></p>
<ol>
<li>There are only two possible outcomes: your favorite team wins, or they lose. Ties are a possibility, but they're very rare. So, to simplify things a bit, we’ll assume they are so unlikely that could be disregarded from this analysis.</li>
<li>There are numerous factors to consider.
<ol style="list-style-type:lower-alpha;">
<li>What will the playing conditions be?</li>
<li>Are key players injured?</li>
<li>Do they match up well with their opponent?</li>
<li>Do they have home-field advantage?</li>
<li>And the list goes on...</li>
</ol>
</li>
</ol>
<p>First, since we assumed the outcome is binary, we can put together a <a href="http://blog.minitab.com/blog/real-world-quality-improvement/using-binary-logistic-regression-to-investigate-high-employee-turnover">Binary Logistic Regression</a> model to predict the probability of a win occurring. Next, we need to find which predictors would be best to include. After <a href="http://www.thepredictiontracker.com/ncaaresults.php" target="_blank">a little research</a>, we found the betting markets seem to take all of this information into account. Basically, we are utilizing the wisdom of the masses to find out what they believe will happen. Since betting markets take this into account, we decided to look at the probability of a win, given the spread of a NCAA football game. </p>
Data Collection
<p>If you are not convinced about how accurate the spreads can be in determining the outcome of a game: win or loss, we collected data for every college football game played <span style="line-height: 20.7999992370605px;">between 2000 and 2014</span><span style="line-height: 1.6;">. The structure of the data is illustrated below. The third column has the spread (or line) provided by casinos at Vegas, and the last column displayed is the actual score differential (vscore – hscore).</span></p>
<p><strong><em>Note</em></strong><em>: In betting lines, a negative spread indicates how many points you are favored over the opponent. In short, you are giving the opponent a certain number of points. </em></p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/52aaa628ea28b55523232a9b2da6b623/table1.png" style="width: 600px; height: 352px;" /></p>
<p><span style="line-height: 1.6;">The original win-or-lose question can be rephrased then as follows: Is the difference between the spreads and actual score differentials statistically significant?</span></p>
<p>Since we have two populations that are dependent we would compare them via a paired t test. In other words, both the <em>Spread</em> and <em>scoreDiffer</em> are observations (a priori and a posteriori) for the same game and they reflect the relative strength of the home team <em>i</em> versus the road team <em>j</em>.</p>
<p>Using <strong>Stat > Basic Statistics > Paired t </strong>in Minitab Statistical Software, we get the output below.</p>
<p style="margin-left: 40px;"><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/5531697c176006a4057f4ab7b6fda7dc/t_test_output.png" style="width: 500px; height: 189px;" /></p>
<p>Since the p-value is larger than 0.05, we can conclude from the 15 years of data that the average difference between Las Vegas spreads and actual score differentials is not significantly different from zero. With this we are saying that the bias that could exist between both measures of relative strength for teams is not different from zero, which in lay terms means that <em>on average</em> the error that exists between Vegas and actual outcomes is negligible.</p>
<p>It is worth noting that the results above were obtained with a sample size of 10,476 games! So we hope you'll excuse our not including <a href="http://blog.minitab.com/blog/understanding-statistics/how-much-data-do-you-really-need-check-power-and-sample-size">power calculations</a> here.</p>
<p>As a final remark on spreads, the histogram of the differences below shows a couple of interesting things:</p>
<ul>
<li>The average difference between the spreads and score differentials seem to be very close to zero. So don’t get too excited yet, as the spreads cannot be used to predict the exact score differential for a game. Nevertheless, with extremely high probability the spread will be very close to the score differential.</li>
<li>The standard deviation, however, is 15.5 points. That means that if a game shows a spread for your favorite team of -3 points, the outcome could be with high confidence within plus or minus 2 standard deviations of the point estimate, which is -3 ± 31 points in this case. So your favorite team could win by 34 points, or lose by 28!</li>
</ul>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f8f64fe200b85bcd5753a62737735de3/histogram1.png" style="width: 577px; height: 385px;" /></p>
<p align="center"><em>Figure 1 - Distribution of the differences between scores and spreads</em></p>
The Binary Logistic Regression Model
<p>By this point, we hope you are convinced about how good these spread values could be. To make the output more readable we summarized the data as follows:</p>
</div>
<p style="margin-left: 40px;"><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/218aa990c975fa2292d84926ba0002f0/table2.png" style="width: 250px; height: 405px;" /></p>
Creating our Binary Logistic Regression Model
<p>After summarizing the data, we used the Binary Fitted Line Plot (new in Minitab 17) to come up with our model. </p>
<p>If you are following along, here are the steps:</p>
<ol>
<li>Go to <strong>Stat > Regression > Binary Fitted Line Plot</strong></li>
<li>Fill out the dialog box as shown below and click <strong>OK</strong>.</li>
</ol>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ae06bb10e129b9c527b07700a57a7e2f/dialog1.png" style="width: 600px; height: 457px;" /></p>
<p><span style="line-height: 1.6;">The steps will produce the following graph:</span></p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/e2acef300ee9d1cd605ff089c217c8d5/binary_fitted_line_plot_w1024.png" style="width: 600px; height: 400px;" /></p>
Interpreting the Plot
<p>If your team is favored to win by 25 points or more, you have a very good chance of winning the game, but what if the spread is much closer?</p>
<p>For the 2014 National Championship, Ohio State was an underdog by 6 points to Oregon. Looking at the Binary Fitted Line Plot the probability of a 6-point underdog to win the game is close to 31% in college football. </p>
<p>Ohio State University ended up beating Oregon by 22 points. Given that the differences described in Figure 1 are normally distributed with respect to zero, then if we assume the spread is given (or known), we can compute the probability of the national championship game outcome being as extreme—or more—as it turned out.</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/977aafa9be7a8e2736b1340bca0b3b62/distribution_plot.png" style="width: 576px; height: 384px;" /></p>
<p>With Ohio State 6 point underdogs, and a standard deviation of 15.53, we can run a Probability Distribution Plot to show that Ohio State would win by 22 points or more 3.6% of the time.</p>
<p>Eduardo Santiago and myself will be giving a talk on using statistics to rank college football teams at the upcoming <a href="http://www.amstat.org/meetings/csp/2015/" target="_blank">Conference on Statistical Practice</a> in New Orleans. Our talk is February 21 at 2 p.m. and we would love to have you join. </p>
Fun StatisticsHypothesis TestingRegression AnalysisThu, 12 Feb 2015 13:00:00 +0000http://blog.minitab.com/blog/customized-data-analysis/what%E2%80%99s-the-probability-that-your-favorite-football-team-will-winDaniel GriffithStatistics: Another Weapon in the Galactic Patrol’s Arsenal
http://blog.minitab.com/blog/statistics-in-the-field/statistics-another-weapon-in-the-galactic-patrol%E2%80%99s-arsenal
<p><em><span style="line-height: 1.6;">by Matthew Barsalou, guest blogger. </span></em></p>
<p>E. E. Doc <a href="http://en.wikipedia.org/wiki/E._E._Smith" target="_blank">Smith</a>, one of the greatest authors ever, wrote many classic books such as <a href="http://en.wikipedia.org/wiki/Skylark_%28series%29" target="_blank">The Skylark of </a><a href="http://en.wikipedia.org/wiki/Skylark_%28series%29">Space</a> and his <a href="http://en.wikipedia.org/wiki/Lensman_series" target="_blank">Lensman</a> series. Doc Smith’s imagination knew no limits; his Galactic <a href="http://en.wikipedia.org/wiki/Galactic_Patrol" target="_blank">Patrol</a> had millions of combat fleets under its command and possessed planets turned into movable, armored weapons platforms. Some of the Galactic Patrol’s weapons may be well known. For example, there is the sunbeam, which concentrated the entire output of a sun’s energy into one beam.</p>
<p><span style="line-height: 1.6;"><img alt="amazing stories featuring E. E. "Doc" Smith" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/0d1ef573ea1b75bd2e6364f219ec6a19/docsmithcover.png" style="border-width: 1px; border-style: solid; margin: 10px 15px; float: right; width: 296px; height: 400px;" />The Galactic Patrol also created the negasphere, a planet-sized dark matter/dark energy bomb that could eat through anything. I’ll go out on a limb and assume that they first created a container that could contain such a substance,</span><span style="line-height: 20.7999992370605px;"> </span><span style="line-height: 20.7999992370605px;">at least briefly</span><span style="line-height: 1.6;">.</span></p>
<p>When I read about such technology, I always have to wonder “How did they test it?” I can see where Minitab Statistical Software could be very helpful to the Galactic Patrol. How could the Galactic Patrol evaluate smaller, torpedo-sized units of negasphere? Suppose negasphere was created at the time of firing in a space torpedo and needed to be contained for the first 30 seconds after being fired, lest it break containment early and damage the ship that is firing it or rupture the torpedo before it reaches a space pirate.</p>
<p>The table below shows data collected from fifteen samples each of two materials that could be used for negasphere containment. Material 1 has a mean containment time of 33.951 seconds and Material 2 has a mean of 32.018 seconds. But is this difference statically significant? Does it even matter?</p>
<p style="text-align: center;"><strong>Material 1</strong></p>
<p style="text-align: center;"><strong>Material 2</strong></p>
<p style="text-align: center;">34.5207</p>
<p style="text-align: center;">32.1227</p>
<p style="text-align: center;">33.0061</p>
<p style="text-align: center;">31.9836</p>
<p style="text-align: center;">32.9733</p>
<p style="text-align: center;">31.9975</p>
<p style="text-align: center;">32.4381</p>
<p style="text-align: center;">31.9997</p>
<p style="text-align: center;">34.1364</p>
<p style="text-align: center;">31.9414</p>
<p style="text-align: center;">36.1568</p>
<p style="text-align: center;">32.0403</p>
<p style="text-align: center;">34.6487</p>
<p style="text-align: center;">32.1153</p>
<p style="text-align: center;">36.6436</p>
<p style="text-align: center;">31.9661</p>
<p style="text-align: center;">35.3177</p>
<p style="text-align: center;">32.0670</p>
<p style="text-align: center;">32.4043</p>
<p style="text-align: center;">31.9610</p>
<p style="text-align: center;">31.3107</p>
<p style="text-align: center;">32.0303</p>
<p style="text-align: center;">34.0913</p>
<p style="text-align: center;">32.0146</p>
<p style="text-align: center;">33.2040</p>
<p style="text-align: center;">31.9865</p>
<p style="text-align: center;">32.5601</p>
<p style="text-align: center;">32.0079</p>
<p style="text-align: center;">35.8556</p>
<p style="text-align: center;">32.0328</p>
<p><span style="line-height: 1.6;">The questions we're asking and the type and distribution of the data we have should determine the types of statistical test we perform. Many statistical tests for continuous data require an assumption of normality, and this can easily be tested in our <a href="http://www.minitab.com/products/minitab">statistical software</a> by going to <strong>Graphs > Probability Plot…</strong> and entering the columns containing the data.</span></p>
<p><span style="line-height: 1.6;"><img alt="probability plot of material 1" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ebd5796caf013f0204dbddc33c06df56/probability_plot1.png" style="width: 581px; height: 388px;" /></span></p>
<p><span style="line-height: 1.6;"><img alt="probability plot of material 2" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a8464e0302753942334c4e11d31482e5/probability_plot2.png" style="width: 580px; height: 388px;" /></span></p>
<p><span style="line-height: 1.6;">The null hypothesis is “the data are normally distributed,” and the resulting P-values are greater 0.05, so we <a href="http://blog.minitab.com/blog/understanding-statistics/things-statisticians-say-failure-to-reject-the-null-hypothesis">fail to reject the null hypothesis</a>. That means we can evaluate the data using tests that require the data to be normally distributed.</span></p>
<p>To determine if the mean of Material 1 is indeed greater than the mean of Material 2, we perform a two sample t-test: go to <strong>Stat > Basic Statistics > 2 Sample t…</strong> and select “Each sample in its own column.” We then choose “Options..” and select “Difference > hypothesized difference.”</p>
<p><img alt="two-sample t-test and ci output" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/3e270a93cceb77f6818345bcb41c9110/2_sample_t_test_output.png" style="width: 546px; height: 226px;" /></p>
<p><span style="line-height: 1.6;">The P-value for the two sample t-test is less than 0.05, so we can conclude there is a statistically significant difference between the materials. But the two sample t-test does not give us a complete picture of the situation, so we should look at the data by going to <strong>Graph > Individual Value Plot...</strong> and selecting a simple graph for multiple Y’s.</span></p>
<p><span style="line-height: 1.6;"><img alt="individual value plot " src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/96d713980aefd4912a402dc156802788/individual_value_plot1.png" style="width: 583px; height: 391px;" /></span></p>
<p><span style="line-height: 1.6;">The mean of Material 1 may be higher, but our biggest concern is identifying a material that does not fail in 30 seconds or less. Material 2 appears to have far less variation and we can assess this by performing an F-test: go to <strong>Stat > Basic Statistics > 2 Variances…</strong> and select “Each sample in its own column.” Then choose “Options..” and select “Ratio > hypothesized ratio.” The data is normally distributed, so put a checkmark next to “Use test and confidence intervals based on normal distribution.”</span></p>
<p><img alt="two variances test output" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/28aa53d2ec2582e3b41e29fb5f55331f/two_variances_test_output.png" style="width: 482px; height: 563px;" /></p>
<p>The P-value is less than 0.05, so we can conclude the evidence does supports the alternative hypothesis that the variance of the first material is greater than the variance of the second material. Having already looked at a graph of the data, this should come as no surprise</p>
<p>No statistical software program can tell us which material to choose, but Minitab can provide us with the information needed to make an informed decision. The objective is to exceed a lower specification limit of 30 seconds and the lower variability of Material 2 will achieve this better than the higher mean value for Material 1. Material 2 looks good, but the penalty for a wrong decision could be lost space ships if the negasphere breaches its containment too soon, so we must be certain.</p>
<p>The Galactic Patrol has millions of ships so a failure rate of even one per million would be unacceptably high so we should perform a capability study by going to<strong> Quality Tools > Capability Analysis > Normal…</strong> Enter the column containing the data for Material 1 and use the same column for the subgroup size and then enter a lower specification of 30. This would then be repeated for Material 2.</p>
<p><img alt="process capability for material 1" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/9c10d14f155707770eb3688aec834ca2/process_capability_report1.png" style="width: 635px; height: 476px;" /></p>
<p><img alt="Process Capability for Material 2" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/eaf12e0874c393730037cf504a90fa8f/process_capability_report2.png" style="width: 638px; height: 476px;" /></p>
<p><span style="line-height: 1.6;">Looking at the Minitab generated capability studies, we can see that Material 1 can be expected to fail thousands of times per million uses, but Material 2 would is not expected to fail at all. In spite of the higher mean, the Galactic Patrol should use Material 2 for the negaspehere torpedoes. </span></p>
<p> </p>
<p> </p>
<div>
<p style="line-height: 20.7999992370605px;"><strong>About the Guest Blogger</strong></p>
<p style="line-height: 20.7999992370605px;"><em><a href="https://www.linkedin.com/pub/matthew-barsalou/5b/539/198" target="_blank">Matthew Barsalou</a> is a statistical problem resolution Master Black Belt at <a href="http://www.3k-warner.de/" target="_blank">BorgWarner</a> Turbo Systems Engineering GmbH. He is a Smarter Solutions certified Lean Six Sigma Master Black Belt, ASQ-certified Six Sigma Black Belt, quality engineer, and quality technician, and a TÜV-certified quality manager, quality management representative, and auditor. He has a bachelor of science in industrial sciences, a master of liberal studies with emphasis in international business, and has a master of science in business administration and engineering from the Wilhelm Büchner Hochschule in Darmstadt, Germany. He is author of the books <a href="http://www.amazon.com/Root-Cause-Analysis-Step---Step/dp/148225879X/ref=sr_1_1?ie=UTF8&qid=1416937278&sr=8-1&keywords=Root+Cause+Analysis%3A+A+Step-By-Step+Guide+to+Using+the+Right+Tool+at+the+Right+Time" target="_blank">Root Cause Analysis: A Step-By-Step Guide to Using the Right Tool at the Right Time</a>, <a href="http://asq.org/quality-press/display-item/index.html?item=H1472" target="_blank">Statistics for Six Sigma Black Belts</a> and <a href="http://asq.org/quality-press/display-item/index.html?item=H1473&xvl=76115763" target="_blank">The ASQ Pocket Guide to Statistics for Six Sigma Black Belts</a>.</em></p>
</div>
Data AnalysisHypothesis TestingStatisticsTue, 03 Feb 2015 13:00:00 +0000http://blog.minitab.com/blog/statistics-in-the-field/statistics-another-weapon-in-the-galactic-patrol%E2%80%99s-arsenalGuest BloggerAnalyzing Qualitative Data, part 1: Pareto, Pie, and Stacked Bar Charts
http://blog.minitab.com/blog/applying-statistics-in-quality-projects/analyzing-qualitative-data-part-1-pareto-pie-and-stacked-bar-charts
<p>In several previous blogs, I have discussed the use of statistics for <a href="http://blog.minitab.com/blog/applying-statistics-in-quality-projects/using-nonparametric-analysis-to-visually-manage-durations-in-service-processes">quality improvement in the service sector</a>. Understandably, services account for a very large part of the economy. Lately, when meeting with several people from financial companies, I realized that one of the problems they faced was that they were collecting large amounts of "qualitative" data: types of product, customer profiles, different subsidiaries, several customer requirements, etc.</p>
<p>There are several ways to process such qualitative data. Qualitative data points may still be counted, and once they have been counted they may be quantitatively (numerically) analyzed using statistical methods.</p>
<p>I will focus on the analysis of qualitative data using a simple and obvious example. In this case, we would like to analyze mistakes on invoices made during a period of several weeks by three employees (anonymously identified).</p>
<p><img src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/545c0823fc7368e795585c38424891d9/quali1.jpg" style="width: 288px; height: 273px;" /></p>
<p>I will present three different ways to analyze such qualitative data (counts). In this post, I will cover:</p>
<ol>
<li>A very simple graphical approach based on bar charts to display counts (stacked and clustered bars), Pareto diagrams and Pie charts.</li>
</ol>
<p>Then, in my next post, I will demonstrate: </p>
<ol start="2">
<li> A more complex approach for testing statistical significance using a Chi-square test.<br />
</li>
<li> An even more complex multivariate approach (using correspondence analysis).</li>
</ol>
<p>Again, the main purpose of this example is to show several ways to analyze qualitative data. Quantitative data represent numeric values such as the number of grams, dollars, newtons, etc., whereas qualitative data may represent text values such as different colours, types of defects or different employees.</p>
<p>The <a href="http://www.minitab.com/en-us/products/minitab/assistant/">Assistant</a> in Minitab 17 provides a great breakdown of two main data types: </p>
<p><img alt="" spellcheck="true" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/2fd46235529df11ab90d53efa677b706/quali2.jpg" style="width: 586px; height: 316px; border-width: 1px; border-style: solid;" /></p>
Charts and Diagrams with Qualitative Data
<p>I first created a pie chart using the Minitab Assistant (<strong>Assistant > Graphical Analysis</strong>) as well as a stacked bar chart on counts (from the graph menu of Minitab, select <strong>Graph > Bar Charts</strong>) to describe the proportion of each type of mistakes according to the day of the week.</p>
<p><img alt="" spellcheck="true" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/15ec9831d178df8fc0cbaddab0975c89/pie_chart_of_mistake_by_day___summary_report.jpg" style="width: 478px; height: 358px; border-width: 1px; border-style: solid;" /></p>
<p>In the pie charts above, the proportion of mistake types seems to be fairly similar across the different days of the week.</p>
<p> <img alt="" spellcheck="true" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/4b92a1293aff3f424d5a6f751653fb17/quali3.jpg" style="width: 403px; height: 302px; border-width: 1px; border-style: solid;" /></p>
<p>The number of mistakes also seems to be very stable and uniform according to day of week, when we see the stacked bar chart above.</p>
<p><img alt="" spellcheck="true" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/c23dcf3e01cedf8aaad5bad176437ed2/quali4.jpg" style="width: 426px; height: 330px;" /></p>
<p>Now let's create a stacked bar chart on counts to analyze mistakes by employees. In this second graph, shown above, large variations in the number of errors do occur according to employees. The distribution of errors also seems to be very different, with more “Product” errors associated with employee A.</p>
Qualitative Data in a Pareto Chart
<p><img alt="" spellcheck="true" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/30893b16e7ab4a75024498b7c3cf9fdf/pareto_chart_of_mistake_by_person___diagnostic_report.jpg" style="width: 768px; height: 547px;" /></p>
<p>Above we see <span style="line-height: 1.6;"><span><a href="http://blog.minitab.com/blog/understanding-statistics/explaining-quality-statistics-so-your-boss-will-understand-pareto-charts">Pareto charts</a></span> created using the Minitab Assistant (above): an overall Pareto and some additional Pareto diagrams, one for each employee. Again, it's easy to identify the large number of “product” mistakes (red columns) for employee A.</span></p>
<span style="line-height: 1.6;">Stacked Bar Charts of Qualitative Data</span>
<p><img alt="" spellcheck="true" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/79589c080171780e682cbd69d3353a0e/quali6.jpg" style="width: 426px; height: 347px;" /></p>
<p><span style="line-height: 20.7999992370605px;">Mistake counts are represented as percentages in the s</span><span style="line-height: 1.6;">tacked bar chart above. For each employee the error types are summed up to obtain 100% (within each employee's column). This provides a clearer understanding of how each employee's mistakes are distributed. Again, the high percentage of “Product” errors (in yellow) for employee A is very noticeable, but also note the high percentage, proportionately, of “Address” mistakes (blue areas) for employee C.</span></p>
<p><img alt="" spellcheck="true" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/9da688410bcb56f516061a4a26e64dfe/quali7.jpg" style="width: 434px; height: 346px;" /></p>
<p>The stacked bar chart above displays changes in the number of errors and in error types according to the week (time trends). Notice that in the last three weeks, at the end of the period, only product and address issues occurred. Apparently error types tend to shift towards more “product” and “address” types of errors, at the end of the period.</p>
Different Views of the Data Give a More Complete Picture
<p>These diagrams do provide a clear picture of mistake occurrences according to employees, error types and weeks. However, as you've seen, it takes several graphs to provide a good understanding of the issue.</p>
<p>This is still a subjective approach though, several people seated around the same table looking at these same graphs, might interpret them differently and in some cases, this could result in endless discussions.</p>
<p>Therefore we would also like to use a more scientific and rigorous approach: the Chi-square test. <a href="http://blog.minitab.com/blog/applying-statistics-in-quality-projects/analyzing-qualitative-data-part-2-chi-square-and-multivariate-analysis">We'll cover that in my next post</a>. </p>
<p> </p>
Data AnalysisHypothesis TestingQuality ImprovementSix SigmaStatisticsStatsWed, 28 Jan 2015 13:00:00 +0000http://blog.minitab.com/blog/applying-statistics-in-quality-projects/analyzing-qualitative-data-part-1-pareto-pie-and-stacked-bar-chartsBruno ScibiliaWhat Are T Values and P Values in Statistics?
http://blog.minitab.com/blog/statistics-and-quality-data-analysis/what-are-t-values-and-p-values-in-statistics
<p>If you’re not a statistician, looking through statistical output can sometimes make you feel a bit like <em>Alice in</em> <em>Wonderland. </em>Suddenly, you step into a fantastical world where strange and mysterious phantasms appear out of nowhere. </p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/6f4053a89257952fef0b9998547dffe2/tweedle_tweedledum.jpg" style="line-height: 20.7999992370605px; float: right; width: 248px; height: 255px; margin: 10px 15px;" /></p>
<p>For example, consider the T and P in your t-test results.</p>
<p>“Curiouser and curiouser!” you might exclaim, like Alice, as you gaze at your output.</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/1e5a4c064f43f19169121222402e4560/t_test_results_one_sided.jpg" style="width: 467px; height: 121px;" /></p>
<p>What are these values, really? Where do they come from? Even if you’ve used the p-value to interpret the statistical significance of your results<span style="line-height: 20.7999992370605px;"> </span><span style="line-height: 20.7999992370605px;">umpteen times</span><span style="line-height: 1.6;">, its actual origin may remain murky to you.</span></p>
T & P: The Tweedledee and Tweedledum of a T-test
<p>T and P are inextricably linked. They go arm in arm, like Tweedledee and Tweedledum. Here's why.</p>
<p>When you perform a t-test, you're usually trying to find evidence of a significant difference between population means (2-sample t) or between the population mean and a hypothesized value (1-sample t). <a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/what-is-a-t-test-and-why-is-it-like-telling-a-kid-to-clean-up-that-mess-in-the-kitchen">The t-value measures the size of the difference relative to the variation in your sample data</a>. Put another way, T is simply the calculated difference represented in units of standard error. The greater the magnitude of T (it can be either positive or negative), the greater the evidence <em>against </em>the null hypothesis that there is no significant difference. The closer T is to 0, the more likely there isn't a significant difference.</p>
<p>Remember, the t-value in your output is calculated from only one sample from the entire population. It you took repeated random samples of data from the same population, you'd get slightly different t-values each time, due to random sampling error (which is really not a mistake of any kind–it's just the random variation expected in the data).</p>
<p>How different could you expect the t-values from many random samples from the same population to be? And how does the t-value from your sample data compare to those expected t-values?</p>
<p>You can use a t-distribution to find out.</p>
Using a t-distribution to calculate probability
<p>For the sake of illustration, assume that you're using a 1-sample t-test to determine whether the population mean is greater than a hypothesized value, such as 5, based on a sample of 20 observations, as shown in the above t-test output.</p>
<ol>
<li>In Minitab, choose <strong>Graph > Probability Distribution Plot</strong>.</li>
<li>Select <strong>View Probability</strong>, then click <strong>OK</strong>.</li>
<li>From <strong>Distribution</strong>, select <strong>t</strong>.</li>
<li>In <strong>Degrees of freedom</strong>, enter <em>19</em>. (For a 1-sample t test, the degrees of freedom equals the sample size minus 1).</li>
<li>Click <strong>Shaded Area</strong>. Select <strong>X Value</strong>. Select <strong>Right Tail</strong>.</li>
<li> In <strong>X Value</strong>, enter 2.8 (the t-value), then click <strong>OK</strong>.</li>
</ol>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/bc5183a42a169d45632fd4f6c0b153b3/distribution_plot_t_2.8" style="width: 576px; height: 384px;" /></p>
<p>The highest part (peak) of the distribution curve shows you where you can expect most of the t-values to fall. Most of the time, you’d expect to get t-values close to 0. That makes sense, right? Because if you randomly select representative samples from a population, the mean of most of those random samples from the population should be close to the overall population mean, making their differences (and thus the calculated t-values) close to 0.</p>
T values, P values, and poker hands
<p>T values of larger magnitudes (either negative or positive) are less likely. The far left and right "tails" of the distribution curve represent instances of obtaining extreme values of t, far from 0. For example, the shaded region represents the probability of obtaining a t-value of 2.8 or greater. Imagine a magical dart that could be thrown to land randomly anywhere under the distribution curve. What's the chance it would land in the shaded region? The calculated probability is 0.005712.....which rounds to 0.006...which is...the p-value obtained in the t-test results! <img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/5633b267494c2017d6d7c7544247d57d/poker_picture.jpg" style="float: right; width: 200px; height: 164px; margin: 10px 15px;" /></p>
<p>In other words, the probability of obtaining a t-value of 2.8 or higher, when sampling from the same population (here, a population with a hypothesized mean of 5), is approximately 0.006.</p>
<p>How likely is that? Not very! For comparison, the probability of being dealt 3-of-a-kind in a 5-card poker hand is over three times as high (≈ 0.021).</p>
<p>Given that the probability of obtaining a t-value this high or higher when sampling from this population is so low, what’s more likely? It’s more likely this sample doesn’t come from this population (with the hypothesized mean of 5). It's much more likely that this sample comes from different population, one with a mean greater than 5.</p>
<p>To wit: Because the p-value is very low (< alpha level), you reject the null hypothesis and conclude that there's a statistically significant difference.</p>
<p>In this way, T and P are inextricably linked. Consider them simply different ways to quantify the "extremeness" of your results under the null hypothesis. You can’t change the value of one without changing the other.</p>
<p>The larger the absolute value of the t-value, the smaller the p-value, and the greater the evidence against the null hypothesis.(You can verify this by entering lower and higher t values for the t-distribution in step 6 above).</p>
Try this two-tailed follow up...
<p>The t-distribution example shown above is based on a one-tailed t-test to determine whether the mean of the population is greater than a hypothesized value. Therefore the t-distribution example shows the probability associated with the t-value of 2.8 only in one direction (the right tail of the distribution).</p>
<p>How would you use the t-distribution to find the p-value associated with a t-value of 2.8 for two-tailed t-test (in both directions)?</p>
<p><strong>Hint:</strong> In Minitab, adjust the options in step 5 to find the probability for both tails. If you don't have a copy of Minitab, download a free <a href="http://it.minitab.com/en-us/products/minitab/free-trial.aspx" target="_blank">30-day trial version</a>.</p>
Hypothesis TestingTue, 27 Jan 2015 13:10:00 +0000http://blog.minitab.com/blog/statistics-and-quality-data-analysis/what-are-t-values-and-p-values-in-statisticsPatrick RunkelA Minitab Holiday Tale: Featuring the Two Sample t-Test
http://blog.minitab.com/blog/statistics-in-the-field/a-minitab-holiday-tale-featuring-the-two-sample-t-test
<p><em><span style="line-height: 1.6;">by Matthew Barsalou, guest blogger</span></em></p>
<p>Aaron and Billy are two very competitive—and not always well-behaved—eight-year-old twin brothers. They constantly strive to outdo each other, no matter what the subject. If the boys are given a piece of pie for dessert, they each automatically want to make sure that their own piece of pie is bigger than the other’s piece of pie. This causes much exasperation, aggravation and annoyance for their parents. Especially when it happens in a restaurant (although the restaurant situation has improved, since they have been asked not to return to most local restaurants).</p>
<p><img alt="A bag of coal" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/d2ccbe9f7c8e887281272ae49854893f/bag_of_coal.jpg" style="border-width: 1px; border-style: solid; margin: 10px 15px; float: right; width: 200px; height: 200px;" />Sending the boys to their rooms never helped. The two would just compete to see who could stay in their room longer. This Christmas their parents were at wits' ends, and they decided the boys needed to be taught a lesson so they could grow up to be upstanding citizens. Instead of the new bicycles the boys were going to get—and probably just race till they crashed anyway—their parents decided to give them each a bag of coal.</p>
<p>An astute reader might ask, “But what does this have to do with <a href="http://www.minitab.com/products/minitab">Minitab</a>?” Well, dear reader, the boys need to figure out who got the most coal. Immediately upon opening their packages, the boys carefully weighed each piece of coal and entered the data into Minitab.</p>
<p><span style="line-height: 1.6;">Then they selected <strong>Stat > Basic Statistics > Display Descriptive Statistics</strong> and used the "Statistics" options dialog to select the metrics they wanted, including the sum of the weights they'd entered:</span></p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/dacaebac62e3cc4c2e29329d0a779720/descriptivestatistics.png" style="width: 600px; height: 208px;" /></p>
<p><span style="line-height: 1.6;">Billy quickly saw that he had the most coal, and yelled, “I have 279.383 ounces and you only have 272.896 ounces, and the mean of my pieces of coal is more than the mean of yours. Mine weigh more, so our parents must love me more.” </span></p>
<p><span style="line-height: 1.6;">“Not so fast,” said Aaron. “You may have a higher mean value, but is the difference statistically significant?” There was only one thing left for the boys to do: perform a <a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/t-for-2-should-i-use-a-paired-t-or-a-2-sample-t">two sample t-test</a>.</span></p>
<p><span style="line-height: 1.6;">In Minitab, Aaron selected </span><strong><span style="line-height: 1.6;">Stat > Basic Statistics > 2-Sample t…</span></strong></p>
<p>The boys left the default values at a confidence level of 95.0 and a hypothesized difference of 0. The alternative hypothesis was “Difference ≠ hypothesized difference” because the only question they were asking was “Is there a statistically significant difference?” between the two data sets.</p>
<p>The two troublemakers also selected “Graphs” and checked the options to display an individual value plot and a boxplot. They knew they should look at their data. Having the graphs available would also make it easier for them to communicate their results to higher authorities, in this case, their poor parents.</p>
<p><img alt="Individual Value Plot of Coal" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/bf541d8df2461a8edff9060789394b00/individual_value_plot_of_coal.png" style="width: 577px; height: 385px;" /></p>
<p><img alt="Boxplot of Coal" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/8945d7a038de654d008f68dc0a8886d3/boxplot_of_coal.png" style="width: 577px; height: 385px;" /></p>
<p>Both the individual value plots and boxplots showed that Aaron's bag of coal had pieces with the highest individual weights. But he also had the pieces with the least weight. So the values for his Christmas coal were scattered across a wider range than the values for Billy‘s Christmas coal. But was there really a difference?</p>
<p>Billy went running for his tables of Student‘s t-scores so he could interpret the resulting t-value of -0.71. Aaron simply looked at the resulting p-value of 0.481. The p-value was greater than 0.05 so the boys could not conclude there was a true difference in the weight of their Christmas "presents."</p>
<p><img alt="600" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/549762a9cb277536a76baedba32617d3/2_sample_t_test_coal.png" style="width: 683px; height: 305px;" /></p>
<p><span style="line-height: 1.6;">The boys dutifully reported the results, with illustrative graphs, each demanding that they get a little more to best the other. Clearly, receiving coal for Christmas had done nothing to reduce their level of competitiveness. Their parents realized the boys were probably not going to grow up to be upstanding citizens, but they may at least become good statisticians.</span></p>
<p>Happy Holidays.</p>
<p> </p>
<p style="line-height: 20.7999992370605px;"><strong>About the Guest Blogger</strong></p>
<p style="line-height: 20.7999992370605px;"><em><a href="https://www.linkedin.com/pub/matthew-barsalou/5b/539/198" target="_blank">Matthew Barsalou</a> is a statistical problem resolution Master Black Belt at <a href="http://www.3k-warner.de/" target="_blank">BorgWarner</a> Turbo Systems Engineering GmbH. He is a Smarter Solutions certified Lean Six Sigma Master Black Belt, ASQ-certified Six Sigma Black Belt, quality engineer, and quality technician, and a TÜV-certified quality manager, quality management representative, and auditor. He has a bachelor of science in industrial sciences, a master of liberal studies with emphasis in international business, and has a master of science in business administration and engineering from the Wilhelm Büchner Hochschule in Darmstadt, Germany. He is author of the books <a href="http://www.amazon.com/Root-Cause-Analysis-Step---Step/dp/148225879X/ref=sr_1_1?ie=UTF8&qid=1416937278&sr=8-1&keywords=Root+Cause+Analysis%3A+A+Step-By-Step+Guide+to+Using+the+Right+Tool+at+the+Right+Time" target="_blank">Root Cause Analysis: A Step-By-Step Guide to Using the Right Tool at the Right Time</a>, <a href="http://asq.org/quality-press/display-item/index.html?item=H1472" target="_blank">Statistics for Six Sigma Black Belts</a> and <a href="http://asq.org/quality-press/display-item/index.html?item=H1473&xvl=76115763" target="_blank">The ASQ Pocket Guide to Statistics for Six Sigma Black Belts</a>.</em></p>
Fun StatisticsHypothesis TestingStatisticsTue, 23 Dec 2014 13:00:00 +0000http://blog.minitab.com/blog/statistics-in-the-field/a-minitab-holiday-tale-featuring-the-two-sample-t-testGuest BloggerAre Preseason Football or Basketball Rankings More Accurate?
http://blog.minitab.com/blog/the-statistics-game/are-preseason-football-or-basketball-rankings-more-accurate
<p>College basketball season tips off today, and for the second straight season Kentucky is the #1 ranked preseason team in the AP poll. Last year Kentucky did not live up to that ranking in the regular season, going 24-10 and earning a lowly 8 seed in the NCAA tournament. But then, in the tournament, they overachieved and made a run all the way to the championship game...before losing to Connecticut.</p>
<p>In football, Florida State was the AP poll preseason #1 football team. While they are currently still undefeated, they aren't quite playing like the #1 team in the country. So this made me wonder, which preseason rankings are more accurate, football or basketball?</p>
<p>I gathered <a href="//cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/File/1d3961db92c5ba14bc90b2b8323b95f8/preseason_basketball_vs__football_rankings.MTW">data</a> from the last 10 seasons, and recorded the top 10 teams in the preseason AP poll for both football and basketball. Then I recorded the difference between their preseason ranking and their final ranking. Both sports had 10 teams that weren’t ranked or receiving votes in the final poll, so I gave all of those teams a final ranking of 40.</p>
Creating a Histogram to Compare Two Distributions
<p>Let’s start with a histogram to look at the distributions of the differences. (It's always a good idea to look at the distribution of your data when you're starting an analysis, whether you're looking at quality improvement data work or sports data for yourself.) </p>
<p>You can create this graph in Minitab <a href="http://www.minitab.com/products/minitab">Statistical Software</a> by selecting <strong>Graph > Histograms</strong>, choosing "With Groups" in the dialog box, and using the Basketball Difference and Football Difference columns as the graph variables:</p>
<p><img alt="Histogram" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/53055c57978dbfa85d28688cc816c98a/histogram_of_basketball_difference__football_difference.jpg" style="width: 720px; height: 480px;" /></p>
<p>The differences in the rankings appear to be pretty similar. Most of the data is towards the left side of this histogram, meaning for most cases the difference between the preseason and final ranking is pretty small.</p>
Conducting a Mann-Whitney Hypothesis Test on Two Medians
<p>We can further investigate the data by performing a hypothesis test. Because the data is heavily skewed, I’ll use <a href="http://blog.minitab.com/blog/the-statistics-game/do-the-data-really-say-female-named-hurricanes-are-more-deadly">a Mann-Whitney test</a>. This compares the medians of two samples with similarly-shaped distributions, as opposed to a <a href="http://blog.minitab.com/blog/understanding-statistics/guidelines-and-how-tos-for-the-2-sample-t-test">2-sample t test</a>, which compares the means. <span style="line-height: 20.7999992370605px;">The median is the middle value of the data. Half the observations are less than or equal to it, and half the observations are greater than or equal to it.</span><span style="line-height: 20.7999992370605px;"> </span></p>
<p>To perform this test in our statistical software, we select <strong>Stat > Nonparametrics > Mann-Whitney</strong>, then choose the appropriate columns for our first and second sample: </p>
<p><img alt="Mann-Whitney Test" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/1a1f239841b82e60170e6ecbc8077d4b/mann_whitney.jpg" style="width: 689px; height: 241px;" /></p>
<p>The basketball rankings have a smaller median difference than the football rankings. However, when we examine the <a href="http://blog.minitab.com/blog/understanding-statistics/three-things-the-p-value-cant-tell-you-about-your-hypothesis-test">p-value</a> we see that this difference is not statistically significant. There is not enough evidence to conclude that one preseason poll is more accurate than the other.</p>
<p>But what about the best teams? I grouped each of the top 3 ranked teams and looked at the median difference between their preseason and final rank.</p>
<p><img alt="Bar Chart" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/692a3db40dd5d3b4c20d539f92395629/bar_chart.jpg" style="width: 720px; height: 480px;" /></p>
<p>The preseason AP basketball poll has a smaller difference for the #1 and #3 ranked teams. But the football poll is better for the #2 team, having an impressive median value of 1. Overall, both polls are relatively good, as neither has a median value greater than 6. And the differences are close enough that we can’t conclude that one is more accurate than the other.</p>
What Does It Mean for the Teams?
<p>While the odds are against both Kentucky and Florida State to finish the season ranked #1 in their respective polls, previous seasons indicate that they’re still likely to finish as one of the top teams. This is better news for Kentucky, as being one of the top teams means they’ll easily make the NCAA basketball tournament and get a high seed. However, Florida State must finish as one of the top 4 teams, or else they’ll miss out on the football postseason completely.</p>
<p>So while we can’t conclude one poll is better than the other, teams at the top of the AP basketball poll are clearly much more likely to reach the postseason than football.</p>
Data AnalysisFun StatisticsHypothesis TestingStatistics in the NewsFri, 14 Nov 2014 15:03:33 +0000http://blog.minitab.com/blog/the-statistics-game/are-preseason-football-or-basketball-rankings-more-accurateKevin RudyComparing the College Football Playoff Top 25 and the Preseason AP Poll
http://blog.minitab.com/blog/the-statistics-game/comparing-the-college-football-playoff-top-25-and-the-preseason-ap-poll
<p>The college football playoff committee waited until the end of October to release their first top 25 rankings. One of the reasons for waiting so far into the season was that the committee would rank the teams off of actual games and wouldn’t be influenced by preseason rankings.</p>
<p>At least, that was the idea.</p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/8ac74acf42052d068b6cd0eeec32f609/cfb_playoff.jpg" style="line-height: 20.7999992370605px; float: right; width: 300px; height: 187px;" /></p>
<p>Earlier this year, I found that the <a href="http://blog.minitab.com/blog/the-statistics-game/has-the-college-football-playoff-already-been-decided">final AP poll was correlated with the preseason AP poll</a>. That is, if team A was ranked ahead of team B in the preseason and they had the same number of losses, team A was still usually ranked ahead of team B. The biggest exception was SEC teams, who were able to regularly jump ahead of teams (with the same number of losses) ranked ahead of them in the preseason.</p>
<p>If the final AP poll can be influenced by preseason expectations, could the college football playoff committee be influenced, too? Let’s compare their first set of rankings to the preseason AP poll to find out.</p>
Comparing the Ranks
<p>There are currently 17 different teams in the committee’s top 25 that have just one loss. I <a href="//cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/File/26e7c8d8d8eee4fe2dfa26dc3d6e3c54/preseason_ap_vs__cfb_playoff_rankings.MTW">recorded the order</a> they are ranked in the committee’s poll and their order in the AP preseason poll. Below is an individual value plot of the data that shows each team’s preseason rank versus their current rank.</p>
<p><img alt="IVP" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/4098bab194a586865d3861f854d65627/ivp.jpg" style="width: 600px; height: 400px;" /></p>
<p>Teams on the diagonal line haven’t moved up or down since the preseason. Although Notre Dame is the only team to fall directly on the line, most teams aren’t too far off.</p>
<p>Teams below the line have jumped teams that were ranked ahead of them in the preseason. The biggest winner is actually not an SEC team, it’s TCU. Before the season, 13 of the current one-loss teams were ranked ahead of TCU, but now there are only 4. On the surface TCU seems to counter the idea that only SEC teams can drastically move up from their preseason ranking. However, of the 9 teams TCU jumped, only one (Georgia) is from the SEC. And the only other team to jump up more than 5 spots is Mississippi—who of course is from the SEC. So I wouldn’t conclude that the CFB playoff committee rankings behave differently than the AP poll quite yet.</p>
<p>Teams below the line have been passed by teams that had been ranked behind them in the preseason. Ohio State is the biggest loser, having had 9 different teams pass over them. Part of this can be explained by the fact that they have the worst loss (a 4-4 Virginia Tech game at home). But another factor is that the preseason AP poll was released before anybody knew Buckeye quarterback Braxton Miller would miss the entire season. Had voters known that, Ohio State probably wouldn’t have been ranked so high to begin with. </p>
<p>Overall, 10 teams have moved up or down from their preseason spot by 3 spots or less. The correlation between the two polls is 0.571, which indicates a positive association between the preseason AP poll and the current CFB playoff rankings. That is, teams ranked higher in the preseason poll tend to be ranked higher in the playoff rankings.</p>
Concordant and Discordant Pairs
<p>We can take this analysis a step further by looking at the concordant and discordant pairs. A pair is concordant if the observations are in the same direction. A pair is discordant if the observations are in opposite directions. This will let us compare teams to each other two at a time.</p>
<p>For example, let’s compare Auburn and Mississippi. In the preseason, Auburn was ranked 3 (out of the 17 one-loss teams) and Mississippi was ranked 10. In the playoff rankings, Auburn is ranked 1 and Mississippi is ranked 2. This pair is concordant, since in both cases Auburn is ranked higher than Mississippi. But if you compare Alabama and Mississippi, you’ll see Alabama was ranked higher in the preseason, but Mississippi is ranked higher in the playoff rankings. That pair is discordant.</p>
<p>When we compare every team, we end up with 136 pairs. How many of those are concordant? Our <a href="http://www.minitab.com/products/minitab">favorite statistical software</a> has the answer: </p>
<p><img alt="Measures of Concordance" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/5f281abfa1e06d5cda492e17b3f9746b/concordance.jpg" style="width: 663px; height: 176px;" /></p>
<p>There are 96 concordant pairs, which is just over 70%. So most of the time, if a team ranked higher in the preseason poll, they are ranked higher in the playoff rankings. And consider this: of the one-loss teams, the top 4 ranked preseason teams were Alabama, Oregon, Auburn, and Michigan St. Currently, the top 4 one loss teams are Auburn, Mississippi, Oregon, and Alabama. That’s only one new team—which just so happens to be from the SEC.</p>
<p>That’s bad news for non-SEC teams that started the season ranked low, like Arizona, Notre Dame, Nebraska, and Kansas State. It's going to be hard for them to jump teams with the same record, especially if those teams are from the SEC. Just look at Alabama’s résumé so far. Their best win is over West Virginia and they lost to #4 Mississippi. Is that <em>really </em>better than Kansas State, who lost to #3 Auburn and beat Oklahoma <em>on the road</em>? If you simply changed the name on Alabama’s uniform to Utah and had them unranked to start the season, would they still be ranked three spots higher than Kansas State? I doubt it.</p>
<p>The good news is that there are still many games left to play. Most of these one-loss teams will lose at least one more game. But with 4 teams making the playoff this year, odds are we'll see multiple teams with the same record vying for the last playoff spot. And if this college football playoff ranking is any indication, if you're not in the SEC, teams who were highly thought of in the preseason will have an edge.</p>
Fun StatisticsHypothesis TestingFri, 31 Oct 2014 13:04:57 +0000http://blog.minitab.com/blog/the-statistics-game/comparing-the-college-football-playoff-top-25-and-the-preseason-ap-pollKevin RudyUsing Data Analysis to Maximize Webinar Attendance
http://blog.minitab.com/blog/michelle-paret/using-data-analysis-to-maximize-webinar-attendance
<p>We like to host webinars, and our customers and prospects like to attend them. But when our webinar vendor moved from a pay-per-person pricing model to a pay-per-webinar pricing model, we wanted to find out how to maximize registrations and thereby minimize our costs.<img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/8a6733d3b0516b7f1c7ad80ea753d430/mtbnewspromos_w640.jpeg" style="width: 400px; height: 273px; float: right; border-width: 1px; border-style: solid; margin: 10px 15px;" /></p>
<p>We collected webinar data on the following variables:</p>
<ul>
<li>Webinar topic</li>
<li>Day of week</li>
<li>Time of day – 11 a.m. or 2 p.m.</li>
<li>Newsletter promotion – no promotion, newsletter article, newsletter sidebar</li>
<li>Number of registrants</li>
<li>Number of attendees</li>
</ul>
<p>Once we'd collected our data, it was time to analyze it and answer some key questions using <a href="http://www.minitab.com/products/minitab/">Minitab Statistical Software</a>.</p>
Should we use registrant or attendee counts for the analysis?
<strong><span style="line-height: 16.8666667938232px; font-family: Calibri, sans-serif; font-size: 11pt;"><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/4d9fa1e3c73606627d2ca1ec34b620e2/scatterplot_w640.jpeg" style="width: 300px; height: 197px; margin: 10px 15px; float: left;" /></span></strong>
<p>First we needed to decide what we would use to measure our results: the number of people who signed up, or the number of people who actually attended the webinar. This question really boils down to answering the question, “Can I trust my data?”</p>
<p>Our data collection system for webinar registrants is much more accurate than our data collection system for webinar attendees. This is due to customer behavior and their willingness to share contact information, in addition to the automated database processes that connect our webinar vendor data with our own database. So, for a period of time, I manually collected the attendee data directly from our webinar vendor to see how it correlated with the easily-accessible and accurate registration data. The scatterplot above shows the results.</p>
<p>With a <a href="http://blog.minitab.com/blog/understanding-statistics/no-matter-how-strong-correlation-still-doesnt-imply-causation">correlation coefficient </a>of 0.929 and a p-value of 0.000, there was a strong positive linear relationship between the registrations and attendee counts. If registrations are high, then attendance is also high. If registrations are low, then attendance is also low. I concluded that I could use the registration data—which is both easily accessible and extremely reliable—to conduct my analysis.</p>
Should we consider data for the last 6 years?
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/5e73f48b852c7afc17762f28bf8887cf/i_mr_chart_of_registrants_w640.jpeg" style="width: 400px; height: 263px; margin: 10px 15px; float: left;" />We’ve been collecting webinar data for 6 years, but that doesn’t mean we can treat the last 6 years of data as one homogeneous population.</p>
<p>A lot can change in a 6-year time period. Perhaps there was a change in the webinar process that affected registrations. To determine whether or not I should use all of the data, I used an Individuals and Moving Range (I-MR, also referred to as X-MR) <a href="http://blog.minitab.com/blog/understanding-statistics/how-create-and-read-an-i-mr-control-chart">control chart</a> to evaluate the process stability of webinar registrations over time.</p>
<p>The graph revealed a single point on the MR chart that flagged as out-of-control. I looked more closely at this point and verified that the data was accurate and that this webinar belonged with the larger population. Based on this information, I decided to proceed with analyzing all 6 years of data together. (Note there is some clustering of points due to promotions, but again the goal here was to determine if we could use data over a 6-year time period.)</p>
What variables impact registrations?
<p>I performed an ANOVA using Minitab's General Linear Model tool to find out which factors—topic, day of week, time of day, or newsletter promotion—significantly affect webinar registrations.<img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/3758d3d03a604bab9921ad9f94663dc8/main_effects_plot_for_registrants_w640.jpeg" style="width: 400px; height: 263px; float: right; margin: 10px 15px;" /></p>
<p>The ANOVA results revealed that the day of week, time of day, and webinar topic <em>do not</em> affect webinar registrations, but the newsletter promotion type <em>does</em> (p-value = 0.000).</p>
<p>So which webinar promotion type maximizes webinar registrations?</p>
<p>Using Minitab to conduct <a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/keep-that-special-someone-happy-when-you-perform-multiple-comparisons">Tukey comparisons</a>, we can see that registrations for webinars promoted in the newsletter sidebar space were not significantly different from webinars that weren't promoted at all.</p>
<p>However, webinars that were promoted in the newsletter <em>article </em>space resulted in significantly more registrations than both the sidebar promotions and no promotions.</p>
<p>From this analysis, we concluded that we still had the flexibility to offer webinars at various times and days of the week, and we could continue to vary webinar topics based on customer demand and other factors. To maximize webinar attendance and minimize webinar cost, we needed to focus our efforts on promoting the webinars in our newsletter, utilizing the article space.</p>
<p>But over the past year, we’ve started to actively promote our webinars via other channels as well, so next up is some more data analysis—using Minitab—to figure out what marketing channels provide the best results…</p>
Data AnalysisHypothesis TestingRegression AnalysisStatisticsFri, 17 Oct 2014 12:00:00 +0000http://blog.minitab.com/blog/michelle-paret/using-data-analysis-to-maximize-webinar-attendanceMichelle ParetWith the Assistant, You Won't Have to Stop and Get Directions about Directional Hypotheses
http://blog.minitab.com/blog/statistics-and-quality-improvement/with-the-assistant-you-wont-have-to-stop-and-get-directions-about-directional-hypotheses
<p>I got lost a lot as a child. I got lost at malls, at museums, Christmas markets, and everywhere else you could think of. Had it been in fashion to tether children to their parents at the time, I'm sure my mother would have. As an adult, I've gotten used to using a GPS device to keep me from getting lost.</p>
<p><span style="line-height: 20.7999992370605px;">The Assistant in Minitab is like your GPS for statistics. The Assistant is there to provide you with directions so that you don't get lost. One particular area where it's easy to get lost is with directional hypotheses.</span><img alt="Wait... is my hypothesis the other direction?" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/25dd42362071d2aafc3bfc85f78f5f22/hypothesis_bubble_w640.jpeg" style="line-height: 20.7999992370605px; width: 480px; height: 350px; border-width: 1px; border-style: solid; margin: 10px 15px;" /></p>
What Is a Directional Hypothesis?
<p>When you do a <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/hypothesis-tests/basics/what-is-a-hypothesis-test/">statistical hypothesis test</a>, you have a null hypothesis and an alternative hypothesis. Directional hypotheses refer to two types of alternative hypotheses that you can usually choose. The common alternative hypotheses are these three:</p>
<ul>
<li>The value that you want to test is greater than a target.</li>
<li>The value that you want to test is different from a target.</li>
<li>The value that you want to test is less than a target.</li>
</ul>
<p>If you select an alternative hypothesis with "greater than" or "less than" in it, then you've chosen a directional hypothesis. When you choose a directional hypothesis, you get a one-sided test.</p>
<p>What does it look like to choose a one-sided test, and why would you? Let's consider an example.</p>
Choosing Whether to Use a One-sided Test or a Two-sided Test
<p>Suppose new production equipment is installed at a factory that should increase the rate of production for electrical panels. Concern exists that the change could increase the percentage of electrical panels that require rework before shipping. A quality team prepares to conduct a hypothesis test to determine whether statistical evidence supports this concern. The historical rework rate is 1%.</p>
<p>At this point, you would usually choose an alternative hypothesis. Maybe you remember hearing that you should think about whether to use a one-sided test or a two-sided test, or you may not even know how a test can have a side.</p>
<p>To keep from getting lost, you use your GPS. To keep from getting confused about statistics, you can use the Assistant. The Assistant uses clear and simple language. The Assistant doesn't ask you about "directional hypotheses" or "one-sided tests." Instead, the Assistant asks the question, "What do you want to determine?"</p>
<p><img alt="Is the % defective of Panels greater than .01? Is the % defective of Panels less than .01? Is the % defective of Panels different from .01?" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/b090980e5b08184e7b70b96b9cb05489/test_setup_in_assistant.png" style="width: 573px; height: 198px;" /></p>
<p>In this scenario, it's easy to see why the team would want to determine whether the percent is greater than 1. By performing the one-sided test for whether the percentage is greater than 1, the team can determine if there is enough statistical evidence to conclude that the percentage increased. If the percentage increased, then the concern is justified.</p>
<p>In practical terms, you should consider what it means to limit your decision to whether there is evidence for an increase. A one-sided test of whether the percentage increased will never show a statistically significant decrease in the percentage of boards that require rework. Evidence of a decrease in the number of defectives might guide the quality team to investigate the reasons for the unforeseen benefit.</p>
Why Use a One-sided Test?
<p>Given this possible concern about whether a one-sided test excludes important information from the result, why would you ever use one? The best answer is that you use a one-sided test when the one-sided test tells you everything that you need to know.</p>
<p>In the example about the electrical panels, the quality team might feel completely secure in assuming that the new equipment will not result in a decrease in the percentage of panels that require rework. If so, then a test that checks for a decrease is flawed. The team needs only to determine whether to solve a problem with increased defectives or not.</p>
The Assistant Gets Even Better
<p>While a p-value for a one-sided test can be useful, more analysis can help you make better decisions. For example, in the electrical panel example, if the team finds a statistically significant increase, it will be important to know what the percentage increase is. <a href="http://www.minitab.com/en-us/products/minitab/assistant/">The Assistant</a> produces several reports with your hypothesis tests that help you get as much information as you can from your data. The report card verifies your analysis by providing assumption checks and identifying any concerns that you should be aware of. The diagnostic report helps you further understand your analysis by providing additional detail. The summary report helps you to draw the correct conclusions and explain those conclusions to others. The series of reports includes a variety of other statistics and analyses. That way, you have everything that you need to interpret your results with confidence.</p>
<p><img alt="The % defective of Panels is not significantly greater than the target (p > 0.05)" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/75f280df482574a3aee75ee65741b5c4/1_sample___defective_test_for_panels___summary_report_w640.png" style="width: 480px; height: 360px;" /></p>
<p>The image of the face in the crowd without the thought bubble is by <a href="https://www.flickr.com/photos/akbarsyah/">_Imaji_</a> and is licensed under <a href="https://creativecommons.org/licenses/by/2.0/">this creative commons license</a>.</p>
Hypothesis TestingWed, 15 Oct 2014 18:52:23 +0000http://blog.minitab.com/blog/statistics-and-quality-improvement/with-the-assistant-you-wont-have-to-stop-and-get-directions-about-directional-hypothesesCody Steele