Hypothesis Testing | MinitabBlog posts and articles about hypothesis testing, especially in the course of Lean Six Sigma quality improvement projects.
http://blog.minitab.com/blog/hypothesis-testing-2/rss
Fri, 26 Aug 2016 04:51:00 +0000FeedCreator 1.7.3Data Not Normal? Try Letting It Be, with a Nonparametric Hypothesis Test
http://blog.minitab.com/blog/understanding-statistics/data-not-normal-try-letting-it-be-with-a-nonparametric-hypothesis-test
<p>So the data you nurtured, that you worked so hard to format and make useful, failed the normality test.</p>
<img alt="not-normal" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/c6e92e8046f3fcee28e7cf505fb77005/data_freak_flag_300.jpg" style="line-height: 20.8px; width: 300px; height: 293px; margin: 10px 15px; float: right;" />
<p>Time to face the truth: despite your best efforts, that data set is <em>never </em>going to measure up to the assumption you may have been trained to fervently look for.</p>
<p>Your data's lack of normality seems to make it poorly suited for analysis. Now what?</p>
<p>Take it easy. Don't get uptight. Just let your data be what they are, go to the <strong>Stat </strong>menu in Minitab Statistical Software, and choose "Nonparametrics."</p>
<p style="margin-left: 40px;"><img alt="nonparametrics menu" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/fbebf763ac6bd92b40c0d241b7c4029c/nonparametrics_menu.png" style="width: 367px; height: 309px;" /></p>
<p>If you're stymied by your data's lack of normality, nonparametric statistics might help you find answers. And if the word "nonparametric" looks like five syllables' worth of trouble, don't be intimidated—it's just a big word that usually refers to "tests that don't assume your data follow a normal distribution."</p>
<p>In fact, nonparametric statistics don't assume your data follow <em>any distribution at all</em>. The following table lists common parametric tests, their equivalent nonparametric tests, and the main characteristics of each.</p>
<p style="margin-left: 40px;"><img alt="correspondence table for parametric and nonparametric tests" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/4a69043809861f5187be271de67f8161/parametric_correspondence_table.png" style="width: 661px; height: 488px;" /></p>
<p>Nonparametric analyses free your data from the straitjacket of the <span style="line-height: 20.8px;">normality </span><span style="line-height: 1.6;">assumption. So choosing a nonparametric analysis is sort of like removing your data from a stifling, </span><a href="https://www.verywell.com/the-asch-conformity-experiments-2794996" style="line-height: 1.6;" target="_blank">conformist environment</a><span style="line-height: 1.6;">, and putting it into </span><a href="https://en.wikipedia.org/wiki/Utopia" style="line-height: 1.6;" target="_blank">a judgment-free, groovy idyll</a><span style="line-height: 1.6;">, where your data set can just be what it is, with no hassles about its unique and beautiful shape. How cool is </span><em style="line-height: 1.6;">that</em><span style="line-height: 1.6;">, man? Can you dig it?</span></p>
<p>Of course, it's not <em>quite </em>that carefree. Just like the 1960s encompassed both <a href="https://en.wikipedia.org/wiki/Woodstock" target="_blank">Woodstock</a> and <a href="https://en.wikipedia.org/wiki/Altamont_Free_Concert" target="_blank">Altamont</a>, so nonparametric tests offer both compelling advantages and serious limitations.</p>
Advantages of Nonparametric Tests
<p>Both parametric and nonparametric tests draw inferences about populations based on samples, but parametric tests focus on sample parameters like the mean and the standard deviation, and make various assumptions about your data—for example, that it follows a normal distribution, and that samples include a minimum number of data points.</p>
<p>In contrast, nonparametric tests are unaffected by the distribution of your data. Nonparametric tests also accommodate many conditions that parametric tests do not handle, including small sample sizes, ordered outcomes, and outliers.</p>
<p>Consequently, they can be used in a wider range of situations and with more types of data than traditional parametric tests. Many people also feel that nonparametric analyses are more intuitive.</p>
Drawbacks of Nonparametric Tests
<p><span style="line-height: 20.8px;">But nonparametric tests are not </span><em style="line-height: 20.8px;">completely </em><span style="line-height: 20.8px;">free from assumptions—they do require data to be an independent random sample, for example.</span></p>
<p>And nonparametric tests aren't a cure-all. For starters, they typically have less <a href="http://blog.minitab.com/blog/starting-out-with-statistical-software/how-powerful-am-i-power-and-sample-size-in-minitab">statistical power</a> than parametric equivalents. Power is the probability that you will correctly reject the null hypothesis when it is false. That means you have an increased chance making a Type II error with these tests.</p>
<p>In practical terms, that means nonparametric tests are <em>less </em>likely to detect an effect or association when one really exists.</p>
<p>So if you want to draw conclusions with the same confidence level you'd get using an equivalent parametric test, you will need larger sample sizes. </p>
<p>Nonparametric tests are not a one-size-fits-all solution for non-normal data, but they can yield good answers in situations that parametric statistics just won't work.</p>
Is Parametric or Nonparametric the Right Choice for You?
<p>I've briefly outlined differences between parametric and nonparametric hypothesis tests, looked at which tests are equivalent, and considered some of their advantages and disadvantages. If you're waiting for me to tell you which direction you should choose...well, all I can say is, "It depends..." But I can give you some established rules of thumb to consider when you're looking at the specifics of your situation.</p>
<p>Keep in mind that <strong>nonnormal data does not immediately disqualify your data for a parametric test</strong>. What's your sample size? <span style="line-height: 20.8px;">As long as a certain minimum sample size is met, most parametric tests will be </span><a href="http://blog.minitab.com/blog/fun-with-statistics/forget-statistical-assumptions-just-check-the-requirements" style="line-height: 20.8px;">robust to the normality assumption</a><span style="line-height: 20.8px;">. </span><span style="line-height: 1.6;">For example, the Assistant in Minitab (which uses Welch's t-test) points out that </span><span style="line-height: 1.6;">while the 2-sample t-test is based on the assumption that the data are normally distributed, this assumption is not critical when the sample sizes are at least 15. And Bonnett's 2-sample standard deviation test performs well for nonnormal data even when sample sizes are as small as 20. </span></p>
<p><span style="line-height: 1.6;">In addition, while they may not require normal data, many nonparametric tests have other assumptions that you can’t disregard.</span> For example, t<span style="line-height: 20.8px;">he Kruskal-Wallis test assumes your samples come from populations that have similar shapes and equal variances. </span><span style="line-height: 1.6;">And the 1-sample Wilcoxon test does not assume a particular population distribution, but it does assume the distribution is symmetrical. </span></p>
<p><span style="line-height: 1.6;">In most cases, your choice between parametric and nonparametric tests ultimately comes down to sample size, and whether the center of your data's distribution is better reflected by the mean or the median.</span></p>
<ul>
<li>If the mean accurately represents the center of your distribution and your sample size is large enough, a parametric test offers you better accuracy and more power. </li>
<li>If your sample size is small, you'll likely need to go with a nonparametric test. But if the median better represents the center of your distribution, a nonparametric test may be a better option even for a large sample.</li>
</ul>
<p> </p>
Data AnalysisHypothesis TestingStatisticsStatistics HelpMon, 22 Aug 2016 12:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/data-not-normal-try-letting-it-be-with-a-nonparametric-hypothesis-testEston MartzHave You Accidentally Done Statistics?
http://blog.minitab.com/blog/statistics-and-quality/have-you-accidentally-done-statistics
<p>Have you ever accidentally done statistics? Not all of us can (or would want to) be “stat nerds,” but the word “statistics” shouldn’t be scary. In fact, we all analyze things that happen to us every day. Sometimes we don’t realize that we are compiling data and analyzing it, but that’s exactly what we are doing. Yes, there are advanced statistical concepts that can be difficult to understand—but there are many concepts that we use every day that we don’t realize are statistics.</p>
<p>I consider myself a student of baseball, so my example of unknowingly performing statistical procedures concerns my own experiences playing that game.</p>
<p>My baseball career ended as a 5’7” college freshman walk-on. When I realized that my ceiling as a catcher was a lot lower than my 6’0”-6’5” teammates I hung up my spikes. As an adult, while finishing my degree in Business Statistics, I had the opportunity to shadow a couple of scouts from the Major League Baseball Scouting Bureau. Yes, I’ve seen <a href="http://blog.minitab.com/blog/the-statistics-game/moneyball-shows-the-power-of-statistics"><em>Moneyball </em></a>and I know that traditional scouting methods are reputed to conflict with the methods of stat nerds like myself, but as a former player I wanted to see what these scouts were looking at. </p>
<p><img alt="baseball statistics" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/076e1f8132a222e6204e393eb0d3e9a2/baseball_stats.jpg" style="width: 278px; height: 313px; margin: 10px 15px; float: right;" />My first day with the scouts I found out they were traditional baseball guys. They didn’t believe data could tell how good a player is better than observation could, and ultimately they didn't think statistics were important to what they do. </p>
<p>I found their thinking to be a little off, and a little funny. Although they didn’t believe in statistics, the tools they use for their jobs actually quantify a player's attributes. I watched as they used a radar gun to measure pitch speed, a stopwatch to measure running speed, and a notepad to record their measurements (they didn’t realize they were compiling data). As one of the scouts was conversing with me, asking how statistics are going to be brought into baseball, he was making a dot plot by hand of the pitcher's pitches by speed to find the velocity distribution of the pitcher.</p>
<p style="margin-left: 40px;"><img height="343" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/b51a0c86-e2dd-456e-878a-4196c7381c3a/File/8361f15f80b379a88187b539c124cad0/8361f15f80b379a88187b539c124cad0.png" width="514" /></p>
<p>After I explained to him that was unknowing creating a dot plot (like the one I created for Rasiel Iglesias using Minitab, and which has a <a href="http://support.minitab.com/minitab/17/topic-library/basic-statistics-and-graphs/summary-statistics/measures-of-central-tendency/">bimodal distribution</a>) we started talking about grading players’ skills. The scouts would grade how players hit, their power, how they run, arm strength, and fielding ability. They used a numeric grading system from 20-80 for each of the characteristics, with 20 being the lowest, 50 being average, and 80 being elite. After they compiled this data they would give the players grades through analysis, and they would create a report with these grades to convey to others what they saw in the player.</p>
<p style="margin-left: 40px;"><img height="401" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/b51a0c86-e2dd-456e-878a-4196c7381c3a/File/a57bd643816872de2fee895f303c0ddc/a57bd643816872de2fee895f303c0ddc.png" width="602" /></p>
<p>I was amazed at how these scouts—true, old-school baseball guys who said stats weren’t important for their jobs—were compiling data and analyzing it for their reports. </p>
<p>A few of the other statistical ideas the scouts were (accidentally) concerned about included the sample size of observations of a player, comparison analysis, and predicting a where a player falls within their physical development (regression).</p>
<p>Like the baseball scouts, many of us are unwittingly doing statistics. Just like these scouts, we run into data all day long without recognizing that we can compile and analyze it. In work we worry about customer satisfaction, wait time, average transaction value, cost ratios, efficiency, etc. And while many people get intimidated when we use the word "statistics," we don’t need advanced degrees to embrace observing, compiling data, and making solid decisions based on our analysis.</p>
<p>So, are <em>you </em>accidentally doing statistics? If you are wanting to get beyond accidentally doing statistics and analyze a little more deliberately, Minitab has many tools like the <a href="http://www.minitab.com/products/minitab/assistant/">Assistant menu</a>, and Stat Guide to help you on your stats journey.</p>
Data AnalysisFun StatisticsHypothesis TestingStatisticsStatistics in the NewsStatsTue, 02 Aug 2016 12:00:00 +0000http://blog.minitab.com/blog/statistics-and-quality/have-you-accidentally-done-statisticsJoseph HartsockOne-Sample t-test: Calculating the t-statistic is not really a bear
http://blog.minitab.com/blog/marilyn-wheatleys-blog/one-sample-t-test-calculating-the-t-statistic-is-not-really-a-bear
<p>While some posts in our Minitab blog focus on <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-t-tests-t-values-and-t-distributions">understanding t-tests and t-distributions</a> this post will focus more simply on how to hand-calculate the t-value for a one-sample t-test (and how to replicate the p-value that Minitab gives us). </p>
<p>The formulas used in this post are available within <a href="http://www.minitab.com/en-us/products/minitab/">Minitab Statistical Software</a> by choosing the following menu path: <strong>Help</strong> > <strong>Methods and Formulas</strong> > <strong>Basic Statistics</strong> > <strong>1-sample t</strong>.</p>
<p>The null and three alternative hypotheses for a one-sample t-test are shown below:</p>
<p style="margin-left: 40px;"><img border="0" height="184" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/553bfcce02e2394b13b5175655c99df6/553bfcce02e2394b13b5175655c99df6.png" width="368" /></p>
<p>The default alternative hypothesis is the last one listed: The true population mean is not equal to the mean of the sample, and this is the option used in this example.</p>
<p><img alt="bear" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/88db51bd8ccbfcbb306372bb65fa4902/bear.jpg" style="margin: 10px 15px; float: right; width: 400px; height: 290px;" />To understand the calculations, we’ll use a sample data set available within Minitab. The name of the dataset is <strong>Bears.MTW</strong>, because the calculation is not a huge bear to wrestle (plus who can resist a dataset with that name?). The path to access the sample data from within Minitab depends on the version of the software. </p>
<p>For the current version of Minitab, <a href="http://www.minitab.com/en-us/products/minitab/whats-new/">Minitab 17.3.1</a>, the sample data is available by choosing <strong>Help</strong> > <strong>Sample Data</strong>.</p>
<p>For previous versions of Minitab, the data set is available by choosing <strong>File</strong> > <strong>Open Worksheet</strong> and clicking the <strong>Look in Minitab Sample Data folder</strong> button at the bottom of the window.</p>
<p>For this example, we will use column C2, titled Age, in the Bears.MTW data set, and we will test the hypothesis that the average age of bears is 40. First, we’ll use <strong>Stat</strong> > <strong>Basic Statistics</strong> > <strong>1-sample t</strong> to test the hypothesis:</p>
<p style="margin-left: 40px;"><img border="0" height="315" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/d3336e100a9a4a91501ed1206c8e807f/d3336e100a9a4a91501ed1206c8e807f.png" width="400" /></p>
<p>After clicking <strong>OK</strong> above we see the following results in the session window:</p>
<p style="margin-left: 40px;"><img border="0" height="118" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/e62a2a776614c60eff0dd6383f66e5f5/e62a2a776614c60eff0dd6383f66e5f5.png" width="464" /></p>
<p>With a high p-value of 0.361, we don’t have enough evidence to conclude that the average age of bears is significantly different from 40. </p>
<p>Now we’ll see how to calculate the T value above by hand.</p>
<p>The formula for the T value (0.92) shown above is calculated using the following formula in Minitab:</p>
<p style="margin-left: 40px;"><img border="0" height="172" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/701f9c0efa98a38fb397f3c3ec459b66/701f9c0efa98a38fb397f3c3ec459b66.png" width="247" /></p>
<p>The output from the 1-sample t test above gives us all the information we need to plug the values into our formula:</p>
<p style="margin-left: 40px;">Sample mean: 43.43</p>
<p style="margin-left: 40px;">Sample standard deviation: 34.02</p>
<p style="margin-left: 40px;">Sample size: 83</p>
<p>We also know that our target or hypothesized value for the mean is 40.</p>
<p>Using the numbers above to calculate the t-statistic we see:</p>
<p style="margin-left: 40px;">t = (43.43-40)/34.02/√83) = <strong>0.918542</strong><br />
(which rounds to 0.92, as shown in Minitab’s 1-sample t-test output)</p>
<p>Now, we <em>could </em>dust off a statistics textbook and use it to compare our calculated t of 0.918542 to the corresponding critical value in a t-table, but that seems like a pretty big bear to wrestle when we can easily get the p-value from Minitab instead. To do that, I’ve used <strong>Graph</strong> > <strong>Probability Distribution Plot</strong> > <strong>View Probability</strong>:</p>
<p style="margin-left: 40px;"><img border="0" height="382" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/e43510dc233e71f22b93f190deb5e523/e43510dc233e71f22b93f190deb5e523.png" width="419" /></p>
<p>In the dialog above, we’re using the t distribution with 82 degrees of freedom (we had an N = 83, so the degrees of freedom for a 1-sample t-test is N-1). Next, I’ve selected the <strong>Shaded Area</strong> tab:</p>
<p style="margin-left: 40px;"><img border="0" height="383" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/e36572b6cead5cf393763d880b6f229a/e36572b6cead5cf393763d880b6f229a.png" width="414" /></p>
<p>In the dialog box above, we’re defining the shaded area by the X value (the calculated t-statistic), and I’ve typed in the t-value we calculated in the <strong>X value</strong> field. This was a 2-tailed test, so I’ve selected <strong>Both Tails</strong> in the dialog above.</p>
<p>After clicking <strong>OK</strong> in the window above, we see:</p>
<p style="margin-left: 40px;"><img border="0" height="384" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/a12abfcbe5ecea6902e4a138e96a53a6/a12abfcbe5ecea6902e4a138e96a53a6.png" width="576" /></p>
<p>We add together the probabilities from both tails, 0.1805 + 0.1805 and that equals 0.361 – the same p-value that Minitab gave us for the 1-sample t test. </p>
<p>That wasn’t so bad—not a difficult bear to wrestle at all!</p>
Data AnalysisFun StatisticsHypothesis TestingLearningStatisticsStatistics HelpStatsWed, 27 Jul 2016 17:57:00 +0000http://blog.minitab.com/blog/marilyn-wheatleys-blog/one-sample-t-test-calculating-the-t-statistic-is-not-really-a-bearMarilyn WheatleyUnderstanding Analysis of Variance (ANOVA) and the F-test
http://blog.minitab.com/blog/adventures-in-statistics/understanding-analysis-of-variance-anova-and-the-f-test
<p>Analysis of variance (ANOVA) can determine whether the means of three or more groups are different. ANOVA uses F-tests to statistically test the equality of means. In this post, I’ll show you how ANOVA and F-tests work using a one-way ANOVA example.</p>
<p>But wait a minute...have you ever stopped to wonder why you’d use an analysis of <em>variance</em> to determine whether <em>means</em> are different? I'll also show how variances provide information about means.</p>
<p>As in my posts about <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-t-tests:-1-sample,-2-sample,-and-paired-t-tests" target="_blank">understanding t-tests</a>, I’ll focus on concepts and graphs rather than equations to explain ANOVA F-tests.</p>
What are F-statistics and the F-test?
<p>F-tests are named after its test statistic, F, which was named in honor of Sir Ronald Fisher. The F-statistic is simply a ratio of two variances. Variances are a measure of dispersion, or how far the data are scattered from the mean. Larger values represent greater dispersion.</p>
<img alt="F is for F-test" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/2176eecdb5dee3586bf90f5dc2ca0007/f.gif" style="line-height: 20.8px; margin: 10px 15px; float: right; width: 200px; height: 221px;" />
<p>Variance is the square of the standard deviation. For us humans, standard deviations are easier to understand than variances because they’re in the same units as the data rather than squared units. However, many analyses actually use variances in the calculations.</p>
<p>F-statistics are based on the ratio of mean squares. The term “<a href="http://support.minitab.com/minitab/17/topic-library/modeling-statistics/anova/anova-statistics/understanding-mean-squares/" target="_blank">mean squares</a>” may sound confusing but it is simply an estimate of population variance that accounts for the <a href="http://support.minitab.com/minitab/17/topic-library/basic-statistics-and-graphs/introductory-concepts/basic-concepts/df/" target="_blank">degrees of freedom (DF)</a> used to calculate that estimate.</p>
<p>Despite being a ratio of variances, you can use F-tests in a wide variety of situations. Unsurprisingly, the F-test can assess the equality of variances. However, by changing the variances that are included in the ratio, the F-test becomes a very flexible test. For example, you can use F-statistics and F-tests to <a href="http://blog.minitab.com/blog/adventures-in-statistics/what-is-the-f-test-of-overall-significance-in-regression-analysis" target="_blank">test the overall significance for a regression model</a>, to compare the fits of different models, to test specific regression terms, and to test the equality of means.</p>
Using the F-test in One-Way ANOVA
<p>To use the F-test to determine whether group means are equal, it’s just a matter of including the correct variances in the ratio. In one-way ANOVA, the F-statistic is this ratio:</p>
<p style="margin-left: 40px;"><strong>F = variation between sample means / variation within the samples</strong></p>
<p>The best way to understand this ratio is to walk through a one-way ANOVA example.</p>
<p>We’ll analyze four samples of plastic to determine whether they have different mean strengths. You can download the <a href="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/File/a8a9c678090ccac0f3be61be91cf8012/plasticstrength.mtw">sample data</a> if you want to follow along. (If you don't have Minitab, you can download a <a href="http://www.minitab.com/en-us/products/minitab/free-trial/" target="_blank">free 30-day trial</a>.) I'll refer back to the one-way ANOVA output as I explain the concepts.</p>
<p>In Minitab, choose <strong>Stat > ANOVA > One-Way ANOVA...</strong> In the dialog box, choose "Strength" as the response, and "Sample" as the factor. Press OK, and Minitab's Session Window displays the following output: </p>
<p style="margin-left: 40px;"><img alt="Output for Minitab's one-way ANOVA" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/42587221b52ed940d53478106c134ebc/1way_swo.png" style="width: 315px; height: 322px;" /></p>
Numerator: Variation Between Sample Means
<p>One-way ANOVA has calculated a mean for each of the four samples of plastic. The group means are: 11.203, 8.938, 10.683, and 8.838. These group means are distributed around the overall mean for all 40 observations, which is 9.915. If the group means are clustered close to the overall mean, their variance is low. However, if the group means are spread out further from the overall mean, their variance is higher.</p>
<p>Clearly, if we want to show that the group means are different, it helps if the means are further apart from each other. In other words, we want higher variability among the means.</p>
<p>Imagine that we perform two different one-way ANOVAs where each analysis has four groups. The graph below shows the spread of the means. Each dot represents the mean of an entire group. The further the dots are spread out, the higher the value of the variability in the numerator of the F-statistic.</p>
<p style="margin-left: 40px;"><img alt="Dot plot that shows high and low variability between group means" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/f9a100946675098ca09c4440a7907230/group_means_dot_plot.png" style="width: 576px; height: 86px;" /></p>
<p>What value do we use to measure the variance between sample means for the plastic strength example? In the one-way ANOVA output, we’ll use the adjusted mean square (Adj MS) for Factor, which is 14.540. Don’t try to interpret this number because it won’t make sense. It’s the sum of the squared deviations divided by the factor DF. Just keep in mind that the further apart the group means are, the larger this number becomes.</p>
Denominator: Variation Within the Samples
<p>We also need an estimate of the variability within each sample. To calculate this variance, we need to calculate how far each observation is from its group mean for all 40 observations. Technically, it is the sum of the squared deviations of each observation from its group mean divided by the error DF.</p>
<p>If the observations for each group are close to the group mean, the variance within the samples is low. However, if the observations for each group are further from the group mean, the variance within the samples is higher.</p>
<p style="margin-left: 40px;"><img alt="Plot that shows high and low variability within groups" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/9ef2eae1cf6bba97ccb1b664356d0d0a/within_group_dplot.png" style="width: 576px; height: 384px;" /></p>
<p>In the graph, the panel on the left shows low variation in the samples while the panel on the right shows high variation. The more spread out the observations are from their group mean, the higher the value in the denominator of the F-statistic.</p>
<p>If we’re hoping to show that the means are different, it's good when the within-group variance is low. You can think of the within-group variance as the background noise that can obscure a difference between means.</p>
<p>For this one-way ANOVA example, the value that we’ll use for the variance within samples is the Adj MS for Error, which is 4.402. It is considered “error” because it is the variability that is not explained by the factor.</p>
The F-Statistic: Variation Between Sample Means / Variation Within the Samples
<p>The F-statistic is the <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/hypothesis-tests/basics/what-is-a-test-statistic/" target="_blank">test statistic</a> for F-tests. In general, an F-statistic is a ratio of two quantities that are expected to be roughly equal under the null hypothesis, which produces an F-statistic of approximately 1.</p>
<p>The F-statistic incorporates both measures of variability discussed above. Let's take a look at how these measures can work together to produce low and high F-values. Look at the graphs below and compare the width of the spread of the group means to the width of the spread within each group.</p>
<img alt="Graph that shows sample data that produce a low F-value" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/a8faab4bb32bf1a1f5864d34d96e8d56/low_f_dplot.png" style="width: 350px; height: 233px;" />
<img alt="Graph that shows sample data that produce a high F-value" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/054b86eb1e48803baba2cff9c78028ab/high_f_dplot.png" style="width: 350px; height: 233px;" />
<p>The low F-value graph shows a case where the group means are close together (low variability) relative to the variability within each group. The high F-value graph shows a case where the variability of group means is large relative to the within group variability. In order to reject the null hypothesis that the group means are equal, we need a high F-value.</p>
<p>For our plastic strength example, we'll use the Factor Adj MS for the numerator (14.540) and the Error Adj MS for the denominator (4.402), which gives us an F-value of 3.30.</p>
<p>Is our F-value high enough? A single F-value is hard to interpret on its own. We need to place our F-value into a larger context before we can interpret it. To do that, we’ll use the F-distribution to calculate probabilities.</p>
F-distributions and Hypothesis Testing
<p>For one-way ANOVA, the ratio of the between-group variability to the within-group variability follows an <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/probability-distributions-and-random-data/distributions/f-distribution/" target="_blank">F-distribution</a> when the null hypothesis is true.</p>
<p>When you perform a one-way ANOVA for a single study, you obtain a single F-value. However, if we drew multiple random samples of the same size from the same population and performed the same one-way ANOVA, we would obtain many F-values and we could plot a distribution of all of them. This type of distribution is known as a <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/introductory-concepts/basic-concepts/sampling-distribution/" target="_blank">sampling distribution</a>.</p>
<p>Because the F-distribution assumes that the null hypothesis is true, we can place the F-value from our study in the F-distribution to determine how consistent our results are with the null hypothesis and to calculate probabilities.</p>
<p>The probability that we want to calculate is the probability of observing an F-statistic that is at least as high as the value that our study obtained. That probability allows us to determine how common or rare our F-value is under the assumption that the null hypothesis is true. If the probability is low enough, we can conclude that our data is inconsistent with the null hypothesis. The evidence in the sample data is strong enough to reject the null hypothesis for the entire population.</p>
<p>This probability that we’re calculating is also known as the p-value!</p>
<p>To plot the F-distribution for our plastic strength example, I’ll use Minitab’s <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/graphs/graphs-of-distributions/probability-distribution-plots/probability-distribution-plot/" target="_blank">probability distribution plots</a>. In order to graph the F-distribution that is appropriate for our specific design and sample size, we'll need to specify the correct number of DF. Looking at our one-way ANOVA output, we can see that we have 3 DF for the numerator and 36 DF for the denominator.</p>
<p><img alt="Probability distribution plot for an F-distribution with a probability" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/6303a2314437d8fcf2f72d9a56b1293a/f_distribution_probability.png" style="width: 576px; height: 384px;" /></p>
<p>The graph displays the distribution of F-values that we'd obtain if the null hypothesis is true and we repeat our study many times. The shaded area represents the probability of observing an F-value that is at least as large as the F-value our study obtained. F-values fall within this shaded region about 3.1% of the time when the null hypothesis is true. This probability is low enough to reject the null hypothesis using the common <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests:-significance-levels-alpha-and-p-values-in-statistics" target="_blank">significance level</a> of 0.05. We can conclude that not all the group means are equal.</p>
<p><a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-correctly-interpret-p-values" target="_blank">Learn how to correctly interpret the p-value.</a></p>
Assessing Means by Analyzing Variation
<p>ANOVA uses the F-test to determine whether the variability between group means is larger than the variability of the observations within the groups. If that ratio is sufficiently large, you can conclude that not all the means are equal.</p>
<p><span style="line-height: 20.8px;">This brings us back to why we analyze variation to make judgments about means. </span>Think about the question: "Are the group means different?" You are implicitly asking about the variability of the means. After all, if the group means <em>don't </em>vary, or don't vary by more than random chance allows, then you can't say the means are different. And that's why you use analysis of variance to test the means.</p>
ANOVAData AnalysisHypothesis TestingLearningStatistics HelpWed, 18 May 2016 12:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/understanding-analysis-of-variance-anova-and-the-f-testJim FrostAn Overview of Discriminant Analysis
http://blog.minitab.com/blog/starting-out-with-statistical-software/an-overview-of-discriminant-analysis
<p>Among the most underutilized statistical tools in Minitab, and I think in general, are multivariate tools. Minitab offers a number of different multivariate tools, including principal component analysis, factor analysis, <span><a href="http://blog.minitab.com/blog/quality-data-analysis-and-statistics/cluster-analysis-tips-part-2">clustering</a></span>, and more. In this post, my goal is to give you a better understanding of the multivariate tool called discriminant analysis, and how it can be used.</p>
<p>Discriminant analysis is used to classify observations into two or more groups if you have a sample with known groups. Essentially, it's a way to handle a classification problem, where two or more groups, clusters, populations are known up front, and one or more new observations are placed into one of these known classifications based on the measured characteristics. Discriminant analysis can also used to investigate how variables contribute to group separation.</p>
<p>An area where this is especially useful is species classification. We'll use that as an example to explore how this all works. If you want to follow along and you don't already have Minitab, you can get it <a href="http://www.minitab.com/products/minitab/free-trial/">free for 30 days</a>. </p>
Discriminant Analysis in Action
<img alt="Arctic wolf" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/43484b551c0cc2eacb1b848678d666be/wolf.jpg" style="line-height: 20.8px; margin: 10px 15px; float: right; width: 241px; height: 300px;" />
<div>
<p>I have a <a href="//cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/9429cbd678e906f6bbbda0793aa859f6/discrimdata.mtw">data set</a> with variables containing data on both Rocky Mountain and Arctic wolves. We already know which species each observation belongs to; the main goal of this analysis is find out how the data we have contribute to the groupings, and then to use this information to help us classify new individuals. </p>
<p>In Minitab, we set up our worksheet to be column-based like usual. We have a column denoting the species of wolf, as well as 9 other columns containing measurements for each individual on a number of different features.</p>
<p>Once we have our continuous predictors and a group identifier column in our worksheet, we can go to <strong>Stat > Multivariate > Discriminant Analysis</strong>. Here's how we'd fill out the dialog:</p>
<p style="margin-left: 40px;"><img alt="dialog" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/732ead34-1005-4470-b034-d7f8b87fabcf/Image/bbfff731ce2f30923c064a73324dba1e/discrimdia.png" style="width: 448px; height: 336px;" /></p>
<p>'Groups' is where you would enter the column that contains the data on which group the observation falls into. In this case, "Location" is the species ID column. Our predictors, in my case X1-X9, represent the measurements of the individual wolves for each of 9 categories; we'll use these to determine which characteristics determine the groupings.</p>
<p>Some notes before we click OK. First, we're using a Linear discriminant function for simplicity. This makes the assumption that the covariance matrices are equal for all groups. This is something we can verify using Bartlett's Test (also available in Minitab). Once we have our dialog filled out, we can click OK and see our results.</p>
Using the Linear Discriminant Function to Classify New Observations
<p>One of the most important parts of the output we get is called the Linear Discriminant Function. In our example, it looks like this:</p>
<p style="margin-left: 40px;"><img alt="function" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/732ead34-1005-4470-b034-d7f8b87fabcf/Image/a3f3b5199c25010c69d3b19843c31b0e/function.PNG" style="width: 303px; height: 208px;" /></p>
<p>This is the function we will use to classify new observations into groups. Using this function, we can use these coefficients to determine which group provides the best fit for a new individual's measurements. Minitab can do this in the "Options" subdialog. For example, let's say we had an observation with a certain vector of measurements (X1,...,X9). If we do that, we get output like this:</p>
<p style="margin-left: 40px;"><img alt="pred" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/732ead34-1005-4470-b034-d7f8b87fabcf/Image/49873dcbc94d8aa1ae75a45474aaf147/predic.PNG" style="width: 421px; height: 119px;" /></p>
<p>This will give us the probability that a particular new observation falls into either of our groups. In our case, it was an easy one. The probability that is belongs to the AR species was 1. We're reasonably sure, based on the data, that this is the case. In some cases, you may get probabilities much closer to each other, meaning it isn't as clear cut.</p>
<p>I hope this gives you some idea of the usefulness of discriminant analysis, and how you can use it in Minitab to make decisions.</p>
</div>
Data AnalysisHypothesis TestingStatisticsMon, 16 May 2016 12:00:00 +0000http://blog.minitab.com/blog/starting-out-with-statistical-software/an-overview-of-discriminant-analysisEric HeckmanTests of 2 Standard Deviations? Side Effects May Include Paradoxical Dissociations
http://blog.minitab.com/blog/data-analysis-and-quality-improvement-and-stuff/tests-of-2-standard-deviations-side-effects-may-include-paradoxical-dissociations
<p>Once upon a time, when people wanted to compare the standard deviations of two samples, they had two handy tests available, the F-test and Levene's test.</p>
<p>Statistical lore has it that the F-test is so named because <a href="##footnote">it so frequently fails you.1</a> Although the F-test is suitable for data that are normally distributed, its sensitivity to departures from <span><a href="http://blog.minitab.com/blog/the-statistical-mentor/anderson-darling-ryan-joiner-or-kolmogorov-smirnov-which-normality-test-is-the-best">normality</a></span> limits when and where it can be used.</p>
<p><a name="#back"></a>Levene’s test was developed as an antidote to the F-test's extreme sensitivity to nonnormality. However, Levene's test<span style="line-height: 1.6;"> is sometimes accompanied by a troubling side effect: paradoxical </span>dissociations<span style="line-height: 1.6;">. To see what I mean, take a look at these results from an </span><span style="line-height: 1.6;">actual </span><span style="line-height: 1.6;">test of 2 standard deviations that I actually ran in Minitab 16 using actual data that I actually made up:</span></p>
<p style="margin-left: 40px;"><img alt="Ratio of the standard deviations in Release 16" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/313db9f57725eeb074002df423c4415e/16_ratio.jpg" style="width: 286px; height: 99px;" /></p>
<p>Nothing surprising so far. The ratio of the standard deviations from samples 1 and 2 (s1/s2) is <span style="line-height: 20.8px;">1.414 / 1.575 = 0.898. This ratio is </span>our best "point estimate" for the ratio of the standard deviations from populations 1 and 2 (Ps1/Ps2).</p>
<p>Note that the ratio is less than 1, which suggests that Ps2 is greater than Ps1. </p>
<p>Now, let's have a look at the confidence interval (CI) for the population ratio. The CI gives us a range of likely values for the ratio of Ps1/Ps2. The CI <span style="line-height: 20.8px;">below</span><span style="line-height: 1.6;"> labeled "Continuous" is the one calculated using Levene's method:</span></p>
<p style="margin-left: 40px;"><img alt="Confidence interval for the ratio in Release 16" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/aee886880d52d5aed7150abd242b5d61/16_ci.jpg" style="width: 338px; height: 114px;" /></p>
<p><span style="line-height: 1.6;">What in Gauss' name is going on here?!? The range of likely values for Ps1/Ps2—1.046 to 1.566—doesn't include the point estimate of 0.898?!? In fact, the CI suggests that Ps1/Ps2 is </span><em style="line-height: 1.6;">greater </em><span style="line-height: 1.6;">than 1. Which suggests that Ps1 is actually </span><em style="line-height: 20.8px;">greater </em><span style="line-height: 1.6;">than Ps2. </span></p>
<p><span style="line-height: 1.6;">But the point estimate suggests the exact opposite! Which suggests that </span><span style="line-height: 20.8px;">something odd is going on here. Or that</span><span style="line-height: 1.6;"> I might be losing my mind (which wouldn't be that odd). Or both.</span></p>
<p>As it turns out, the very elements that make Levene's test robust to departures from normality also leave the test susceptible to paradoxical dissociations like this one. You see, Levene's test isn't <em>actually </em>based on the standard deviation. Instead, the test is based on a statistic called the <em>mean absolute deviation from the median</em>, or MADM. The MADM is much less affected by nonnormality and outliers than is the standard deviation. And even though the MADM and the <span style="line-height: 20.8px;">standard deviation of a sample </span>can be very different, the <em>ratio </em>of MADM1/MADM2 is nevertheless a good approximation for the <em>ratio </em>of Ps1/Ps2. </p>
<p><span style="line-height: 1.6;">However, in extreme cases, outliers can affect the sample standard deviations so much that s1/s2 can fall completely outside of Levene's CI. And that's when you're left with an awkward and confusing case of paradoxical dissociation. </span></p>
<p><span style="line-height: 1.6;">Fortunately (and this may be the first and last time that you'll ever hear this next phrase), our </span><span style="line-height: 1.6;">statisticians have made things a lot less awkward. </span><span style="line-height: 1.6;">One of the brave folks in Minitab's R&D department toiled against all odds, and at considerable personal peril to solve this enigma. The result, which has been incorporated into Minitab 17, is an effective, elegant, and </span>non-enigmatic<span style="line-height: 1.6;"> test that we call Bonett's test. </span></p>
<p style="margin-left: 40px;"><span style="line-height: 1.6;"><img alt="Confidence interval in Release 17" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/3c014cdea970a3f1f6a540119ef3b533/bonnet_results.jpg" style="width: 310px; height: 170px;" /></span></p>
<p>Like Levene's test, Bonett's test can be used with nonnormal data. But <em>unlike </em>Levene's test, Bonett's test is actually based on the actual standard deviations of the actual samples. Which means that Bonett's test is not subject to the same awkward and confusing paradoxical dissociations that can accompany Levene's test. And I don't know about you, but I try to avoid paradoxical dissociations whenever I can. (Especially as I get older, ... I just don't bounce back the way I used to.) </p>
<p><span style="line-height: 20.8px;">When you compare two standard deviations in Minitab 17, you get a handy graphical report </span><span style="line-height: 20.8px;">that quickly and clearly summarizes the results of your test, including the point estimate and the CI from Bonett's test. Which means n</span><span style="line-height: 20.8px;">o more awkward and confusing paradoxical dissociations. </span></p>
<p style="margin-left: 40px;"><img alt="Summary plot in Release 17" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/b785749b3292df1aa6d32abe4e430b63/17_summary_plot.jpg" style="width: 578px; height: 386px;" /></p>
<p><span style="line-height: 1.6;">------------------------------------------------------------</span></p>
<p><a name="#footnote"> </a></p>
<p>1 So, that bit about the name of the F-test—I kind of made that up. Fortunately, there is a better source of information for the genuinely curious. Our white paper, <a href="http://support.minitab.com/en-us/minitab/17/bonetts_method_two_variances.pdf">Bonett's Method</a>, includes all kinds of details about these tests and comparisons between the CIs calculated with each. Enjoy.</p>
<p> <br />
<em><a href="##back">return to text of post</a></em></p>
<p> </p>
<p> </p>
Hypothesis TestingStatisticsStatsWed, 11 May 2016 12:00:00 +0000http://blog.minitab.com/blog/data-analysis-and-quality-improvement-and-stuff/tests-of-2-standard-deviations-side-effects-may-include-paradoxical-dissociationsGreg FoxUnderstanding t-Tests: 1-sample, 2-sample, and Paired t-Tests
http://blog.minitab.com/blog/adventures-in-statistics/understanding-t-tests%3A-1-sample%2C-2-sample%2C-and-paired-t-tests
<p>In statistics, t-tests are a type of hypothesis test that allows you to compare means. They are called t-tests because each t-test boils your sample data down to one number, the t-value. If you understand how t-tests calculate t-values, you’re well on your way to understanding how these tests work.</p>
<p>In this series of posts, I'm focusing on concepts rather than equations to show how t-tests work. However, this post includes two simple equations that I’ll work through using the analogy of a signal-to-noise ratio.</p>
<p><a href="http://www.minitab.com/products/minitab/" target="_blank">Minitab statistical software</a> offers the 1-sample t-test, paired t-test, and the 2-sample t-test. Let's look at how each of these t-tests reduce your sample data down to the t-value.</p>
How 1-Sample t-Tests Calculate t-Values
<p>Understanding this process is crucial to understanding how t-tests work. I'll show you the formula first, and then I’ll explain how it works.</p>
<p style="margin-left: 40px;"><img alt="formula to calculate t for a 1-sample t-test" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/dbbda42fec926eef96a56c22ed462458/formula_1t.png" style="width: 142px; height: 88px;" /></p>
<p>Please notice that the formula is a ratio. A common analogy is that the t-value is the signal-to-noise ratio.</p>
<strong>Signal (a.k.a. the effect size)</strong>
<p>The numerator is the signal. You simply take the sample mean and subtract the null hypothesis value. If your sample mean is 10 and the null hypothesis is 6, the difference, or signal, is 4.</p>
<p>If there is no difference between the sample mean and null value, the signal in the numerator, as well as the value of the entire ratio, equals zero. For instance, if your sample mean is 6 and the null value is 6, the difference is zero.</p>
<p>As the difference between the sample mean and the null hypothesis mean increases in either the positive or negative direction, the strength of the signal increases.</p>
<div style="float: right; width: 325px; margin: 15px 0px 15px 15px;"><img alt="Photo of a packed stadium to illustrate high background noise" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/695f063e8d38c2bc9c5fa61637ef6327/crowd.jpg" style="width: 325px; height: 244px; margin-bottom:5px;" /><br />
<em>Lots of noise can overwhelm the signal.</em></div>
<strong>Noise</strong>
<p>The denominator is the noise. The equation in the denominator is a measure of variability known as the <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/hypothesis-tests/tests-of-means/what-is-the-standard-error-of-the-mean/" target="_blank">standard error of the mean</a>. This statistic indicates how accurately your sample estimates the mean of the population. A larger number indicates that your sample estimate is less precise because it has more random error.</p>
<p>This random error is the “noise.” When there is more noise, you expect to see larger differences between the sample mean and the null hypothesis value <em>even when the null hypothesis is true</em>. We include the noise factor in the denominator because we must determine whether the signal is large enough to stand out from it.</p>
<strong>Signal-to-Noise ratio</strong>
<p>Both the signal and noise values are in the units of your data. If your signal is 6 and the noise is 2, your t-value is 3. This t-value indicates that the difference is 3 times the size of the standard error. However, if there is a difference of the same size but your data have more variability (6), your t-value is only 1. The signal is at the same scale as the noise.</p>
<p>In this manner, t-values allow you to see how distinguishable your signal is from the noise. Relatively large signals and low levels of noise produce larger t-values. If the signal does not stand out from the noise, it’s likely that the observed difference between the sample estimate and the null hypothesis value is due to random error in the sample rather than a true difference at the population level.</p>
A Paired t-test Is Just A 1-Sample t-Test
<p>Many people are confused about when to use a paired t-test and how it works. I’ll let you in on a little secret. The paired t-test and the 1-sample t-test are actually the same test in disguise! As we saw above, a 1-sample t-test compares one sample mean to a null hypothesis value. A paired t-test simply calculates the difference between paired observations (e.g., before and after) and then performs a 1-sample t-test on the differences.</p>
<p>You can test this with <a href="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/File/946c3f4725847e714e7fcc9664ae67b2/paired_t_test.mtw">this data set</a> to see how all of the results are identical, including the mean difference, t-value, p-value, and confidence interval of the difference.</p>
<p style="margin-left: 40px;"><img alt="Minitab worksheet with paired t-test example" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/02fbcdbbf62fec3823123fbcc818b11f/paired_t_worksheet.png" style="width: 229px; height: 223px;" /><img alt="paired t-test output" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/170d6d4fa1fbbb1bf4f5aa56b1783b5f/paired_t_swo.png" style="width: 518px; height: 196px;" /></p>
<p style="margin-left: 40px;"><img alt="1-sample t-test output" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/08d652fb45599fc1ac247181a935c471/1t_difc_swo.png" style="width: 504px; height: 115px;" /></p>
<p>Understanding that the paired t-test simply performs a 1-sample t-test on the paired differences can really help you understand how the paired t-test works and when to use it. You just need to figure out whether it makes sense to calculate the difference between each pair of observations.</p>
<p>For example, let’s assume that “before” and “after” represent test scores, and there was an intervention in between them. If the before and after scores in each row of the example worksheet represent the same subject, it makes sense to calculate the difference between the scores in this fashion—the paired t-test is appropriate. However, if the scores in each row are for different subjects, it doesn’t make sense to calculate the difference. In this case, you’d need to use another test, such as the 2-sample t-test, which I discuss below.</p>
<p>Using the paired t-test simply saves you the step of having to calculate the differences before performing the t-test. You just need to be sure that the paired differences make sense!</p>
<p>When it is appropriate to use a paired t-test, it can be more powerful than a 2-sample t-test. For more information, go to <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/hypothesis-tests/tests-of-means/why-use-paired-t/" target="_blank">Why should I use a paired t-test?</a></p>
How Two-Sample T-tests Calculate T-Values
<p>The 2-sample t-test takes your sample data from two groups and boils it down to the t-value. The process is very similar to the 1-sample t-test, and you can still use the analogy of the signal-to-noise ratio. Unlike the paired t-test, the 2-sample t-test requires independent groups for each sample.</p>
<p>The formula is below, and then some discussion.</p>
<p style="margin-left: 40px;"><img alt="formula to cacalculate t for a 2-sample t-test" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/276994cf179b4997ce6097d1f4462363/formula_2t.png" style="width: 102px; height: 54px;" /></p>
<p>For the 2-sample t-test, the numerator is again the signal, which is the difference between the means of the two samples. For example, if the mean of group 1 is 10, and the mean of group 2 is 4, the difference is 6.</p>
<p>The default null hypothesis for a 2-sample t-test is that the two groups are equal. You can see in the equation that when the two groups are equal, the difference (and the entire ratio) also equals zero. As the difference between the two groups grows in either a positive or negative direction, the signal becomes stronger.</p>
<p>In a 2-sample t-test, the denominator is still the noise, but Minitab can use two different values. You can either assume that the variability in both groups is equal or not equal, and Minitab uses the corresponding estimate of the variability. Either way, the principle remains the same: you are comparing your signal to the noise to see how much the signal stands out.</p>
<p>Just like with the 1-sample t-test, for any given difference in the numerator, as you increase the noise value in the denominator, the t-value becomes smaller. To determine that the groups are different, you need a t-value that is large.</p>
What Do t-Values Mean?
<p>Each type of t-test uses a procedure to boil all of your sample data down to one value, the t-value. The calculations compare your sample mean(s) to the null hypothesis and incorporates both the sample size and the variability in the data. A t-value of 0 indicates that the sample results exactly equal the null hypothesis. In statistics, we call the difference between the sample estimate and the null hypothesis the effect size. As this difference increases, the absolute value of the t-value increases.</p>
<p>That’s all nice, but what does a t-value of, say, 2 really mean? From the discussion above, we know that a t-value of 2 indicates that the observed difference is twice the size of the variability in your data. However, we use t-tests to evaluate hypotheses rather than just figuring out the signal-to-noise ratio. We want to determine whether the effect size is statistically significant.</p>
<p>To see how we get from t-values to assessing hypotheses and determining statistical significance, read the other post in this series, <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-t-tests-t-values-and-t-distributions">Understanding t-Tests: t-values and t-distributions</a>.</p>
Data AnalysisHypothesis TestingLearningStatistics HelpWed, 04 May 2016 12:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/understanding-t-tests%3A-1-sample%2C-2-sample%2C-and-paired-t-testsJim FrostUnderstanding t-Tests: t-values and t-distributions
http://blog.minitab.com/blog/adventures-in-statistics/understanding-t-tests-t-values-and-t-distributions
<p>T-tests are handy <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-why-we-need-to-use-hypothesis-tests-in-statistics" target="_blank">hypothesis tests</a> in statistics when you want to compare means. You can compare a sample mean to a hypothesized or target value using a one-sample t-test. You can compare the means of two groups with a two-sample t-test. If you have two groups with paired observations (e.g., before and after measurements), use the paired t-test.</p>
<img alt="Output that shows a t-value" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/efd51d69e3947d70197143b735e0c51d/t_value_swo.png" style="line-height: 20.8px; float: right; width: 400px; height: 57px; margin: 10px 15px; border-width: 1px; border-style: solid;" />
<p>How do t-tests work? How do t-values fit in? In this series of posts, I’ll answer these questions by focusing on concepts and graphs rather than equations and numbers. After all, a key reason to use <a href="http://www.minitab.com/products/minitab">statistical software like </a><a href="http://www.minitab.com/en-us/products/minitab/" target="_blank">Minitab</a> is so you don’t get bogged down in the calculations and can instead focus on understanding your results.</p>
<p>In this post, I will explain t-values, t-distributions, and how t-tests use them to calculate probabilities and assess hypotheses.</p>
What Are t-Values?
<p>T-tests are called t-tests because the test results are all based on t-values. T-values are an example of what statisticians call test statistics. A test statistic is a standardized value that is calculated from sample data during a hypothesis test. The procedure that calculates the test statistic compares your data to what is expected under the <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/hypothesis-tests/basics/null-and-alternative-hypotheses/" target="_blank">null hypothesis</a>.</p>
<p>Each type of t-test uses a specific procedure to boil all of your sample data down to one value, the t-value. The calculations behind t-values compare your sample mean(s) to the null hypothesis and incorporates both the sample size and the variability in the data. A t-value of 0 indicates that the sample results exactly equal the null hypothesis. As the difference between the sample data and the null hypothesis increases, the absolute value of the t-value increases.</p>
<p>Assume that we perform a t-test and it calculates a t-value of 2 for our sample data. What does that even mean? I might as well have told you that our data equal 2 fizbins! We don’t know if that’s common or rare when the null hypothesis is true.</p>
<p>By itself, a t-value of 2 doesn’t really tell us anything. T-values are not in the units of the original data, or anything else we’d be familiar with. We need a larger context in which we can place individual t-values before we can interpret them. This is where t-distributions come in.</p>
What Are t-Distributions?
<p>When you perform a t-test for a single study, you obtain a single t-value. However, if we drew multiple random samples of the same size from the same population and performed the same t-test, we would obtain many t-values and we could plot a distribution of all of them. This type of distribution is known as a <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/introductory-concepts/basic-concepts/sampling-distribution/" target="_blank">sampling distribution</a>.</p>
<p>Fortunately, the properties of t-distributions are well understood in statistics, so we can plot them without having to collect many samples! A specific t-distribution is defined by its <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/introductory-concepts/basic-concepts/df/" target="_blank">degrees of freedom (DF)</a>, a value closely related to sample size. Therefore, different t-distributions exist for every sample size. <span style="line-height: 20.8px;">You can graph t-distributions u</span><span style="line-height: 1.6;">sing Minitab’s </span><a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/graphs/graphs-of-distributions/probability-distribution-plots/probability-distribution-plot/" style="line-height: 1.6;" target="_blank">probability distribution plots</a><span style="line-height: 1.6;">.</span></p>
<p>T-distributions assume that you draw repeated random samples from a population where the null hypothesis is true. You place the t-value from your study in the t-distribution to determine how consistent your results are with the null hypothesis.</p>
<p style="margin-left: 40px;"><img alt="Plot of t-distribution" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/d628e56f0380e0edcf575502a670ed31/t_dist_20_df.png" style="width: 576px; height: 384px;" /></p>
<p>The graph above shows a t-distribution that has 20 degrees of freedom, which corresponds to a sample size of 21 in a one-sample t-test. It is a symmetric, bell-shaped distribution that is similar to the normal distribution, but with thicker tails. This graph plots the probability density function (PDF), which describes the likelihood of each t-value.</p>
<p>The peak of the graph is right at zero, which indicates that obtaining a sample value close to the null hypothesis is the most likely. That makes sense because t-distributions assume that the null hypothesis is true. T-values become less likely as you get further away from zero in either direction. In other words, when the null hypothesis is true, you are less likely to obtain a sample that is very different from the null hypothesis.</p>
<p>Our t-value of 2 indicates a positive difference between our sample data and the null hypothesis. The graph shows that there is a reasonable probability of obtaining a t-value from -2 to +2 when the null hypothesis is true. Our t-value of 2 is an unusual value, but we don’t know exactly <em>how </em>unusual. Our ultimate goal is to determine whether our t-value is unusual enough to warrant rejecting the null hypothesis. To do that, we'll need to calculate the probability.</p>
Using t-Values and t-Distributions to Calculate Probabilities
<p>The foundation behind any hypothesis test is being able to take the test statistic from a specific sample and place it within the context of a known probability distribution. For t-tests, if you take a t-value and place it in the context of the correct t-distribution, you can calculate the probabilities associated with that t-value.</p>
<p>A probability allows us to determine how common or rare our t-value is under the assumption that the null hypothesis is true. If the probability is low enough, we can conclude that the effect observed in our sample is inconsistent with the null hypothesis. The evidence in the sample data is strong enough to reject the null hypothesis for the entire population.</p>
<p>Before we calculate the probability associated with our t-value of 2, there are two important details to address.</p>
<p>First, we’ll actually use the t-values of +2 and -2 because we’ll perform a two-tailed test. A two-tailed test is one that can test for differences in both directions. For example, a two-tailed 2-sample t-test can determine whether the difference between group 1 and group 2 is statistically significant in either the positive or negative direction. A one-tailed test can only assess one of those directions.</p>
<p>Second, we can only calculate a non-zero probability for a range of t-values. As you’ll see in the graph below, a range of t-values corresponds to a proportion of the total area under the distribution curve, which is the probability. The probability for any specific point value is zero because it does not produce an area under the curve.</p>
<p>With these points in mind, we’ll shade the area of the curve that has t-values greater than 2 and t-values less than -2.</p>
<p><img alt="T-distribution with a shaded area that represents a probability" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/5e124a2c8139681afec706799ebabcec/t_dist_prob.png" style="width: 576px; height: 384px;" /></p>
<p>The graph displays the probability for observing a difference from the null hypothesis that is at least as extreme as the difference present in our sample data while assuming that the null hypothesis is actually true. Each of the shaded regions has a probability of 0.02963, which sums to a total probability of 0.05926. When the null hypothesis is true, the t-value falls within these regions nearly 6% of the time.</p>
<p>This probability has a name that you might have heard of—it’s called the p-value! While the probability of our t-value falling within these regions is fairly low, it’s not low enough to reject the null hypothesis using the common <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-significance-levels-alpha-and-p-values-in-statistics" target="_blank">significance level</a> of 0.05.</p>
<p><a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-correctly-interpret-p-values" target="_blank">Learn how to correctly interpret the p-value.</a></p>
t-Distributions and Sample Size
<p>As mentioned above, t-distributions are defined by the DF, which are closely associated with sample size. As the DF increases, the probability density in the tails decreases and the distribution becomes more tightly clustered around the central value. The graph below depicts t-distributions with 5 and 30 degrees of freedom.</p>
<p><img alt="Comparison of t-distributions with different degrees of freedom" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/5220dc6347611a230e89b70de904b034/t_dist_comp_df.png" style="width: 576px; height: 384px;" /></p>
<p>The t-distribution with fewer degrees of freedom has thicker tails. This occurs because the t-distribution is designed to reflect the added uncertainty associated with analyzing small samples. In other words, if you have a small sample, the probability that the sample statistic will be further away from the null hypothesis is greater even when the null hypothesis is true.</p>
<p>Small samples are more likely to be unusual. This affects the probability associated with any given t-value. For 5 and 30 degrees of freedom, a t-value of 2 in a two-tailed test has p-values of 10.2% and 5.4%, respectively. Large samples are better!</p>
<p>I’ve showed how t-values and t-distributions work together to produce probabilities. To see how each type of t-test works and actually calculates the t-values, read the other post in this series, <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-t-tests:-1-sample,-2-sample,-and-paired-t-tests">Understanding t-Tests: 1-sample, 2-sample, and Paired t-Tests</a>.</p>
<p>If you'd like to learn how the ANOVA F-test works, read my post, <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-analysis-of-variance-anova-and-the-f-test">Understanding Analysis of Variance (ANOVA) and the F-test</a>.</p>
Data AnalysisHypothesis TestingLearningStatistics HelpWed, 20 Apr 2016 12:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/understanding-t-tests-t-values-and-t-distributionsJim FrostBest Way to Analyze Likert Item Data: Two Sample T-Test versus Mann-Whitney
http://blog.minitab.com/blog/adventures-in-statistics/best-way-to-analyze-likert-item-data%3A-two-sample-t-test-versus-mann-whitney
<p><img alt="Worksheet that shows Likert data" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/6b1cf78b969699ed58febb026d32051d/likert_worksheet.png" style="float: right; width: 162px; height: 265px; margin: 10px 15px;" />Five-point Likert scales are commonly associated with surveys and are used in a wide variety of settings. You’ve run into the Likert scale if you’ve ever been asked whether you strongly agree, agree, neither agree or disagree, disagree, or strongly disagree about something. The worksheet to the right shows what five-point Likert data look like when you have two groups.</p>
<p>Because Likert item data are discrete, ordinal, and have a limited range, there’s been a longstanding dispute about the most valid way to analyze Likert data. The basic choice is between <a href="http://blog.minitab.com/blog/adventures-in-statistics/choosing-between-a-nonparametric-test-and-a-parametric-test" target="_blank">a parametric test and a nonparametric test</a>. The pros and cons for each type of test are generally described as the following:</p>
<ul>
<li>Parametric tests, such as the 2-sample t-test, assume a normal, continuous distribution. However, with a sufficient sample size, t-tests are robust to departures from normality.</li>
<li>Nonparametric tests, such as the Mann-Whitney test, do not assume a normal or a continuous distribution. However, there are concerns about a lower ability to detect a difference when one truly exists.</li>
</ul>
<p>What’s the better choice? This is a real-world decision that users of <a href="http://www.minitab.com/en-us/products/minitab/" target="_blank">statistical software</a> have to make when they want to analyze Likert data.</p>
<p>Over the years, a number of studies that have tried to answer this question. However, they’ve tended to look at a limited number of potential distributions for the Likert data, which causes the generalizability of the results to suffer. Thanks to increases in computing power, simulation studies can now thoroughly assess a wide range of distributions.</p>
<p>In this blog post, I highlight a simulation study conducted by de Winter and Dodou* that compares the capabilities of the two sample t-test and the Mann-Whitney test to analyze five-point Likert items for two groups. Is it better to use one analysis or the other?</p>
<p>The researchers identified a diverse set of 14 distributions that are representative of actual Likert data. The computer program drew independent pairs of samples to test all possible combinations of the 14 distributions. All in all, 10,000 random samples were generated for each of the 98 distribution combinations! The pairs of samples are analyzed using both the two sample t-test and the Mann-Whitney test to compare how well each test performs. The study also assessed different sample sizes.</p>
<p>The results show that for all pairs of distributions the <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/hypothesis-tests/basics/type-i-and-type-ii-error/" target="_blank">Type I (false positive) error rates</a> are very close to the target amounts. In other words, if you use either analysis and your results are statistically significant, you don’t need to be overly concerned about a false positive.</p>
<p>The results also show that for most pairs of distributions, the difference between the <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/power-and-sample-size/what-is-power/" target="_blank">statistical power</a> of the two tests is trivial. In other words, if a difference truly exists at the population level, either analysis is equally likely to detect it. The concerns about the Mann-Whitney test having less power in this context appear to be unfounded.</p>
<p>I do have one caveat. There are a few pairs of specific distributions where there is a power difference between the two tests. If you perform both tests on the same data and they disagree (one is significant and the other is not), you can look at a table in the article to help you determine whether a difference in statistical power might be an issue. This power difference affects only a small minority of the cases.</p>
<p>Generally speaking, the choice between the two analyses is tie. If you need to compare two groups of five-point Likert data, it usually doesn’t matter which analysis you use. Both tests almost always provide the same protection against false negatives and always provide the same protection against false positives. These patterns hold true for sample sizes of 10, 30, and 200 per group.</p>
<p>*de Winter, J.C.F. and D. Dodou (2010), Five-Point Likert Items: t test versus Mann-Whitney-Wilcoxon, <em>Practical Assessment, Research and Evaluation</em>, 15(11).</p>
Data AnalysisHypothesis TestingStatisticsStatistics HelpWed, 06 Apr 2016 12:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/best-way-to-analyze-likert-item-data%3A-two-sample-t-test-versus-mann-whitneyJim FrostThe American Statistical Association's Statement on the Use of P Values
http://blog.minitab.com/blog/adventures-in-statistics/the-american-statistical-associations-statement-on-the-use-of-p-values
<p>P values have been around for nearly a century and they’ve been the subject of criticism since their origins. In recent years, the debate over P values has risen to a fever pitch. In particular, there are serious fears that P values are misused to such an extent that it has actually damaged science.</p>
<p>In March 2016, spurred on by the growing concerns, the American Statistical Association (ASA) did something that it has never done before and took an official position on a statistical practice—how to use P values. The ASA tapped a group of 20 experts who discussed this over the course of many months. Despite facing complex issues and many heated disagreements, this group managed to reach a consensus on specific points and produce the <a href="http://amstat.tandfonline.com/doi/pdf/10.1080/00031305.2016.1154108" target="_blank">ASA Statement on Statistical Significance and P-values</a>.</p>
<p>I’ve written previously about my concerns over how P values have been misused and misinterpreted. My opinion is that P values are powerful tools but they need to be used and interpreted correctly. P value calculations incorporate the effect size, sample size, and variability of the data into a single number that objectively tells you how consistent your data are with the null hypothesis. You can read my case for the power of P values in my <a href="http://blog.minitab.com/blog/adventures-in-statistics/banned-p-values-and-confidence-intervals-a-rebuttal-part-1" target="_blank">rebuttal to a journal that banned them</a>.</p>
<p><span style="line-height: 1.6;">The ASA statement contains the following six principles on how to use P values, which</span><span style="line-height: 20.8px;"> </span><span style="line-height: 20.8px;">are remarkably aligned with my own. </span><span style="line-height: 20.8px;">Let’s take a look at what they came up with.</span></p>
<ol>
<li>P-values can indicate how incompatible the data are with a specified statistical model.</li>
<li>P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.</li>
</ol>
<p>I discuss these ideas in my post <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-correctly-interpret-p-values" target="_blank">How to Correctly Interpret P Values</a>. It turns out that the common misconception stated in principle #2 creates the illusion of substantially more evidence against the null hypothesis than is justified. There are a number of reasons <a href="http://blog.minitab.com/blog/adventures-in-statistics/why-are-p-value-misunderstandings-so-common" target="_blank">why this type of P value misunderstanding is so common</a>. In reality, a P value is a probability about your sample data and not about the truth of a hypothesis.</p>
<ol>
<li value="3">Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.</li>
</ol>
<p>In statistics, we’re working with samples to describe a complex reality. Attempting to discover the truth based on an oversimplified process of comparing a single P value to an arbitrary significance level is destined to have problems. False positives, false negatives, and otherwise fluky results are bound to happen.</p>
<p>Using P values in conjunction with a significance level to decide when to reject the null hypothesis increases your chance of making the correct decision. However, there is no magical threshold that distinguishes between the studies that have a true effect and those that don’t with 100% accuracy. You can see a graphical representation of why this is the case in my post <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-why-we-need-to-use-hypothesis-tests-in-statistics" target="_blank">Why We Need to Use Hypothesis Tests</a>.</p>
<p>When Sir Ronald Fisher introduced P values, he never intended for them to be the deciding factor in such a rigid process. Instead, Fisher considered them to be just one part of a process that incorporates scientific reasoning, experimentation, statistical analysis and replication to lead to scientific conclusions.</p>
<p>According to Fisher, “A scientific fact should be regarded as experimentally established only if a properly designed experiment rarely fails to give this level of significance.”</p>
<p>In other words, don’t expect a <em>single</em> study to provide a definitive answer. No single P value can divine the truth about reality by itself.</p>
<ol>
<li value="4">Proper inference requires full reporting and transparency.</li>
</ol>
<p>If you don’t know the full context of a study, you can’t properly interpret a carefully selected subset of the results. Data dredging, cherry picking, significance chasing, data manipulation, and other forms of p-hacking can make it impossible to draw the proper conclusions from selectively reported findings. You must know the full details about all data collection choices, how many and which analyses were performed, and all P values.</p>
<p><img alt="Comic about jelly beans causing acne with selective reporting of the results" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/22099bc252d3630a4876f579c1b83778/jelly_bean_comic.png" style="line-height: 20.8px; width: 500px; height: 1387px; margin: 10px 15px;" /></p>
<div><span style="line-height: 1.6;">In the </span><a href="http://xkcd.com/882/" style="line-height: 1.6;" target="_blank">XKCD comic</a><span style="line-height: 1.6;"> about jelly beans, if you didn’t know about the post hoc decision to subdivide the data and the 20 insignificant test results, you’d be pretty convinced that green jelly beans cause acne!</span></div>
<ol>
<li value="5">A p-value, or statistical significance, does not measure the size of an effect or the importance of an effect.</li>
<li value="6">By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.</li>
</ol>
<p>I cover these ideas, and more, in my <a href="http://blog.minitab.com/blog/adventures-in-statistics/five-guidelines-for-using-p-values">Five Guidelines for Using P Values</a>. P-values don’t tell you the size or importance of the effect. An effect can be statistically significant but trivial in the real world. This is the difference between <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/introductory-concepts/p-value-and-significance-level/practical-significance/" target="_blank">statistical significance and practical significance</a>. The analyst should supplement P values with other statistics, such as effect sizes and confidence intervals, to convey the importance of the effect.</p>
<p>Researchers need to apply their scientific judgment about the plausibility of the hypotheses, results of similar studies, proposed mechanisms, proper experimental design, and so on. Expert knowledge transforms statistics from numbers into meaningful, trustworthy findings.</p>
Data AnalysisHypothesis TestingLearningStatisticsStatistics in the NewsWed, 23 Mar 2016 12:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/the-american-statistical-associations-statement-on-the-use-of-p-valuesJim FrostDo Actors Wait Longer than Actresses for Oscars? A Comparison Between Academy Award Winners
http://blog.minitab.com/blog/statistics-and-more/do-actors-wait-longer-than-actresses-for-oscars-a-comparison-between-academy-award-winners
<p><span style="line-height: 1.6;">I am a bit of an Oscar fanatic. Every year after the ceremony, I religiously go online to find out who won the awards and listen to their acceptance speeches. This year, I was <em>so </em>chuffed to learn that Leonardo Di Caprio won his first Oscar for his performance in <em>The Revenant</em> in the 88</span>th<span style="line-height: 1.6;"> Academy Awards—after five nominations in previous ceremonies. As a longtime Di Caprio fan, I still remember going to the cinema when <em>Titanic </em>was released, and returning four more times. Every time, I could not hold back any tears and used up all tissues I'd brought with me!<img alt="this year's winner..." src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a51cc79cd412237ef2f241d69e7e83ec/dicaprio.png" style="margin: 10px 15px; float: right; width: 190px; height: 250px;" /></span></p>
<p>Compared to his <em>Titanic </em>costar Kate Winslet, who won the Best Actress award in 2009 (aged 33), Leonardo waited 7 more years (20 years since his first nomination) before his turn came. I can name several actresses—Gwyneth Paltrow, Hilary Swank, and Jennifer Lawrence come immediately to mind—who obtained the award at younger ages. However, it appears that few young actors have received the Academy Award in recent years. This makes me wonder whether Oscar-winning actors tend to be older than Oscar-winning actresses.</p>
<p>To investigate, I collected data of the dates of past Academy Awards ceremonies and the birthdays of the winning actors and actresses. From these, I calculated the age of the winners on their Oscar-winning night. Below is a screenshot of some of the data.</p>
<p><img alt="oscars data" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ef494be723f2d7d3bb55d8f055124ad1/oscar1.png" style="width: 564px; height: 390px;" /></p>
<p>I used <a href="http://www.minitab.com/products/minitab">Minitab Statistical Software</a> to create a time series plot of the data, shown below.</p>
<p><img alt="time series plot" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/2d5d4f84cb67fbe6fd41aee118f40c6a/oscar2.png" style="width: 550px; height: 367px;" /></p>
<p><span style="line-height: 1.6;">The plot suggests that there is usually a substantial age difference between the Best Actor and Best Actress winners. There are more years when the Best Actor winner is much older than the best actress winner (blue dots above red dots) than years where the winning actress is older. Some examples:</span></p>
<p style="margin-left: 40px;">1992: Anthony Hopkins (54.2466), Jodie Foster (29.3616)</p>
<p style="margin-left: 40px;">1987: Paul Newman (62.1726), Marlee Matlin (21.5973)</p>
<p style="margin-left: 40px;">1989: Dustin Hoffman (51.6329), Jodie Foster (26.3507)</p>
<p style="margin-left: 40px;">1990: Daniel Day-Lewis (32.9068), Jessica Tandy (80.8000)</p>
<p style="margin-left: 40px;">1998: Jack Nicholson (60.9178), Helen Hunt (34.7699)</p>
<p style="margin-left: 40px;">2011: Colin Firth (50.4658), Natalie Portman (29.7205)</p>
<p style="margin-left: 40px;">2013: Daniel Day-Lewis (55.8247), Jennifer Lawrence (22.5288)</p>
<p><span style="line-height: 1.6;">There are not many occasions when both the Best Actor and Best Actress are in their 30s, 40s, 50s, etc.</span></p>
<p><a href="http://blog.minitab.com/blog/cpammer/planning-a-trip-to-disney-world%3A-using-statistics-to-keep-it-in-the-green">Conditional formatting</a> was introduced with the release of Minitab 17.2 and this is what I am going to use to identify any repeats in the data. </p>
<p style="margin-left: 40px;"><img alt="conditional formatting" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/9c70ddd1b5378dd004dac75a6dafaf31/oscar3.png" style="width: 505px; height: 213px;" /></p>
<p>Minitab applies the following conditional formatting to the data set:</p>
<p style="margin-left: 40px;"><img alt="conditional formatting" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/bfff88d02adb86b588b2ee63fc9d41a4/oscar4.png" style="width: 547px; height: 570px;" /></p>
<p>For the Best Actor award, Daniel Day-Lewis received the award on three occasions, while <span style="line-height: 20.8px;">Marlon Brando, Gary Cooper, Tom Hanks, Dustin Hoffman, Fredric March, Jack Nicholson, </span><span style="line-height: 20.8px;">Sean Penn, and Spencer Tracy each</span><span style="line-height: 1.6;"> won the award twice.</span></p>
<p>For the Best Actress category, Katharine Hepburn won four times. <span style="line-height: 20.8px;">Ingrid Bergman, Bette Davis, Olivia de Havilland, Sally Field, Jane Fonda, Jodie Foster, </span><span style="line-height: 20.8px;">Glenda Jackson, Vivien Leigh, Luise Rainer, Meryl Streep, Hilary Swank, and Elizabeth Taylor each</span><span style="line-height: 1.6;"> received the award twice.</span></p>
<p>Winners below the age of 30 could be regarded as obtaining the award at an early stage of their careers. Using the conditional formatting again, I can quickly identify the actors and actress in the data who are in this group.</p>
<p><img alt="conditional formatting" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a1397cfe27b867b0fae4b1da7271a945/oscar5.png" style="width: 496px; height: 210px;" /></p>
<p><span style="line-height: 1.6;">As shown below, a lot more actresses than actors obtain the award before the age of 30.</span></p>
<p><img alt="conditional formatted data" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/2dd1d117449fee623b47ce7c0062bb7d/oscar5a.png" style="width: 649px; height: 432px;" /></p>
<p><span style="line-height: 1.6;">To get a better comparison, I am going to remove the repeats (with the help of the highlighted cells) for actors and actress who won more than once and only take into account their age at first win. This gives data from 79 Best Actor and 74 Best Actress winners. I am going to use <a href="http://www.minitab.com/products/minitab/assistant/">the Assistant</a> to carry out a comparison using the 2-sample t-test.</span></p>
<p><img alt="Assistant 2-sample t test" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/eab880dbb38bb33bbfb452693513610f/oscar6.png" style="width: 641px; height: 495px;" /></p>
<p><span style="line-height: 1.6;">Apart from generating easy-to-interpret output, the Assistant also has the advantage of carrying out a powerful t-test even with unequal sample sizes using the Welch approach.</span></p>
<p><img alt="Report Card" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ac73bee979d25f49f4221170e9769956/oscar7.png" style="line-height: 1.6; width: 650px; height: 488px;" /></p>
<p><span style="line-height: 1.6;">The Report Card indicates that we have sufficient data and the assumptions of the t-test are fulfilled. However, Minitab also detects some usual data which I will look into further.</span></p>
<p><img alt="2 sample t test diagnostic report" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/384df9dc72ef6868ed050f65d663e1bc/oscar8.png" style="width: 650px; height: 487px;" /></p>
<p><span style="line-height: 1.6;">Using the brush, the following unusual data are identified.</span></p>
<p style="margin-left: 40px;"><strong>Best Actor: </strong><br />
<span style="line-height: 1.6;">John Wayne (62.8658)</span><br />
<span style="line-height: 1.6;">Henry Fonda (76.8685)</span></p>
<p>These winners were considerably older, as the majority of the actor winners are in their 40s and 50s.</p>
<p style="margin-left: 40px;"><strong>Best Actress:</strong><br />
<span style="line-height: 1.6;">Marie Dressler (63.0027)</span><br />
<span style="line-height: 1.6;">Geraldine Page (61.3342)</span><br />
<span style="line-height: 1.6;">Jessica Tandy (80.8000)</span><br />
<span style="line-height: 1.6;">Helen Mirren (61.5863)</span></p>
<p>These winners were considerably older as the majority of the actress winners were in their late 30s and 40s.</p>
<p><img alt="2-sample t test summary report" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/8d4c39828bcad9ffdc0e151771078244/oscar9.png" style="width: 650px; height: 488px;" /></p>
<p><span style="line-height: 1.6;">The Summary Report provides the key output of the t-test. The mean age of Best Actor is 43.746, while the mean age of Best Actress is 35. The p-value value of the test is very small (<0.001). This means that we have enough evidence to suggest that, on average, the Best Actor winner is older than the Best Actress winner.</span></p>
<p><span style="line-height: 1.6;">I will leave it to others to speculate (and perhaps even use data to explore) why this apparent age gap exists. However, whatever their ages, we all enjoy seeing these Oscar winners' amazing performances on the big screen!</span></p>
<p><span style="font-size: 8px; line-height: 1.6;">Photograph of Leonardo DiCaprio by <a href="https://www.flickr.com/photos/phototoday2008/11933209533/" target="_blank">See Li</a>, used under Creative Commons 2.0. </span></p>
Fun StatisticsHypothesis TestingStatistics in the NewsMon, 07 Mar 2016 13:00:00 +0000http://blog.minitab.com/blog/statistics-and-more/do-actors-wait-longer-than-actresses-for-oscars-a-comparison-between-academy-award-winnersEugenie ChungHow to Compare Regression Slopes
http://blog.minitab.com/blog/adventures-in-statistics/how-to-compare-regression-lines-between-different-models
<p>If you perform linear regression analysis, you might need to compare different regression lines to see if their constants and slope coefficients are different. Imagine there is an established relationship between X and Y. Now, suppose you want to determine whether that relationship has changed. Perhaps there is a new context, process, or some other qualitative change, and you want to determine whether that affects the relationship between X and Y.</p>
<p>For example, you might want to assess whether the relationship between the height and weight of football players is significantly different than the same relationship in the general population.</p>
<p>You can graph the regression lines to visually compare the slope coefficients and constants. However, you should also statistically test the differences. <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-why-we-need-to-use-hypothesis-tests-in-statistics" target="_blank">Hypothesis testing</a> helps separate the true differences from the random differences caused by sampling error so you can have more confidence in your findings.</p>
<p>In this blog post, I’ll show you how to compare a relationship between different regression models and determine whether the differences are statistically significant. Fortunately, these tests are easy to do using <a href="http://www.minitab.com/en-us/products/minitab/" target="_blank">Minitab statistical software</a>.</p>
<p>In the example I’ll use throughout this post, there is an input variable and an output variable for a hypothetical process. We want to compare the relationship between these two variables under two different conditions. Here is the <a href="//cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/File/569a0e7d067944f6f9147434794efcd6/comparingregressionmodels.MPJ">Minitab project file</a> with the data.</p>
Comparing Constants in Regression Analysis
<p>When the <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-how-to-interpret-the-constant-y-intercept" target="_blank">constants</a> (or y intercepts) in two different regression equations are different, this indicates that the two regression lines are shifted up or down on the Y axis. In the scatterplot below, you can see that the Output from Condition B is consistently higher than Condition A for any given Input value. We want to determine whether this vertical shift is statistically significant.</p>
<p><img alt="Scatterplot with two regression lines that have different constants." src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/2ed27f4204515bac9d9674c16fa0c0f7/scatter_constant_dift.png" style="width: 576px; height: 384px;" /></p>
<p>To test the difference between the constants, we just need to include a <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/introductory-concepts/data-concepts/cat-quan-variable/" target="_blank">categorical variable</a> that identifies the qualitative attribute of interest in the model. For our example, I have created a variable for the condition (A or B) associated with each observation.</p>
<p>To fit the model in Minitab, I’ll use: <strong>Stat > Regression > Regression > Fit Regression Model</strong>. I’ll include <em>Output</em> as the <a href="http://support.minitab.com/en-us/minitab/17/topic-library/modeling-statistics/regression-and-correlation/regression-models/what-are-response-and-predictor-variables/" target="_blank">response variable</a>, <em>Input</em> as the continuous <a href="http://support.minitab.com/en-us/minitab/17/topic-library/modeling-statistics/regression-and-correlation/regression-models/what-are-response-and-predictor-variables/" target="_blank">predictor</a>, and <em>Condition</em> as the categorical predictor.</p>
<p>In the regression analysis output, we’ll first check the coefficients table.</p>
<p style="margin-left: 40px;"><img alt="Coefficients table that shows that the constants are different" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/23657868f2cf893d216d05d3400ab9e6/coeff_constant_dift.png" style="width: 369px; height: 117px;" /></p>
<p>This table shows us that the relationship between Input and Output is statistically significant because the p-value for Input is 0.000.</p>
<p>The <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-interpret-regression-analysis-results-p-values-and-coefficients" target="_blank">coefficient</a> for Condition is 10 and its <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-interpret-regression-analysis-results-p-values-and-coefficients" target="_blank">p-value</a> is significant (0.000). The coefficient tells us that the vertical distance between the two regression lines in the scatterplot is 10 units of Output. The p-value tells us that this difference is statistically significant—you can reject the null hypothesis that the distance between the two constants is zero. You can also see the difference between the two constants in the regression equation table below.</p>
<p style="margin-left: 40px;"><img alt="Regression equation table that shows constants that are different" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/a879996e37ebb05a297721e695a71943/equ_constant_dift.png" style="width: 305px; height: 113px;" /></p>
Comparing Coefficients in Regression Analysis
<p>When two slope coefficients are different, a one-unit change in a predictor is associated with different mean changes in the response. In the scatterplot below, it appears that a one-unit increase in Input is associated with a greater increase in Output in Condition B than in Condition A. We can <em>see</em> that the slopes look different, but we want to be sure this difference is statistically significant.</p>
<p><img alt="Scatterplot that shows two slopes that are different" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/200c12087fdf7eecd9b773d9ce213020/scatter_slope_dift.png" style="width: 576px; height: 384px;" /></p>
<p>How do you statistically test the difference between regression coefficients? It sounds like it might be complicated, but it is actually very simple. We can even use the same Condition variable that we did for testing the constants.</p>
<p>We need to determine whether the coefficient for Input depends on the Condition. In statistics, when we say that the effect of one variable depends on another variable, that’s an interaction effect. All we need to do is include the interaction term for Input*Condition!</p>
<p>In Minitab, you can specify interaction terms by clicking the <strong>Model</strong> button in the main regression dialog box. After I fit the regression model with the interaction term, we obtain the following coefficients table:</p>
<p style="margin-left: 40px;"><img alt="Coefficients table that shows different slopes" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/f06eff56f2266d0ff7e3919aa1292285/coeff_slope_dift.png" style="width: 410px; height: 154px;" /></p>
<p>The table shows us that the interaction term (Input*Condition) is statistically significant (p = 0.000). Consequently, we reject the null hypothesis and conclude that the difference between the two coefficients for Input (below, 1.5359 and 2.0050) does not equal zero. We also see that the main effect of Condition is not significant (p = 0.093), which indicates that difference between the two constants is not statistically significant.</p>
<p style="margin-left: 40px;"><img alt="Regression equation table that shows different slopes" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/d5e5142c0ff13645d1dacc3e2c0bee27/equ_coeff_dift.png" style="width: 295px; height: 105px;" /></p>
<p>It is easy to compare and test the differences between the constants and coefficients in regression models by including a categorical variable. These tests are useful when you can see differences between regression models and you want to defend your conclusions with p-values.</p>
<p>If you're learning about regression, read my <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-tutorial-and-examples">regression tutorial</a>!</p>
Data AnalysisHypothesis TestingRegression AnalysisStatistics HelpWed, 13 Jan 2016 13:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/how-to-compare-regression-lines-between-different-modelsJim FrostChecking the “Naughty” or “Nice” Assessment with Attribute Agreement Analysis
http://blog.minitab.com/blog/using-data-and-statistics/checking-the-naughty-or-nice-assessment-with-attribute-agreement-analysis
<p><span style="line-height: 1.6;">Each year Santa’s Elves have to take all the information provided by family, friends and teachers to determine if all the children of the world have been “Naughty” or “Nice.” This is no small task, as according to the website </span><a href="http://www.santafaqs.com/" style="line-height: 1.6;">www.santafaqs.com</a><span style="line-height: 1.6;"> Santa delivers over 5 billion presents per year. </span></p>
<p><span style="line-height: 1.6;"><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/0b9beb7f6ce672b36e9141b1bc3d3826/elf_classifying.png" style="margin: 10px 15px; float: right; width: 200px; height: 194px;" />Not only is it a large task in terms of size, but it is critical that the Elves have a consistent approach to this assessment. Santa does not want to give presents to naughty children, but he is adamant that he would rather mistakenly give a present to a naughty child than run the risk of <em>not</em> giving a present to a nice child. </span></p>
<p>For this reason, every summer Santa trains all his staff on separating people into the “Naughty” and “Nice” categories, and then he gives them a final test on a set of characters where their behaviour category is already known. For each of these 50 characters, Santa gives the Elves details of their behaviour as reported by their family, friends and work colleagues, and they give them a Naughty or Nice grade. To set up and analyse his new Elf recruits performance, Santa uses an <span><a href="http://blog.minitab.com/blog/understanding-statistics/got-good-judgment-prove-it-with-attribute-agreement-analysis">Attribute Agreement Analysis</a></span>.</p>
<p>The full list of characters and their grades can be seen in this Minitab project file: <a href="//cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/File/edccea06e97c99500398e5f26bf71e23/elf_test.MPJ">elf-test.mpj</a>. If you don't already have Minitab and you'd like to give Attribute Agreement Analysis a try with this data set, you can <a href="http://www.minitab.com/en-us/products/minitab/free-trial/">download the free 30-day trial</a>. </p>
<p><img src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/006aec47ff21e0f2c53ff20bb3c8aaf7/naughty_1.png" style="border-width: 0px; border-style: solid; width: 282px; height: 330px;" /></p>
<p><span style="line-height: 1.6;">The first thing Santa has to do is create an Attribute Agreement Worksheet, which ensures that each Elf evaluates all the characters in a random order and creates a Minitab worksheet that includes expected category (Naughty or Nice) for each person so that Santa or one of his helpers can quickly enter the Elves assessments. </span></p>
<p>To avoid any pre-judgement the Elves do not see the name of the person they are assessing—only their Sample No and the information from family and friends.</p>
<p>The steps he follows are:</p>
<ol>
<li><strong>File > Open Project > Elf-Test.mpj</strong></li>
<li><strong>Assistant > Measurement System Analysis (MSA) > Attribute Agreement Worksheet</strong></li>
</ol>
<p>Santa completes the dialog box as follows and clicks OK. He then prints of the collection datasheets and gets the new Elves to assess the information for each of the people of the list and categorise them as Naughty or Nice. Once he has this information it is input into the Minitab Worksheet.</p>
<p><img alt="Attribute Agreement Analysis worksheet" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/367d2c66939af738133c5e34ef72dabf/naughty_2.png" style="border-width: 0px; border-style: solid; width: 585px; height: 418px;" /></p>
<p>Once Santa has collected all this data, he runs the Attribute Agreement Analysis in the Assistant and gets the following results:</p>
<p><img src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/058a59ecbcb6d9b30f6bb90f896e7f9e/naughty_3.png" style="border-width: 0px; border-style: solid; width: 496px; height: 137px;" /></p>
<p><img src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/cd34f968224601cfacac3a635117c05b/naughty_4.png" style="border-width: 0px; border-style: solid; width: 567px; height: 575px;" /></p>
<p>Santa is happy with the overall error rate. However, he is very concerned that the percentage of Nice people being rated as Naughty is higher than the overall error rate. This means that there are some good people that may not get presents. This is not acceptable, so he uses another report produced by Minitab to investigate which people are being mis-classified.</p>
<p><img src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a16a40bd7ae48ca184dc665b6c3727f3/naughty_5.png" style="border-width: 0px; border-style: solid; width: 549px; height: 318px;" /></p>
<p>This chart shows which samples were misclassified as Naughty.</p>
<p>Santa is worried because every Elf said person 26 was Naughty when the standard was Nice. When Santa looks at the Elf-Test Worksheet, he can see that person 26 was Sherlock Holmes. Santa checks the information on him and can see why the Elves think he is naughty: he smokes and the neighbours have complained that he plays his violin (badly) at all hours of the day and night. Santa provides extra training to the Elves to help them realise that musicians only improve if they practise regularly, so the neighbours will have to suffer.</p>
<p>Characters, 24, 40 and 49 (Little Red Riding Hood, Stuart Little and Shrek, respectively) were only misclassified once apiece, so Santa wants to investigate which Elves made the wrong decision in these cases and again he uses one of the reports the Assistant produces as a standard.</p>
<p><img src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/e960ad692f6856d825475bfc1373de29/naughty_6.png" style="border-width: 0px; border-style: solid; width: 545px; height: 344px;" /></p>
<p>From this report Santa, can see that Berry is the strictest elf—and the one who has made the most mistakes classifying Nice people as Naughty. For this reason, Santa decides to reassign Berry to the reindeer welfare department.</p>
<p>Jingle and Sparkle are now full time Niceness monitors, and Santa is sure—thanks to his training program and the Attribute Agreement Assessment Analysis completed in Minitab—that <em>everyone</em> will get the presents they deserve this year.</p>
<p>If, like Santa, you have to make qualitative assessments on your products or services, an Attribute Agreement Analysis is a good way to verify and improve the performance of you assessors.</p>
<p> </p>
Fun StatisticsHypothesis TestingQuality ImprovementWed, 23 Dec 2015 13:00:00 +0000http://blog.minitab.com/blog/using-data-and-statistics/checking-the-naughty-or-nice-assessment-with-attribute-agreement-analysisGillian GroomWhy Are P Value Misunderstandings So Common?
http://blog.minitab.com/blog/adventures-in-statistics/why-are-p-value-misunderstandings-so-common
<p><img alt="Danger thin ice sign" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/694cbaccbcb94c40ba77ec6a967994d7/thin_ice_sign.jpg" style="float: right; width: 225px; height: 300px; margin: 15px 10px;" />I’ve written a fair bit about P values: <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-correctly-interpret-p-values" target="_blank">how to correctly interpret P values</a>, <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-significance-levels-alpha-and-p-values-in-statistics" target="_blank">a graphical representation of how they work</a>, <a href="http://blog.minitab.com/blog/adventures-in-statistics/five-guidelines-for-using-p-values" target="_blank">guidelines for using P values</a>, and why the <a href="http://blog.minitab.com/blog/adventures-in-statistics/banned-p-values-and-confidence-intervals-a-rebuttal-part-1" target="_blank">P value ban in one journal is a mistake</a>. Along the way, I’ve received many questions about P values, but the questions from one reader stand out.</p>
<p>This reader asked, <em>why</em> is it so easy to interpret P values incorrectly? Why is the common misinterpretation <em>so</em> pervasive? And, what can be done about it? He wasn’t sure if it these were fair questions, but I think they are. Let’s answer them!</p>
The Correct Way to Interpret P Values
<p>First, to make sure we’re on the same page, here’s the correct definition of P values.</p>
<p>The P value is the probability of obtaining an effect at least as extreme as the one in your sample data, assuming the truth of the <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/hypothesis-tests/basics/null-and-alternative-hypotheses/" target="_blank">null hypothesis</a>. In other words, if the null hypothesis is true, the P value is the probability of obtaining your sample data. It answers the question, are your sample data unusual if the null hypothesis is true?</p>
<p>If you’re thinking that the P value is the probability that the null hypothesis is true, the probability that you’re making a mistake if you reject the null, or anything else along these lines, that’s the most common misunderstanding. You should click the links above to learn how to correctly interpret P values.</p>
Historical Circumstances Helped Make P Values Confusing
<p>This problem is nearly a century old and goes back to two very antagonistic camps from the early days of hypothesis testing: Fisher's measures of evidence approach (P values) and the Neyman-Pearson error rate approach (alpha). Fisher believed in inductive reasoning, which is the idea that we can use sample data to learn about a population. On the other side, the Neyman-Pearson methodology does not allow analysts to learn from individual studies. Instead, the results only apply to a long series of tests.</p>
<p>Courses and textbooks have mushed these disparate approaches together into the standard hypothesis-testing procedure that is known and taught today. This procedure <em>seems </em>like a seamless combination but it's really a muddled, Frankenstein's-monster combination of sometimes-contradictory methods that has promoted the confusion. The end result of this fusion is that P values are incorrectly entangled with the <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/hypothesis-tests/basics/type-i-and-type-ii-error/" target="_blank">Type I error rate</a>. Fisher tried to clarify this misunderstanding for decades, but to no avail.</p>
P Values Aren’t What We <em>Really </em>Want to Know
<p>The common misconception is what we'd <em>really</em> like to know. We’d <em>loooove</em> to know the probability that a hypothesis is correct, or the probability that we’re making a mistake. What we get instead is the probability of our <em>observation</em>, which just isn’t as useful.</p>
<p>It would be great if we could take evidence solely from a sample and determine the probability that the sample is wrong. Unfortunately, that's not possible—for logical reasons when you think about it. Without outside information, a sample can’t tell you whether it’s representative of the population.</p>
<p>P values are based exclusively on information contained within a sample. Consequently, P values can't answer the question that we most want answered, but there seems to be an irresistible temptation towards interpreting it that way.</p>
P Values Have a Convoluted Definition
<p>The correct definition of a P value is fairly convoluted. The definition is based on the probability of observing what you actually did observe (huh?), but in a hypothetical context (a true null hypothesis), and it includes strange wording about results that are at least as extreme as what you observed. It's hard to understand all of that without a lot of study. It's just not intuitive.</p>
<p>Unfortunately, there is no simple <em>and</em> accurate definition that can help counteract the pressures to believe in the common misinterpretation. In fact, the incorrect definition <em>sounds</em> so much simpler than the correct definition. Shoot, <a href="http://fivethirtyeight.com/features/not-even-scientists-can-easily-explain-p-values/" target="_blank">not even scientists can explain P values</a>! And, so the misconceptions live on.</p>
What Can Be Done?
<p>Historical circumstances have conspired to confuse the issue. We have a natural tendency to want P values to mean something else. And, there is no simple yet correct definition for P values that can counteract the common misunderstandings. No wonder this has been a problem for a long time!</p>
<p>Fisher tried in vain to correct this misinterpretation but didn't have much luck. As for myself, I hope to point out that what may seem like a semantic difference between the correct and incorrect definitions actually equates to a huge difference.</p>
<p>Using the incorrect definition is likely to come back to bite you! If you think a P value of 0.05 equates to a 5% chance of a mistake, boy, are you in for a big surprise—because it’s often around 26%! Instead, based on middle-of-the-road assumptions, you’ll need a P value around 0.0027 to achieve an error rate of about 5%. However, <a href="http://blog.minitab.com/blog/adventures-in-statistics/not-all-p-values-are-created-equal" target="_blank">not all P values are created equal</a> in terms of the error rate.</p>
<p>I also think that P values are easier for most people to understand graphically than through the tricky definition and the math. So, I wrote a series of blog posts that graphically show <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-why-we-need-to-use-hypothesis-tests-in-statistics" target="_blank">why we need hypothesis testing and how it works</a>.</p>
<p>I have no reason to expect that I'll have any more impact than Fisher did himself, but it's an attempt!</p>
Hypothesis TestingLearningStatisticsThu, 10 Dec 2015 13:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/why-are-p-value-misunderstandings-so-commonJim FrostWhy You Should Use Non-parametric Tests when Analyzing Data with Outliers
http://blog.minitab.com/blog/applying-statistics-in-quality-projects/why-you-should-use-non-parametric-tests-when-analyzing-data-with-outliers
<p>There are many reasons why a distribution might not be normal/Gaussian. A non-normal pattern might be caused by several distributions being mixed together, or by a drift in time, or by one or several outliers, or by an asymmetrical behavior, some out-of-control points, etc.</p>
<p>I recently collected the scores of three different teams (the Blue team, the Yellow team and the Pink team) after a laser tag game session one Saturday afternoon. The three teams represented three different groups of friends wishing to spend their afternoon tagging players from competing teams. Gengiz Khan turned out to be the best player, followed by Tarantula and Desert Fox.</p>
One-Way ANOVA
<p>In this post, I will focus on team performances, not on single individuals. I decided to compare the average scores of each team. The best tool I could possibly think of was a one-way ANOVA using the Minitab <a href="http://www.minitab.com/products/minitab/assistant/">Assistant</a> (with a continuous Y response and three sample means to compare).</p>
<p>To assess statistical significance, the differences <em>between </em>team averages are compared to the <em>within </em>(team) variability. A large between-team variability compared to a small within-team variability (the error term) means that the differences between teams are statistically significant.</p>
<p>In this comparison (see the output from the Assistant below), the <a href="http://blog.minitab.com/blog/understanding-statistics/what-can-you-say-when-your-p-value-is-greater-than-005">P value was 0.053, just above the 0.05</a> standard usual threshold. The P value is the probability that the differences in observed means are only due to random causes. A p-value above 0.05, therefore, indicates that the probability that such differences are only due to random causes is not negligible. Because of that, the differences are not considered to be statistically significant (there is "not enough evidence that there are significant differences," according to the comments in Minitab Assistant). But the result remains somewhat ambiguous since the p-value is still very close to the significance limit (0.05).</p>
<p>Note that the variability within the Blue team seems to be much larger (see the confidence interval plot in the means comparison chart below) than for the other two groups. This not a cause for concern in this case, since the Minitab Assistant uses the <a href="http://blog.minitab.com/blog/adventures-in-statistics/did-welchs-anova-make-fishers-classic-one-way-anova-obsolete">Welch method of ANOVA</a>, which does not require or assume variances within groups to be equal.</p>
<p><img height="468" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/8457900262f468b76f8d2a4f28027c2d/8457900262f468b76f8d2a4f28027c2d.png" width="624" /></p>
Outliers and Normality
<p>When looking at the distribution of individual data (below) one point seems to be an outlier or at least a suspect, extreme value (marked in red). This is Gengiz Khan, the best player. In my worksheet, the scores have been entered from the best to the worst (not in time order). This is why we can see a downward trend in the chart on the right site of the diagnostic report (see below).</p>
<p><img height="468" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/7a7f0889d207c5dab409c4f32fb33d85/7a7f0889d207c5dab409c4f32fb33d85.png" width="624" /></p>
<p>The Report Card (see below) from the Minitab Assistant shows that Normality might be an issue (the yellow triangle is a warning sign) because the sample sizes are quite small. We need to check normality within each team. The second warning sign is due to the unusual / extreme data (score in row 1) which may bias our analysis.</p>
<p><img height="500" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/c1b6e895a14806b48ce1389d3b1d283b/c1b6e895a14806b48ce1389d3b1d283b.png" width="666" /></p>
<p><span style="line-height: 20.8px;">Following the suggestion from the warning signal in the Minitab Assistant Report Card, </span>I decided to run a normality test. I performed a separate normality test for each team in order not to mix different distributions together.</p>
<p>A low P value in the normal probability plot (see below) signals a significant departure from normality. This p-value is below 0.05 for the Blue team. The points located along the normal probability plot line represent “normal,” common, random variations. The points at the upper or lower extreme, which are distant from the line, represent unusual values or outliers. The non-normal behavior in the probability plot of the blue team is clearly due to the outlier on the right side of the normal probability plot line.</p>
<p><img height="384" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/0a2a24b82115014a0e981e63b0628f5b/0a2a24b82115014a0e981e63b0628f5b.png" width="576" /></p>
<p>Should we remove this value (Gengiz Khan’s score) in the Blue group and rerun the analysis without him ?</p>
<p>Even though Gengiz Khan is more experienced and talented than the other team members, there are no particular reasons why he should be removed—he is certainly part of the Blue team. There are probably many other talented laser game players around. If another additional laser game session takes place in the future, there will probably still be a large difference between Gengiz Khan and the rest of his team.</p>
<p>The problem is that this extreme value tends to inflate the within-group variability. Because there is a much larger within-team variability for the blue team, differences <em>between </em>groups when they are compared to the residual / within variability do not appear to be significant, causing the p-value to move just above the significance threshold.</p>
A Non-parametric Solution
<p>One possible solution is to use a non-parametric approach. Non-parametric techniques are based on ranks, or medians. Ranks represent the relative position of an individual in comparison to others, but are not affected by extreme values (whereas a mean is sensitive to outlier values). Ranks and medians are more “robust” to outliers.</p>
<p>I used the Kruskal-Wallis test (see the correspondence table between parametric and non-parametric tests below). The p-value (see the output below) is now significant (less than 0.05), and the conclusion is completely different. We can consider that the differences are significant .</p>
<p style="margin-left: 40px;"><strong>Kruskal-Wallis Test: Score versus Team </strong></p>
<p style="margin-left: 40px;">Kruskal-Wallis Test on Score</p>
<p style="margin-left: 40px;">Team N Median Ave Rank Z</p>
<p style="margin-left: 40px;">Blue 9 2550,0 23,7 2,72</p>
<p style="margin-left: 40px;">Pink 13 -450,0 11,6 -2,44</p>
<p style="margin-left: 40px;">Yellow 10 975,0 16,4 -0,06</p>
<p style="margin-left: 40px;">Overall 32 16,5</p>
<p style="margin-left: 40px;">H = 8,86 DF = 2 <strong>P = 0,012</strong></p>
<p style="margin-left: 40px;">H = 8,87 DF = 2 <strong>P = 0,012</strong> (adjusted for ties)</p>
<p style="margin-left: 40px;"><img height="384" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/ed3231b9ceab5d16a6a5d5bb0ce43973/ed3231b9ceab5d16a6a5d5bb0ce43973.png" width="576" /></p>
<p>See below the correspondence table for parametric and non-parametric tests :</p>
<p><img alt="correspondence table" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/4a69043809861f5187be271de67f8161/parametric_correspondence_table.png" style="width: 661px; height: 488px;" /></p>
Conclusion
<p>Outliers do happen and removing them is not always straightforward. One nice thing about non-parametric tests is that they are more robust to such outliers. However, this does not mean that non-parametric tests should be used in any circumstance. When there are no outliers and the distribution is normal, standard parametric tests (T tests or ANOVA) are more powerful. </p>
Data AnalysisHypothesis TestingLearningStatisticsStatsMon, 07 Dec 2015 13:02:00 +0000http://blog.minitab.com/blog/applying-statistics-in-quality-projects/why-you-should-use-non-parametric-tests-when-analyzing-data-with-outliersBruno Scibilia