Hypothesis Testing | MinitabBlog posts and articles about hypothesis testing, especially in the course of Lean Six Sigma quality improvement projects.
http://blog.minitab.com/blog/hypothesis-testing-2/rss
Sun, 30 Aug 2015 22:27:17 +0000FeedCreator 1.7.3Chi-Square Analysis: Powerful, Versatile, Statistically Objective
http://blog.minitab.com/blog/michelle-paret/chi-square-analysis-powerful-versatile-statistically-objective
<p style="line-height: 20.7999992370605px;">To make objective decisions about the processes that are critical to your organization, you often need to examine categorical data. You may know how to use a t-test or ANOVA when you’re comparing measurement data (like weight, length, <span style="line-height: 1.6;">revenue, </span><span style="line-height: 1.6;">and so on), but do you know how to compare attribute or counts data? It easy to do with <a href="http://www.minitab.com/products/minitab">statistical software</a> like Minitab. </span></p>
<p style="line-height: 20.7999992370605px;"><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/60bfd1eb8d2c2c3689bce89ea55453ab/chisquare_onevariable_w1024.jpeg" style="line-height: 20.7999992370605px; width: 350px; height: 230px; float: right; margin: 10px 15px;" /></p>
<p style="line-height: 20.7999992370605px;">One person may look at this bar chart and decide that each production line had the same <span style="line-height: 1.6;">proportion of defects. But another person may focus on the small difference between the bars and decide that one of the lines has outperformed the others. Without an appropriate statistical analysis, how can you know which person is right?</span></p>
<p style="line-height: 20.7999992370605px;">When time, money, and quality depend on your answers, you can’t rely on subjective visual assessments alone. To answer questions like these with statistical objectivity, you can use a Chi-Square analysis.</p>
Which Analysis Is Right for Me?
<p style="line-height: 20.7999992370605px;">Minitab offers three Chi-Square tests. The appropriate analysis depends on the number of variables that you want to examine. And for all three options, the data can be formatted either as raw data or summarized counts.</p>
<strong>Chi-Square Goodness-of-Fit Test – 1 Variable</strong>
<p style="line-height: 20.7999992370605px;">Use Minitab’s <strong>Stat > Tables > Chi-Square Goodness-of-Fit Test (One Variable)</strong> when you have just one variable.</p>
<p style="line-height: 20.7999992370605px;">The Chi-Square Goodness-of-Fit Test can test if the proportions for all groups are equal. It can also be used to test if the proportions for groups are equal to specific values. For example:</p>
<ul style="line-height: 20.7999992370605px;">
<li>A bottle cap manufacturer operates three production lines and records the number of defective caps for each line. The manufacturer uses the <strong>Chi-Square Goodness-of-Fit Test</strong> to determine if the proportion of defects is equal across all three lines.</li>
<li>A bottle cap manufacturer operates three production lines and records the number of defective caps for each line. One line runs at high speed and produces twice as many caps as the other two lines that run at a slower speed. The manufacturer uses the <strong>Chi-Square Goodness-of-Fit Test</strong> to determine if the defects for each line is proportional to the volume of caps it produces.</li>
</ul>
<strong>Chi-Square Test for Association – 2 Variables</strong>
<p style="line-height: 20.7999992370605px;">Use Minitab’s <strong>Stat > Tables > Chi-Square Test for Association</strong> when you have two variables.</p>
<p style="line-height: 20.7999992370605px;">The Chi-Square Test for Association can tell you if there’s an association between two variables. In another words, it can test if two variables are independent or not. For example:</p>
<ul style="line-height: 20.7999992370605px;">
<li>A paint manufacturer operates two production lines across three shifts and records the number of defective units per line per shift. The manufacturer uses the <strong>Chi-Square Goodness-of-Fit Test</strong> to determine if the defect rates are similar across all shifts and production lines. Or, are certain lines during certain shifts more prone to defects?</li>
<li>A credit card billing center records the type of billing error that is made, as well as the type of form that is used. The billing center uses a Chi-Square Test to determine whether certain types of errors are related to certain forms.</li>
</ul>
<p style="line-height: 20.7999992370605px;"><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/7af9e9b2ee624e7d912393d7debe7f1b/chisquare_twovariables_w1024.jpeg" style="width: 500px; height: 329px;" /></p>
<strong>Cross Tabulation and Chi-Square – 2 or more variables</strong>
<p style="line-height: 20.7999992370605px;">Use Minitab’s <strong>Stat > Tables > Cross Tabulation and Chi-Square </strong>when you have two or more variables.</p>
<p style="line-height: 20.7999992370605px;">If you simply want to test for associations between two variables, you can use either <strong>Cross Tabulation and Chi-Square</strong> or <strong>Chi-Square Test for Association</strong>. However, <span><a href="http://blog.minitab.com/blog/understanding-statistics/using-cross-tabulation-and-chi-square-the-survey-says">Cross Tabulation and Chi-Square</a></span> also lets you control for the effect of additional variables. Here’s an example:</p>
<ul style="line-height: 20.7999992370605px;">
<li>A dairy processing plant records information about each defective milk carton that it produces. The plant uses a Cross Tabulation and Chi-Square analysis to look for dependencies between the defect types and the machine that produces the carton, while controlling for any shift effect. Perhaps a particular filling machine is prone to a certain type of defect, but only during the first shift.</li>
</ul>
<p style="line-height: 20.7999992370605px;">This analysis also offers advanced options. For example, if your categories are ordinal (good, better, best or small, medium, large) you can include a special test for concordance.</p>
Conducting a Chi-Square Analysis in Minitab
<p style="line-height: 20.7999992370605px;">Each of these analyses is easy to run in Minitab. For more examples that include step-by-step instructions, just navigate to the Chi-Square menu of your choice and then click Help > example.</p>
<p style="line-height: 20.7999992370605px;">It can be tempting to make subjective assessments about a given set of data, their makeup, and possible interdependencies, but why risk an error in judgment when you can be sure with a Chi-Square test?</p>
<p style="line-height: 20.7999992370605px;">Whether you’re interested in one variable, two variables, or more, a Chi-Square analysis can help you make a clear, statistically sound assessment.</p>
Data AnalysisHypothesis TestingLean Six SigmaQuality ImprovementSix SigmaStatisticsStatistics HelpThu, 27 Aug 2015 12:33:39 +0000http://blog.minitab.com/blog/michelle-paret/chi-square-analysis-powerful-versatile-statistically-objectiveMichelle ParetThe Null Hypothesis: Always “Busy Doing Nothing”
http://blog.minitab.com/blog/using-data-and-statistics/the-null-hypothesis-always-busy-doing-nothing
<p>The 1949 film <a href="http://www.imdb.com/title/tt0041259/" target="_blank"><em>A Connecticut Yankee in King Arthur's Court</em></a> includes the song “Busy Doing Nothing,” and this could be written about the <a href="http://blog.minitab.com/blog/understanding-statistics/things-statisticians-say-failure-to-reject-the-null-hypothesis">Null Hypothesis</a> as it is used in statistical analyses. </p>
<p></p>
<p>The words to the song go:</p>
<p style="margin-left: 40px;"><em>We're busy doin' nothin'<br />
<span style="line-height: 1.6;">Workin' the whole day through<br />
Tryin' to find lots of things not to do </span></em></p>
<p><span style="line-height: 1.6;">And that summarises the role of the Null Hypothesis perfectly. Let me explain why.</span></p>
<span style="line-height: 1.6;">What's the Question?</span>
<p>Before doing any statistical analysis—in fact even before we collect any data—we need to define what problem and/or question we need to answer. Once we have this, we can then work on defining our Null and Alternative Hypotheses.</p>
<p>The null hypothesis is always the option that maintains the status quo and results in the least amount of disruption, hence it is “Busy Doin’ Nothin'”. </p>
<p>When the probability of the Null Hypothesis is very low and we reject the Null Hypothesis, then we will have to take some action and we will no longer be “Doin Nothin'”.</p>
<p>Let’s have a look at how this works in practice with some common examples.</p>
<p><strong>Question</strong></p>
<p><strong>Null Hypothesis</strong></p>
Do the chocolate bars I am selling weigh 100g?
Chocolate Weight = 100g<br />
<br />
If I am giving my customers the right size chocolate bars I don’t need to make changes to my chocolate packing process.<br />
Are the diameters of my bolts normally distributed?
<p>Bolt diameters are n<span style="line-height: 1.6;">ormally distributed.</span></p>
<p>If my bolt diameters are normally distributed I can use any statistical techniques that use the standard normal approach.<br />
</p>
Does the weather affect how my strawberries grow?
Number of hours sunshine has no effect on strawberry yield<br />
<br />
Amount of rain has no effect on strawberry yield<br />
<br />
Temperature has no effect on strawberry yield<br />
<p>Note that the last instance in the table, investigating if weather affects the growth of my strawberries, is a bit more complicated. That's because I needed to define some metrics to measure the weather. Once I decided that the weather was a combination of sunshine, rain and temperature, I established my null hypotheses. These all assume that none of these factors impact the strawberry yield. I only need to control the sunshine, temperature and rain if the probability that they have no effect is very small.</p>
Is Your Null Hypothesis Suitably Inactive?
<p><span style="line-height: 1.6;">So in conclusion, in order to be “Busy Doin’ Nothin’”, your Null Hypothesis has to be as follows:</span></p>
<ul>
<li>A logical question.</li>
<li>Focused on one objective.</li>
<li>Requires action only if <a href="http://blog.minitab.com/blog/michelle-paret/alphas-p-values-confidence-intervals-oh-my">its probability of being true</a> is low (typically 5%).</li>
</ul>
Hypothesis TestingStatisticsWed, 12 Aug 2015 12:00:00 +0000http://blog.minitab.com/blog/using-data-and-statistics/the-null-hypothesis-always-busy-doing-nothingGillian GroomLessons from a Statistical Analysis Gone Wrong, part 1
http://blog.minitab.com/blog/understanding-statistics/lessons-from-a-statistical-analysis-gone-wrong-part-3-v2
<p style="line-height: 18.9090900421143px;">I don't like the taste of crow. That's a shame, because I'm about to eat a huge helping of it. </p>
<p style="line-height: 18.9090900421143px;">I'm going to tell you how I messed up an analysis. But in the process, I learned some new lessons and was reminded of some older ones I should remember to apply more carefully. </p>
This Failure Starts in a Victory
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/3e3a70cd6b6094eda21615f6eee14c0f/pharoah.jpg" style="line-height: 18.9090900421143px; border-width: 1px; border-style: solid; margin: 10px 15px; float: right; width: 280px; height: 296px;" /></p>
<p style="line-height: 18.9090900421143px;"><span style="line-height: 18.9090900421143px;">My mistake originated in the 2015 Triple Crown victory of American Pharoah. I'm no racing enthusiast, but I knew this horse had ended almost four decades of Triple Crown disappointments, and that was exciting. </span><span style="line-height: 18.9090900421143px;">I'd never seen a </span><a href="http://blog.minitab.com/blog/the-statistics-game/triple-crown-odds-ill-have-another" style="line-height: 18.9090900421143px;">Triple Crown</a><span style="line-height: 18.9090900421143px;"> won before. It hadn't happened since 1978. </span></p>
<p style="line-height: 18.9090900421143px;">So when an acquaintance asked to contribute a guest post to the Minitab Blog that compared American Pharoah with previous Triple Crown contenders, including the record-shattering Secretariat, who took the Triple Crown in 1973, I eagerly accepted. </p>
<p style="line-height: 18.9090900421143px;">In reviewing the post, I checked and replicated the contributor's analysis. It was a fun post, and I was excited about publishing it. But a few days after it went live, I had to remove it: the analysis was not acceptable. </p>
<p style="line-height: 18.9090900421143px;">To explain how I made my mistake, I'll need to review that analysis. </p>
Comparing American Pharoah and Secretariat
<p style="line-height: 18.9090900421143px;"><span style="line-height: 18.9090900421143px;">In the post, we used Minitab's </span><a href="http://www.minitab.com/products/minitab/" style="line-height: 18.9090900421143px;">statistical software</a><span style="line-height: 18.9090900421143px;"> to compare Secretariat's performance to other winners of Triple Crown races. </span></p>
<p style="line-height: 18.9090900421143px;">Since 1926, the Belmont Stakes has been the longest of the three races at 1.5 miles. The analysis began by charting 89 years of winning horse times<span style="line-height: 1.6;">:</span><span style="line-height: 18.9090900421143px;"> </span></p>
<p style="line-height: 18.9090900421143px;"><img src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ad64da996c235ee5ff8cb4c3cef66292/belmont1.png" style="width: 500px; height: 334px;" /></p>
<p style="line-height: 18.9090900421143px;"><span style="line-height: 1.6;">Only two data points were outside of the I-chart's control limits:</span></p>
<ul style="line-height: 18.9090900421143px;">
<li>The fastest winner, Secretariat's 1973 time of 144 seconds</li>
<li>The slowest winner, High Echelon's 1970 time of 154 seconds</li>
</ul>
<p style="line-height: 18.9090900421143px;">The average winning time was 148.81 seconds, which Secretariat beat by more than 4 seconds. </p>
Applying a Capability Approach to the Race Data
<p style="line-height: 18.9090900421143px;">Next, the analysis approached the data from a capability perspective: Secretariat's time was used as a lower spec limit, and the analysis sought to assess the probability of another horse beating that time. </p>
<p style="line-height: 18.9090900421143px;">The way you assess capability depends on the distribution of your data, and a normality test in Minitab showed this data to be nonnormal<span style="line-height: 18.9090900421143px;">. </span></p>
<p style="line-height: 18.9090900421143px;"><img src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/89d338659c8cace002fe777633a238cf/belmont2.png" style="width: 500px; height: 334px;" /></p>
<p style="line-height: 18.9090900421143px;"><span style="line-height: 18.9090900421143px;">When you run Minitab's normal capability analysis, you can elect to apply the Johnson transformation, which can automatically transform many nonnormal distributions before the capability analysis is performed. This is an extremely convenient feature, but here's where I made my mistake. </span></p>
<p style="line-height: 18.9090900421143px;">Running the capability analysis with Johnson transformation, using Secretariat's 144-second time as a lower spec limit, produced the following output:</p>
<p style="line-height: 18.9090900421143px;"><img src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/0f3d2967b87743714821fa47e6bd999d/belmont4.png" style="width: 500px; height: 375px;" /></p>
<p style="line-height: 18.9090900421143px;">The analysis found a .36% chance of any horse beating Secretariat's time, making it very unlikely indeed. </p>
<p>The same method was applied to Kentucky Derby and Preakness data. </p>
<p style="line-height: 18.9090900421143px;"><img src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/6268cce550e1f97de81d0889da797814/belmont5.png" style="width: 500px; height: 375px;" /></p>
<p style="line-height: 18.9090900421143px;">We found a 5.54% chance of a horse beating Secretariat's Kentucky Derby time.</p>
<p style="line-height: 18.9090900421143px;"><img src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/21fda483b790f76e051ddb22359cbfe2/belmont6.png" style="width: 500px; height: 375px;" /></p>
<p style="line-height: 18.9090900421143px;">We found a 3.5% probability of a horse beating Secretariat's Preakness time.</p>
<p style="line-height: 18.9090900421143px;">Despite the b<span style="line-height: 18.9090900421143px;">illions of dollars and countless time and effort spent trying to make thoroughbred horses faster over the past 43 years,</span><span style="line-height: 18.9090900421143px;"> no one has yet beaten “Big Red,” as Secretariat was known. So the analysis indicated that American Pharoah may be a great horse, but he</span><span style="line-height: 1.6;"> is no Secretariat. </span></p>
<p style="line-height: 18.9090900421143px;"><span style="line-height: 1.6;">That conclusion may well be true...but it turns out we can't use <em>this</em> analysis to make that assertion. </span></p>
My Mistake Is Discovered, and the Analysis Unravels
<p style="line-height: 18.9090900421143px;">Here's where I start chewing those crow feathers. A day or so after sharing the post about American Pharoah, a reader sent the following comment: </p>
<p style="line-height: 18.9090900421143px; margin-left: 40px;"><em>Why does Minitab allow a Johnson Transformation on this data when using <strong>Quality Tools > Capability Analysis > Normal > Transform</strong>, but does not allow a transformation when using <strong>Quality Tools > Johnson Transformation</strong>? Or could I be doing something wrong? </em></p>
<p style="line-height: 18.9090900421143px;">Interesting question. Honestly, i<span style="line-height: 18.9090900421143px;">t hadn't even occurred to me to try to run the Johnson transformation on the data by itself. </span></p>
<p style="line-height: 18.9090900421143px;"><span style="line-height: 18.9090900421143px;">But if the Johnson Transformation worked when performed as part of the capability analysis, it ought to work when applied outside of that analysis, too. </span></p>
<p style="line-height: 18.9090900421143px;"><span style="line-height: 18.9090900421143px;">I suspected the person who asked th</span><span style="line-height: 18.9090900421143px;">is question might have just checked a wrong option in the dialog box. </span>So I tried running the Johnson Transformation on the data by itself.</p>
<p style="line-height: 18.9090900421143px;">The following <span style="line-height: 18.9090900421143px;">note appeared in Minitab's session window: </span></p>
<p style="line-height: 18.9090900421143px;"><img alt="no transformation is made" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/2892f3daa1549df56defbdd4fe9dc48a/no_transformation.gif" style="line-height: 18.9090900421143px; width: 500px; height: 55px;" /></p>
<p style="line-height: 18.9090900421143px;">Uh oh. </p>
<p style="line-height: 18.9090900421143px;">Our reader <em>hadn't</em> done anything wrong, but it was looking like I made an error somewhere. But where?</p>
<p style="line-height: 18.9090900421143px;">I'll show you exactly where I made my mistake in <a href="http://blog.minitab.com/blog/understanding-statistics/lessons-from-a-statistical-analysis-gone-wrong-part-2">my next post.</a> </p>
<p style="line-height: 18.9090900421143px;"> </p>
<p style="font-size: 9px;">Photo of American Pharoah used under Creative Commons license 2.0. Source: Maryland GovPics <a href="https://www.flickr.com/people/64018555@N03" target="_blank">https://www.flickr.com/people/64018555@N03</a> </p>
Data AnalysisFun StatisticsHypothesis TestingStatisticsStatistics in the NewsTue, 14 Jul 2015 12:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/lessons-from-a-statistical-analysis-gone-wrong-part-3-v2Eston MartzTime of Game: Are MLB Games Getting Any Shorter?
http://blog.minitab.com/blog/starting-out-with-statistical-software/time-of-game-are-mlb-games-getting-any-shorter
<p>Over the past few years, the average length of an MLB game has been steadily increasing. We can create a quick time series plot in Minitab <a href="http://www.minitab.com/products/minitab">Statistical Software</a> to display this:</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/732ead34-1005-4470-b034-d7f8b87fabcf/Image/5d6c7b2edfd1611a6daddbf93f4deb76/lengthofgame.jpg" style="width: 576px; height: 384px;" /></p>
<p><span style="line-height: 1.6;">As games have been lasting longer, there's been a feeling shared by many that this was a negative. Games seemed to drag on, with a lot of unnecessary stoppages and breaks. </span></p>
<p><span style="line-height: 1.6;"><img alt="game lasts into the night" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/354e61083ab5a1abf9eff587c412ccea/diamond.jpg" style="margin: 10px 15px; float: right; width: 300px; height: 230px;" />To combat this trend, and to try to speed up games to make them more accessible to casual fans, a few different rules have gone into effect this year to help increase the pace of games. First, the batter is now required to keep one foot in the batter's box at all times (with a few exceptions). Additionally, there is a clock that runs between innings and pitching changes to make sure that the game restarts in a timely manner.</span></p>
<p>But are these rules having an effect at all? We can look at the time of game data for games played in the first month of the season, and see if the games have been any shorter. We can use a 1-sample t-test within Minitab to determine if the average game length is less than <span><a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-significance-levels-alpha-and-p-values-in-statistics">a certain hypothesized value</a></span>; in our case, we can look and see if it's less than last year's average. </p>
<p>I have created a data set that has game time for every game played so far in 2015 (up through April 29). In Minitab, we can go to <strong>Stat > Basic Statistics > 1-Sample t...</strong> and fill out the dialog box as follows:</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/732ead34-1005-4470-b034-d7f8b87fabcf/Image/5be1455e1a6e1ab7452329871f801a00/dialog.png" /></p>
<p>We check the box to perform our hypothesis test. The hypothesized mean we're testing against is the average time of game (in minutes) from 2014, which was 187.8.</p>
<p>Now we want to click 'Options' and change our hypothesis to "less than." Why? A one-tailed test allots all of our alpha to determining significance in one specific direction. In a one-tailed test, we are testing the possibility of a relationship in one direction and ignoring the possibility of a relationship in another. Statistically, by <em>not </em>looking for an effect in one direction, we have more power to detect an effect in the other direction. In this case, we are ignoring the possibility that the mean time of games may be greater than last year's. </p>
<p>Here are our results:</p>
<p style="margin-left: 40px;"><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/732ead34-1005-4470-b034-d7f8b87fabcf/Image/783aab154c24fd0d5dcc63be5ee7e007/ttest.PNG" style="width: 531px; height: 130px;" /></p>
<p>Look at the mean, and the upper bound. The mean time of games played so far is about 177 minutes, almost a full 10 minutes shorter! The upper bound indicates that we are 95% confident that the true mean is less than 180 minutes, clocking in at under 3 hours.</p>
<p>Based on early season results, it appears that the new rules are definitely serving their intended purpose. </p>
<p>What hypotheses—sports-related or otherwise<span style="line-height: 18.9090900421143px;">—could you use a 1-sample t-test to examine?</span></p>
Data AnalysisFun StatisticsHypothesis TestingStatistics in the NewsFri, 15 May 2015 12:00:00 +0000http://blog.minitab.com/blog/starting-out-with-statistical-software/time-of-game-are-mlb-games-getting-any-shorterEric HeckmanBanned: P Values and Confidence Intervals! A Rebuttal, Part 2
http://blog.minitab.com/blog/adventures-in-statistics/banned-p-values-and-confidence-intervals-a-rebuttal-part-2
<p>In <a href="http://blog.minitab.com/blog/adventures-in-statistics/banned-p-values-and-confidence-intervals-a-rebuttal-part-1">my previous post</a>, I wrote about the hypothesis testing ban in the <em>Journal of Basic and Applied Social Psychology.</em> I showed how P values and confidence intervals provide important information that descriptive statistics alone don’t provide. In this post, I'll cover the editors’ concerns about hypothesis testing and how to avoid the problems they describe.</p>
<p>The editors describe hypothesis testing as "invalid" and the significance level of 0.05 as a “crutch” for weak data. They claim that it is a bar that is “too easy to pass and sometimes serves as an excuse for lower quality research.” They also bemoan the fact that sometimes the initial study obtains a significant P value but follow-up replication studies can fail to obtain significant results.</p>
<p>Ouch, right?</p>
<p>Their arguments against hypothesis testing focus on the following:</p>
<ol>
<li>You can’t determine the probability that either the <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/hypothesis-tests/basics/null-and-alternative-hypotheses/" target="_blank">null hypothesis or the alternative hypothesis</a> is true.</li>
<li>Studies that attempt to replicate previous significant findings do not always obtain significant results.</li>
</ol>
<p>These issues are nothing new and aren't show stoppers for hypothesis testing. In fact, I believe using them to ban null hypothesis testing represents a basic misunderstanding of both how to correctly use hypothesis test results and how the scientific process works.</p>
P Values Are Frequently Misinterpreted and This Leads to Problems
<p>P values are not "invalid" but they do answer a different question than what many readers realize. There is a common misconception that the P value represents the probability that the null hypothesis is true. Under this mistaken understanding, a P value of 0.04 would indicate there is a 4% probability of a false positive when you reject the null hypothesis. This is <strong>WRONG</strong>!</p>
<p>The question that a P value <em>actually</em> answers is: <em>If </em>the null hypothesis is true, are my data unusual?</p>
<p>The correct interpretation for a P value of 0.04 is that <em>if the null hypothesis is true</em>, you would obtain the observed effect or more in 4% of the studies due to random sampling error. In other words, the observed sample results are unlikely if there truly is no effect in the population.</p>
<p>The actual false positive rate associated with a P value of 0.04 depends on a variety of factors but it is typically at least 23%. Unfortunately, the common misconception creates the illusion of substantially more evidence against the null hypothesis than is justified. You actually need a P value around 0.0027 to achieve an error rate of around 4.5%, which is close to the rate that many mistakenly attribute to a P value of 0.05.</p>
<p>The higher-than-expected false positive rate is the basis behind the editors’ criticisms that P values near 0.05 are a “crutch” and “too easy to pass.” However, this is due to misinterpretation rather than a problem with P values. The answer isn’t to ban P values, but to learn how to correctly interpret and use the results.</p>
Failure to Replicate
<p>The common illusion described above ties into the second issue of studies that fail to replicate significant findings. If the false positive rate is higher than expected, it makes sense that the number of followup studies that can’t replicate the previously significant results will also be higher than expected.</p>
<p>Another related common misunderstanding is that once you obtain a significant P value, you have a proven effect. Trafimow claims in an earlier editorial that once a significant effect is published, "it becomes sacred." This claim misrepresents the scientific method because there is no magic significance level that distinguishes between the studies that have a true effect and those that don’t with 100% accuracy.</p>
<p>A P value near 0.05 simply indicates that the result is worth another look, but it’s nothing you can hang your hat on by itself. Instead, it’s all about repeated testing to lower the error rate to an acceptable level.</p>
<p>You <em>always</em> need repeated testing to prove the truth of an effect!</p>
How to Use Hypothesis Tests Correctly
<div style="float: right; width: 250px; margin: 25px 25px;">
<p><img alt="water filter" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/f9f46589d93374e78dc6c9dcbfdcade5/soma_carafe_w1024.jpeg" style="float: right; width: 250px; height: 177px;" /> <em>Keep filtering until the results are clean and clear!</em></p>
</div>
<p>How does replication work with hypothesis tests and the false positive rate? Simulation studies show that the lower the P value, the greater the reduction in the probability that the null hypothesis is true from the beginning of the experiment to the end.</p>
<p>With this in mind, think of hypothesis tests as a filter that allows you to progressively lower the probability that the null hypothesis is true each time you obtain significant results. With repeated testing, we can filter out the false positives, as I illustrate below.</p>
<p>We generally don’t know the probability that a null hypothesis is true, but I’ll run through a hypothetical scenario based on the simulation studies. Let’s assume that initially there is a 50% chance that the null hypothesis is true. You perform the first experiment and obtain significant results. Let’s say this reduces the probability that the null is true down to 25%. Another study tests the same hypothesis, obtains significant results, and lowers the probability of a true null hypothesis even further to 10%.</p>
<p>Wash, rinse, and repeat! Eventually the probability that the null is true becomes a tiny value. This shows why significant results need to be replicated in order to become trustworthy findings.</p>
<p>The actual rate of reduction can be faster or slower than the example above. It depends on various factors including the initial probability of a true null hypothesis and the exact P value of each experiment. I used conservative P values near 0.05.</p>
<p>Of course, there’s always the possibility that the initial significant finding won’t be replicated. <em>This is a normal part of the scientific process and not a problem. </em>You won’t know for sure until a subsequent study tries to replicate a significant result!</p>
<p>Reality is complex and we’re trying to model it with samples. Conclusively proving a hypothesis with a single study is unlikely. So, don’t expect it!</p>
<p style="margin-left: 40px;">"A scientific fact should be regarded as experimentally established only if a properly designed experiment rarely fails to give this level of significance." <br />
<em><span style="line-height: 1.6;">—Sir Ronald A. Fisher, original developer of P values.</span></em></p>
Don't Blame the P Values for Poor Quality Studies
<p>You can’t look at a P value to determine the quality level of a study. The overall quality depends on <em>many </em>factors that occur well before the P value is calculated. A P value is just the end result of a long process.</p>
<p>The factors that affect the quality of a study include the following: theoretical considerations, experimental design, variables measured, sampling technique, sample size, measurement precision and accuracy, data cleaning, and the modeling method.</p>
<p>Any of these factors can doom a study before a P value is even calculated!</p>
<p>The blame that the editors place on P values for low quality research appearing in their journal is misdirected. This is a peer-reviewed journal and it’s the reviewers’ job to assess the quality of each study and publish only those with merit.</p>
Four Key Points!
<ol>
<li>Hypothesis test results such as P values and confidence intervals provide important information in addition to descriptive statistics.</li>
<li>But you need to interpret them correctly.</li>
<li>Significant results must be replicated to be trustworthy.</li>
<li>To evaluate the quality of a study, you must assess the entire process rather than the P value.</li>
</ol>
How to Avoid Common Problems with Hypothesis Test Results
<p>Hypothesis tests and statistical output such as P values and confidence intervals are powerful tools. Like any tool, you need to use them correctly to obtain good results. Don't ban the tools. Instead, change the bad practices that surround them. <span style="line-height: 1.6;">Please follow these links for more details and references.</span></p>
<p><a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-correctly-interpret-p-values" target="_blank">How to Correctly Interpret P Values</a>: Just as the title says, this post helps you to correctly interpret P values and avoid the mistakes associated with the incorrect interpretations.</p>
<p><a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-why-we-need-to-use-hypothesis-tests-in-statistics" target="_blank">Understanding Hypothesis Tests</a>: The graphical approach in this series of three posts provides a more intuitive understanding of how hypothesis testing works and what statistical significance truly means.</p>
<p><a href="http://blog.minitab.com/blog/adventures-in-statistics/not-all-p-values-are-created-equal" target="_blank">Not all P Values are Created Equal</a>: If you want to understand better the false positive rate associated with different P values and the factors that effect it, this post is for you! This post also shows you how lower P values reduce the probability of a true null hypothesis.</p>
<p><a href="http://blog.minitab.com/blog/adventures-in-statistics/five-guidelines-for-using-p-values" target="_blank">Five Guidelines for Using P Values</a>: The journal editors raise issues about how P values can be abused. These are real issues when P values are used incorrectly. However, there’s no need to banish them! This post provides simple guidelines for how to navigate these issues and avoid common problems.</p>
<p><em>The photo of the water filter is by the Wikimedia user TheMadBullDog and used under this <a href="http://creativecommons.org/licenses/by-sa/3.0/deed.en" target="_blank">Creative Commons license</a>.</em></p>
Data AnalysisHypothesis TestingLearningStatisticsStatistics HelpStatsThu, 14 May 2015 12:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/banned-p-values-and-confidence-intervals-a-rebuttal-part-2Jim FrostImproving Recycling Processes at Rose-Hulman, Part III
http://blog.minitab.com/blog/real-world-quality-improvement/improving-recycling-processes-at-rose-hulman-part-iii
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/fa7a4559e547be217d5fa38f61c978c1/landfill.jpg" style="float: right; width: 350px; height: 253px; margin: 10px 15px;" />In previous posts, I discussed the results of a recycling project done by Six Sigma students at Rose-Hulman Institute of Technology last spring. (If you’re playing catch up, you can read <a href="http://blog.minitab.com/blog/real-world-quality-improvement/a-little-trash-talk3a-improving-recycling-processes-at-rose-hulman" target="_blank">Part I</a> and <a href="http://blog.minitab.com/blog/real-world-quality-improvement/a-little-trash-talk%3A-improving-recycling-processes-at-rose-hulman%2C-part-ii" target="_blank">Part II</a>.)</p>
<p>The students did an awesome job reducing the amount of recycling that was thrown into the normal trash cans across all of the institution’s academic buildings. At the end of the spring quarter (2014), 24% of trash cans (by weight) included recyclable items. At the beginning of that spring quarter, 36% of trash cans were recyclable items, so you can see that they were very successful in reducing this percentage!</p>
<p>The fall quarter (2015) brought a new set of Six Sigma students to Rose-Hulman who were just as dedicated to reducing the amount of recycling thrown into normal trash cans, and I want to cover their success in this post, as well as some of the neat statistical methods they used when completing their project.</p>
Fall 2015 goals
<p>This time around, the students wanted to at least maintain or improve on the percentage spring quarter (2014) students were able to achieve. They set out with a specific goal to reduce the amount of recycling in the trash to 20% by weight.</p>
<p>In order to further reduce the recyclables in the academic buildings in fall 2015, the standard “Define, Measure, Analyze, Improve, Control” (DMAIC) methodology of Six Sigma was once again implemented. The main project goal focused on standardizing the recycling process within the buildings, and their plan to reduce the amount of recyclables focused on optimizing the operating procedure for collecting recyclables in all academic building areas (excluding classrooms) where trash and recycling are collected.</p>
<p>Many of the same DMAIC tools that were used by spring 2014 students were also used here, including—<a href="http://support.minitab.com/quality-companion/3/help-and-how-to/run-projects/brainstorming/ct-tree/" target="_blank">Critical to Quality Diagrams</a>, <a href="http://support.minitab.com/quality-companion/3/help-and-how-to/run-projects/maps/process-map/" target="_blank">Process Maps</a>, <a href="http://blog.minitab.com/blog/real-world-quality-improvement/spicy-statistics-and-attribute-agreement-analysis" target="_blank">Attribute Agreement Analysis</a>, <a href="http://blog.minitab.com/blog/marilyn-wheatleys-blog/evaluating-a-gage-study-with-one-part-v2" target="_blank">Gage R&R</a>, Statistical Plots, <a href="http://blog.minitab.com/blog/adventures-in-software-development/risk-based-testing-at-minitab-using-quality-companions-fmea" target="_blank">FMEA</a>, <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-tutorial-and-examples" target="_blank">Regression</a>—among many others.</p>
Making and measuring improvements
<p>The spring 2014 initiative added recycling bins to every classroom, which created a measurable improvement. The fall 2015 effort focused on improvement through <em>standardization of operation</em>. For example, many areas in the academic buildings suffer from random placement and arrangement of trash cans and recycling bins. The students thought standardization of bin areas (one trash, one plastic/aluminum recycling, and one paper recycling) would lessen the confusion of recycling, and clear signage and stickers on identically shaped trash cans and recycling bins would be better visual cues of where to place waste of both kinds.</p>
<p>For fall 2015, there were seven teams, and they were assigned different academic building floors (not including classrooms) and common areas. Unlike the spring 2014 data collection, the teams did not combine the trash from their assigned areas. They treated each recycling station as a unique data point.</p>
<p>After implementing the improvements to standardize the bins, the teams collected data for four days across twenty-nine total stations. Thus, there were a total of 116 fall 2015 improvement percentages. The fall 2015 students used the post-improvement percentage of recyclables in the trash from spring 2014 (24%) as their baseline for determining improvement in fall 2015.</p>
<p>The descriptive statistics for the percentage of recyclables (by weight) in the trash were as follows:</p>
<p><img src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/5c77690aaaff21d0b33eb5083f82074e/descriptive_stats.jpg" style="border-width: 0px; border-style: solid; width: 550px; height: 67px;" /></p>
<p>Below, the students put together a histogram and a boxplot of the data using <a href="http://www.minitab.com/products/minitab/features/" target="_blank">Minitab Statistical Software</a>. Over half of the stations (61 out of 116) had less than 5% of recyclables in the trash. Forty-six of the 116 recycling stations had no recyclables. The value of the third quartile (16.6%), meant that 75% of the stations had less than 16.6% recyclables. The descriptive statistics above showed that the sample mean was much larger than the sample median. The graphs confirmed that this must be the case because of the strong positively skewed shape of the data.</p>
<p><img src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/4e730181a9288e531ff9caf69a347dd0/histogram.jpg" style="border-width: 0px; border-style: solid; width: 624px; height: 206px;" /></p>
<p>Even though the 116 data points didn’t follow a normal distribution and there was a large mound of 0’s as part of the distribution from collection spots that had no recyclables, the students trusted that the <a href="http://blog.minitab.com/blog/understanding-statistics/how-the-central-limit-theorem-works" target="_blank">Central Limit Theorem</a> with a sample size of 116 would generate a sampling distribution of the means that was normally distributed. Because of the large sample size and unknown standard deviation, they used a <em>t</em> distribution to create a 95% confidence interval for the true mean percentage of recyclables in the trash for fall 2015.</p>
<p>Also using Minitab, they constructed the 95% confidence interval:</p>
<p><img src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/2ccf17f68f0055c32282c2020f2c9108/one_sample_t.jpg" style="border-width: 0px; border-style: solid; width: 423px; height: 48px;" /></p>
<p>The 95% confidence interval meant that the students were 95% certain that the interval [9.94, 18.22] contains the true mean percentage of recyclables in the trash for fall 2015. At an alpha level equal to 0.025, they were able to reject the null hypothesis, where H0: μ = 24% versus Ha: μ < 24%, because 24% was not contained in the two-sided 95% confidence interval. (Remember that 24% was the mean percentage of recyclables in trash after the spring 2014 improvement phase.) The null hypothesis for H0: μ = 20% versus Ha: μ < 20%, was rejected. This meant that they had met their goal to reduce the percentage of recyclables in the trash to below 20% for this project!</p>
Continuing to analyze the data
<p>The students also subgrouped their data by collection day. Each day consisted of data from 29 recycling stations. The comparative boxplots and individual value plots below show the percentage of recyclables in the trash across the four collection dates. (The horizontal dotted line in the boxplot is the mean from spring 2014’s post-improvement data.)</p>
<p><img src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/664e8bf0f443d278376e71a70817e727/ivp.jpg" style="border-width: 0px; border-style: solid; width: 624px; height: 207px;" /></p>
<p>Though all four collection days have sample means less than 24%, it’s obvious from the boxplots that the first three collection days are clearly below 24%, and the medians from all four days are less than 11%. The individual value plots reveal the large number of 0’s on each day, which represented collection spots that had no recyclables. Both graphs display the positively skewed nature of the data. Because of the positive skewness, each day’s mean is much larger than its median.</p>
How capable was the process?
<p>Next, the students ran a <a href="http://blog.minitab.com/blog/real-world-quality-improvement/using-statistics-to-show-your-boss-process-improvements" target="_blank">process capability analysis</a> for the seven areas where trash was collected over four days:</p>
<p><img src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/8f9b85a55164f9e957809a8be1eef1c0/process_cap.jpg" style="border-width: 0px; border-style: solid; width: 465px; height: 347px;" /></p>
<p>The process capability indices were Pp = 0.48 and Ppk = 0.42. (The Pp value corresponds to a 1.44 Sigma Level, while the Ppk value corresponds to a 1.26 Sigma Level.) Recall that the previous Ppk value after improvements in <a href="http://blog.minitab.com/blog/real-world-quality-improvement/a-little-trash-talk%3A-improving-recycling-processes-at-rose-hulman%2C-part-ii" target="_blank">spring 2014</a> was 0.22. The fall 2015 index is almost double that value!</p>
<p>The students knew that they still needed to account for the total weight of the trash and recyclables by calculating the percentage of recyclables per station. Some collection stations with the highest percentage of recyclables had the lowest total weight, while some stations with the lowest percentage of recyclables had the highest total weight. Instead of strictly using a capability index to indicate their improvement, they incorporated a <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-tutorial-and-examples" target="_blank">regression</a> model for the trash weight versus the total weight of trash and recyclables to show that the percentage of recyclables in the trash was less than 20%.</p>
<p>The 95% confidence interval for the true mean slope of the regression line was [0.856, 0.954]. The students were 95% certain that the trash weight was somewhere between 0.86 to 0.96 of the total weight of the collection. Hence, the recycling weight was between 0.046 and 0.114 of the total weight. This value is clearly below 20% with 95% confidence! From this, they were able to state through yet another type of analysis that there was a statistically significant improvement over the spring 2014 recycling project, and that they met their goal of reducing the percentage of recyclables in the trash to below 20%. Compared to the spring 2014 project where 24% of the trash was recyclables, the fall 2015 students saved <em>at least</em> 4% more recyclables from ending up in the local landfill!</p>
<p>For even more on this topic, be sure to check out Rose-Hulman student Peter Olejnik’s blog posts on how he and the recycling project team at the school used regression to evaluate project results:</p>
<p><a href="http://blog.minitab.com/blog/statistics-in-the-field/using-regression-to-evaluate-project-results%2C-part-1" target="_blank">Using Regression to Evaluate Project Results, part 1</a></p>
<p><a href="http://blog.minitab.com/blog/statistics-in-the-field/using-regression-to-evaluate-project-results%2C-part-2" target="_blank">Using Regression to Evaluate Project Results, part 2</a></p>
<p><em>Many thanks to Dr. Diane Evans for her contributions to this post!</em></p>
Data AnalysisFun StatisticsHypothesis TestingLean Six SigmaLearningSix SigmaStatisticsStatsFri, 08 May 2015 12:00:00 +0000http://blog.minitab.com/blog/real-world-quality-improvement/improving-recycling-processes-at-rose-hulman-part-iiiCarly BarryBanned: P Values and Confidence Intervals! A Rebuttal, Part 1
http://blog.minitab.com/blog/adventures-in-statistics/banned-p-values-and-confidence-intervals-a-rebuttal-part-1
<p>Banned! In February 2015, editor David Trafimow and associate editor Michael Marks of the <em>Journal of Basic and Applied Social Psychology</em> <a href="http://www.tandfonline.com/doi/full/10.1080/01973533.2015.1012991#abstract" target="_blank">declared</a> that the null hypothesis statistical testing procedure is invalid. They promptly banned P values, confidence intervals, and hypothesis testing from the journal.</p>
<p>The journal now requires descriptive statistics and effect sizes. They also encourage large sample sizes, but they don’t require it.</p>
<p>This is the first of two posts in which I focus on the ban. In this post, I’ll start by showing how hypothesis testing provides crucial information that descriptive statistics alone just can't convey. In my next post, I’ll explain the editors' rationale for the ban—and why I disagree with them.</p>
P Values and Confidence Intervals Are Valuable!
<p>It’s really easy to show how P values and confidence intervals are valuable. Take a look at the graph below and determine which study found a true treatment effect and which one didn’t. The difference between the treatment group and the control group is the effect size, which is what the editors want authors to focus on.</p>
<p><img alt="Bar chart that compares the effect size of two studies" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/94164d874cc69ffe3763cf5cee64d47b/banned_pvalues.png" style="width: 576px; height: 384px;" /></p>
<p>Can you tell? The truth is that the results from both of these studies could represent either a true treatment effect or a random fluctuation due to sampling error.</p>
<p>So, how do you know? There are three factors at play.</p>
<ul>
<li><strong>Effect size</strong>: The larger the effect size, the less likely it is to be a random fluctuation. Clearly, Study A has a larger effect size. The large effect seems significant, but it’s not enough by itself.</li>
<li><strong>Sample size</strong>: A larger sample size allows you to detect smaller effects. If the sample size for Study B is large enough, its smaller treatment effect may very well be real.</li>
<li><strong>Variability in the data</strong>: The greater the variability, the more likely you’ll see large differences between the experimental groups due to random sampling error. If the variability in Study A is large enough, its larger difference may be attributable to random error rather than a treatment effect.</li>
</ul>
<p>The effect size from either study could be meaningful, or not, depending on the other factors. As you can see, there are scenarios where the larger effect size in Study A can be random error while the smaller effect size in Study B can be a true treatment effect.</p>
<p>Presumably, these statistics will all be reported under the journal's new focus on effect size and descriptive statistics. However, assessing different combinations of effect sizes, sample sizes, and variability gets fairly complicated. The ban forces journal readers to use a subjective eyeball approach to determine whether the difference is a true effect. And this is just for <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/hypothesis-tests/tests-of-means/why-use-2-sample-t/" target="_blank">comparing two means</a>, which is about as simple as it can get! (How the heck would you even perform multiple regression analysis with only descriptive statistics?!)</p>
<p>Wouldn’t it be nice if there was some sort of statistic that incorporated all of these factors and rolled them into one objective number?</p>
<p>Hold on . . . that’s the P value! The <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-correctly-interpret-p-values" target="_blank">P value</a> provides an objective standard for everyone assessing the results from a study.</p>
<p>Now, let’s consider two different experiments that have studied the same treatment and have come up with the following two estimates of the effect size.</p>
<strong>Effect Size Study C</strong>
<strong>Effect Size Study D</strong>
10
10
<p>Which estimate is better? It is pretty hard to say which 10 is better, right? Wouldn’t it be nice if there was a procedure that incorporated the effect size, sample size, and variability to provide a range of probable values <em>and</em> indicate the precision of the estimate?</p>
<p>Oh wait . . . that’s the confidence interval!</p>
<p>If we create the <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-confidence-intervals-and-confidence-levels" target="_blank">confidence intervals</a> for Study C [-5 25] and Study D [8 12], we gain some very valuable information. The confidence interval for Study C is both very wide and contains 0. This estimate is imprecise, and we can't rule out the possibility of no treatment effect. We're not learning anything from this study. On the other hand, the estimate from Study D is both very precise and statistically significant.</p>
<p>The two studies produced the same point estimate of the effect size, but the confidence interval shows that they're actually very different.</p>
<p>Focusing solely on effect sizes and descriptive statistics is inadequate. P values and confidence intervals contribute truly important information that descriptive statistics alone can’t provide. That's why banning them is a mistake.</p>
<p><a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-why-we-need-to-use-hypothesis-tests-in-statistics">See a graphical explanation of how hypothesis tests work</a>.</p>
<p>If you'd like to see some fun examples of hypothesis tests in action, check out my posts about the Mythbusters!</p>
<ul>
<li><a href="http://blog.minitab.com/blog/adventures-in-statistics/busting-the-mythbusters-are-yawns-contagious">Busting the Mythbusters with Statistics: Are Yawns Contagious?</a></li>
<li><a href="http://blog.minitab.com/blog/adventures-in-statistics/using-hypothesis-tests-to-bust-myths-about-the-battle-of-the-sexes">Using Hypothesis Tests to Bust Myths about the Battle of the Sexes</a></li>
</ul>
<p>The editors do raise some legitimate concerns about the hypothesis testing process. In <a href="http://blog.minitab.com/blog/adventures-in-statistics/banned-p-values-and-confidence-intervals-a-rebuttal-part-2">part two</a>, I assess their arguments and explain why I believe a ban still is not justified.</p>
Data AnalysisHypothesis TestingLearningStatisticsStatistics HelpStatsThu, 30 Apr 2015 12:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/banned-p-values-and-confidence-intervals-a-rebuttal-part-1Jim FrostNo Horsing Around with the Poisson Distribution, Troops
http://blog.minitab.com/blog/quality-data-analysis-and-statistics/no-horsing-around-with-the-poisson-distribution-troops
<p>In 1898, Russian economist Ladislaus Bortkiewicz published his first statistics book entitled <em>Das Gesetz der keinem Zahlen</em><em>,</em> in which he included an example that eventually became famous for illustrating the Poisson distribution. <img alt="horses" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/d76b2523d4819d498f66c0a10250df7b/horses.jpg" style="margin: 10px 15px; float: right; width: 250px; height: 154px;" /></p>
<p><span style="line-height: 18.9090900421143px;">Bortkiewicz </span>researched the annual deaths by horse kicks in the Prussian Army from 1875-1984. Data was recorded from 14 different army corps, with one being the Guard Corps. (According to one Wikipedia article on the subject, the Guard Corps may have been responsible for Prussia’s elite Guard units.) Let's take a closer look at his data and see what Minitab has to say using a Poisson goodness-of-fit test.</p>
<p>Here's the data set (thank you, <a href="http://www.math.uah.edu/stat/data/HorseKicks.html" target="_blank">University of Alabama in Huntsville</a>):</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f7e1af57-c25e-4ec3-a999-2166d525717e/Image/823a75a6dcf75897e05eaaa468d7350c/data_set.PNG" style="width: 997px; height: 466px;" /><br />
</p>
What Is the Poisson Distribution?
<p>As a review, the Poisson distribution is a discrete probability distribution for the counts of events that occur randomly in a given interval of time or space. The Poisson distribution only has one parameter, which is called lambda (or mean). To divert your attention just a little bit before we run our goodness-of-fit test, let’s look at how the distribution changes with different values of lambda. <span style="line-height: 1.6;">Go to </span><strong style="line-height: 1.6;">Graph > </strong><strong style="line-height: 1.6;">Probability Distribution Plot > </strong><strong style="line-height: 1.6;">View Single</strong><span style="line-height: 1.6;">. Select <em>Poisson </em>from the Distribution drop-down and enter in <em>.5</em> for the mean, then press <em>OK</em>:</span></p>
<p align="center"><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f7e1af57-c25e-4ec3-a999-2166d525717e/Image/770927a9991955cf769dc73e508af2f4/pic1.png" style="width: 415px; height: 334px;" /></p>
<p>After I created my first plot, I created 3 more probability distribution plots with lambda at 2, 4, 10. I then used Minitab’s Layout Tool under the <a href="http://blog.minitab.com/blog/marilyn-wheatleys-blog/getting-the-most-out-of-your-text-data-part-iii">Editor Menu</a> to combine four graphs.</p>
<p>As lambda increases, the graphs begin to resemble a normally distributed curve:</p>
<p style="text-align: center;"><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f7e1af57-c25e-4ec3-a999-2166d525717e/Image/3e77aef5baac0edd39721bdf9a57a0de/pic2.png" style="width: 577px; height: 385px;" /></p>
Does This Data Follow a Poisson Distribution?
<p><span style="line-height: 1.6;">Interesting, right? But let's get back on track and test if the overall data obtained by Bortkiewicz follows a Poisson distribution. </span></p>
<p><span style="line-height: 1.6;">I first had to stack the data from 14 columns into one column. This is done via </span><strong style="line-height: 1.6;">Data > Stack > Columns…</strong></p>
<p style="text-align: center;"><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f7e1af57-c25e-4ec3-a999-2166d525717e/Image/7a29a555220937423f326e10d1fe2475/pic3.png" style="width: 461px; height: 335px;" /></p>
<p><span style="line-height: 1.6;">With the data stacked, I went to</span><strong style="line-height: 1.6;"> Stat > Basic Statistics > Goodness-of-Fit for Poisson…, </strong><span style="line-height: 1.6;">filling out the dialog as shown below:</span></p>
<p align="center"><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f7e1af57-c25e-4ec3-a999-2166d525717e/Image/b8dd1fc71f1cccbc277b3b35770b037e/pic4.png" style="width: 434px; height: 334px;" /></p>
<p>After I clicked OK, Minitab delivered the following results:</p>
<p style="text-align: center;"><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f7e1af57-c25e-4ec3-a999-2166d525717e/Image/b6e13edc14af23b6c5f1b27b1616e066/results.PNG" style="width: 339px; height: 216px;" /></p>
<p><span style="line-height: 1.6;">The Poisson mean, or lambda, is 0.70. This means that we can expect, on average, 0.70 deaths per one corps per one year. If I knew of these statistics and served in the army corps at that time, I would have treated my horse like gold. Anything my horse wants, it gets.</span></p>
<p>Further down you’ll see a table showing the observed counts and the Expected Counts for the number of deaths by horse. The expected counts visually mirror pretty well to what was observed. To further validate these claims that this data can be modeled by a Poisson distribution, we can use the p-value for the Goodness-of-Fit Test in the last section of the output.</p>
<p>The hypothesis for the Chi-Square Goodness-of-Fit test for Poisson is:</p>
<p style="margin-left: 40px;">Ho: The data follow a Poisson distribution</p>
<p style="margin-left: 40px;">H1: The data do not follow a Poisson distribution</p>
<p>We are going to use an alpha level of 0.05. Since our p-value is greater than our alpha, we can say that we do not have enough evidence to reject the null hypothesis, which is that the horse kick deaths per year follow a Poisson distribution.</p>
<p>The chart below shows how close the both the expected and observed values for deaths are to each other. </p>
<p style="text-align: center;"><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f7e1af57-c25e-4ec3-a999-2166d525717e/Image/8200b5c574e8824a563089db792909e8/pic5.png" style="width: 577px; height: 385px;" /></p>
<p><span style="line-height: 1.6;">I've been thinking about what other data could have been collected to serve as potential predictors if we wanted to do a poisson regression. We could then see if there were any significant relationships between our horse kick death counts and some factor of interest. Maybe corps location or horse breed could have been documented? Given that the space or unit of time is considered one year, that specific location or breed would have to be the same value for the entire length of that time. For example, Corps 14 in 1893 must have remained entirely in “Location A” during that year, or every horse in a particular corps must be of the same breed for a particular year.</span></p>
<p>According to <a href="http://equusmagazine.com/article/whyhorseskick_012307-8294">equusmagazine.com</a>, horses kick for six reasons:</p>
<ul>
<li>"I feel threatened."</li>
<li>"I feel good."</li>
<li>"I hurt."</li>
<li>"I feel frustrated."</li>
<li>"Back off."</li>
<li>"I'm the boss around here."</li>
</ul>
<p>Wouldn’t this have made for a great categorical variable?</p>
<p> </p>
Data AnalysisFun StatisticsHypothesis TestingStatisticsStatistics HelpTue, 14 Apr 2015 12:00:00 +0000http://blog.minitab.com/blog/quality-data-analysis-and-statistics/no-horsing-around-with-the-poisson-distribution-troopsAndy CheshireUnderstanding Hypothesis Tests: Confidence Intervals and Confidence Levels
http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-confidence-intervals-and-confidence-levels
<p>In this series of posts, I show how hypothesis tests and confidence intervals work by focusing on concepts and graphs rather than equations and numbers. </p>
<p>Previously, I used graphs to <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-significance-levels-alpha-and-p-values-in-statistics" target="_blank">show what statistical significance really means</a>. In this post, I’ll explain both confidence intervals and confidence levels, and how they’re closely related to P values and significance levels.</p>
How to Correctly Interpret Confidence Intervals and Confidence Levels
<p><img alt="Illustration of confidence levels" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/a9bd1376510c8289a0daf15f5bcd376f/ci.gif" style="float: right; width: 327px; height: 224px;" />A confidence interval is a range of values that is likely to contain an unknown population parameter. If you draw a random sample many times, a certain percentage of the confidence intervals will contain the population mean. This percentage is the confidence level.</p>
<p>Most frequently, you’ll use confidence intervals to bound the mean or standard deviation, but you can also obtain them for regression coefficients, proportions, rates of occurrence (Poisson), and for the differences between populations.</p>
<p>Just as there is a common <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-correctly-interpret-p-values" target="_blank">misconception of how to interpret P values</a>, there’s a common misconception of how to interpret confidence intervals. In this case, the confidence level is<em> <strong>not </strong></em>the probability that a specific confidence interval contains the population parameter.</p>
<p>The confidence level represents the theoretical ability of the analysis to produce accurate intervals if you are able to assess <em>many intervals</em> and you know the value of the population parameter. For a <em>specific</em> confidence interval from one study, the interval either contains the population value or it does not—there’s no room for probabilities other than 0 or 1. And you can't choose between these two possibilities because you don’t know the value of the population parameter.</p>
<p style="margin-left: 40px;">"The parameter is an unknown constant and no probability statement concerning its value may be made." <br />
<em><span style="line-height: 1.6;">—Jerzy Neyman, original developer of confidence intervals.</span></em></p>
<p>This will be easier to understand after we discuss the graph below . . .</p>
<p>With this in mind, how <em>do</em> you interpret confidence intervals?</p>
<p>Confidence intervals serve as good estimates of the population parameter because the procedure tends to produce intervals that contain the parameter. Confidence intervals are comprised of the point estimate (the most likely value) and a margin of error around that point estimate. The margin of error indicates the amount of uncertainty that surrounds the sample estimate of the population parameter.</p>
<p>In this vein, you can use confidence intervals to assess the precision of the sample estimate. For a specific variable, a narrower confidence interval [90 110] suggests a more precise estimate of the population parameter than a wider confidence interval [50 150].</p>
Confidence Intervals and the Margin of Error
<p>Let’s move on to see how confidence intervals account for that margin of error. To do this, we’ll use the same tools that we’ve been using to understand hypothesis tests. I’ll create a <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/introductory-concepts/basic-concepts/sampling-distribution/" target="_blank">sampling distribution</a> using <a href="http://blog.minitab.com/blog/adventures-in-statistics/graphing-distributions-with-probability-distribution-plots" target="_blank">probability distribution plots</a>, the <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/probability-distributions-and-random-data/distributions/t-distribution/" target="_blank">t-distribution</a>, and the variability in our data. We'll base our confidence interval on the <a href="http://support.minitab.com/datasets/FamilyEnergyCost.MTW">energy cost data set</a> that we've been using.</p>
<p>When we looked at <a href="http://blog.minitab.com/blog/adventures-in-statistics/when-should-i-use-confidence-intervals-prediction-intervals-and-tolerance-intervals" target="_blank">significance levels</a>, the graphs displayed a sampling distribution centered on the null hypothesis value, and the outer 5% of the distribution was shaded. For confidence intervals, we need to shift the sampling distribution so that it is centered on the sample mean and shade the middle 95%.</p>
<p><img alt="Probability distribution plot that illustrates how a confidence interval works" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/80de5f2397507752d74ffff86fbd94ea/ci_sample_mean.png" style="width: 576px; height: 384px;" /></p>
<p>The shaded area shows the range of sample means that you’d obtain 95% of the time using our sample mean as the point estimate of the population mean. This range [267 394] is our 95% confidence interval.</p>
<p>Using the graph, it’s easier to understand how a specific confidence interval represents the margin of error, or the amount of uncertainty, around the point estimate. The sample mean is the most likely value for the population mean given the information that we have. However, the graph shows it would not be unusual at all for other random samples drawn from the same population to obtain different sample means within the shaded area. These other likely sample means all suggest different values for the population mean. Hence, the interval represents the inherent uncertainty that comes with using sample data.</p>
<p>You can use these graphs to calculate probabilities for specific values. However, notice that you can’t place the population mean on the graph because that value is unknown. Consequently, you can’t calculate probabilities for the population mean, just as Neyman said!</p>
Why P Values and Confidence Intervals Always Agree About Statistical Significance
<p>You can use either P values or confidence intervals to determine whether your results are statistically significant. If a hypothesis test produces both, these results will agree.</p>
<p>The confidence level is equivalent to 1 – the alpha level. So, if your significance level is 0.05, the corresponding confidence level is 95%.</p>
<ul>
<li>If the P value is less than your significance (alpha) level, the hypothesis test is statistically significant.</li>
<li>If the confidence interval does not contain the null hypothesis value, the results are statistically significant.</li>
<li>If the P value is less than alpha, the confidence interval will not contain the null hypothesis value.</li>
</ul>
<p>For our example, the P value (0.031) is less than the significance level (0.05), which indicates that our results are statistically significant. Similarly, our 95% confidence interval [267 394] does not include the null hypothesis mean of 260 and we draw the same conclusion.</p>
<p>To understand why the results always agree, let’s recall how both the significance level and confidence level work.</p>
<ul>
<li>The significance level defines the distance the sample mean must be from the null hypothesis to be considered statistically significant.</li>
<li>The confidence level defines the distance for how close the confidence limits are to sample mean.</li>
</ul>
<p>Both the significance level and the confidence level define a distance from a limit to a mean. Guess what? The distances in both cases are exactly the same!</p>
<p>The distance equals the <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/hypothesis-tests/basics/what-is-a-critical-value/" target="_blank">critical t-value</a> * <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/hypothesis-tests/tests-of-means/what-is-the-standard-error-of-the-mean/" target="_blank">standard error of the mean</a>. For our energy cost example data, the distance works out to be $63.57.</p>
<p>Imagine this discussion between the null hypothesis mean and the sample mean:</p>
<p><strong>Null hypothesis mean, hypothesis test representative</strong>: Hey buddy! I’ve found that you’re statistically significant because you’re more than $63.57 away from me!</p>
<p><strong>Sample mean, confidence interval representative</strong>: Actually, I’m significant because <em>you’re</em> more than $63.57 away from <em>me</em>!</p>
<p>Very agreeable aren’t they? And, they always will agree as long as you compare the correct pairs of P values and confidence intervals. If you compare the incorrect pair, you can get conflicting results, as shown by common mistake #1 in this <a href="http://blog.minitab.com/blog/real-world-quality-improvement/3-common-and-dangerous-statistical-misconceptions" target="_blank">post</a>.</p>
Closing Thoughts
<p>In statistical analyses, there tends to be a greater focus on P values and simply detecting a significant effect or difference. However, a statistically significant effect is not necessarily meaningful in the real world. For instance, the effect might be too small to be of any practical value.</p>
<p>It’s important to pay attention to the both the magnitude and the precision of the estimated effect. That’s why I'm rather fond of confidence intervals. They allow you to assess these important characteristics along with the statistical significance. You'd like to see a narrow confidence interval where the entire range represents an effect that is meaningful in the real world.</p>
<p>For more about confidence intervals, read my post where I <a href="http://blog.minitab.com/blog/adventures-in-statistics/when-should-i-use-confidence-intervals-prediction-intervals-and-tolerance-intervals">compare them to tolerance intervals and prediction intervals</a>.</p>
<p>If you'd like to see how I made the probability distribution plot, please read: <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-create-a-graphical-version-of-the-1-sample-t-test-in-minitab">How to Create a Graphical Version of the 1-sample t-Test</a>.</p>
Data AnalysisHypothesis TestingLearningStatisticsStatistics HelpStatsThu, 02 Apr 2015 12:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-confidence-intervals-and-confidence-levelsJim FrostHow Could You Benefit from a Box-Cox Transformation?
http://blog.minitab.com/blog/applying-statistics-in-quality-projects/how-could-you-benefit-from-a-box-cox-transformation
<p>Imagine that you are watching a race and that you are located close to the finish line. When the first and fastest runners complete the race, the differences in times between them will probably be quite small.</p>
<p>Now wait until the last runners arrive and consider their finishing times. For these slowest runners, the differences in completion times will be extremely large. This is due to the fact that for longer racing times a small difference in speed will have a significant impact on completion times, whereas for the fastest runners, small differences in speed will have a small (but decisive) impact on arrival times.</p>
<p>This phenomenon is called “<a href="http://blog.minitab.com/blog/adventures-in-statistics/curing-heteroscedasticity-with-weighted-regression-in-minitab-statistical-software">heteroscedasticity</a>” (non-constant variance). In this example, the amount of Variation depends on the average value (small variations for shorter completion times, large variations for longer times).</p>
<p><img alt="" spellcheck="true" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/cb6a4d6b498b3525a6d18d579db20557/race.JPG" style="width: 781px; height: 120px;" /></p>
<p>This distribution of running times data will probably not follow the familiar bell-shaped curve (a.k.a. the normal distribution). The resulting distribution will be asymmetrical with a longer tail on the right side. This is because there's small variability on the left side with a short tail for smaller running times, and larger variability for longer running times on the right side, hence the longer tail.</p>
<p><img alt="" spellcheck="true" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/a70fc75a0255884e65f231ac072d42c2/distribution_plot.jpg" style="width: 578px; height: 344px;" /></p>
<p>Why does this matter?</p>
<ul>
<li>Model bias and spurious interactions: If you are performing a regression or a design of experiments (any statistical modelling), this asymmetrical behavior may lead to a bias in the model. If a factor has a significant effect on the average speed, because the variability is much larger for a larger average running time, many factors will seem to have a stronger effect when the mean is larger. This is not due, however, to a true factor effect but rather to an increased amount of variability that affects all factor effect estimates when the mean gets larger. This will probably generate spurious interactions due to a non-constant variation, resulting in a very complex model with many (spurious and unrealistic) interactions.</li>
<li>If you are performing a standard capability analysis, this analysis is based on the normality assumption. A substantial departure from normality will bias your capability estimates.</li>
</ul>
The Box-Cox Transformation
<p>One solution to this is to transform your data into normality using a Box-Cox transformation. Minitab will select the best mathematical function for this data transformation. The objective is to obtain a normal distribution of the transformed data (after transformation) and a constant variance.</p>
<p>Consider the asymmetrical function below :</p>
<p><img alt="" spellcheck="true" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/796a7b0d27c6613ac17f983c839701e5/transformed_distribution.jpg" style="width: 515px; height: 326px;" /></p>
<p> <img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/9b0c5998839682f685055bb5168ab540/log_function.JPG" style="width: 437px; height: 313px;" /></p>
<p>If a logarithmic transformation is applied to this distribution, the differences between smaller values will be expanded (because the slope of the logarithmic function is steeper when values are small) whereas the differences between larger values will be reduced (because of the very moderate slope of the log distribution for larger values). If you inflate differences on the left tail and reduce differences on the right side tail, the result will be a symmetrical normal distribution, and a variance that is now constant (whatever the mean). This is the reason why in the <a href="http://www.minitab.com/products/minitab">Minitab Assistant</a>, a Box- Cox transformation is suggested whenever this is possible for non-normal data, and why in the <span style="line-height: 18.9090900421143px;">Minitab </span><span style="line-height: 1.6;">regression or DOE (design of experiments) dialogue boxes, the Box-Cox transformation is an option that anyone may consider if needed to transform residual data into normality.</span></p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/fc4ec75ca192d81dabf8556ebdf751e8/transformation.JPG" style="width: 430px; height: 611px;" /></p>
<p>The diagram above illustrates how, thanks to a Box-Cox transformation, performed by the Minitab Assistant (in a capability analysis), an asymmetrical distribution has been transformed into a normal symmetrical distribution (with a successful normality test).</p>
Box-Cox Transformation and Variable Scale
<p>Note that Minitab will search for the best transformation function, which may not necessarily be a logarithmic transformation.</p>
<p>As a result of this transformation, the physical scale of your variable may be altered. When looking at a capability graph, one may not recognize his typical values for the variable scale (after transformation). However, the estimated Ppk and Pp capability indices will be reliable and based on a normal distribution. Similarly, in a regression model, you need to be aware that the coefficients will be modified, although the transformation is obviously useful to remove spurious interactions and to identify the factors that are really significant.</p>
Data AnalysisDesign of ExperimentsHypothesis TestingLearningQuality ImprovementRegression AnalysisStatisticsStatistics HelpStatsMon, 30 Mar 2015 12:00:00 +0000http://blog.minitab.com/blog/applying-statistics-in-quality-projects/how-could-you-benefit-from-a-box-cox-transformationBruno ScibiliaHow to Create a Graphical Version of the 1-sample t-Test in Minitab
http://blog.minitab.com/blog/adventures-in-statistics/how-to-create-a-graphical-version-of-the-1-sample-t-test-in-minitab
<p>This is a companion post for a series of blog posts about understanding hypothesis tests. In this series, I create a graphical equivalent to a 1-sample t-test and confidence interval to help you understand how it works more intuitively.</p>
<p>This post focuses entirely on the steps required to create the graphs. It’s a fairly technical and task-oriented post designed for those who need to create the graphs for illustrative purposes. If you’d instead like to gain a better understanding of the concepts behind the graphs, please see the following posts:</p>
<ul>
<li><a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-why-we-need-to-use-hypothesis-tests-in-statistics" target="_blank">Understanding Hypothesis Tests: Why We Need to Use Hypothesis Tests</a></li>
<li><a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-significance-levels-alpha-and-p-values-in-statistics" target="_blank">Understanding Hypothesis Tests: The Significance Level and P Values</a></li>
<li><a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-confidence-intervals-and-confidence-levels" target="_blank">Understanding Hypothesis Tests: Confidence Intervals and Confidence Levels</a></li>
</ul>
<p>To create the following graphs, we’ll use Minitab’s <a href="http://blog.minitab.com/blog/adventures-in-statistics/graphing-distributions-with-probability-distribution-plots" target="_blank">probability distribution plots</a> in conjunction with several statistics obtained from the 1-sample t output. If you’d like more information about the formulas that are involved, you can find them in Minitab at: <strong>Help > Methods and Formulas > Basic Statistics > 1-Sample t</strong>.</p>
<p>The data for this example is <a href="http://support.minitab.com/datasets/FamilyEnergyCost.MTW">FamilyEnergyCost</a> and it is just one of the many data set examples that can be found in <a href="http://support.minitab.com/datasets/">Minitab’s Data Set Library</a>. We’ll perform the regular 1-sample t-test with a null hypothesis mean of 260, and then graphically recreate the results. </p>
<p><img alt="1-sample t-test output from Minitab statistical software" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/e26965956d7d682888dd0c749e10f7af/1t_swo.png" style="width: 485px; height: 123px;" /></p>
How to Graph the Two-Tailed Critical Region for a Significance Level of 0.05
<p>To create a graphical equivalent to a 1-sample t-test, we’ll need to graph the <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/probability-distributions-and-random-data/distributions/t-distribution/" target="_blank">t-distribution</a> using the correct number of degrees of freedom. For a 1-sample t-test, the degrees of freedom equals the sample size minus 1. So, that’s 24 degrees of freedom for our sample of 25.</p>
<ol>
<li>In Minitab, choose: <strong>Graph > Probability Distribution Plot > View Probability</strong>.</li>
<li>In <strong>Distribution</strong>, select <strong>t</strong>.</li>
<li>In <strong>Degrees of freedom</strong>, enter <em>24</em>.</li>
<li>Click the <strong>Shaded Area</strong> tab.</li>
<li>In <strong>Define Shaded Area By</strong>, select <strong>Probability</strong> and <strong>Both Tails</strong>.</li>
<li>In <strong>Probability</strong>, enter <em>0.05</em>.</li>
<li>Click <strong>OK</strong>.</li>
</ol>
<p>You should see this graph.</p>
<p><img alt="Probability distribution plot of t-values" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/4201bc7734483312e6056e023f12272b/t_value_plot_crtical_region.png" style="width: 576px; height: 384px;" /></p>
<p>This graph shows the distribution of t-values for a sample of our size with the t-values for the end points of the critical region. The <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/hypothesis-tests/tests-of-means/what-is-a-t-value/" target="_blank">t-value</a> for our sample mean is 2.29 and it falls within the critical region.</p>
<p>For my blog posts, I thought displaying the x-axis in the same units as our measurement variable (energy costs) would make the graph easier to understand. To do this, we need to transform the x-axis scale from t-values to energy costs.</p>
<p>Transforming the t-values to energy costs for a distribution centered on the null hypothesis mean requires a simple calculation:</p>
<p style="margin-left: 40px;">Energy Cost = Null Hypothesis Mean + (t-value * SE Mean)</p>
<p>We’ll use the null hypothesis value that we entered in the dialog box (260) and the SE Mean value that appears in the 1-sample t-test output (30.8). We need to calculate the energy cost values for all of the t-values that will appear on the x-axis (-4 to +4).</p>
<p>For example, a t-value of 1 equals 290.8 (260 + (1*30.8). Zero is the null hypothesis value, which is 260.</p>
<p>Next, we need to replace the t-values with the energy cost equivalents.</p>
<ol>
<li>Choose <strong>Editor > Select Item > X Scale</strong>.</li>
<li>Choose <strong>Editor > Edit X Scale</strong>.</li>
<li>In <strong>Major Tick Position</strong>, choose <strong>Number of Ticks</strong> and enter <em>9</em>.</li>
<li>Click the <strong>Show</strong> tab and check the <strong>Low</strong> check box for <strong>Major ticks</strong> and <strong>Major tick labels</strong>.</li>
<li>Click the <strong>Labels</strong> tab of the dialog box that appears. Enter the energy cost values that you calculated as shown below. I use rounded values to keep the x-axis tidy. Click <strong>OK</strong>.</li>
</ol>
<p><img alt="Dialog box for showing the transformed values on the x-scale" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/f96d38fa628633a7e4ff3a12b44483c0/edit_scale_dialog.png" style="width: 400px; height: 348px;" /></p>
<p>You should see this graph. To cleanup the x-axis, I had to delete the t-values that were still showing from before. Simply click each t-value once and press the <strong>Delete</strong> key.</p>
<p><img alt="Probability distribution plot of t-distribution with the x-scale transformed to energy costs" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/69de6d5f41cf0703764895b379c5d3fb/t_value_plot_crtical_region2.png" style="width: 576px; height: 384px;" /></p>
<p>Let’s add a reference line to show where our sample mean falls within the sampling distribution and critical region. The trick here is that the x-axis still uses t-values despite displaying the energy costs. We need to use the t-value for our sample mean that appears in the 1-sample t output (2.29).</p>
<ol>
<li>Choose <strong>Editor > Add > Reference Lines</strong>.</li>
<li>In <strong>Show reference lines at X values</strong>, enter<em> 2.29.</em></li>
<li>Click <strong>OK</strong>.</li>
<li>Double click the <em>2.29</em> that now appears on the graph.</li>
<li>In the dialog box that appears, enter <em>330.6</em> in <strong>Text</strong>.</li>
<li>Click <strong>OK</strong>.</li>
</ol>
<p>After editing the title and the x-axis label, you should have a graph similar to the one below.</p>
<p><img alt="Probability distribution plot with two-tailed critical region for a significance level of 0.05" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/212878044412db4ec165745b18c010e8/sig_level_05.png" style="width: 576px; height: 384px;" /></p>
How to Graph the P Value for a 1-sample t-Test
<p>To do this, we’ll duplicate the graph we created above and then modify it. This allows us to reuse some of the work that we’ve already done.</p>
<ol>
<li>Make sure the graph we created is selected.</li>
<li>Choose <strong>Editor > Duplicate Graph</strong>.</li>
<li>Double click the blue distribution curve on the graph.</li>
<li>Click the <strong>Shaded Area</strong> tab in the dialog box that appears.</li>
<li>In <strong>Define Shaded Area By</strong>, select <strong>X Value</strong> and <strong>Both Tails</strong>.</li>
<li>In <strong>X value</strong>, enter <em>2.29</em>.</li>
<li>Click <strong>OK</strong>.</li>
</ol>
<p>You’ll need to edit the graph title and delete some extra numbers on the x-axis. After these edits, you should have a graph similar to this one.</p>
<p><img alt="Probability distribution plot that displays the p-value for our sample mean" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/4a599dfe53a1c065837de772a5b157fb/p_value.png" style="width: 576px; height: 384px;" /></p>
How to Graph the Confidence Interval for a 1-sample t-test
<p>To graphically recreate the confidence interval, we’ll need to start from scratch for this graph. </p>
<ol>
<li>In Minitab, choose: <strong>Graph > Probability Distribution Plot > View Probability</strong>.</li>
<li>In <strong>Distribution</strong>, select <strong>t</strong>.</li>
<li>In <strong>Degrees of freedom</strong>, enter <em>24</em>.</li>
<li>Click the <strong>Shaded Area</strong> tab.</li>
<li>In <strong>Define Shaded Area By</strong>, select <strong>Probability</strong> and <strong>Middle</strong>.</li>
<li>Enter <em>0.025</em> in both <strong>Probability 1</strong> and <strong>Probability 2</strong>.</li>
<li>Click <strong>OK</strong>.</li>
</ol>
<p>Your graph should look like this:</p>
<p><img alt="Probability distribution plot that represents a confidence interval with t-values" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/9ec475422d97330eb236d76f37ee576a/ci_t_values.png" style="width: 576px; height: 384px;" /></p>
<p>Like before, we’ll need to transform the x-axis into energy costs. For this graph, I’ll only display the x-values for the end points of the confidence interval and the sample mean. So, we need to convert the three t-values of -2.064, 0, 2.064.</p>
<p>The equation to transform the t-values to energy costs for a distribution centered on the sample mean is:</p>
<p style="margin-left: 40px;">Energy Cost = Sample Mean + (t-score * SE Mean)</p>
<p>We obtain the following rounded values that represent the lower confidence limit, sample mean, and upper confidence limit: 267, 330.6, 394.</p>
<p>Simply double click the values in the x-axis to edit each individual label. Replace the t-value with the energy cost value. After editing the graph title, you should have a visual representation of the confidence interval that looks like this. I rounded the values for the confidence limits.</p>
<p><img alt="Probability distribution plot that displays a visual representation of a 95% confidence interval around the sample mean" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/80de5f2397507752d74ffff86fbd94ea/ci_sample_mean.png" style="width: 576px; height: 384px;" /></p>
Consider Using Minitab's Command Language
<p>When I create multiple graphs that involve many steps, I generally use Minitab's command language. This may sound daunting if you're not familiar with using this command language. However, Minitab makes this easier for you.</p>
<p>After you create one graph, choose <strong>Editor > Copy Command Language</strong>, and paste it into a text editor, such as Notepad. Save the file with the extension *.mtb and you have a Minitab Exec file. This Exec file contains all of the edits you made. Now, you can easily create similar graphs simply by modifying the parts that you want to change.</p>
<p>You can also get help for the command language right in Minitab. First, make sure the command prompt is enabled by choosing <strong>Editor > Enable Commands</strong>. At the prompt, type <em>help dplot</em>, and Minitab displays the help specific to probability distribution plots!</p>
<p>To run an exec file, choose <strong>File > Other Files > Run an Exec</strong>. Click <strong>Select File</strong> and browse to the file you saved. Here are the MTB files for my graphs for the <a href="//cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/File/e4994557f813b872b03687363259faa2/prob_plot_alpha.mtb">critical region</a>, <a href="//cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/File/3a0e83912f4826db06ee8c0777a5cf73/prob_plot_p.mtb">P value</a>, and <a href="//cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/File/00f225aa85ce31c80cfae24c1704fa1c/ci_sample.mtb">confidence interval</a>.</p>
<p>Happy graphing!</p>
Hypothesis TestingSix SigmaWed, 25 Mar 2015 12:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/how-to-create-a-graphical-version-of-the-1-sample-t-test-in-minitabJim FrostUnderstanding Hypothesis Tests: Significance Levels (Alpha) and P values in Statistics
http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-significance-levels-alpha-and-p-values-in-statistics
<p>What do significance levels and P values mean in hypothesis tests? What <em>is </em>statistical significance anyway? In this post, I’ll continue to focus on concepts and graphs to help you gain a more intuitive understanding of how hypothesis tests work in statistics.</p>
<p>To bring it to life, I’ll add the significance level and P value to the graph in my previous post in order to perform a graphical version of the 1 sample t-test. It’s easier to understand when you can see what statistical significance truly means!</p>
<p>Here’s where we left off in <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-why-we-need-to-use-hypothesis-tests-in-statistics" target="_blank">my last post</a>. We want to determine whether our sample mean (330.6) indicates that this year's average energy cost is significantly different from last year’s average energy cost of $260.</p>
<p><img alt="Descriptive statistics for the example" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/135cd05bde9f7f16ab396a8525d2b09c/desc_stats.png" style="width: 302px; height: 87px;" /></p>
<p><img alt="Probability distribution plot for our example" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/76699c4c1f2bd6c83b88c1ac8e93aa54/sampling_dist_null.png" style="width: 595px; height: 397px;" /></p>
<p>The <a href="http://blog.minitab.com/blog/adventures-in-statistics/graphing-distributions-with-probability-distribution-plots" target="_blank">probability distribution plot</a> above shows the <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/introductory-concepts/basic-concepts/sampling-distribution/" target="_blank">distribution of sample means</a> we’d obtain under the assumption that the null hypothesis is true (population mean = 260) and we repeatedly drew a large number of random samples.</p>
<p>I left you with a question: where do we draw the line for statistical significance on the graph? Now we'll add in the significance level and the P value, which are the decision-making tools we'll need.</p>
<p>We'll use these tools to test the following hypotheses:</p>
<ul>
<li>Null hypothesis: The population mean equals the hypothesized mean (260).</li>
<li>Alternative hypothesis: The population mean differs from the hypothesized mean (260).</li>
</ul>
What Is the Significance Level (Alpha)?
<p>The significance level, also denoted as alpha or α, is the probability of rejecting the null hypothesis when it is true. For example, a significance level of 0.05 indicates a 5% risk of concluding that a difference exists when there is no actual difference.</p>
<p>These types of definitions can be hard to understand because of their technical nature. A picture makes the concepts much easier to comprehend!</p>
<p>The significance level determines how far out from the null hypothesis value we'll draw that line on the graph. To graph a significance level of 0.05, we need to shade the 5% of the distribution that is furthest away from the null hypothesis.</p>
<p><img alt="Probability plot that shows the critical regions for a significance level of 0.05" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/212878044412db4ec165745b18c010e8/sig_level_05.png" style="width: 595px; height: 397px;" /></p>
<p>In the graph above, the two shaded areas are equidistant from the null hypothesis value and each area has a probability of 0.025, for a total of 0.05. In statistics, we call these shaded areas the <em>critical region</em> for a two-tailed test. If the population mean is 260, we’d expect to obtain a sample mean that falls in the critical region 5% of the time. The critical region defines how far away our sample statistic must be from the null hypothesis value before we can say it is unusual enough to reject the null hypothesis.</p>
<p>Our sample mean (330.6) falls within the critical region, which indicates it is statistically significant at the 0.05 level.</p>
<p>We can also see if it is statistically significant using the other common significance level of 0.01.</p>
<p><img alt="Probability plot that shows the critical regions for a significance level of 0.01" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/8744b853f28396be001c2ee9678a9c14/sig_level_01.png" style="width: 595px; height: 397px;" /></p>
<p>The two shaded areas each have a probability of 0.005, which adds up to a total probability of 0.01. This time our sample mean does not fall within the critical region and we fail to reject the null hypothesis. This comparison shows why you need to choose your significance level before you begin your study. It protects you from choosing a significance level because it conveniently gives you significant results!</p>
<p>Thanks to the graph, we were able to determine that our results are statistically significant at the 0.05 level without using a P value. However, when you use the numeric output produced by <a href="http://www.minitab.com/en-us/products/minitab/features/" target="_blank">statistical software</a>, you’ll need to compare the P value to your significance level to make this determination.</p>
What Are P values?
<p>P-values are the probability of obtaining an effect at least as extreme as the one in your sample data, assuming the truth of the null hypothesis.</p>
<p>This definition of P values, while technically correct, is a bit convoluted. It’s easier to understand with a graph!</p>
<p>To graph the P value for our example data set, we need to determine the distance between the sample mean and the null hypothesis value (330.6 - 260 = 70.6). Next, we can graph the probability of obtaining a sample mean that is at least as extreme in both tails of the distribution (260 +/- 70.6).</p>
<p><img alt="Probability plot that shows the p-value for our sample mean" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/4a599dfe53a1c065837de772a5b157fb/p_value.png" style="width: 595px; height: 397px;" /></p>
<p>In the graph above, the two shaded areas each have a probability of 0.01556, for a total probability 0.03112. This probability represents the likelihood of obtaining a sample mean that is at least as extreme as our sample mean in both tails of the distribution if the population mean is 260. That’s our P value!</p>
<p>When a P value is less than or equal to the significance level, you reject the null hypothesis. If we take the P value for our example and compare it to the common significance levels, it matches the previous graphical results. The P value of 0.03112 is statistically significant at an alpha level of 0.05, but not at the 0.01 level.</p>
<p>If we stick to a significance level of 0.05, we can conclude that the average energy cost for the population is greater than 260.</p>
<p>A common mistake is to interpret the P-value as the probability that the null hypothesis is true. To understand why this interpretation is incorrect, please read my blog post <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-correctly-interpret-p-values">How to Correctly Interpret P Values</a>.</p>
Discussion about Statistically Significant Results
<p>A hypothesis test evaluates two mutually exclusive statements about a population to determine which statement is best supported by the sample data. A test result is statistically significant when the sample statistic is unusual enough relative to the null hypothesis that we can reject the null hypothesis for the entire population. “Unusual enough” in a hypothesis test is defined by:</p>
<ul>
<li>The assumption that the null hypothesis is true—the graphs are centered on the null hypothesis value.</li>
<li>The significance level—how far out do we draw the line for the critical region?</li>
<li>Our sample statistic—does it fall in the critical region?</li>
</ul>
<p>Keep in mind that there is no magic significance level that distinguishes between the studies that have a true effect and those that don’t with 100% accuracy. The common alpha values of 0.05 and 0.01 are simply based on tradition. For a significance level of 0.05, expect to obtain sample means in the critical region 5% of the time when <em>the</em> <em>null hypothesis is</em> <em>true</em>. In these cases, you won’t know that the null hypothesis is true but you’ll reject it because the sample mean falls in the critical region. That’s why the significance level is also referred to as an <em>error</em> rate!</p>
<p>This <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/hypothesis-tests/basics/type-i-and-type-ii-error/" target="_blank">type of error</a> doesn’t imply that the experimenter did anything wrong or require any other unusual explanation. The graphs show that when the null hypothesis is true, it is possible to obtain these unusual sample means for no reason other than random sampling error. It’s just luck of the draw.</p>
<p>Significance levels and P values are important tools that help you quantify and control this type of error in a hypothesis test. Using these tools to decide when to reject the null hypothesis increases your chance of making the correct decision.</p>
<p>In my next post, I’ll continue to use this graphical framework to help you <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-confidence-intervals-and-confidence-levels">understand confidence intervals and confidence levels</a>!</p>
<p>If you'd like to see how I made these graphs, please read: <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-create-a-graphical-version-of-the-1-sample-t-test-in-minitab" target="_blank">How to Create a Graphical Version of the 1-sample t-Test</a>.</p>
Data AnalysisHypothesis TestingLearningStatisticsStatistics HelpStatsThu, 19 Mar 2015 12:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-significance-levels-alpha-and-p-values-in-statisticsJim FrostP-value Roulette: Making Hypothesis Testing a Winner’s Game
http://blog.minitab.com/blog/rkelly/p-value-roulette-making-hypothesis-testing-a-winner%E2%80%99s-game
<p>Welcome to the Hypothesis Test Casino! The featured game of the house is roulette. But this is no <em>ordinary</em> game of roulette. This is p-value roulette!</p>
<p>Here’s how it works: We have two roulette wheels, the Null wheel and the Alternative wheel. Each wheel has 20 slots (instead of the usual 37 or 38). You get to bet on one slot.</p>
<p><img alt="http://upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Edvard_Munch_-_At_the_Roulette_Table_in_Monte_Carlo_-_Google_Art_Project.jpg/256px-Edvard_Munch_-_At_the_Roulette_Table_in_Monte_Carlo_-_Google_Art_Project.jpg" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/8647ae2930d63e128d09f0b2cc5cdb87/p_value_roulette.jpg" style="line-height: 20.7999992370605px; border-width: 1px; border-style: solid; margin: 10px 15px; width: 256px; height: 166px; float: right;" /></p>
<p>What happens if the ball lands in the slot you bet on? Well, that depends on which wheel we spin. If we spin the Null wheel, you lose your bet. But if we spin the Alternative wheel, you win!</p>
<p>I’m sorry, but we can’t tell you which wheel we’re spinning.</p>
<p>Doesn’t that sound like a good game?</p>
<p>Not convinced yet? I assure you the odds are in your favor <em>if </em>you choose your slot wisely. Look, I’ll show you a graph of some data from the Null wheel. We spun it 10,000 times and counted how many times the ball landed in each slot. As you can see each slot is just as likely as any other, with a probability of about 0.05 each. That means there’s a 95% probability the ball won’t land on your slot, so you have only a 5% chance of losing—no matter what—<em>if</em> we happen to spin the Null wheel.</p>
<p><img alt="histogram of p values for null hypothesis" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/dc5efcd7001f33a77bea1c635af837e5/histogram_of_p_values_null_hypothesis.png" style="width: 576px; height: 384px;" /></p>
<p>What about that Alternative wheel, you ask? Well, we’ve had quite a few different Alternative wheels over the years. Here’s a graph of some data from one we were spinning last year:</p>
<p><img alt="histogram of p values from alternative hypothesis" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/dd0cafe3375f3202adaf3542d15eb9ab/histogram_of_p_values_alternative_hypothesis.png" style="width: 576px; height: 384px;" /></p>
<p>And just a few months ago, we had a different one. Check out the data from this one. It was very, very popular.</p>
<p><img alt=" histogram of p-values from popular alternative hypothesis" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/fc6f0ff641e7eb4d3f7750c8163ac968/histogram_of_p_values_alternative_hypothesis_2.png" style="width: 576px; height: 384px;" /></p>
<p>Now that’s what I call an Alternative! People in the know always picked the first slot. You can see why.</p>
<p>I’m not allowed to show you data from the current game. But I assure you the Alternatives all follow this same pattern. They tend to favor those smaller numbers.</p>
<p>So, you’d like to play? Great! Which slot would you like to bet on?</p>
Is this on the level?
<p>No, I don’t really have a casino with two roulette wheels. My graphs are simulated p-values for a <a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/what-is-a-t-test-and-why-is-it-like-telling-a-kid-to-clean-up-that-mess-in-the-kitchen">1-sample t-test</a>. The null hypothesis is that the mean of a process or population is 5. The two-sided alternative is that the mean is different from 5. In my first graph, the null hypothesis was true: I used Minitab to generate random samples of size 20 from a normal distribution with mean 5 and standard deviation of 1. For the other two graphs, the only thing I changed was the mean of the normal distribution I sampled from. For the second graph, the mean was 5.3. For the final graph, the mean was 5.75.</p>
<p>For just about any hypothesis test you do in Minitab <a href="http://www.minitab.com/products/minitab">Statistical Software</a>, you will see a p-value. Once you understand how p-values work, you will have greater insight into what they are telling you. Let’s see what we can learn about p-values from playing p-value roulette.</p>
<ol>
<li>Just as you didn’t know whether you are spinning the Null or Alternative wheel, you don’t know for sure whether the null hypothesis is true or not. But basing your decision to reject the null hypothesis on the p-value favors your chance of making a good decision.<br />
</li>
<li>If the null hypothesis is true, then any p-value is just as likely as any other. You control the probability of making a Type I error by rejecting only when the p-value falls within a narrow range, typically 0.05 or smaller. A <a href="http://blog.minitab.com/blog/the-stats-cat/understanding-type-1-and-type-2-errors-from-the-feline-perspective-all-mistakes-are-not-equal">Type I error</a> occurs if you incorrectly reject a true null hypothesis.<br />
</li>
<li>If the alternative hypothesis is true, then smaller p-values become more likely and larger p-values become less likely. That’s why you can think of a small p-value as evidence in favor of the alternative hypothesis.<br />
</li>
<li>It is tempting to try to interpret the p-value as the probability that the null hypothesis is true. But that’s not what it is. The null hypothesis is either true, or it’s not. Each time you “spin the wheel” the ball will land in a different slot, giving you a different p-value. But the truth of the null hypothesis—or lack thereof—remains unchanged.<br />
</li>
<li>In the roulette analogy there were different alternative wheels, because there is not usually just a single alternative condition. There are infinitely many mean values that are not equal to 5; my graphs looked at just two of these.<br />
</li>
<li>The probability of rejecting the null hypothesis when the alternative hypothesis is true is called the power of the test. In the 1-sample t-test, the power depends on how different the mean is from the null hypothesis value, relative to the standard error. While you don’t control the true mean, you can reduce the standard error by taking a larger sample. This will give the test greater power.<br />
</li>
</ol>
You Too Can Be a Winner!
<p>To be a winner at p-value roulette, you need to make sure you are performing the right hypothesis test, and that your data fit the assumptions of that test. Minitab’s <a href="http://www.minitab.com/en-us/products/minitab/assistant/">Assistant menu</a> can help you with that. The Assistant helps you choose the right statistical analysis, provides easy-to-understand guidelines to walk you through data collection and analysis. Then it gives you clear graphical output to let you know how to interpret your p-value, while helping you evaluate whether your data are appropriate, so you can trust your results.</p>
<p> </p>
Hypothesis TestingStatisticsStatistics HelpStatsThu, 12 Mar 2015 11:00:00 +0000http://blog.minitab.com/blog/rkelly/p-value-roulette-making-hypothesis-testing-a-winner%E2%80%99s-gameRob KellyUnderstanding Hypothesis Tests: Why We Need to Use Hypothesis Tests in Statistics
http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-why-we-need-to-use-hypothesis-tests-in-statistics
<p>Hypothesis testing is an essential procedure in statistics. A hypothesis test evaluates two mutually exclusive statements about a population to determine which statement is best supported by the sample data. When we say that a finding is statistically significant, it’s thanks to a hypothesis test. How do these tests really work and what does statistical significance actually mean?</p>
<p>In this series of three posts, I’ll help you intuitively understand how hypothesis tests work by focusing on concepts and graphs rather than equations and numbers. After all, a key reason to use <a href="http://www.minitab.com/products/minitab/features/">statistical software </a>like Minitab is so you don’t get bogged down in the calculations and can instead focus on understanding your results.</p>
<p>To kick things off in this post, I highlight the rationale for using hypothesis tests with an example.</p>
The Scenario
<p>An economist wants to determine whether the monthly energy cost for families has changed from the previous year, when the mean cost per month was $260. The economist randomly samples 25 families and records their energy costs for the current year. (The data for this example is <a href="http://support.minitab.com/datasets/FamilyEnergyCost.MTW">FamilyEnergyCost</a> and it is just one of the many data set examples that can be found in <a href="http://support.minitab.com/datasets/">Minitab’s Data Set Library</a>.)</p>
<p><img alt="Descriptive statistics for family energy costs" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/135cd05bde9f7f16ab396a8525d2b09c/desc_stats.png" style="width: 302px; height: 87px;" /></p>
<p>I’ll use these descriptive statistics to create a probability distribution plot that shows you the importance of hypothesis tests. Read on!</p>
The Need for Hypothesis Tests
<p>Why do we even need hypothesis tests? After all, we took a random sample and our sample mean of 330.6 is different from 260. That <em>is</em> different, right? Unfortunately, the picture is muddied because we’re looking at a sample rather than the entire population.</p>
<p>Sampling error is the difference between a sample and the entire population. Thanks to sampling error, it’s entirely possible that while our <em>sample </em>mean is 330.6, the population mean could still be 260. Or, to put it another way, if we repeated the experiment, it’s possible that the second sample mean could be close to 260. A hypothesis test helps assess the likelihood of this possibility!</p>
Use the Sampling Distribution to See If Our Sample Mean is Unlikely
<p>For any given random sample, the mean of the sample almost certainly doesn’t equal the true mean of the population due to sampling error. For our example, it’s unlikely that the mean cost for the entire population is exactly 330.6. In fact, if we took multiple random samples of the same size from the same population, we could plot a distribution of the sample means.</p>
<p>A <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/introductory-concepts/basic-concepts/sampling-distribution/" target="_blank">sampling distribution</a> is the distribution of a statistic, such as the mean, that is obtained by repeatedly drawing a large number of samples from a specific population. This distribution allows you to determine the probability of obtaining the sample statistic.</p>
<p>Fortunately, I can create a plot of sample means without collecting many different random samples! Instead, I’ll create a <a href="http://blog.minitab.com/blog/adventures-in-statistics/graphing-distributions-with-probability-distribution-plots" target="_blank">probability distribution plot</a> using the <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/probability-distributions-and-random-data/distributions/t-distribution/" target="_blank">t-distribution</a>, the sample size, and the <a href="http://blog.minitab.com/blog/adventures-in-statistics/assessing-variability-for-quality-improvement" target="_blank">variability</a> in our sample to graph the sampling distribution.</p>
<p>Our goal is to determine whether our sample mean is significantly different from the null hypothesis mean. Therefore, we’ll use the graph to see whether our sample mean of 330.6 is unlikely assuming that the population mean is 260. The graph below shows the expected distribution of sample means.</p>
<p><img alt="Sampling distribution plot for the null hypothesis" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/76699c4c1f2bd6c83b88c1ac8e93aa54/sampling_dist_null.png" style="width: 595px; height: 397px;" /></p>
<p>You can see that the most probable sample mean is 260, which makes sense because we’re assuming that the null hypothesis is true. However, there is a reasonable probability of obtaining a sample mean that ranges from 167 to 352, and even beyond! The takeaway from this graph is that while our sample mean of 330.6 is not the most probable, it’s also not outside the realm of possibility.</p>
The Role of Hypothesis Tests
<p>We’ve placed our sample mean in the context of all possible sample means while assuming that the null hypothesis is true. Are these results statistically significant?</p>
<p>As you can see, there is no magic place on the distribution curve to make this determination. Instead, we have a continual decrease in the probability of obtaining sample means that are further from the null hypothesis value. Where do we draw the line?</p>
<p>This is where hypothesis tests are useful. A hypothesis test allows us quantify the probability that our sample mean is unusual. In <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-significance-levels-alpha-and-p-values-in-statistics">my next blog post</a>, I’ll continue to use this graphical framework and add in the significance level and P value to show how hypothesis tests work and what statistical significance means.</p>
<p>If you'd like to see how I made these graphs, please read: <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-create-a-graphical-version-of-the-1-sample-t-test-in-minitab" target="_blank">How to Create a Graphical Version of the 1-sample t-Test</a>.</p>
Data AnalysisHypothesis TestingStatisticsStatistics HelpStatsThu, 05 Mar 2015 17:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-why-we-need-to-use-hypothesis-tests-in-statisticsJim FrostChoosing Between a Nonparametric Test and a Parametric Test
http://blog.minitab.com/blog/adventures-in-statistics/choosing-between-a-nonparametric-test-and-a-parametric-test
<p>It’s safe to say that most people who use statistics are more familiar with parametric analyses than nonparametric analyses. Nonparametric tests are also called distribution-free tests because they don’t assume that your data follow a specific distribution.</p>
<p>You may have heard that you should use nonparametric tests when your data don’t meet the assumptions of the parametric test, especially the assumption about normally distributed data. That sounds like a nice and straightforward way to choose, but there are additional considerations.</p>
<p>In this post, I’ll help you determine when you should use a:</p>
<ul>
<li>Parametric analysis to test group means.</li>
<li>Nonparametric analysis to test group medians.</li>
</ul>
<p>In particular, I'll focus on an important reason to use nonparametric tests that I don’t think gets mentioned often enough!</p>
Hypothesis Tests of the Mean and Median
<p>Nonparametric tests are like a parallel universe to parametric tests. The table shows related pairs of <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/hypothesis-tests/basics/hypothesis-tests-in-minitab/" target="_blank">hypothesis tests</a> that <a href="http://www.minitab.com/en-us/products/minitab/features/" target="_blank">Minitab statistical software</a> offers.</p>
<p style="text-align: center;"><strong>Parametric tests (means)</strong></p>
<p style="text-align: center;"><strong>Nonparametric tests (medians)</strong></p>
<p style="text-align: center;">1-sample t test</p>
<p style="text-align: center;">1-sample Sign, 1-sample Wilcoxon</p>
<p style="text-align: center;">2-sample t test</p>
<p style="text-align: center;">Mann-Whitney test</p>
<p style="text-align: center;">One-Way ANOVA</p>
<p style="text-align: center;">Kruskal-Wallis, Mood’s median test</p>
<p style="text-align: center;">Factorial DOE with one factor and one blocking variable</p>
<p style="text-align: center;">Friedman test</p>
Reasons to Use Parametric Tests
<p><strong>Reason 1: Parametric tests can perform well with skewed and nonnormal distributions</strong></p>
<p>This may be a surprise but parametric tests can perform well with continuous data that are nonnormal if you satisfy these sample size guidelines.</p>
<p style="text-align: center;"><strong>Parametric analyses</strong></p>
<p style="text-align: center;"><strong>Sample size guidelines for nonnormal data</strong></p>
<p style="text-align: center;">1-sample t test</p>
<p style="text-align: center;">Greater than 20</p>
<p style="text-align: center;">2-sample t test</p>
<p style="text-align: center;">Each group should be greater than 15</p>
<p style="text-align: center;">One-Way ANOVA</p>
<ul>
<li style="text-align: center;">If you have 2-9 groups, each group should be greater than 15.</li>
<li style="text-align: center;">If you have 10-12 groups, each group should be greater than 20.</li>
</ul>
<p><strong>Reason 2: Parametric tests can perform well when the spread of each group is different</strong></p>
<p>While nonparametric tests don’t assume that your data follow a normal distribution, they do have other assumptions that can be hard to meet. For nonparametric tests that compare groups, a common assumption is that the data for all groups must have the same spread (dispersion). If your groups have a different spread, the nonparametric tests might not provide valid results.</p>
<p>On the other hand, if you use the 2-sample t test or One-Way ANOVA, you can simply go to the <strong>Options</strong> subdialog and uncheck <em>Assume equal variances</em>. Voilà, you’re good to go even when the groups have different spreads!</p>
<p><strong>Reason 3: Statistical power</strong></p>
<p>Parametric tests usually have more <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/power-and-sample-size/what-is-power/" target="_blank">statistical power</a> than nonparametric tests. Thus, you are more likely to detect a significant effect when one truly exists.</p>
Reasons to Use Nonparametric Tests
<p><strong>Reason 1: Your area of study is better represented by the median</strong></p>
<p><img alt="Comparing two skewed distributions" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/7223b01bc095dbd652bd863be5288cfe/mean_or_median.png" style="float: right; width: 200px; height: 181px; margin: 10px 15px;" />This is my favorite reason to use a nonparametric test and the one that isn’t mentioned often enough! The fact that you <em>can</em> perform a parametric test with nonnormal data doesn’t imply that the mean is the best <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/summary-statistics/measures-of-central-tendency/" target="_blank">measure of the central tendency</a> for your data.</p>
<p>For example, the center of a skewed distribution, like income, can be better measured by the median where 50% are above the median and 50% are below. If you add a few billionaires to a sample, the mathematical mean increases greatly even though the income for the typical person doesn’t change.</p>
<p>When your distribution is skewed enough, the mean is strongly affected by changes far out in the distribution’s tail whereas the median continues to more closely reflect the center of the distribution. For these two distributions, a random sample of 100 from each distribution produces means that are significantly different, but medians that are not significantly different.</p>
<p>Two of my colleagues have written excellent blog posts that illustrate this point:</p>
<ul>
<li>Michelle Paret: <a href="http://blog.minitab.com/blog/michelle-paret/using-the-mean-its-not-always-a-slam-dunk" target="_blank">Using the Mean in Data Analysis: It’s Not Always a Slam-Dunk</a></li>
<li>Redouane Kouiden: <a href="http://blog.minitab.com/blog/statistics-for-lean-six-sigma/the-non-parametric-economy-what-does-average-actually-mean" target="_blank">The Non-parametric Economy: What Does Average Actually Mean?</a></li>
</ul>
<p><strong>Reason 2: You have a very small sample size</strong></p>
<p>If you don’t meet the sample size guidelines for the parametric tests and you are not confident that you have normally distributed data, you should use a nonparametric test. When you have a really small sample, you might not even be able to ascertain the distribution of your data because the distribution tests will lack sufficient power to provide meaningful results.</p>
<p>In this scenario, you’re in a tough spot with no valid alternative. Nonparametric tests have less power to begin with and it’s a double whammy when you add a small sample size on top of that!</p>
<p><strong>Reason 3: You have ordinal data, ranked data, or outliers that you can’t remove</strong></p>
<p>Typical parametric tests can only assess continuous data and the results can be significantly affected by outliers. Conversely, some nonparametric tests can handle ordinal data, ranked data, and not be seriously affected by outliers. Be sure to check the assumptions for the nonparametric test because each one has its own data requirements.</p>
Closing Thoughts
<p>It’s commonly thought that the need to choose between a parametric and nonparametric test occurs when your data fail to meet an assumption of the parametric test. This can be the case when you have both a small sample size and nonnormal data. However, other considerations often play a role because parametric tests can often handle nonnormal data. Conversely, nonparametric tests have strict assumptions that you can’t disregard.</p>
<p>The decision often depends on whether the mean or median more accurately represents the center of your data’s distribution.</p>
<ul>
<li>If the mean accurately represents the center of your distribution and your sample size is large enough, consider a parametric test because they are more powerful.</li>
<li>If the median better represents the center of your distribution, consider the nonparametric test even when you have a large sample.</li>
</ul>
<p>Finally, if you have a very small sample size, you might be stuck using a nonparametric test. Please, collect more data next time if it is at all possible! As you can see, the sample size guidelines aren’t really that large. Your chance of detecting a significant effect when one exists can be very small when you have both a small sample size and you need to use a less efficient nonparametric test!</p>
Hypothesis TestingStatisticsStatistics HelpThu, 19 Feb 2015 13:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/choosing-between-a-nonparametric-test-and-a-parametric-testJim Frost