Hypothesis Testing | MinitabBlog posts and articles about hypothesis testing, especially in the course of Lean Six Sigma quality improvement projects.
http://blog.minitab.com/blog/hypothesis-testing-2/rss
Tue, 25 Jul 2017 00:41:11 +0000FeedCreator 1.7.3Gleaning Insights from Election Data with Basic Statistical Tools
http://blog.minitab.com/blog/statistics-and-more/gleaning-insights-from-election-data-with-basic-statistical-tools
<p>One of the biggest pieces of international news last year was the so-called "Brexit" referendum, in which a majority of voters in the United Kingdom cast their ballots to leave the European Union (EU).</p>
<p><img alt="Polling station in the United Kingdom" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ccd0e2bc94b8749ffd9e4ecf49ccd179/polling_station.jpg" style="width: 300px; height: 244px; margin: 10px 15px; float: right;" />That outcome shocked the world. Follow-up media coverage has asserted that the younger generation prefers to remain in the EU since that means more opportunities on the continent. The older generation, on the other hand, prefers to leave the EU.</p>
<p>As a statistician, I wanted to look at the data to see what I could find out about the Brexit vote, and recently the BBC <a href="http://www.bbc.co.uk/news/uk-politics-38762034">published an article</a> that included some detailed data.</p>
<p>In this post, I'll use Minitab Statistical Software to explore the data from the BBC site along with the <a href="https://www.electoralcommission.org.uk/find-information-by-subject/elections-and-referendums/past-elections-and-referendums/eu-referendum">data from the Electoral Commission website</a>. I hope this exploration will give you some ideas about how you might use publicly available data to get insights about your customers or other aspects of your business.</p>
<p>The electoral commission data contains the voting details of all 382 regions in the United Kingdom. It includes information on voter turnout, the percent who voted to leave the EU, and the percent who voted to remain. (If you'd like to follow along, open the <a href="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/d851330cd1b38a9afba9cf524c3353e7/brexitdata1.mtw">BrexitData1</a> and <a href="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/eb505bc52b3837fa15673329836d76d3/brexitdata2.mtw">BrexitData2</a> worksheets in Minitab 18. If you don't already have Minitab, you can <a href="http://www.minitab.com/products/minitab/free-trial/">download the 30-day trial</a>.)</p>
<p>I began by creating scatterplots (in Minitab, go to <strong>Graph > Scatterplot...</strong>) of the percentage of voter turnout against the percentage of the population that voted to leave for each region, as shown below.</p>
<p style="margin-left: 40px;"><img alt="Scatterplot of Brexit Voter Data1" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/54eb0e04d859e1401f55def43f23bb24/brexit_scatterplot_1.png" style="width: 577px; height: 385px;" /></p>
<p style="margin-left: 40px;"><img alt="Scatterplot of Brexit Voter Data, #2" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/01aa20381fe1d7e4cb3098af1eba36f4/brexit_scatterplot_2.png" style="width: 577px; height: 385px;" /></p>
<p>According to commentators, areas with high voter turnout had a tendency to vote to leave, as the elderly were more likely to turn up to vote. There is also a perceptible difference between the plots for the different areas.</p>
<p>To make this easier to analyze, I created an indicator variable called “decided to leave” in my Minitab worksheet. This variable takes the value of 1 if the area voted to leave the EU, and takes the value 0 otherwise. Tallying the number of areas in each region that voted to leave or remain (<strong>Stat > Tables > Tally Individual Variables...</strong>) yields the following:</p>
<p style="margin-left: 40px;"><img alt="Tabulated Brexit Statistics: Region, Decided to Leave" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/e2223885372be6070d6a403cba2a1604/tabulated_statistics_region_decided_to_leave.jpg" style="width: 503px; height: 426px;" /></p>
<p>There are indeed regional differences. For example, London and Scotland voted strongly to remain while North East and North West voted strongly to leave. So, do we see greater voter turnout in the regions that voted to leave? Looking at the average turnout in each region (using <strong>Stat > Display Descriptive Statistics...</strong>), we have the following:</p>
<p style="margin-left: 40px;"><img alt="Brexit Data - Descriptive Statistics" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/de2ce324872b704f7bd93a2ff2954d8a/descriptive_statistics_percent_turnout.jpg" style="width: 380px; height: 346px;" /></p>
<p>Surprisingly, the average turnout of regions that voted strongly to leave is not very different from the turnout of regions that voted strongly to remain. For example, the average turnout of 69.817% in London compared to 70.739% in North West.</p>
<p>The data set analyzed in the BBC article contains localised voting data supplied to the BBC by councils which counted the EU referendum. This data is more detailed than the regional data from the Electoral Commission, and it includes a detailed breakdown of how the people in individual electoral wards voted.</p>
<p>The BBC asked all the counting areas for these figures. Three councils did not reply. The remaining missing data could be due to any of the following reasons:</p>
<ul>
<li>The council refused to give the information to the BBC.</li>
<li>No geographical information was available because all ballot boxes were mixed before counting.</li>
<li>The council conducted a number of mini-counts that combined ballot boxes in a way that does not correspond to individual wards.</li>
</ul>
<p>For those wards that have voting data, I also gathered the following information from the last census for each area.</p>
<ul>
<li>Percent of population in an area with level 4 qualification or higher. This includes individuals with a higher certificate/diploma, foundation degree, undergraduate degree, or master’s degree up to a doctorate. I will call this variable “degree” to represent individuals holding degrees or equivalent qualification.</li>
<li>Percentage of young people (age 18-29) in an area.</li>
<li>Percentage of middle-aged (age 30-59) in an area.</li>
<li>Percentage of elderly (age 65 or above) in an area.</li>
</ul>
<p>There is some difference in how some wards are defined between this data set and the data from the last census, perhaps due to changes in ward boundaries. Thus, for some wards, it was not possible to match the corresponding percentages of different age groups and degree holders. Therefore, some areas had to be omitted from my analysis, leaving me with data from a total of 1,069 wards.</p>
<p>With the exception of Scotland, Northern Ireland, and Wales, I have data from wards in all regions of the UK. The number of measurements from each region appears below.</p>
<p style="margin-left: 40px;"><img alt="Brexit Data, Descriptive Statistics N" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/6c992d074b0c8616f39ebced04f84a8d/descriptive_statistics_n_brexit_data.png" style="width: 418px; height: 312px;" /></p>
<p>As with the Electoral Commission data, let’s begin by looking at some graphs. Below is a scatterplot of the percentage voting to leave against the percent of the population with a degree in an area.</p>
<p style="margin-left: 40px;"><img alt="Scatterplot of Brexit Data: Leave % vs. Degree" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/3f343dda6be65f6b25e657f39129b540/brexit_scatterplot_3.png" style="width: 577px; height: 385px;" /></p>
<p>As you can see, the higher the percentage of people in an area who had a degree, the lower the percentage of the population that voted to leave. However, there are exceptions. For example, for Osterley and Spring Grove in Hounslow, the percentage that voted to leave is 63.41%, with a higher percentage of degree holders at 37.5566%. However, the area has a small proportion of young adults, at 19.3538%.</p>
<p>Let's look at the voting behaviour for different age groups. I created scatterplots of the percentage that voted to leave against different age groups.</p>
<p>The next plot shows percentage that voted to leave against the percentage of young people (age 18-29) in an area:</p>
<p style="margin-left: 40px;"><img alt="Scatterplot of Brexit Data: Leave% vs Young" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/83203f8cfa58ea8a30e6e3a0b39d5d23/brexit_scatterplot_4.png" style="width: 577px; height: 385px;" /></p>
<p>Areas with a higher percentage of young people appear to have a smaller percentage of people who voted to leave.</p>
<p>The following plot shows the percentage of the population that voted to leave against the percentage of elderly residents:</p>
<p style="margin-left: 40px;"><img alt="Scatterplot of Brexit Data: Leave% vs. Elderly" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/708bbc59b66d9eb6028472cb3e7feb1f/brexit_scatterplot_5.png" style="width: 577px; height: 385px;" /></p>
<p>This plot shows the opposite situation shown in the previous one: areas with a higher proportion of elderly residents voted more strongly to leave.</p>
<p>These scatterplots support what’s being said in pieces such as the article on the BBC's website. However, in statistics, we like to verify that the relationship is significant. Let’s look at the correlation coefficients (<strong>Stat > Basic Statistics > Correlation...</strong>).</p>
<p style="margin-left: 40px;"><img alt="Brexit Data: Correlation - Leave%, Degree, Young, Elderly" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a3eaedc8c0404005e8caec42699d8ad1/correlation_coefficients.jpg" style="width: 340px; height: 416px;" /></p>
<p>The correlation output in Minitab includes a <a href="http://blog.minitab.com/blog/adventures-in-statistics-2/how-to-correctly-interpret-p-values">p-value</a>. If the p-value is less than the chosen significance level, it tells you the correlation coefficient is significantly different from 0—in other words, a correlation exists. Since we selected an alpha value (or significance level) of 0.05, we can say that all the coefficients calculated above are significant and that there are correlations between these factors.</p>
<p>Thus, the proportion of degree holders in an area has a strong negative impact on voting to leave. On the other hand, the proportion of elderly residents in an area has a strong positive impact on voting to leave.</p>
<p>Going a step further, I fit a regression model (<strong>Stat > Regression > Regression > Fit Regression Model...</strong>) that links the percent voting to leave with the proportion of degree holders and different age groups.</p>
<p style="margin-left: 40px;"><img alt="Brexit Data Regression: Leave% vs Degree, Young, Middle-age, Elderly" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/d9f5483603ddb35ffdea02a3c0f856e6/brexit_regression_output.jpg" style="width: 695px; height: 655px;" /></p>
<p>While there is no need to use the equation to make a prediction, we can still get some interesting information from the results.</p>
<p>The different age groups and proportion of degree holders all have an impact on the percentage voting to leave. The coefficient for the “degree” term is negative, and this implies for each unit increase in the percent of degree holders, the percentage voting to leave drops by 1.4095. On the other hand, for a unit increase in the percentage of elderly, the percentage voting to leave increases by 1.2732. In addition, there is a significant interaction between the percentage of degree holders and young people: Every unit increase in this interaction term only increases the percent voting to leave by 0.00641.</p>
<p>The results I obtained when I analyzed the data with Minitab support the commonly held view that younger voters preferred to remain in the EU, while older voters preferred to leave. The analysis also underscores the complicated politics surrounding Brexit, a reality that became apparent in the recent general election. One thing seems certain now that Brexit talks are imminent: balancing the needs and desires of the people from different age groups and backgrounds will be a tremendous task.</p>
Data AnalysisGovernmentHypothesis TestingRegression AnalysisStatisticsStatistics in the NewsMon, 26 Jun 2017 12:44:51 +0000http://blog.minitab.com/blog/statistics-and-more/gleaning-insights-from-election-data-with-basic-statistical-toolsEugenie ChungThe Null Hypothesis: Always “Busy Doing Nothing”
http://blog.minitab.com/blog/using-data-and-statistics/the-null-hypothesis-always-busy-doing-nothing
<p>The 1949 film <a href="http://www.imdb.com/title/tt0041259/" target="_blank"><em>A Connecticut Yankee in King Arthur's Court</em></a> includes the song “Busy Doing Nothing,” and this could be written about the <a href="http://blog.minitab.com/blog/understanding-statistics/things-statisticians-say-failure-to-reject-the-null-hypothesis">Null Hypothesis</a> as it is used in statistical analyses. </p>
<p></p>
<p>The words to the song go:</p>
<p style="margin-left: 40px;"><em>We're busy doin' nothin'<br />
<span style="line-height: 1.6;">Workin' the whole day through<br />
Tryin' to find lots of things not to do </span></em></p>
<p><span style="line-height: 1.6;">And that summarises the role of the Null Hypothesis perfectly. Let me explain why.</span></p>
<span style="line-height: 1.6;">What's the Question?</span>
<p>Before doing any statistical analysis—in fact even before we collect any data—we need to define what problem and/or question we need to answer. Once we have this, we can then work on defining our Null and Alternative Hypotheses.</p>
<p>The null hypothesis is always the option that maintains the status quo and results in the least amount of disruption, hence it is “Busy Doin’ Nothin'”. </p>
<p>When the probability of the Null Hypothesis is very low and we reject the Null Hypothesis, then we will have to take some action and we will no longer be “Doin Nothin'”.</p>
<p>Let’s have a look at how this works in practice with some common examples.</p>
<p><strong>Question</strong></p>
<p><strong>Null Hypothesis</strong></p>
Do the chocolate bars I am selling weigh 100g?
Chocolate Weight = 100g<br />
<br />
If I am giving my customers the right size chocolate bars I don’t need to make changes to my chocolate packing process.<br />
Are the diameters of my bolts normally distributed?
<p>Bolt diameters are n<span style="line-height: 1.6;">ormally distributed.</span></p>
<p>If my bolt diameters are normally distributed I can use any statistical techniques that use the standard normal approach.<br />
</p>
Does the weather affect how my strawberries grow?
Number of hours sunshine has no effect on strawberry yield<br />
<br />
Amount of rain has no effect on strawberry yield<br />
<br />
Temperature has no effect on strawberry yield<br />
<p>Note that the last instance in the table, investigating if weather affects the growth of my strawberries, is a bit more complicated. That's because I needed to define some metrics to measure the weather. Once I decided that the weather was a combination of sunshine, rain and temperature, I established my null hypotheses. These all assume that none of these factors impact the strawberry yield. I only need to control the sunshine, temperature and rain if the probability that they have no effect is very small.</p>
Is Your Null Hypothesis Suitably Inactive?
<p><span style="line-height: 1.6;">So in conclusion, in order to be “Busy Doin’ Nothin’”, your Null Hypothesis has to be as follows:</span></p>
<ul>
<li>A logical question.</li>
<li>Focused on one objective.</li>
<li>Requires action only if <a href="http://blog.minitab.com/blog/michelle-paret/alphas-p-values-confidence-intervals-oh-my">its probability of being true</a> is low (typically 5%).</li>
</ul>
Hypothesis TestingStatisticsFri, 26 May 2017 12:00:00 +0000http://blog.minitab.com/blog/using-data-and-statistics/the-null-hypothesis-always-busy-doing-nothingGillian GroomHow Can a Similar P-Value Mean Different Things?
http://blog.minitab.com/blog/understanding-statistics/how-can-a-similar-p-value-mean-different-things
<p>One highlight of writing for and editing the Minitab Blog is the opportunity to read your responses and answer your questions. Sometimes, to my chagrin, you point out that we've made a mistake. However, I'm particularly grateful for those comments, because it permits us to correct inadvertent errors. </p>
<p><img alt="opposites" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/23935a16810d11052612b597b5c0d6f2/opposites.jpg" style="width: 302px; height: 219px; float: right; border-width: 0px; border-style: solid; margin: 10px 15px;" />I feared I had an opportunity to fix just such an error when I saw this comment appear on one of our older blog posts:</p>
<p style="margin-left: 40px;"><em>You said a p-value greater than 0.05 gives a good fit. However, in another post, you say the p-value should be below 0.05 if the result is significant. Please, check it out!</em></p>
<p>You ever get a chill down your back when you realize you goofed? That's what I felt when I read that comment. <em>Oh no,</em> I thought. <em>If the p-value is greater than 0.05, the results of a test certainly wouldn't be significant. Did I overlook an error that basic? </em></p>
<p>Before beating myself up about it, I decided to check out the posts in question. After reviewing them, I realized I wouldn't need to put on the hairshirt after all. But the question reminded me about the importance of a fundamental idea. </p>
It Starts with the Hypothesis
<p>If you took an introductory statistics course at some point, you probably recall the instructor telling the class how important it is to formulate your hypotheses clearly. <em>Excellent </em>advice.</p>
<p>However, many commonly used statistical tools formulate their hypotheses in ways that don't quite match. That's what this sharp-eyed commenter noticed and pointed out.</p>
<p>The writer of the first post detailed how to use Minitab to <a href="http://blog.minitab.com/blog/meredith-griffith/identifying-the-distribution-of-your-data">identify the distribution of your data</a>, and in her example pointed out that a p-value greater than 0.05 meant that the data were a good fit for a given distribution. The writer of second post—yours truly—commented on the alarming tendency to use deceptive language to <a href="http://blog.minitab.com/blog/understanding-statistics/what-can-you-say-when-your-p-value-is-greater-than-005">describe a high p-value as if it indicated statistical significance</a>. </p>
<p>To put it in plain language, my colleague's post cited the high p-value as an indicator of a positive result. And my post chided people who cite a high p-value as an indicator of a positive result. </p>
<p>Now, what's so confusing about that? </p>
Don't Forget What You're Actually Testing
<p>You can see where this looks like a contradiction, but to my relief, the posts were consistent. The appearance of contradiction stemmed from the hypotheses discussed in the two posts. Let's take a look. </p>
<p>My colleague presented this graph, output from the Individual Distribution Identification:</p>
<p style="margin-left: 40px;"><img alt="Probability Plot" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/809b9e485f550984b027a447a547eb76/distribution_id_plot_for_weight_graph1.jpg" style="width: 576px; height: 384px;" /></p>
<p>The individual distribution identification is a kind of hypothesis test, and so the p-value helps you determine whether or not to reject the null hypothesis.</p>
<p>Here, the null hypothesis is "The data follow a normal distribution," and the alternative hypothesis would be "The data DO NOT follow a normal distribution." If the p-value is over 0.05, we will <em>fail to reject</em> the null hypothesis and conclude that the data follow the normal distribution.</p>
<p>Just have a look at that p-value:</p>
<p style="margin-left: 40px;"><img alt="P value" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/fe3a6a4411f22f6352b1a32c65920449/p_value.gif" style="width: 103px; height: 88px;" /></p>
<p><em>That's </em>a high p-value. And for this test, that means we can conclude the normal distribution fits the data. So if we're checking these data for the <a href="http://support.minitab.com/en-us/minitab/17/topic-library/quality-tools/control-charts/data/normality-assumptions-for-control-charts/">assumption of normality</a>, this high p-value is good. </p>
<p>But more often we're looking for a low p-value. In a t-test, the null hypothesis might be "The sample means ARE NOT different," and the alternative hypothesis, "The sample means ARE different." Seen this way, the value or arrangement of the hypotheses is the opposite of that in the distribution identification. </p>
<p>Hence, the apparent contradiction. But in both cases a p-value greater than 0.05 means we fail to reject the null hypothesis. We're interpreting the p-value in each test the same way.</p>
<p>However, because the connotations of "good" and "bad" are different in the two examples, how we talk about these respective p-values appears contradictory—until we consider exactly what the null and alternative hypotheses are saying. </p>
<p>And that's a point I was happy to be reminded of. </p>
<p> </p>
Fun StatisticsHypothesis TestingStatisticsStatistics HelpStatsFri, 05 May 2017 12:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/how-can-a-similar-p-value-mean-different-thingsEston MartzWhich Statistical Error Is Worse: Type 1 or Type 2?
http://blog.minitab.com/blog/understanding-statistics/which-statistical-error-is-worse%3A-type-1-or-type-2
<p>People can make mistakes when they test a hypothesis with statistical analysis. Specifically, they can make either Type I or Type II errors.</p>
<p>As you analyze your own data and test hypotheses, understanding the difference between Type I and Type II errors is extremely important, because there's a risk of making each type of error in every analysis, and the amount of risk is in your control. </p>
<p><img alt="What's the worst that could happen? " src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/7a1f815604183cc05ca6569a0480edb2/trainwreck.png" style="width: 350px; height: 381px; margin: 10px 15px; float: right; border-width: 1px; border-style: solid;" />So if you're testing a hypothesis about a safety or quality issue that could affect people's lives, or a project that might save your business millions of dollars, which type of error has more serious or costly consequences? Is there one type of error that's more important to control than another? </p>
<p>Before we attempt to answer that question, let's review what these errors are. </p>
The Null Hypothesis and Type 1 and 2 Errors
<div>When statisticians refer to Type I and Type II errors, we're talking about the two ways we can make a mistake regarding the null hypothesis (Ho). The null hypothesis is the default position, akin to the idea of "innocent until proven guilty." We begin any hypothesis test with the assumption that the null hypothesis is correct. </div>
<div> </div>
<div>We commit a Type 1 error if we reject the null hypothesis when it is true. This is a false positive, like a fire alarm that rings when there's no fire.</div>
<div> </div>
<div>A Type 2 error happens if we <a href="http://blog.minitab.com/blog/understanding-statistics/why-shrewd-experts-fail-to-reject-the-null-every-time">fail to reject the null</a> when it is not true. This is a false negative—like an alarm that fails to sound when there <em>is </em>a fire.</div>
<div> </div>
<div>It's easier to understand in the table below, which you'll see a version of in every statistical textbook:</div>
<div> </div>
<div>
<strong>Reality</strong>
<em>Null (H0) not rejected</em>
<strong><span style="font-weight: normal;"><em>Null (H0) rejected</em></span></strong>
Null (H0) is true.
Correct conclusion.
<strong><span style="color: rgb(255, 0, 0);">Type 1 error</span></strong>
Null (H0) is false.
<strong><span style="color: rgb(255, 0, 0);">Type 2 error</span></strong>
Correct conclusion.
</div>
<p>These errors relate to the statistical concepts of risk, significance, and power.</p>
Reducing the Risk of Statistical Errors
<p>Statisticians call the risk, or probability, of making a Type I error "alpha," aka "significance level." In other words, it's your willingness to risk rejecting the null when it's true. Alpha is commonly set at 0.05, which is a 5 percent chance of rejecting the null when it is true. The lower the alpha, the less your risk of rejecting the null incorrectly. In life-or-death situations, for example, an alpha of 0.01 reduces the chance of a Type I error to just 1 percent.<br />
<br />
A Type 2 error relates to the concept of "power," and the probability of making this error is referred to as "beta." We can reduce our risk of making a Type II error by making sure our test has enough power—which depends on whether the sample size is sufficiently large to detect a difference when it exists. </p>
The Default Argument for "Which Error Is Worse"
<p>Let's return to the question of which error, Type 1 or Type 2, is worse. The go-to example to help people think about this is a defendant accused of a crime that demands an extremely harsh sentence.</p>
<p>The null hypothesis is that the defendant is innocent. Of course you wouldn't want to let a guilty person off the hook, but most people would say that sentencing an innocent person to such punishment is a worse consequence.</p>
<p>Hence, many textbooks and instructors will say that the Type 1 (false positive) is worse than a Type 2 (false negative) error. The rationale boils down to the idea that if you stick to the status quo or default assumption, at least you're not making things <em>worse</em>. </p>
<p>And in many cases, that's true. But like so much in statistics, in application it's not really so black or white. The analogy of the defendant is great for teaching the concept, but when we try to make it a rule of thumb for which type of error is worse in practice, it falls apart.</p>
So Which Type of Error Is Worse, Already?
<p>I'm sorry to disappoint you, but as with so many things in life and statistics, the honest answer to this question has to be, "It depends."</p>
<p>In one instance, the Type I error may have consequences that are less acceptable than those from a Type II error. In another, the Type II error could be less costly than a Type I error. And sometimes, as Dan Smith pointed out in <em><a href="http://magazine.amstat.org/blog/2013/11/01/mathmyopia/" target="_blank">Significance</a> </em>a few years back with respect to Six Sigma and quality improvement, "neither" is the only answer to which error is worse: </p>
<p style="margin-left: 40px;"><em>Most Six Sigma students are going to use the skills they learn in the context of business. In business, whether we cost a company $3 million by suggesting an alternative process when there is nothing wrong with the current process or we fail to realize $3 million in gains when we should switch to a new process but fail to do so, the end result is the same. The company failed to capture $3 million in additional revenue. </em></p>
Look at the Potential Consequences
<p>Since there's not a clear rule of thumb about whether Type 1 or Type 2 errors are worse, our best option when using data to test a hypothesis is to look very carefully at the fallout that might follow both kinds of errors. Several experts suggest using a table like the one below to detail the consequences for a Type 1 and a Type 2 error in your particular analysis. </p>
<div>
<strong>Null </strong>
<em><strong>Type 1 Error: H0 true, but rejected </strong></em>
<em><strong>Type 2 Error: H0 false, but not rejected</strong></em>
<em>Medicine A does not relieve Condition B.</em>
Medicine A does not relieve Condition B, but is not eliminated as a treatment option.
Medicine A relieves Condition B, but is eliminated as a treatment option.
<strong>Consequences</strong>
Patients with Condition B who receive Medicine A get no relief. They may experience worsening condition and/or side effects, up to and including death. Litigation possible.
A viable treatment remains unavailable to patients with Condition B. Development costs are lost. Profit potential is eliminated.
</div>
<p>Whatever your analysis involves, understanding the difference between Type 1 and Type 2 errors, and considering and mitigating their respective risks as appropriate, is always wise. For each type of error, make sure you've answered this question: "What's the worst that could happen?" </p>
<p>To explore this topic further, check out this article on <a href="http://www.minitab.com/en-us/Published-Articles/Minitab-s-Power-and-Sample-Size-Tools/">using power and sample size calculations</a> to balance your risk of a type 2 error and testing costs, or this blog post about <span><a href="http://blog.minitab.com/blog/statistics-and-quality-improvement/understanding-alpha-alleviates-alarm">considering the appropriate alpha</a></span> for your particular test. </p>
<p><br />
</p>
Hypothesis TestingStatisticsStatistics HelpStatsWed, 08 Mar 2017 19:24:15 +0000http://blog.minitab.com/blog/understanding-statistics/which-statistical-error-is-worse%3A-type-1-or-type-2Eston MartzP-value Roulette: Making Hypothesis Testing a Winner’s Game
http://blog.minitab.com/blog/rkelly/p-value-roulette-making-hypothesis-testing-a-winner%E2%80%99s-game
<p>Welcome to the Hypothesis Test Casino! The featured game of the house is roulette. But this is no <em>ordinary</em> game of roulette. This is p-value roulette!</p>
<p>Here’s how it works: We have two roulette wheels, the Null wheel and the Alternative wheel. Each wheel has 20 slots (instead of the usual 37 or 38). You get to bet on one slot.</p>
<p><img alt="http://upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Edvard_Munch_-_At_the_Roulette_Table_in_Monte_Carlo_-_Google_Art_Project.jpg/256px-Edvard_Munch_-_At_the_Roulette_Table_in_Monte_Carlo_-_Google_Art_Project.jpg" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/8647ae2930d63e128d09f0b2cc5cdb87/p_value_roulette.jpg" style="line-height: 20.7999992370605px; border-width: 1px; border-style: solid; margin: 10px 15px; width: 256px; height: 166px; float: right;" /></p>
<p>What happens if the ball lands in the slot you bet on? Well, that depends on which wheel we spin. If we spin the Null wheel, you lose your bet. But if we spin the Alternative wheel, you win!</p>
<p>I’m sorry, but we can’t tell you which wheel we’re spinning.</p>
<p>Doesn’t that sound like a good game?</p>
<p>Not convinced yet? I assure you the odds are in your favor <em>if </em>you choose your slot wisely. Look, I’ll show you a graph of some data from the Null wheel. We spun it 10,000 times and counted how many times the ball landed in each slot. As you can see each slot is just as likely as any other, with a probability of about 0.05 each. That means there’s a 95% probability the ball won’t land on your slot, so you have only a 5% chance of losing—no matter what—<em>if</em> we happen to spin the Null wheel.</p>
<p><img alt="histogram of p values for null hypothesis" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/dc5efcd7001f33a77bea1c635af837e5/histogram_of_p_values_null_hypothesis.png" style="width: 576px; height: 384px;" /></p>
<p>What about that Alternative wheel, you ask? Well, we’ve had quite a few different Alternative wheels over the years. Here’s a graph of some data from one we were spinning last year:</p>
<p><img alt="histogram of p values from alternative hypothesis" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/dd0cafe3375f3202adaf3542d15eb9ab/histogram_of_p_values_alternative_hypothesis.png" style="width: 576px; height: 384px;" /></p>
<p>And just a few months ago, we had a different one. Check out the data from this one. It was very, very popular.</p>
<p><img alt=" histogram of p-values from popular alternative hypothesis" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/fc6f0ff641e7eb4d3f7750c8163ac968/histogram_of_p_values_alternative_hypothesis_2.png" style="width: 576px; height: 384px;" /></p>
<p>Now that’s what I call an Alternative! People in the know always picked the first slot. You can see why.</p>
<p>I’m not allowed to show you data from the current game. But I assure you the Alternatives all follow this same pattern. They tend to favor those smaller numbers.</p>
<p>So, you’d like to play? Great! Which slot would you like to bet on?</p>
Is this on the level?
<p>No, I don’t really have a casino with two roulette wheels. My graphs are simulated p-values for a <a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/what-is-a-t-test-and-why-is-it-like-telling-a-kid-to-clean-up-that-mess-in-the-kitchen">1-sample t-test</a>. The null hypothesis is that the mean of a process or population is 5. The two-sided alternative is that the mean is different from 5. In my first graph, the null hypothesis was true: I used Minitab to generate random samples of size 20 from a normal distribution with mean 5 and standard deviation of 1. For the other two graphs, the only thing I changed was the mean of the normal distribution I sampled from. For the second graph, the mean was 5.3. For the final graph, the mean was 5.75.</p>
<p>For just about any hypothesis test you do in Minitab <a href="http://www.minitab.com/products/minitab">Statistical Software</a>, you will see a p-value. Once you understand how p-values work, you will have greater insight into what they are telling you. Let’s see what we can learn about p-values from playing p-value roulette.</p>
<ol>
<li>Just as you didn’t know whether you are spinning the Null or Alternative wheel, you don’t know for sure whether the null hypothesis is true or not. But basing your decision to reject the null hypothesis on the p-value favors your chance of making a good decision.<br />
</li>
<li>If the null hypothesis is true, then any p-value is just as likely as any other. You control the probability of making a Type I error by rejecting only when the p-value falls within a narrow range, typically 0.05 or smaller. A <a href="http://blog.minitab.com/blog/the-stats-cat/understanding-type-1-and-type-2-errors-from-the-feline-perspective-all-mistakes-are-not-equal">Type I error</a> occurs if you incorrectly reject a true null hypothesis.<br />
</li>
<li>If the alternative hypothesis is true, then smaller p-values become more likely and larger p-values become less likely. That’s why you can think of a small p-value as evidence in favor of the alternative hypothesis.<br />
</li>
<li>It is tempting to try to interpret the p-value as the probability that the null hypothesis is true. But that’s not what it is. The null hypothesis is either true, or it’s not. Each time you “spin the wheel” the ball will land in a different slot, giving you a different p-value. But the truth of the null hypothesis—or lack thereof—remains unchanged.<br />
</li>
<li>In the roulette analogy there were different alternative wheels, because there is not usually just a single alternative condition. There are infinitely many mean values that are not equal to 5; my graphs looked at just two of these.<br />
</li>
<li>The probability of rejecting the null hypothesis when the alternative hypothesis is true is called the power of the test. In the 1-sample t-test, the power depends on how different the mean is from the null hypothesis value, relative to the standard error. While you don’t control the true mean, you can reduce the standard error by taking a larger sample. This will give the test greater power.<br />
</li>
</ol>
You Too Can Be a Winner!
<p>To be a winner at p-value roulette, you need to make sure you are performing the right hypothesis test, and that your data fit the assumptions of that test. Minitab’s <a href="http://www.minitab.com/en-us/products/minitab/assistant/">Assistant menu</a> can help you with that. The Assistant helps you choose the right statistical analysis, provides easy-to-understand guidelines to walk you through data collection and analysis. Then it gives you clear graphical output to let you know how to interpret your p-value, while helping you evaluate whether your data are appropriate, so you can trust your results.</p>
<p> </p>
Hypothesis TestingStatisticsStatistics HelpStatsMon, 06 Mar 2017 13:00:00 +0000http://blog.minitab.com/blog/rkelly/p-value-roulette-making-hypothesis-testing-a-winner%E2%80%99s-gameRob KellyThree Common P-Value Mistakes You'll Never Have to Make
http://blog.minitab.com/blog/understanding-statistics/three-common-p-value-mistakes-youll-never-have-to-make
<p>Statistics can be challenging, especially if you're not analyzing data and interpreting the results every day. <a href="http://www.minitab.com/products/minitab/" title="statistical software for analyzing quality data">Statistical software</a> makes things easier by handling the arduous mathematical work involved in statistics. But ultimately, we're responsible for correctly interpreting and communicating what the results of our analyses show.</p>
<p>The p-value is probably the most frequently cited statistic. We use p-values to interpret the results of regression analysis, hypothesis tests, and many other methods. Every introductory statistics student and every Lean Six Sigma Green Belt learns about p-values. </p>
<p>Yet this common statistic is misinterpreted so often that at least one scientific journal has abandoned its use.</p>
What Does a P-value Tell You?
<p>Typically, a P value is defined as "the probability of observing an effect at least as extreme as the one in your sample data—<em>if the <span><a href="http://blog.minitab.com/blog/understanding-statistics/why-shrewd-experts-fail-to-reject-the-null-every-time">null hypothesis</a></span> is true</em>." Thus, the only question a p-value can answer is this one:</p>
<p><em>How likely is it that I would get the data I have, assuming the null hypothesis is true?</em></p>
<p>If your p-value is less than your selected <span><a href="http://blog.minitab.com/blog/adventures-in-statistics-2/understanding-hypothesis-tests%3A-significance-levels-alpha-and-p-values-in-statistics">alpha level</a></span> (typically 0.05), you <em>reject the null hypothesis</em> in favor of the alternative hypothesis. If the p-value is above your alpha value, you <em>fail to reject</em> the null hypothesis. It's important to note that the null hypothesis is never accepted; we can only <em>reject </em>or <em>fail to reject</em> it. </p>
The P-Value in a 2-Sample t-Test
<p>Consider a typical hypothesis test—say, a 2-sample t-test of the mean weight of boxes of cereal filled at different facilities. We collect and weigh 50 boxes from each facility to confirm that the mean weight for each line's boxes is the listed package weight of 14 oz. </p>
<p>Our null hypothesis is that the two means are equal. Our alternative hypothesis is that they are <em>not </em>equal. </p>
<p>To run this test in Minitab, we enter our data in a worksheet and select <strong>Stat > Basic Statistics > 2-Sample T-test</strong>. If you'd like to follow along, you can download the <a href="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/2edc594cf40ec4931e5cd0021df6703e/cereal_weight.mtw">data</a> and, if you don't already have it, get the <a href="http://www.minitab.com/products/minitab/free-trial/">30-day trial of Minitab</a>. In the t-test dialog box, select<em> Both samples are in one column</em> from the drop-down menu, and choose "Weight" for Samples, and "Facility" for Sample IDs.</p>
<p style="margin-left: 40px;"><img alt="t test for the mean" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/1a090752bef395f3b227511c6e57946d/dialog.png" style="width: 424px; height: 296px;" /></p>
<p>Minitab gives us the following output, and I've highlighted the p-value for the hypothesis test:</p>
<p style="margin-left: 40px;"><img alt="t-test output" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/3b27f14d1859460a1875c81384c52ccb/t_test_output.png" style="width: 544px; height: 222px;" /></p>
<p>So we have a p-value of 0.029, which is less than our selected alpha value of 0.05. Therefore, we reject the null hypothesis that the means of Line A and Line B are equal. Note also that while the evidence indicates the means are different, that difference is estimated at 0.338 oz—a pretty small amount of cereal. </p>
<p>So far, so good. But this is the point at which trouble often starts.</p>
Three Frequent Misstatements about P-Values
<p>The p-value of 0.029 means we reject the null hypothesis that the means are equal. But that doesn't mean any of the following statements are accurate:</p>
<ol>
<li><strong>"There is 2.9% probability the means are the same, and 97.1% probability they are different." </strong><br />
We don't know that at all. The p-value only says that <strong><em>if </em></strong>the null hypothesis is true, the sample data collected would exhibit a difference this large or larger only 2.9% of the time. Remember that the p-value doesn't tell you anything <em>directly </em>about what you've seen. Instead, it tells you the <em>odds </em>of seeing it. </li>
<br />
<li><strong>"The p-value is low, which indicates there's an important difference in the means." </strong><br />
Based on the 0.029 p-value shown above, we can conclude that a statistically significant difference between the means exists. But the estimated size of that difference is less than a half-ounce, and won't matter to customers. A p-value may indicate a difference exists, but it tells you nothing about its practical impact.</li>
<br />
<li><strong>"The low p-value shows the alternative hypothesis is true."</strong><br />
A low p-value provides statistical evidence to reject the null hypothesis—but that doesn't prove the truth of the alternative hypothesis. If your alpha level is 0.05, there's a 5% chance you will incorrectly reject the null hypothesis. Or to put it another way, if a jury fails to convict a defendant, it doesn't prove the defendant is <em>innocent</em>: it only means the prosecution failed to prove the defendant's guilt beyond a reasonable doubt. </li>
</ol>
<p>These misinterpretations happen frequently enough to be a concern, but that doesn't mean that we shouldn't use p-values to help interpret data. The p-value remains a very useful tool, as long as we're interpreting and communicating its significance accurately.</p>
P-Value Results in Plain Language
<p>It's one thing to keep all of this straight if you're doing data analysis and statistics all the time. It's another thing if you're only analyze data occasionally, and need to do many other things in between—like most of us. "Use it or lose it" is certainly true about statistical knowledge, which could well be another factor that contributes to misinterpreted p-values. </p>
<p>If you're leery of that happening to you, a good way to avoid that possibility is to use the Assistant in Minitab to perform your analyses. If you haven't used it yet, the Assistant menu guides you through your analysis from start to finish. The dialog boxes and output are all in plain language, so it's easy to figure out what you need to do and what the results mean, even if it's been a while since your last analysis. (But even expert statisticians tell us they like using the Assistant because the output is so clear and easy to understand, regardless of an audience's statistical background.) </p>
<p>So let's redo the analysis above using the Assistant, to see what that output looks like and how it can help you avoid misinterpreting your results—or having them be misunderstood by others!</p>
<p>Start by selecting <strong>Assistant > Hypothesis Test...</strong> from the Minitab menu. Note that a window pops up to explain exactly what a hypothesis test does. </p>
<p style="margin-left: 40px;"><img alt="assistant hypothesis test" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f26601f26db3576a7cf2b5bc3178f9ca/assistant_hypothesis_test.png" style="width: 420px; height: 252px;" /></p>
<p>The Assistant asks what we're trying to do, and gives us three options to choose from.</p>
<p style="margin-left: 40px;"><img alt="hypothesis test chooser" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/fba2ee28b10063e1c5f0f00eb77db1b2/assistant_hypothesis_test_chooser.png" style="width: 600px; height: 472px;" /></p>
<p>We know we want to compare a sample from Line A with a sample from Line B, but what if we can't remember which of the 5 available tests is the appropriate one in this situation? We can get guidance by clicking "Help Me Choose."</p>
<p style="margin-left: 40px;"><img alt="help me choose the right hypothesis test" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/51bb23fbb44603efff50fe4fa1d9dbd1/assistant_hypothesis_test_decision_tree.png" style="width: 700px; height: 551px;" /></p>
<p>The choices on the diagram direct us to the appropriate test. In this case, we choose continuous data instead of attribute (and even if we'd forgotten the difference, clicking on the diamond would explain it). We're comparing two means instead of two standard deviations, and we're measuring two different sets of items since our boxes came from different production lines. </p>
<p>Now we know what test to use, but suppose you want to make sure you don't miss anything that's important about the test, like requirements that must be met? Click the "more..." link and you'll get those details. </p>
<p style="margin-left: 40px;"><img alt="more info about the 2-Sampe t-Test" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/1b4f09a2438b0aaef14e8da6564524cf/assistant_hypothesis_test_more_info.png" style="width: 700px; height: 526px;" /></p>
<p>Now we can proceed to the Assistant's dialog box. Again, statistical jargon is minimized and everything is put in straightforward language. We just need to answer a few questions, as shown. Note that the Assistant even lets us tell it how big a difference needs to be for us to consider it practically important. In this case, we'll enter 2 ounces.</p>
<p style="margin-left: 40px;"><img alt="Assistant 2-sample t-Test dialog" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/994d9172bf788282258f765d4d08aefa/assistant_hypothesis_test_dialog.png" style="width: 641px; height: 495px;" /></p>
<p>When we press OK, the Assistant performs the t-test and delivers three reports. The first of these is a summary report, which includes summary statistics, confidence intervals, histograms of both samples, and more. And interpreting the results couldn't be more straightforward than what we see in the top left quadrant of the diagram. In response to the question, "Do the means differ?" we can see that p-value of 0.029 marked on the bar, very far toward the "Yes" end of the scale. </p>
<p style="margin-left: 40px;"><img alt="2-Sample t-Test summary report" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/8927b8bc833551678715f68149dd18ad/assistant_hypothesis_test_summary.png" style="width: 700px; height: 526px;" /></p>
<p>Next is the Diagnostic Report, which provides additional information about the test. </p>
<p style="margin-left: 40px;"><img alt="2-Sample t-Test diagnostic report" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/6467a0be0ba60329f2be282e14b9be33/assistant_hypothesis_test_diagnostic.png" style="width: 700px; height: 526px;" /></p>
<p>In addition to letting us check for outliers, the diagnostic report shows us the size of the observed difference, as well as the chances that our test could detect a practically significant difference of 2 oz. </p>
<p>The final piece of output the Assistant provides is the report card, which flags any problems or concerns about the test that we would need to be aware of. In this case, all of the boxes are green and checked (instead of red and x'ed). </p>
<p style="margin-left: 40px;"><img alt="2-Sample t-Test report card" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/0e4cd0dce832a8251701f8175de9a037/assistant_hypothesis_test_report_card.png" style="width: 700px; height: 526px;" /></p>
<p>When you're not doing statistics all the time, the Assistant makes it a breeze to find the right analysis for your situation and to make sure you interpret your results the right way. Using it is a great way to make sure you're not attaching too much, or too little, importance on the results of your analyses.</p>
<p> </p>
Hypothesis TestingStatisticsStatistics HelpStatsWed, 22 Feb 2017 14:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/three-common-p-value-mistakes-youll-never-have-to-makeEston MartzChi-Square Analysis: Powerful, Versatile, Statistically Objective
http://blog.minitab.com/blog/michelle-paret/chi-square-analysis-powerful-versatile-statistically-objective
<p style="line-height: 20.7999992370605px;">To make objective decisions about the processes that are critical to your organization, you often need to examine categorical data. You may know how to use a t-test or ANOVA when you’re comparing measurement data (like weight, length, <span style="line-height: 1.6;">revenue, </span><span style="line-height: 1.6;">and so on), but do you know how to compare attribute or counts data? It easy to do with <a href="http://www.minitab.com/products/minitab">statistical software</a> like Minitab. </span></p>
<p style="line-height: 20.7999992370605px;"><img alt="failures per production line" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/19b2bd8557279d21284a23e2174fef88/chisquare_onevariable_revision.jpg" style="line-height: 20.8px; width: 400px; height: 267px; float: right; margin: 10px 15px;" /></p>
<p style="line-height: 20.7999992370605px;">One person may look at this bar chart and decide that the production lines performed similarly<span style="line-height: 1.6;">. But another person may focus on the small difference between the bars and decide that one of the lines has outperformed the others. Without an appropriate statistical analysis, how can you know which person is right?</span></p>
<p style="line-height: 20.7999992370605px;">When time, money, and quality depend on your answers, you can’t rely on subjective visual assessments alone. To answer questions like these with statistical objectivity, you can use a Chi-Square analysis.</p>
Which Analysis Is Right for Me?
<p style="line-height: 20.7999992370605px;">Minitab offers three Chi-Square tests. The appropriate analysis depends on the number of variables that you want to examine. And for all three options, the data can be formatted either as raw data or summarized counts.</p>
<strong>Chi-Square Goodness-of-Fit Test – 1 Variable</strong>
<p style="line-height: 20.7999992370605px;">Use Minitab’s <strong>Stat > Tables > Chi-Square Goodness-of-Fit Test (One Variable)</strong> when you have just one variable.</p>
<p style="line-height: 20.7999992370605px;">The Chi-Square Goodness-of-Fit Test can test if the proportions for all groups are equal. It can also be used to test if the proportions for groups are equal to specific values. For example:</p>
<ul style="line-height: 20.7999992370605px;">
<li>A bottle cap manufacturer operates three production lines and records the number of defective caps for each line. The manufacturer uses the <strong>Chi-Square Goodness-of-Fit Test</strong> to determine if the proportion of defectives is equal across all three lines.</li>
<li>A bottle cap manufacturer operates three production lines and records the number of defective caps and the total number produced for each line. One line runs at high speed and produces twice as many caps as the other two lines that run at a slower speed. The manufacturer uses the <strong>Chi-Square Goodness-of-Fit Test</strong> to determine if the number of defective units for each line is proportional to the volume of caps it produces.</li>
</ul>
<strong>Chi-Square Test for Association – 2 Variables</strong>
<p style="line-height: 20.7999992370605px;">Use Minitab’s <strong>Stat > Tables > Chi-Square Test for Association</strong> when you have two variables.</p>
<p style="line-height: 20.7999992370605px;">The Chi-Square Test for Association can tell you if there’s an association between two variables. In another words, it can test if two variables are independent or not. For example:</p>
<ul style="line-height: 20.7999992370605px;">
<li>A paint manufacturer operates two production lines across three shifts and records the number of defective units per line per shift. The manufacturer uses the <strong>Chi-Square Goodness-of-Fit Test</strong> to determine if the percent defective is similar across all shifts and production lines. Or, are certain lines during certain shifts more prone to issues?<br />
<br />
<img alt="Defectives per line per shift" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/8f78b557ef93b1390b79b866787d5503/chisquare_twovariables_revision.jpg" style="width: 600px; height: 400px;" /><br />
<br />
</li>
<li>A call center randomly samples 100 incoming calls each day of the week for each of its three locations, for a total of 1500 calls. They then record the number of abandoned calls per location per day. The call center uses a Chi-Square Test to determine if there are is any association between location and day of the week with respect to missed calls.</li>
</ul>
<p style="margin-left: 40px;"><img alt="call center data" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/e60774e6ddac893694e7b8a1a39a47b4/callcenterdata.jpg" style="width: 265px; height: 133px;" /><br />
</p>
<strong>Cross Tabulation and Chi-Square – 2 or more variables</strong>
<p style="line-height: 20.7999992370605px;">Use Minitab’s <strong>Stat > Tables > Cross Tabulation and Chi-Square </strong>when you have two or more variables.</p>
<p style="line-height: 20.7999992370605px;">If you simply want to test for associations between two variables, you can use either <strong>Cross Tabulation and Chi-Square</strong> or <strong>Chi-Square Test for Association</strong>. However, <span><a href="http://blog.minitab.com/blog/understanding-statistics/using-cross-tabulation-and-chi-square-the-survey-says">Cross Tabulation and Chi-Square</a></span> also lets you control for the effect of additional variables. Here’s an example:</p>
<ul style="line-height: 20.7999992370605px;">
<li>A tire manufacturer records the number of failed tires for four different tire sizes across two production lines and three shifts. The plant uses a Cross Tabulation and Chi-Square analysis to look for failure dependencies between the tire sizes and production lines, while controlling for any shift effect. Perhaps a particular production line for a certain tire size is more prone to failures, but only during the first shift.</li>
</ul>
<p style="line-height: 20.7999992370605px;">This analysis also offers advanced options. For example, if your categories are ordinal (good, better, best or small, medium, large) you can include a special test for concordance.</p>
Conducting a Chi-Square Analysis in Minitab
<p style="line-height: 20.7999992370605px;">Each of these analyses is easy to run in Minitab. For more examples that include step-by-step instructions, just navigate to the Chi-Square menu of your choice and then click Help > example.</p>
<p style="line-height: 20.7999992370605px;">It can be tempting to make subjective assessments about a given set of data, their makeup, and possible interdependencies, but why risk an error in judgment when you can be sure with a Chi-Square test?</p>
<p style="line-height: 20.7999992370605px;">Whether you’re interested in one variable, two variables, or more, a Chi-Square analysis can help you make a clear, statistically sound assessment.</p>
Data AnalysisHypothesis TestingLean Six SigmaManufacturingQuality ImprovementSix SigmaStatisticsStatistics HelpFri, 17 Feb 2017 13:16:00 +0000http://blog.minitab.com/blog/michelle-paret/chi-square-analysis-powerful-versatile-statistically-objectiveMichelle ParetCommon Assumptions about Data Part 3: Stability and Measurement Systems
http://blog.minitab.com/blog/quality-business/common-assumptions-about-data-part-3-stability-and-measurement-systems
<p><img alt="Cart before the horse" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/8230e7c2bc193a831158677a70eb0146/chile_road_sign_po_4.svg" style="width: 101px; height: 101px; float: right; margin: 10px 15px;" />In Parts <span><a href="http://blog.minitab.com/blog/quality-business/common-assumptions-about-data-part-1-random-samples-and-statistical-independence">1</a></span> and <span><a href="http://blog.minitab.com/blog/quality-business/common-assumptions-about-data-part-2-normality-and-equal-variance">2</a></span> of this blog series, I wrote about how statistical inference uses data from a sample of individuals to reach conclusions about the whole population. That’s a very powerful tool, but you must check your assumptions when you make statistical inferences. Violating any of these assumptions can result in false positives or false negatives, thus invalidating your results. </p>
<p>The common data assumptions are: random samples, independence, normality, equal variance, stability, and that your measurement system is accurate and precise. I addressed random samples and statistical independence last time. Now let’s consider the assumptions of stability and measurement systems.</p>
What Is the Assumption of Stability?
<p>A stable process is one in which the inputs and conditions are consistent over time. When a process is stable, it is said to be “in control.” This means the sources of variation are consistent over time, and the process does not exhibit unpredictable variation. In contrast, if a process is unstable and changing over time, the sources of variation are inconsistent and unpredictable. As a result of the instability, you cannot be confident in your statistical test results.</p>
<p>Use one of the various types of <span><a href="http://blog.minitab.com/blog/understanding-statistics/what-control-chart-should-i-use">control charts</a></span> available in Minitab <a href="http://www.minitab.com/products/minitab/">Statistical Software</a> to assess the stability of your data set. The Assistant menu can walk you through the choices to select the appropriate control chart based on your data and subgroup size. You can get advice about collecting and using data by clicking the “more” link.</p>
<p><img alt="Choose a Control Chart" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/6ec77f5dbc070eb0c2070ce6bcf8144c/1_control_chart.png" style="border-width: 0px; border-style: solid; width: 474px; height: 338px; margin: 10px 15px;" /></p>
<p><img alt="I-MR Control Chart" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/3d69fc444cd5dd09a962a11e645a3a2e/2_control_chart.png" style="border-width: 0px; border-style: solid; width: 474px; height: 338px; margin: 10px 15px;" /></p>
<p>In addition to preparing the control chart, Minitab tests for out-of-control or non-random patterns based on the <a href="http://blog.minitab.com/blog/statistics-in-the-field/using-the-nelson-rules-for-control-charts-in-minitab">Nelson Rules</a> and provides an assessment in easy-to-read Summary and Stability reports. The Report Card, depending on the control chart selected, will automatically check your assumptions of stability, normality, amount of data, correlation, and will suggest alternative charts to further analyze your data.</p>
<p><img alt="Report Card" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/195741e519156b95ee5feee8b521041f/3_control_chart.jpg" style="border-width: 0px; border-style: solid; width: 464px; height: 348px; margin: 10px 15px;" /></p>
What Is the Assumption for Measurement Systems?
<p>All the other assumptions I’ve described “assume” the data reflects reality. But does it?</p>
<p>The <span><a href="http://blog.minitab.com/blog/understanding-statistics/explaining-quality-statistics-so-my-boss-will-understand-measurement-systems-analysis-msa">measurement system</a> </span>is one potential source of variability when measuring a product or process. When a measurement system is poor, you lose the ability to truthfully “see” process performance. A poor measurement system leads to incorrect conclusions and flawed implementation. </p>
<p>Minitab can perform a Gage R&R test for both measurement and appraisal data, depending on your measurement system. You can use the Assistant in Minitab to help you select the most appropriate test based on the type of measurement system you have.</p>
<p><img alt="Choose a MSA" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/3ff089fcee9ab280c8e8d1da1c56d610/4_msa.png" style="border-width: 0px; border-style: solid; width: 474px; height: 345px; margin: 10px 15px;" /></p>
<p>There are two assumptions that should be satisfied when performing a Gage R&R for measurement data: </p>
<ol>
<li>The measurement device should be calibrated.</li>
<li>The parts to be measured should be selected from a stable process and cover approximately 80% of the possible operating range. </li>
</ol>
<p>When using a measurement device make sure it is properly calibrated and check for linearity, bias, and stability over time. The device should produce accurate measurements, compared to a standard value, through the entire range of measurements and throughout the life of the device. Many companies have a metrology or calibration department responsible for calibrating and maintaining gauges. </p>
<p>Both these assumptions must be satisfied. If they are not, you cannot be sure that your data accurately reflect reality. And that means you’ll risk not understanding the sources of variation that influence your process outcomes. </p>
The Real Reason You Need to Check the Assumptions
<p>Collecting and analyzing data requires a lot of time and effort on your part. After all the work you put into your analysis, you want to be able to reach correct conclusions. Some analyses are robust to departures from these assumptions, but take the safe route and check! You want to be confident you can tell whether observed differences between data samples are simply due to chance, or if the populations are indeed different! </p>
<p>It’s easy to put the cart before the horse and just plunge in to the data collection and analysis, but it’s much wiser to take the time to understand which data assumptions apply to the statistical tests you will be using, and plan accordingly.</p>
<p>Thank you for reading my blog. I hope this information helps you with your data analysis mission!</p>
Data AnalysisHypothesis TestingQuality ImprovementStatisticsMon, 05 Dec 2016 13:00:00 +0000http://blog.minitab.com/blog/quality-business/common-assumptions-about-data-part-3-stability-and-measurement-systemsBonnie K. StoneCommon Assumptions about Data (Part 2: Normality and Equal Variance)
http://blog.minitab.com/blog/quality-business/common-assumptions-about-data-part-2-normality-and-equal-variance
<p>In Part 1 of this <a href="http://blog.minitab.com/blog/quality-business/common-assumptions-about-data-part-1-random-samples-and-statistical-independence">blog</a> series, I wrote about how statistical inference uses data from a sample of individuals to reach conclusions about the whole population. That’s a very powerful tool, but you must check your assumptions when you make statistical inferences. Violating any of these assumptions can result in false positives or false negatives, thus invalidating your results. <img alt="Horse and Cart sign" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/8230e7c2bc193a831158677a70eb0146/chile_road_sign_po_4.svg" style="width: 101px; height: 101px; margin: 10px 15px; float: right;" /></p>
<p>The common data assumptions are: random samples, independence, normality, equal variance, stability, and that your measurement system is accurate and precise.</p>
<p>I addressed random samples and statistical independence last time. Now let’s consider the assumptions of Normality and Equal Variance.</p>
What Is the Assumption of Normality?
<p>Before you perform a statistical test, you should find out the distribution of your data. If you don’t, you risk selecting an inappropriate statistical test. Many statistical methods start with the assumption your data follow the normal distribution, including the 1- and 2-Sample t tests, Process Capability, I-MR, and ANOVA. If you don’t have normally distributed data, you might use an <a href="http://blog.minitab.com/blog/understanding-statistics/data-not-normal-try-letting-it-be-with-a-nonparametric-hypothesis-test">equivalent non-parametric test</a> based on the median instead of the mean, or try the Box-Cox or Johnson Transformation to transform your non-normal data into a normal distribution.</p>
<p align="center"><img alt="Normal and Skewed Curves" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/01451195cce5757849948e3871c28187/1_curves.png" style="border-width: 0px; border-style: solid; width: 554px; height: 179px; margin: 10px 15px;" /></p>
<p>But keep in mind that many statistical tools based on the assumption of normality do not actually <em>require</em> normally distributed data if the sample sizes are at least 15 or 20. But if sample sizes are less than 15 and the data are not normally distributed, the p-value may be inaccurate and you should interpret the results with caution.</p>
<p>There are several methods to determine normality in Minitab, and I’ll discuss two of the tools in this post: the Normality Test and the Graphical Summary. </p>
<p>Minitab’s Normality Test will generate a probability plot and perform a one-sample hypothesis test to determine whether the population from which you draw your sample is non-normal. The null hypothesis states that the population is normal. The alternative hypothesis states that the population is non-normal.</p>
<p>Choose <strong>Stat > Basic Statistics > Normality Test</strong></p>
<p align="center"><img alt="Probability Plot" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/9681575c2cdb6cfebde643d73a5e5ca0/3_probability_plot.png" style="border-width: 0px; border-style: solid; width: 599px; height: 350px; margin: 10px 15px;" /></p>
<p>When evaluating the distribution fit for the normality test:</p>
<ul>
<li>The plotted points will roughly form a straight line. Some departure from the straight line at the tails may be okay as long as it stays within the confidence limits.</li>
<li>The plotted points should fall close to the fitted distribution line and pass the “fat pencil” test. Imagine a "fat pencil" lying on top of the fitted line: If it covers all the data points on the plot, the data are probably normal.</li>
<li>The associated Anderson-Darling statistic will be small.</li>
<li>The associated p-value will be larger than your chosen α-level (commonly chosen levels for α include 0.05 and 0.10).</li>
</ul>
<p>The Anderson-Darling statistic is a measure of how far the plot points fall from the fitted line in a probability plot. The statistic is a weighted squared distance from the plot points to the fitted line with larger weights in the tails of the distribution. For a specified data set and distribution, the better the distribution fits the data, the smaller this statistic will be.</p>
<p>Minitab’s Descriptive Statistics with the Graphical Summary will generate a nice visual display of your data and calculate the Anderson-Darling & p-value. The graphical summary displays four graphs: histogram of data with an overlaid normal curve, boxplot, and 95% confidence intervals for both the mean and the median.</p>
<p>Choose <strong>Stat > Basic Statistics > Graphical Summary</strong></p>
<p><img alt="Normality Test" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/363dab1dcf97061dd0075ab38aae2ee3/2_normality_test.png" style="text-align: -webkit-center; border-width: 0px; border-style: solid; width: 583px; height: 306px; margin: 10px 15px;" />When interpreting a graphical summary report for normality: </p>
<ul>
<li>The data will be displayed as a histogram. Look for how your data is distributed (normal or skewed), how the data is spread across the graph, and if there are outliers.</li>
<li>The associated Anderson-Darling statistic will be small.</li>
<li>The associated p-value will be larger than your chosen α-level (commonly chosen levels for α include 0.05 and 0.10).</li>
</ul>
<p>For some processes, such as time and cycle data, the data will never be normally distributed. Non-normal data are fine for some statistical methods, but make sure your data satisfy the <a href="http://blog.minitab.com/blog/fun-with-statistics/forget-statistical-assumptions-just-check-the-requirements">requirements</a> for your particular analysis.</p>
What Is the Assumption of Equal Variance?
<p>In simple terms, variance refers to the data spread or scatter. Statistical tests, such as analysis of variance (ANOVA), assume that although different samples can come from populations with different means, they have the same variance. Equal variances (homoscedasticity) is when the variances are approximately the same across the samples. Unequal variances (heteroscedasticity) can affect the Type I error rate and lead to false positives. If you are comparing two or more sample means, as in the 2-Sample t-test and ANOVA, a significantly different variance could overshadow the differences between means and lead to incorrect conclusions. </p>
<p>Minitab offers several methods to test for equal variances. Consult <a href="http://support.minitab.com/en-us/minitab/17/topic-library/modeling-statistics/anova/basics/understanding-test-for-equal-variances/">Minitab Help</a> to decide which method to use based on the type of data you have. You can also use the Minitab Assistant to check this assumption for you. (Tip: When using the Assistant, click “more” to see data collection tips and important information about how Minitab calculates your results.)</p>
<p align="center"><img alt="Hypothesis Assistant" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/cd958e1efe31a3a0c3acdc818971100c/4_hypothesis_assistant.png" style="border-width: 0px; border-style: solid; width: 402px; height: 318px; margin: 10px 15px;" /></p>
<p>After the analysis is performed, check the Diagnostic Report for the test interpretation and the Report Card for alerts to unusual data points or assumptions that were not met. (Tip: When performing the 2-Sample t test and ANOVA, the Assistant takes a more conservative approach and uses calculations that do not depend on the assumption of equal variance.)</p>
<p><img alt="Assistant Reports" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/a1d56fd284c40360bc62f96e04e69e59/5_assistant_reports.png" style="border-width: 0px; border-style: solid; width: 656px; height: 245px; margin: 10px 15px;" /></p>
The Real Reason You Need to Check the Assumptions
<p>You will be putting a lot of time and effort into collecting and analyzing data. After all the work you put into the analysis, you want to be able to reach correct conclusions. Some analyses are robust to departures from these assumptions, but take the safe route and check! You want to be confident that you can tell whether observed differences between data samples are simply due to chance, or if the populations are indeed different! </p>
<p>It’s easy to put the cart before the horse and just plunge in to the data collection and analysis, but it’s much wiser to take the time to understand which data assumptions apply to the statistical tests you will be using, and plan accordingly.</p>
<p>In my next blog post, I will review the <a href="http://blog.minitab.com/blog/quality-business/common-assumptions-about-data-part-3-stability-and-measurement-systems">common assumptions about stability and the measurement system</a>. </p>
Data AnalysisHypothesis TestingStatisticsStatistics HelpStatsMon, 07 Nov 2016 15:36:00 +0000http://blog.minitab.com/blog/quality-business/common-assumptions-about-data-part-2-normality-and-equal-varianceBonnie K. StoneWhat Are T Values and P Values in Statistics?
http://blog.minitab.com/blog/statistics-and-quality-data-analysis/what-are-t-values-and-p-values-in-statistics
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/6f4053a89257952fef0b9998547dffe2/tweedle_tweedledum.jpg" style="line-height: 20.8px; float: right; width: 248px; height: 255px; margin: 10px 15px;" /></p>
<p>If you’re not a statistician, looking through statistical output can sometimes make you feel a bit like <em>Alice in</em> <em>Wonderland. </em>Suddenly, you step into a fantastical world where strange and mysterious phantasms appear out of nowhere. </p>
<p>For example, consider the T and P in your t-test results.</p>
<p>“Curiouser and curiouser!” you might exclaim, like Alice, as you gaze at your output.</p>
<p><img alt="One-Sample T test output" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/1e5a4c064f43f19169121222402e4560/t_test_results_one_sided.jpg" style="width: 467px; height: 121px;" /></p>
<p>What are these values, really? Where do they come from? Even if you’ve used the p-value to interpret the statistical significance of your results<span style="line-height: 20.7999992370605px;"> </span><span style="line-height: 20.7999992370605px;">umpteen times</span><span style="line-height: 1.6;">, its actual origin may remain murky to you.</span></p>
T & P: The Tweedledee and Tweedledum of a T-test
<p>T and P are inextricably linked. They go arm in arm, like Tweedledee and Tweedledum. Here's why.</p>
<p>When you perform a t-test, you're usually trying to find evidence of a significant difference between population means (2-sample t) or between the population mean and a hypothesized value (1-sample t). <a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/what-is-a-t-test-and-why-is-it-like-telling-a-kid-to-clean-up-that-mess-in-the-kitchen">The t-value measures the size of the difference relative to the variation in your sample data</a>. Put another way, T is simply the calculated difference represented in units of standard error. The greater the magnitude of T (it can be either positive or negative), the greater the evidence <em>against </em>the null hypothesis that there is no significant difference. The closer T is to 0, the more likely there isn't a significant difference.</p>
<p>Remember, the t-value in your output is calculated from only one sample from the entire population. It you took repeated random samples of data from the same population, you'd get slightly different t-values each time, due to random sampling error (which is really not a mistake of any kind–it's just the random variation expected in the data).</p>
<p>How different could you expect the t-values from many random samples from the same population to be? And how does the t-value from your sample data compare to those expected t-values?</p>
<p>You can use a t-distribution to find out.</p>
Using a t-distribution to calculate probability
<p>For the sake of illustration, assume that you're using a 1-sample t-test to determine whether the population mean is greater than a hypothesized value, such as 5, based on a sample of 20 observations, as shown in the above t-test output.</p>
<ol>
<li>In Minitab, choose <strong>Graph > Probability Distribution Plot</strong>.</li>
<li>Select <strong>View Probability</strong>, then click <strong>OK</strong>.</li>
<li>From <strong>Distribution</strong>, select <strong>t</strong>.</li>
<li>In <strong>Degrees of freedom</strong>, enter <em>19</em>. (For a 1-sample t test, the degrees of freedom equals the sample size minus 1).</li>
<li>Click <strong>Shaded Area</strong>. Select <strong>X Value</strong>. Select <strong>Right Tail</strong>.</li>
<li> In <strong>X Value</strong>, enter 2.8 (the t-value), then click <strong>OK</strong>.</li>
</ol>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/bc5183a42a169d45632fd4f6c0b153b3/distribution_plot_t_2.8" style="width: 576px; height: 384px;" /></p>
<p>The highest part (peak) of the distribution curve shows you where you can expect most of the t-values to fall. Most of the time, you’d expect to get t-values close to 0. That makes sense, right? Because if you randomly select representative samples from a population, the mean of most of those random samples from the population should be close to the overall population mean, making their differences (and thus the calculated t-values) close to 0.</p>
T values, P values, and poker hands
<p>T values of larger magnitudes (either negative or positive) are less likely. The far left and right "tails" of the distribution curve represent instances of obtaining extreme values of t, far from 0. For example, the shaded region represents the probability of obtaining a t-value of 2.8 or greater. Imagine a magical dart that could be thrown to land randomly anywhere under the distribution curve. What's the chance it would land in the shaded region? The calculated probability is 0.005712.....which rounds to 0.006...which is...the p-value obtained in the t-test results! <img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/5633b267494c2017d6d7c7544247d57d/poker_picture.jpg" style="float: right; width: 200px; height: 164px; margin: 10px 15px;" /></p>
<p>In other words, the probability of obtaining a t-value of 2.8 or higher, when sampling from the same population (here, a population with a hypothesized mean of 5), is approximately 0.006.</p>
<p>How likely is that? Not very! For comparison, the probability of being dealt 3-of-a-kind in a 5-card poker hand is over three times as high (≈ 0.021).</p>
<p>Given that the probability of obtaining a t-value this high or higher when sampling from this population is so low, what’s more likely? It’s more likely this sample doesn’t come from this population (with the hypothesized mean of 5). It's much more likely that this sample comes from different population, one with a mean greater than 5.</p>
<p>To wit: Because the p-value is very low (< alpha level), you reject the null hypothesis and conclude that there's a statistically significant difference.</p>
<p>In this way, T and P are inextricably linked. Consider them simply different ways to quantify the "extremeness" of your results under the null hypothesis. You can’t change the value of one without changing the other.</p>
<p>The larger the absolute value of the t-value, the smaller the p-value, and the greater the evidence against the null hypothesis.(You can verify this by entering lower and higher t values for the t-distribution in step 6 above).</p>
Try this two-tailed follow up...
<p>The t-distribution example shown above is based on a one-tailed t-test to determine whether the mean of the population is greater than a hypothesized value. Therefore the t-distribution example shows the probability associated with the t-value of 2.8 only in one direction (the right tail of the distribution).</p>
<p>How would you use the t-distribution to find the p-value associated with a t-value of 2.8 for two-tailed t-test (in both directions)?</p>
<p><strong>Hint:</strong> In Minitab, adjust the options in step 5 to find the probability for both tails. If you don't have a copy of Minitab, download a free <a href="http://www.minitab.com/en-us/products/minitab/free-trial/" target="_blank">30-day trial version</a>.</p>
Hypothesis TestingFri, 04 Nov 2016 12:10:00 +0000http://blog.minitab.com/blog/statistics-and-quality-data-analysis/what-are-t-values-and-p-values-in-statisticsPatrick RunkelProblems Using Data Mining to Build Regression Models, Part Two
http://blog.minitab.com/blog/adventures-in-statistics-2/problems-using-data-mining-to-build-regression-models-part-two
<p>Data mining can be helpful in the exploratory phase of an analysis. If you're in the early stages and you're just figuring out which predictors are potentially correlated with your response variable, data mining can help you identify candidates. However, there are problems associated with using data mining to select variables.</p>
<p>In my <a href="http://blog.minitab.com/blog/adventures-in-statistics/problems-using-data-mining-to-build-regression-models" target="_blank">previous post</a>, we used data mining to settle on the following model and graphed one of the relationships between the response (C1) and a predictor (C7). It all looks great! The only problem is that all of these data are randomly generated! No true relationships are present. </p>
<p style="margin-left: 40px;"><img alt="Regression output for data mining example" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/24e98167e2dfd848b346292af371acf3/regression_swo.png" style="width: 364px; height: 278px;" /></p>
<p style="margin-left: 40px;"><img alt="Scatter plot for data mining example" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/6e4dfb991b33031738756d4b2d1c77e4/scatterplot.png" style="width: 576px; height: 384px;" /></p>
<p>If you didn't already know there was no true relationship between these variables, these results could lead you to a very inaccurate conclusion.</p>
<p>Let's explore how these problems happen, and how to avoid them</p>
Why <em>Do </em>These Problems Occur with Data Mining?
<p>The problem with data mining is that you fit many different models, trying lots of different variables, and you pick your final model based mainly on statistical significance, rather than being guided by theory.</p>
<p>What's wrong with that approach? The problem is that every statistical test you perform has a chance of a false positive. A false positive in this context means that the <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-correctly-interpret-p-values" target="_blank">p-value</a> is statistically significant but there really is no relationship between the variables at the population level. If you set the <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests:-significance-levels-alpha-and-p-values-in-statistics" target="_blank">significance level at 0.05</a>, you can expect that in 5% of the cases where the null hypothesis is true, you'll have a false positive.</p>
<p>Because of this false positive rate, if you analyze many different models with many different variables you will inevitably find false positives. And if you're guided mainly by statistical significance, you'll leave the false positives in your model. If you keep going with this approach, you'll fill your model with these false positives. That’s exactly what happened in our example. We had 100 candidate predictor variables and the stepwise procedure literally dredged through hundreds and hundreds of potential models to arrive at our final model.</p>
<p>As we’ve seen, data mining problems can be hard to detect. The numeric results and graph all look great. However, these results don’t represent true relationships but instead are chance correlations that are bound to occur with enough opportunities.</p>
<p>If I had to name my favorite R-squared, it would be <a href="http://blog.minitab.com/blog/adventures-in-statistics/multiple-regession-analysis-use-adjusted-r-squared-and-predicted-r-squared-to-include-the-correct-number-of-variables" target="_blank">predicted R-squared</a>, without a doubt. However, even predicted R-squared can't detect all problems. Ultimately, even though the predicted R-squared is moderate for our model, the ability of this model to predict accurately for an entirely new data set is practically zero.</p>
Theory, the Alternative to Data Mining
<p>Data mining can have a role in the exploratory stages of an analysis. However, for all variables that you identify through data mining, you should perform a confirmation study using newly collected to data to verify the relationships in the new sample. Failure to do so can be very costly. Just imagine if we had made decisions based on the model above!</p>
<p>An alternative to data mining is to use theory as a guide in terms of both the models you fit and the evaluation of your results. Look at what others have done and incorporate those findings when building your model. Before beginning the regression analysis, develop an idea of what the important variables are, along with their expected relationships, coefficient signs, and effect magnitudes.</p>
<p>Building on the results of others makes it easier both to collect the correct data and to specify the best regression model without the need for data mining. The difference is the process by which you fit and evaluate the models. When you’re guided by theory, you reduce the number of models you fit and you assess properties beyond just statistical significance.</p>
<p>Theoretical considerations should not be discarded based solely on statistical measures.</p>
<ul>
<li>Compare the coefficient signs to theory. If any of the signs contradict theory, investigate and either change your model or explain the inconsistency.</li>
<li>Use <a href="http://www.minitab.com/en-us/products/minitab/" target="_blank">Minitab statistical software</a> to create factorial plots based on your model to see if all the effects match theory.</li>
<li>Compare the <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-how-do-i-interpret-r-squared-and-assess-the-goodness-of-fit" target="_blank">R-squared</a> for your study to those of similar studies. If your R-squared is very different than those in similar studies, it's a sign that your model may have a problem.</li>
</ul>
<p>If you’re interested in learning more about these issues, read my post about <a href="http://blog.minitab.com/blog/adventures-in-statistics/beware-of-phantom-degrees-of-freedom-that-haunt-your-regression-models">how using too many <em>phantom</em> degrees of freedom is related to data mining problems</a>.</p>
<p> </p>
Data AnalysisHypothesis TestingLearningRegression AnalysisStatisticsStatistics HelpWed, 19 Oct 2016 12:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics-2/problems-using-data-mining-to-build-regression-models-part-twoJim FrostWhy Shrewd Experts "Fail to Reject the Null" Every Time
http://blog.minitab.com/blog/understanding-statistics/why-shrewd-experts-fail-to-reject-the-null-every-time
<p><img alt="nulls angels: the toughest statisticians around!" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/d2c0571a-acbd-48c7-84f4-222276c293fe/Image/509959f8406d59b3bb31f686aeb3b6b0/nulls_angels.jpg" style="margin: 10px 15px; float: right; width: 175px; height: 198px;" />I watched an old <a href="https://en.wikipedia.org/wiki/The_Wild_Angels" target="_blank">motorcycle flick from the 1960s</a> the other night, and I was struck by the bikers' slang. They had a language all their own. Just like statisticians, whose manner of speaking often confounds those who aren't hep to the lingo of data analysis.</p>
<p>It got me thinking...what if there were an all-statistician biker gang? Call them the Nulls Angels. Imagine them in their colors, tearing across the countryside, analyzing data and asking the people they encounter on the road about whether they "fail to reject the null hypothesis."</p>
<p>If you point out how strange that phrase sounds, the Nulls Angels will <em>know</em> you're not cool...and not very aware of statistics.</p>
<p>Speaking purely as an editor, I acknowledge that "failing to reject the null hypothesis" <em>is</em> cringe-worthy. "Failing to reject" seems like an overly complicated equivalent to <em>accept</em>. At minimum, it's clunky phrasing.</p>
<p>But it turns out those rough-and-ready statisticians in the Nulls Angels have good reason to talk like that. From a <em>statistical</em> perspective, it's undeniably accurate—and replacing "failure to reject" with "accept" would just be wrong.</p>
What <em>Is </em>the Null Hypothesis, Anyway?
<p>Hypothesis tests include one- and two-sample t-tests, tests for association, tests for normality, and many more. (All of these tests are available under the <strong>Stat</strong><span> menu in Minitab <a href="http://www.minitab.com">statistical software</a>. Or, if you want a little more <a href="http://www.minitab.com/en-us/products/minitab/assistant">statistical guidance</a>, the Assistant can lead you through common hypothesis tests step-by-step.)</span></p>
<p>A hypothesis test examines two propositions: the null hypothesis (or H0 for short), and the alternative (H1). The <em>alternative </em>hypothesis is what we hope to support. We presume that the null hypothesis is true, unless the data provide sufficient evidence that it is not.</p>
<p>You've heard the phrase "Innocent until proven guilty." That means the defendant's innocence is taken for granted until guilt is proved. In statistics, the null hypothesis is taken for granted until the alternative is proved true.</p>
So Why Do We "Fail to Reject" the Null Hypothesis?
<p>That brings up the issue of "proof."</p>
<p>The degree of statistical evidence we need in order to “prove” the alternative hypothesis is the <a href="http://blog.minitab.com/blog/michelle-paret/alphas-p-values-confidence-intervals-oh-my">confidence level</a>. The confidence level is 1 minus our risk of committing a Type I error, which occurs when you incorrectly reject a null hypothesis that's true. Statisticians call this risk alpha, and also refer to it as the significance level. The typical alpha of 0.05 corresponds to a 95% confidence level: we're accepting a 5% chance of rejecting the null even if it is true. (In life-or-death matters, we might <a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/alpha-male-vs-alpha-female">lower the risk of a Type I error to 1% or less</a>.)</p>
<p>Regardless of the alpha level we choose, any hypothesis test has only two possible outcomes:</p>
<ol>
<li><strong>Reject the null hypothesis</strong> and conclude that the alternative hypothesis is true at the 95% confidence level (or whatever level you've selected).<br />
</li>
<li><strong>Fail to reject the null hypothesis</strong> and conclude that <em>not</em> enough evidence is available to suggest the null is false at the 95% confidence level.</li>
</ol>
<p>We often use a <a href="http://blog.minitab.com/blog/understanding-statistics/three-things-the-p-value-cant-tell-you-about-your-hypothesis-test">p-value</a> to decide if the data support the null hypothesis or not. If the test's p-value is less than our selected alpha level, we reject the null. Or, as statisticians say "When the p-value's low, the null must go."</p>
<p>This still doesn't explain <em>why</em> a statistician won't "accept the null hypothesis." Here's the bottom line: failing to reject the null hypothesis does not prove the null hypothesis <em>is</em> true. That's because a hypothesis test does not determine <em>which</em> hypothesis is true, or even which is most likely: it <em>only</em> assesses whether evidence exists to reject the null hypothesis.</p>
<img alt=""My hypothesis is Null until proven Alternative, sir!" " src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/d2c0571a-acbd-48c7-84f4-222276c293fe/Image/a07b85370986a3dd126ac4d021775d13/trial.jpg" style="border-width: 1px; border-style: solid; margin: 10px 15px; float: right; width: 300px; height: 200px;" />"Null Until Proved Alternative"
<p>Hark back to "innocent until proven guilty." As the data analyst, you are the judge. The hypothesis test is the trial, and the null hypothesis is the defendant. The alternative hypothesis is the prosecution, which needs to make its case <em>beyond a reasonable doubt</em> (say, with 95% certainty).</p>
<p>If the trial evidence does not show the defendant is guilty, neither has it proved that the defendant <em>is</em> innocent. However, based on the available evidence, you can't reject that <em>possibility</em>. So how would you announce your verdict?</p>
<p>"Not guilty."</p>
<p>That phrase is perfect: "Not guilty"doesn't say the defendant <em>is</em> innocent, because that has not been proved. It just says the prosecution couldn't convince the judge to abandon the assumption of innocence.</p>
<p>So "failure to reject the null" is the statistical equivalent of "not guilty." In a trial, the burden of proof falls to the prosecution. When analyzing data, the entire burden of proof falls to your sample data. "Not guilty" does not mean "innocent," and "failing to reject" the null hypothesis is quite distinct from "accepting" it. </p>
<p>So if a group of marauding statisticians in their Nulls Angels leathers ever asks, keep yourself in their good graces, and show that you know "failing to reject the null" is not "accepting the null."</p>
Fun StatisticsHypothesis TestingStatisticsStatistics HelpMon, 03 Oct 2016 12:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/why-shrewd-experts-fail-to-reject-the-null-every-timeEston MartzDescriptive vs. Inferential Statistics: When Is a P-value Superfluous?
http://blog.minitab.com/blog/statistics-and-quality-data-analysis/descriptive-vs-inferential-statistics-when-is-a-p-value-superfluous
<p>True or false: When comparing a parameter for two sets of measurements, you should always use a hypothesis test to determine whether the difference is statistically significant.</p>
<p>The answer? (<em>drumroll...</em>) True!</p>
<p>...and False!</p>
<p>To understand this paradoxical answer, you need to keep in mind the difference between samples, populations, and descriptive and inferential statistics. </p>
Descriptive Statistics and Populations
<p>Consider the fictional countries of Glumpland and Dolmania.</p>
<p style="text-align: center;"><img alt="Welcome to Glumpland!" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/c1f88e0e6d3e4e55684392ec5a8069e8/glumpland.jpg" style="width: 350px; height: 232px;" /></p>
<img alt="wkshet" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/47e5470dd8123218763ac3666f64bbdd/glumpland_dolmania_wkshet.jpg" style="line-height: 20.8px; width: 222px; height: 579px; float: right;" />
<p>The population of Glumpland is 8,442,012. The population of Dolmania is 6,977,201. For each country, the age of every citizen (to the nearest tenth), <a href="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/080981611ba11403dc8fde411e81d150/glumpland_and_dolmania_ages.mpj">is recorded in a cell of a Minitab worksheet</a>. </p>
<p>Using <strong>Stat > Basic Statistics > Display Descriptive Statistics</strong> we can quickly calculate the mean age of each country.</p>
<p style="margin-left: 40px;"><img alt="desc stats" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/1a791dd23ba85673193f20c2c9971fa4/mean_age_glump_and_dol.jpg" style="width: 316px; height: 96px;" /></p>
<p>It looks like Dolmanians are, on average, more youthful than Glumplanders. But is this difference in means statistically significant?</p>
<p>To find out, we might be tempted to evaluate these data using a <span><a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-t-tests%3A-1-sample%2C-2-sample%2C-and-paired-t-tests">2-sample t-test</a></span>.</p>
<p>Except for one thing: there's absolutely no point in doing that.</p>
<p>That's because these calculated means <em>are</em> the means of the entire populations. So we already know that the population means differ.</p>
<p>Another example. Suppose a baseball player gets 213 hits in 680 at bats in 2015, and 178 hits in 532 at bats in 2016.</p>
<p>Would you need a 2-proportions test to determine whether the difference in batting averages (.313 vs .335) is statistically significant? Of course not.</p>
<p>You've already calculated the proportions using all the data for the entire two seasons. There's nothing more to extrapolate. And yet you often see a hypothesis test applied in this type of situation, in the mistaken belief that if there's no p-value, the results aren't "solid" or "statistical" enough.</p>
<p>But if you've collected every possible piece of data for a population, that's about as solid as you can get!</p>
Inferential Statistics and Random Samples
<p>Now suppose that draconian budget cuts have made it infeasible to track and record the age of every resident in Glumpland and Dolmania. <span style="line-height: 1.6;">What can they do? </span></p>
<p><span style="line-height: 1.6;">Quite a lot, actually. They can apply inferential statistics, which is based on random sampling, to make reliable estimates without those millions of data values they don't have.</span></p>
<p>To see how it works, use <strong>Calc > Random Data > Sample from columns</strong> in Minitab. Randomly sample 50 values from the 8,422,012 values in column C1, which includes the ages of the entire population of Glumpland. Then use descriptive statistics to calculate the mean of the sample.</p>
<p>Here are the results for one random sample of 50:</p>
<p style="margin-left: 40px;"><strong>Descriptive Statistics: GPLND (50)</strong><br />
<span style="line-height: 1.6;">Variable Mean</span><br />
<span style="line-height: 1.6;">GPLND(50) 52.37</span></p>
<p>The sample mean, 52.37 is slightly less than the true mean age of 53 for the entire population of Glumpland. What about another random sample of 50?</p>
<p style="margin-left: 40px;"><strong>Descriptive Statistics: GPLND (50) </strong><br />
<span style="line-height: 1.6;">Variable Mean</span><br />
<span style="line-height: 1.6;">GPLND(50) 54.11</span></p>
<p>Hmm. This sample mean of 54.11 slightly <em>overshoots</em> the true population mean of 53.</p>
<p>Even though the sample estimates are in the ballpark of the true population mean, we're seeing some variation. <span style="line-height: 1.6;">How much variation can we expect? Using descriptive statistics alone, we have no inkling of how "close" a sample estimate might be to the truth. </span></p>
Enter...the Confidence Interval
<p>To quantify the precision of a sample estimate for the population, we can use a powerful tool in inferential statistics: the confidence interval.</p>
<p>Suppose you take random samples of size 5, 10, 20, 50, and 100 from Glumpland and Dolmania using <strong>Calc > Random Data > Sample from columns</strong>. Then use <strong>Graph > Interval Plot > Multiple Ys</strong> to display the 95% confidence intervals for the mean of each sample.</p>
<p>Here's what the interval plots look like for the random samples in my worksheet.</p>
<p style="margin-left: 40px;"><img alt="interval plot Glumpland" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/262031cc398ee9d48031fe1f43b38bdf/interval_plot_of_glumpland.jpg" style="line-height: 20.8px; width: 576px; height: 384px;" /></p>
<p style="margin-left: 40px;"><img alt="Interval plot Dolmania" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/75440d94eaff64a63e338b480029945b/interval_plot_of_dolmania.jpg" style="width: 576px; height: 384px;" /></p>
<p>Your plots will look different based on your random samples, but you should notice a similar pattern: The sample mean estimates (the blue dots) tend to vary more from the population mean as the sample sizes decrease. To compensate for this, the intervals "stretch out" more and more, to ensure the same 95% overall probability of "capturing" the true population mean.</p>
<p>The larger samples produce narrower intervals. In fact, using only 50-100 data values, we can closely estimate the mean of over 8.4 million values, and get a general sense of how precise the estimate is likely to be. That's the incredible power of random sampling and inferential statistics!</p>
<p>To display side-by-side confidence intervals of the mean estimates for Glumpland and Dolmania, you can use an interval plot with groups.</p>
<p style="margin-left: 40px;"><img alt="interval plot side by side" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/9e6348c87befdaf6434dbe80e8257516/interval_plot_of_age_side_by_side.jpg" style="width: 576px; height: 384px;" /></p>
<p>Now, you might be tempted to use these results to infer whether there's a statistically significant difference in the mean age of the populations of Glumpland and Dolmania. But don't. Confidence intervals can be misleading for that purpose.</p>
<p>For that, we need another powerful tool of inferential statistics...</p>
Enter...the hypothesis test and p-value
<p>The 2-sample t-test is used to determine whether there is a statistically significant difference in the means of the populations from which the two random samples were drawn. The following table shows the t-test results for each pair of same-sized samples from Glumpland and Dolmania. As the sample size increases, notice what happens to the p-value and the confidence interval for the difference between the population means.</p>
<p style="margin-left: 40px;"><img alt="t tests" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/7c1bf45756a7fb621094086e5350fef9/2_sample_t_test.jpg" style="width: 526px; height: 757px;" /></p>
<p>Again, the confidence intervals tend to get wider as the samples get smaller. With smaller samples, we're less certain of the precision of the estimate for the difference..</p>
<p>In fact, only for the two largest random samples (N=50 and N=100) is the p-value less than a 0.05 level of significance, allowing us to conclude that the mean ages of Glumplanders and Dolmanians are statistically different. For the three smallest samples (N=20, N=10, N=5), the p-value is greater than 0.05, and confidence interval for each of these small samples includes 0. Therefore, we cannot conclude that there is difference in the population means.</p>
<p>But remember, we already know that the true population means actually <em>do</em> differ by 5.4 years. We just can't statistically "prove" it with the small samples. That's why statisticians bristle when someone says, "The p-value is not less than 0.05. Therefore, there's no significant difference between the groups." There might very well be. So it's safer to say, especially with small samples, "<em>we don't have enough evidence </em>to conclude that there's a significant difference between the groups."</p>
<p>It's not just a matter of nit-picky semantics. It's simply the truth, as you can see when you take random samples of various sizes from the same known populations and test them for a difference.</p>
Wrap-up
<p>If you have a random sample, you should always accompany estimates of statistical parameters with a confidence interval and p-value, whenever possible. Without them, there's no way to know whether you can safely extrapolate to the entire population. But if you already know every value of the population, you're good to go. You don't need a p-value, a t-test, or a CI—any more than you need a clue to determine whats inside a box, if you already know what's in it.</p>
Data AnalysisHypothesis TestingLearningStatisticsFri, 23 Sep 2016 12:08:00 +0000http://blog.minitab.com/blog/statistics-and-quality-data-analysis/descriptive-vs-inferential-statistics-when-is-a-p-value-superfluousPatrick RunkelCreating Value from Your Data
http://blog.minitab.com/blog/applying-statistics-in-quality-projects/creating-value-from-your-data
<p>There may be huge potential benefits waiting in the data in your servers. These data may be used for many different purposes. Better data allows better decisions, of course. Banks, insurance firms, and telecom companies already own a large amount of data about their customers. These resources are useful for building a more personal relationship with each customer.</p>
<p>Some organizations already use data from agricultural fields to build complex and customized models based on a very extensive number of input variables (soil characteristics, weather, plant types, etc.) in order to improve crop yields. Airline companies and large hotel chains use dynamic pricing models to improve their yield management. Data is increasingly being referred as the new “gold mine” of the 21st century.</p>
<p>A couple of factors underlie the rising prominence of data (and, therefore, data analysis):</p>
<p><img alt="Afficher l'image d'origine" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/File/de034e63187d191e1666721fa12a8880/de034e63187d191e1666721fa12a8880.png" style="width: 283px; height: 212px; margin: 10px 15px; float: right;" /></p>
Huge volumes of data
<p><span style="line-height: 1.6;">Data acquisition has never been easier (sensors in manufacturing plants, sensors in connected objects, data from internet usage and web clicks, from credit cards, fidelity cards, Customer Relations Management databases, satellite images etc…) and it can easily be stored at costs that are lower than ever before (huge storage capacity now available on the cloud and elsewhere). The amount of data that is being collected is not only huge, it is growing very fast… in an exponential way.</span></p>
Unprecedented velocity
<p>Connected devices, like our smart phones, provide data in almost real time and it can be processed very quickly. It is now possible to react to any change…almost immediately.</p>
Incredible variety
<p>The data collected is not be restricted to billing information; every source of data is potentially valuable for a business. Not only is numeric data getting collected in a massive way, but also unstructured data such as videos, pictures, etc., in a large variety of situations.</p>
<p>But the explosion of data available to us is prompting every business to wrestle with an extremely complicated problem:</p>
How can we create value from these resources ?
<p>Very simple methods, such as counting words used in queries submitted to company web sites, do provide a good insight as to the general mood of your customers and its evolution. Simple statistical correlations are often used by web vendors to suggest a purchase just after buying a product on the web. Very simple descriptive statistics are also useful.</p>
<p>Just guess what could be achieved from advanced regression models or powerful statistical multivariate techniques, which can be applied easily with <a href="http://www.minitab.com/products/minitab/">statistical software packages like Minitab</a>.</p>
A simple example of the benefits of analyzing an enormous database
<p>Let's consider an example of how one company benefited from analyzing a very large database.</p>
<p><span style="line-height: 20.8px;">Many steps are needed (security and safety checks, cleaning the cabin, etc.) before a plane can depart.</span><span style="line-height: 20.8px;"> Since d</span><span style="line-height: 20.8px;">elays negatively impact customer perceptions and also affect productivity, a</span><span style="line-height: 1.6;">irline companies routinely collect a very large amount of data related to flight delays and times required to perform tasks before departure. Some times are automatically collected, others are manually recorded.</span></p>
<p>A major worldwide airline company intended to use this data to identify the crucial milestones among a very large number of preparation steps, and which ones often triggered delays in departure times. The company used Minitab's <span><a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-smackdown-stepwise-versus-best-subsets">stepwise regression analysis</a></span> to quickly focus on the few variables that played a major role among a large number of potential inputs. Many variables turned out to be statistically significant, but two among them clearly seemed to make a major contribution (X6 and X10).</p>
<p style="margin-left: 40px;">Analysis of Variance1</p>
<p style="margin-left: 40px;">Source DF Seq SS <strong><span style="color: rgb(0, 0, 128);">Contribution </span></strong> Adj SS Adj MS F-Value P-Value</p>
<p style="margin-left: 40px;"><span style="line-height: 1.6;"> X6 1 337394 </span><span style="line-height: 1.6; color: rgb(0, 0, 128);"><strong>53.54%</strong></span><span style="line-height: 1.6;"> 2512 2512.2 29.21 0.000</span></p>
<p style="margin-left: 40px;"><span style="line-height: 1.6;"> X10 1 112911 </span><strong style="line-height: 1.6;"><span style="color: rgb(0, 0, 128);"> 17.92%</span> </strong><span style="line-height: 1.6;"> 66357 66357.1 771.46 0.000</span></p>
<p>When huge databases are used, statistical analyses may become overly sensitive and <a href="http://blog.minitab.com/blog/the-stats-cat/sample-size-statistical-power-and-the-revenge-of-the-zombie-salmon-the-stats-cat">detect even very small differences</a> (due to the large sample and power of the analysis). P values often tend to be quite small (p < 0.05) for a large number of predictors.</p>
<p>However, in Minitab, if you click on Results in the regression dialogue box and select Expanded tables, contributions from each variable will get displayed. X6 and X10 when considered together were contributing to more than 80% of the overall variability (with the largest F values by far), the contributions from the remaining factors were much smaller. The airline then ran a residual analysis to cross-validate the final model. </p>
<p>In addition, a Principal Component Analysis (<a href="http://blog.minitab.com/blog/applying-statistics-in-quality-projects/use-statistics-to-better-understand-your-customers">PCA, a multivariate technique</a>) was performed in Minitab to describe the relations between the most important predictors and the response. Milestones were expected to be strongly correlated to the subsequent steps.</p>
<p style="margin-left: 40px;"><img src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/File/c023d71140ea4ee2b5b22480712a55a4/c023d71140ea4ee2b5b22480712a55a4.png" /></p>
<p>The graph above is a Loading Plot from a principal component analysis. Lines that go in the same direction and are close to one another indicate how the variables may be grouped. Variables are visually grouped together according to their statistical correlations and how closely they are related.</p>
<p>A group of nine variables turned out to be strongly correlated to the most important inputs (X6 and X10) and to the final delay times (Y). Delays at the X6 stage obviously affected the X7 and X8 stages (subsequent operations), and delays from X10 affected the subsequent X11 and X12 operations.</p>
Conclusion
<p>This analysis provided simple rules that this airline's crews can follow in order to avoid delays, making passengers' next flight more pleasant. </p>
<p>The airline can repeat this analysis periodically to search for the next most important causes of delays. Such an approach can propel innovation and help organizations replace traditional and intuitive decision-making methods with data-driven ones.</p>
<p>What's more, the use of data to make things better is not restricted to the corporate world. More and more public administrations and non-governmental organizations are making large, open databases easily accessible to communities and to virtually anyone. </p>
ANOVAAutomotiveData AnalysisGovernmentHealth Care Quality ImprovementHealthcareHypothesis TestingManufacturingMedical DevicesMiningRegression AnalysisServicesStatisticsStatistics in the NewsTue, 06 Sep 2016 13:19:00 +0000http://blog.minitab.com/blog/applying-statistics-in-quality-projects/creating-value-from-your-dataBruno ScibiliaSunny Day for a Statistician vs. Dark Day for A Householder with Solar Panels
http://blog.minitab.com/blog/using-data-and-statistics/sunny-day-for-a-statistician-vs-dark-day-for-a-householder-with-solar-panels
<p>In 2011 we had solar panels fitted on our property. In the last few months we have noticed a few problems with the inverter (the equipment that converts the electricity generated by the panels from DC to AC, and manages the transfer of unused electric to the power company). It was shutting down at various times throughout the day, typically when it was very sunny, resulting in no electricity being generated.<img alt="solar panels" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/0ee09d62f414b4bd79601d23995458bf/solar.jpg" style="width: 400px; height: 267px; margin: 10px 15px; float: right;" /></p>
<p>I contacted the inverter manufacturer for some help to diagnose the problem. They asked me to download their monitoring app, called Sunny Portal. I did this and started a communication process with the inverter via Bluetooth, which not only showed me the error code but also delivered a time series of the electricity generated by the hour since the panels were installed.</p>
<p>I thought I had gone to statistician heaven! By using this data, I could establish if this problem was significantly reducing the amount of electricity generated and, consequently, reducing the amount of cash I was being paid for generating electricity. </p>
<p>The Sunny Portal, does have some basic bar charts to plot <span><a href="http://blog.minitab.com/blog/real-world-quality-improvement/3-ways-to-examine-data-over-time">time series</a></span>, by the month, day, and 5-minute interval; however, each chart automatically works out the scale according to the data so it is difficult to compare time periods. </p>
<div>
<p><strong>Top Minitab Tip</strong>: If you want to compare multiple charts measuring the same thing for different time periods or groups, make sure the Y-axis scales are the same. In many Minitab graphs and charts, if you select the Multiple Graphs button you will be given the option to select the same Y-axis scale.</p>
</div>
Getting the Data into Minitab
<p>I realized that I could output the data to text files, which meant I could use my statistical skills and Minitab to answer my questions. For each month between Sept 2011 and June 2016 I exported a file like the example shown below. For each day I have the date, the cumulative units generated since the inverter was commissioned, and the daily generation.</p>
<p style="margin-left: 40px;"><img src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/65ceccab-4e9a-4eba-8ce7-73b8b3d4d078/File/06a0cad69d2d8bd7cc169fb1ccb039fc/06a0cad69d2d8bd7cc169fb1ccb039fc.png" /></p>
<p>These were easily read into Minitab, using <strong>File > Open</strong>, specifying the first row of data as row 9, and changing the delimiter from comma to semicolon. </p>
<p style="margin-left: 40px;"><img src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/65ceccab-4e9a-4eba-8ce7-73b8b3d4d078/File/a533773b174b01e721e6bae8f3240cdb/a533773b174b01e721e6bae8f3240cdb.png" style="line-height: 20.8px;" /></p>
<p>I read all of these monthly files into individual Minitab worksheets and then used <strong>Data > Stack Worksheets</strong> to create a single worksheet that contained all the data. </p>
Creating and Reviewing the Time Series Plots
<p>Using <strong>Graph > Time Series Plot, </strong>I created the following time series plots. To get each year in different colours, I double-clicked on an individual data point in the chart, chose the "Groups" tab in the Edit Symbols dialog box, and put Year as the grouping variable.</p>
<p style="margin-left: 40px;"><img src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/65ceccab-4e9a-4eba-8ce7-73b8b3d4d078/File/d8735a9c83b4d1fab3b48b1d850cab38/d8735a9c83b4d1fab3b48b1d850cab38.png" style="line-height: 20.8px;" /></p>
<p>Looking at this plot, it was clear that the most electricity is generated in the summer months and least in the winter months, but it was not easy to identify if the amount of electricity generated had been declining. I needed to consider another analytical approach.</p>
<p>Since I have only noticed this problem in the last 6 months, (Jan to June 2016) I decided to compare the electricity generated in the first 6 months of the year for the years 2012–2016. I did this using <strong>Assistant > Hypothesis Tests > One Way Anova</strong>. The descriptive results were as follows:</p>
<p style="margin-left: 40px;"><img src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/65ceccab-4e9a-4eba-8ce7-73b8b3d4d078/File/535915f4f060684ccbfb1bf1cf34475b/535915f4f060684ccbfb1bf1cf34475b.png" style="line-height: 1.6;" /></p>
<p>Just looking at the summary statistics, I can clearly see that the average electric units generated per day for the first six months of 2016 is much lower at 5.71 units than it was in the previous years, which range between 8.15 in 2012 and 9.22 in 2014. However by using the results from the one-way ANOVA I can work out if 2016 is <em>significantly </em>worse than previous years. </p>
<p>From this chart, you can see that the p-value is less than 0.001. Hence, we can conclude that not all the group means are equal. By using the Means Comparision Chart, shown below I can also see that 2016 is significantly lower than all the other years.</p>
<p style="margin-left: 40px;"><img src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/65ceccab-4e9a-4eba-8ce7-73b8b3d4d078/File/c3604a8dd269c552d10b231ad9e28f50/c3604a8dd269c552d10b231ad9e28f50.png" /></p>
<p>However, you might be thinking that first six months 2016 in England were darker than an average year, and there has been significantly less UV light. This might be a fair point, so to check this I looked at data produced by the UK Met Office, <strong>(<a href="http://www.metoffice.gov.uk/climate/uk/summaries/anomalygraphs">www.metoffice.gov.uk/climate/uk/summaries/anomalygraphs</a><u>)</u>. </strong>These charts, called anomaly graphs, compare the sunshine levels by month for particular years to the average sunshine levels for the previous decade.</p>
<p>The results for 2016 and 2012, the two worst years for average electricity generated per day, are as follows: </p>
<p style="margin-left: 40px;"><img src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/65ceccab-4e9a-4eba-8ce7-73b8b3d4d078/File/2a6f05e175bfc75a8fdf9ccb91037eef/2a6f05e175bfc75a8fdf9ccb91037eef.png" /></p>
<p style="margin-left: 40px;"><img src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/65ceccab-4e9a-4eba-8ce7-73b8b3d4d078/File/1803e9942cfdad869abf51aad522a874/1803e9942cfdad869abf51aad522a874.png" /></p>
<p>When I compare Met Office data for the amount of sunshine in the first six months of 2016 in England (red bar), with 2012, the second-worst year according to my the summary statistics, I can see that only Jan and March were better in 2012. It should also be noted you generate more electricity when there are more daylight hours. So a bad June has a bigger influence on electricity generated than a bad January, and June in 2012 was worse than 2016.</p>
<p>Consequently, I can see that the English weather cannot be blamed for the lower electricity generation figures and the fault is with my inverter. The next steps are to determine when this problem with the inverter started, and estimate what it has cost. </p>
<p>After I shared my results, the helpdesk at the manufacturer identified the problem with the Inverter: it had been set up with German power grid settings, and apparently the UK grid has more voltage fluctuation. The settings were changed on 15th July, and I'm looking forward to collecting more data and analyzing it in Minitab to determine whether this problem has been solved</p>
<p> </p>
ANOVAData AnalysisFun StatisticsHypothesis TestingStatisticsFri, 26 Aug 2016 12:00:00 +0000http://blog.minitab.com/blog/using-data-and-statistics/sunny-day-for-a-statistician-vs-dark-day-for-a-householder-with-solar-panelsGillian Groom