Hypothesis Testing | MinitabBlog posts and articles about hypothesis testing, especially in the course of Lean Six Sigma quality improvement projects.
http://blog.minitab.com/blog/hypothesis-testing-2/rss
Tue, 27 Sep 2016 15:28:46 +0000FeedCreator 1.7.3Descriptive vs. Inferential Statistics: When Is a P-value Superfluous?
http://blog.minitab.com/blog/statistics-and-quality-data-analysis/descriptive-vs-inferential-statistics-when-is-a-p-value-superfluous
<p>True or false: When comparing a parameter for two sets of measurements, you should always use a hypothesis test to determine whether the difference is statistically significant.</p>
<p>The answer? (<em>drumroll...</em>) True!</p>
<p>...and False!</p>
<p>To understand this paradoxical answer, you need to keep in mind the difference between samples, populations, and descriptive and inferential statistics. </p>
Descriptive Statistics and Populations
<p>Consider the fictional countries of Glumpland and Dolmania.</p>
<p style="text-align: center;"><img alt="Welcome to Glumpland!" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/c1f88e0e6d3e4e55684392ec5a8069e8/glumpland.jpg" style="width: 350px; height: 232px;" /></p>
<img alt="wkshet" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/47e5470dd8123218763ac3666f64bbdd/glumpland_dolmania_wkshet.jpg" style="line-height: 20.8px; width: 222px; height: 579px; float: right;" />
<p>The population of Glumpland is 8,442,012. The population of Dolmania is 6,977,201. For each country, the age of every citizen (to the nearest tenth), <a href="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/080981611ba11403dc8fde411e81d150/glumpland_and_dolmania_ages.mpj">is recorded in a cell of a Minitab worksheet</a>. </p>
<p>Using <strong>Stat > Basic Statistics > Display Descriptive Statistics</strong> we can quickly calculate the mean age of each country.</p>
<p style="margin-left: 40px;"><img alt="desc stats" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/1a791dd23ba85673193f20c2c9971fa4/mean_age_glump_and_dol.jpg" style="width: 316px; height: 96px;" /></p>
<p>It looks like Dolmanians are, on average, more youthful than Glumplanders. But is this difference in means statistically significant?</p>
<p>To find out, we might be tempted to evaluate these data using a <span><a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-t-tests%3A-1-sample%2C-2-sample%2C-and-paired-t-tests">2-sample t-test</a></span>.</p>
<p>Except for one thing: there's absolutely no point in doing that.</p>
<p>That's because these calculated means <em>are</em> the means of the entire populations. So we already know that the population means differ.</p>
<p>Another example. Suppose a baseball player gets 213 hits in 680 at bats in 2015, and 178 hits in 532 at bats in 2016.</p>
<p>Would you need a 2-proportions test to determine whether the difference in batting averages (.313 vs .335) is statistically significant? Of course not.</p>
<p>You've already calculated the proportions using all the data for the entire two seasons. There's nothing more to extrapolate. And yet you often see a hypothesis test applied in this type of situation, in the mistaken belief that if there's no p-value, the results aren't "solid" or "statistical" enough.</p>
<p>But if you've collected every possible piece of data for a population, that's about as solid as you can get!</p>
Inferential Statistics and Random Samples
<p>Now suppose that draconian budget cuts have made it infeasible to track and record the age of every resident in Glumpland and Dolmania. <span style="line-height: 1.6;">What can they do? </span></p>
<p><span style="line-height: 1.6;">Quite a lot, actually. They can apply inferential statistics, which is based on random sampling, to make reliable estimates without those millions of data values they don't have.</span></p>
<p>To see how it works, use <strong>Calc > Random Data > Sample from columns</strong> in Minitab. Randomly sample 50 values from the 8,422,012 values in column C1, which includes the ages of the entire population of Glumpland. Then use descriptive statistics to calculate the mean of the sample.</p>
<p>Here are the results for one random sample of 50:</p>
<p style="margin-left: 40px;"><strong>Descriptive Statistics: GPLND (50)</strong><br />
<span style="line-height: 1.6;">Variable Mean</span><br />
<span style="line-height: 1.6;">GPLND(50) 52.37</span></p>
<p>The sample mean, 52.37 is slightly less than the true mean age of 53 for the entire population of Glumpland. What about another random sample of 50?</p>
<p style="margin-left: 40px;"><strong>Descriptive Statistics: GPLND (50) </strong><br />
<span style="line-height: 1.6;">Variable Mean</span><br />
<span style="line-height: 1.6;">GPLND(50) 54.11</span></p>
<p>Hmm. This sample mean of 54.11 slightly <em>overshoots</em> the true population mean of 53.</p>
<p>Even though the sample estimates are in the ballpark of the true population mean, we're seeing some variation. <span style="line-height: 1.6;">How much variation can we expect? Using descriptive statistics alone, we have no inkling of how "close" a sample estimate might be to the truth. </span></p>
Enter...the Confidence Interval
<p>To quantify the precision of a sample estimate for the population, we can use a powerful tool in inferential statistics: the confidence interval.</p>
<p>Suppose you take random samples of size 5, 10, 20, 50, and 100 from Glumpland and Dolmania using <strong>Calc > Random Data > Sample from columns</strong>. Then use <strong>Graph > Interval Plot > Multiple Ys</strong> to display the 95% confidence intervals for the mean of each sample.</p>
<p>Here's what the interval plots look for the random samples in my worksheet.</p>
<p style="margin-left: 40px;"><img alt="interval plot Glumpland" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/262031cc398ee9d48031fe1f43b38bdf/interval_plot_of_glumpland.jpg" style="line-height: 20.8px; width: 576px; height: 384px;" /></p>
<p style="margin-left: 40px;"><img alt="Interval plot Dolmania" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/75440d94eaff64a63e338b480029945b/interval_plot_of_dolmania.jpg" style="width: 576px; height: 384px;" /></p>
<p>Your plots will look different based on your random samples, but you should notice a similar pattern: The sample mean estimates (the blue dots) tend to vary more from the population mean as the sample sizes decrease. To compensate for this, the intervals "stretch out" more and more, to ensure the same 95% overall probability of "capturing" the true population mean.</p>
<p>The larger samples produce narrower intervals. In fact, using only 50-100 data values, we can closely estimate the mean of over 8.4 million values, and get a general sense of how precise the estimate is likely to be. That's the incredible power of random sampling and inferential statistics!</p>
<p>To display side-by-side confidence intervals of the mean estimates for Glumpland and Dolmania, you can use an interval plot with groups.</p>
<p style="margin-left: 40px;"><img alt="interval plot side by side" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/9e6348c87befdaf6434dbe80e8257516/interval_plot_of_age_side_by_side.jpg" style="width: 576px; height: 384px;" /></p>
<p>Now, you might be tempted to use these results to infer whether there's a statistically significant difference in the mean age of the populations of Glumpland and Dolmania. But don't. Confidence intervals can be misleading for that purpose.</p>
<p>For that, we need another powerful tool of inferential statistics...</p>
Enter...the hypothesis test and p-value
<p>The 2-sample t-test is used to determine whether there is a statistically significant difference in the means of the populations from which the two random samples were drawn. The following table shows the t-test results for each pair of same-sized samples from Glumpland and Dolmania. As the sample size increases, notice what happens to the p-value and the confidence interval for the difference between the population means.</p>
<p style="margin-left: 40px;"><img alt="t tests" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/7c1bf45756a7fb621094086e5350fef9/2_sample_t_test.jpg" style="width: 526px; height: 757px;" /></p>
<p>Again, the confidence intervals tend to get wider as the samples get smaller. With smaller samples, we're less certain of the precision of the estimate for the difference..</p>
<p>In fact, only for the two largest random samples (N=50 and N=100) is the p-value less than a 0.05 level of significance, allowing us to conclude that the mean ages of Glumplanders and Dolmanians are statistically different. For the three smallest samples (N=20, N=10, N=5), the p-value is greater than 0.05, and confidence interval for each of these small samples includes 0. Therefore, we cannot conclude that there is difference in the population means.</p>
<p>But remember, we already know that the true population means actually <em>do</em> differ by 5.4 years. We just can't statistically "prove" it with the small samples. That's why statisticians bristle when someone says, "The p-value is not less than 0.05. Therefore, there's no significant difference between the groups." There might very well be. So it's safer to say, especially with small samples, "<em>we don't have enough evidence </em>to conclude that there's a significant difference between the groups."</p>
<p>It's not just a matter of nit-picky semantics. It's simply the truth, as you can see when you take random samples of various sizes from the same known populations and test them for a difference.</p>
Wrap-up
<p>If you have a random sample, you should always accompany estimates of statistical parameters with a confidence interval and p-value, whenever possible. Without them, there's no way to know whether you can safely extrapolate to the entire population. But if you already know every value of the population, you're good to go. You don't need a p-value, a t-test, or a CI—any more than you need a clue to determine whats inside a box, if you already know what's in it.</p>
Data AnalysisHypothesis TestingLearningStatisticsFri, 23 Sep 2016 12:08:00 +0000http://blog.minitab.com/blog/statistics-and-quality-data-analysis/descriptive-vs-inferential-statistics-when-is-a-p-value-superfluousPatrick RunkelCreating Value from Your Data
http://blog.minitab.com/blog/applying-statistics-in-quality-projects/creating-value-from-your-data
<p>There may be huge potential benefits waiting in the data in your servers. These data may be used for many different purposes. Better data allows better decisions, of course. Banks, insurance firms, and telecom companies already own a large amount of data about their customers. These resources are useful for building a more personal relationship with each customer.</p>
<p>Some organizations already use data from agricultural fields to build complex and customized models based on a very extensive number of input variables (soil characteristics, weather, plant types, etc.) in order to improve crop yields. Airline companies and large hotel chains use dynamic pricing models to improve their yield management. Data is increasingly being referred as the new “gold mine” of the 21st century.</p>
<p>A couple of factors underlie the rising prominence of data (and, therefore, data analysis):</p>
<p><img alt="Afficher l'image d'origine" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/File/de034e63187d191e1666721fa12a8880/de034e63187d191e1666721fa12a8880.png" style="width: 283px; height: 212px; margin: 10px 15px; float: right;" /></p>
Huge volumes of data
<p><span style="line-height: 1.6;">Data acquisition has never been easier (sensors in manufacturing plants, sensors in connected objects, data from internet usage and web clicks, from credit cards, fidelity cards, Customer Relations Management databases, satellite images etc…) and it can easily be stored at costs that are lower than ever before (huge storage capacity now available on the cloud and elsewhere). The amount of data that is being collected is not only huge, it is growing very fast… in an exponential way.</span></p>
Unprecedented velocity
<p>Connected devices, like our smart phones, provide data in almost real time and it can be processed very quickly. It is now possible to react to any change…almost immediately.</p>
Incredible variety
<p>The data collected is not be restricted to billing information; every source of data is potentially valuable for a business. Not only is numeric data getting collected in a massive way, but also unstructured data such as videos, pictures, etc., in a large variety of situations.</p>
<p>But the explosion of data available to us is prompting every business to wrestle with an extremely complicated problem:</p>
How can we create value from these resources ?
<p>Very simple methods, such as counting words used in queries submitted to company web sites, do provide a good insight as to the general mood of your customers and its evolution. Simple statistical correlations are often used by web vendors to suggest a purchase just after buying a product on the web. Very simple descriptive statistics are also useful.</p>
<p>Just guess what could be achieved from advanced regression models or powerful statistical multivariate techniques, which can be applied easily with <a href="http://www.minitab.com/products/minitab/">statistical software packages like Minitab</a>.</p>
A simple example of the benefits of analyzing an enormous database
<p>Let's consider an example of how one company benefited from analyzing a very large database.</p>
<p><span style="line-height: 20.8px;">Many steps are needed (security and safety checks, cleaning the cabin, etc.) before a plane can depart.</span><span style="line-height: 20.8px;"> Since d</span><span style="line-height: 20.8px;">elays negatively impact customer perceptions and also affect productivity, a</span><span style="line-height: 1.6;">irline companies routinely collect a very large amount of data related to flight delays and times required to perform tasks before departure. Some times are automatically collected, others are manually recorded.</span></p>
<p>A major worldwide airline company intended to use this data to identify the crucial milestones among a very large number of preparation steps, and which ones often triggered delays in departure times. The company used Minitab's <span><a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-smackdown-stepwise-versus-best-subsets">stepwise regression analysis</a></span> to quickly focus on the few variables that played a major role among a large number of potential inputs. Many variables turned out to be statistically significant, but two among them clearly seemed to make a major contribution (X6 and X10).</p>
<p style="margin-left: 40px;">Analysis of Variance1</p>
<p style="margin-left: 40px;">Source DF Seq SS <strong><span style="color: rgb(0, 0, 128);">Contribution </span></strong> Adj SS Adj MS F-Value P-Value</p>
<p style="margin-left: 40px;"><span style="line-height: 1.6;"> X6 1 337394 </span><span style="line-height: 1.6; color: rgb(0, 0, 128);"><strong>53.54%</strong></span><span style="line-height: 1.6;"> 2512 2512.2 29.21 0.000</span></p>
<p style="margin-left: 40px;"><span style="line-height: 1.6;"> X10 1 112911 </span><strong style="line-height: 1.6;"><span style="color: rgb(0, 0, 128);"> 17.92%</span> </strong><span style="line-height: 1.6;"> 66357 66357.1 771.46 0.000</span></p>
<p>When huge databases are used, statistical analyses may become overly sensitive and <a href="http://blog.minitab.com/blog/the-stats-cat/sample-size-statistical-power-and-the-revenge-of-the-zombie-salmon-the-stats-cat">detect even very small differences</a> (due to the large sample and power of the analysis). P values often tend to be quite small (p < 0.05) for a large number of predictors.</p>
<p>However, in Minitab, if you click on Results in the regression dialogue box and select Expanded tables, contributions from each variable will get displayed. X6 and X10 when considered together were contributing to more than 80% of the overall variability (with the largest F values by far), the contributions from the remaining factors were much smaller. The airline then ran a residual analysis to cross-validate the final model. </p>
<p>In addition, a Principal Component Analysis (<a href="http://blog.minitab.com/blog/applying-statistics-in-quality-projects/use-statistics-to-better-understand-your-customers">PCA, a multivariate technique</a>) was performed in Minitab to describe the relations between the most important predictors and the response. Milestones were expected to be strongly correlated to the subsequent steps.</p>
<p style="margin-left: 40px;"><img src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/File/c023d71140ea4ee2b5b22480712a55a4/c023d71140ea4ee2b5b22480712a55a4.png" /></p>
<p>The graph above is a Loading Plot from a principal component analysis. Lines that go in the same direction and are close to one another indicate how the variables may be grouped. Variables are visually grouped together according to their statistical correlations and how closely they are related.</p>
<p>A group of nine variables turned out to be strongly correlated to the most important inputs (X6 and X10) and to the final delay times (Y). Delays at the X6 stage obviously affected the X7 and X8 stages (subsequent operations), and delays from X10 affected the subsequent X11 and X12 operations.</p>
Conclusion
<p>This analysis provided simple rules that this airline's crews can follow in order to avoid delays, making passengers' next flight more pleasant. </p>
<p>The airline can repeat this analysis periodically to search for the next most important causes of delays. Such an approach can propel innovation and help organizations replace traditional and intuitive decision-making methods with data-driven ones.</p>
<p>What's more, the use of data to make things better is not restricted to the corporate world. More and more public administrations and non-governmental organizations are making large, open databases easily accessible to communities and to virtually anyone. </p>
ANOVAData AnalysisHypothesis TestingRegression AnalysisStatisticsStatistics in the NewsTue, 06 Sep 2016 13:19:00 +0000http://blog.minitab.com/blog/applying-statistics-in-quality-projects/creating-value-from-your-dataBruno ScibiliaSunny Day for A Statistician vs. Dark Day for A Householder with Solar Panels
http://blog.minitab.com/blog/using-data-and-statistics/sunny-day-for-a-statistician-vs-dark-day-for-a-householder-with-solar-panels
<p>In 2011 we had solar panels fitted on our property. In the last few months we have noticed a few problems with the inverter (the equipment that converts the electricity generated by the panels from DC to AC, and manages the transfer of unused electric to the power company). It was shutting down at various times throughout the day, typically when it was very sunny, resulting in no electricity being generated.<img alt="solar panels" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/0ee09d62f414b4bd79601d23995458bf/solar.jpg" style="width: 400px; height: 267px; margin: 10px 15px; float: right;" /></p>
<p>I contacted the inverter manufacturer for some help to diagnose the problem. They asked me to download their monitoring app, called Sunny Portal. I did this and started a communication process with the inverter via Bluetooth, which not only showed me the error code but also delivered a time series of the electricity generated by the hour since the panels were installed.</p>
<p>I thought I had gone to statistician heaven! By using this data, I could establish if this problem was significantly reducing the amount of electricity generated and, consequently, reducing the amount of cash I was being paid for generating electricity. </p>
<p>The Sunny Portal, does have some basic bar charts to plot <span><a href="http://blog.minitab.com/blog/real-world-quality-improvement/3-ways-to-examine-data-over-time">time series</a></span>, by the month, day, and 5-minute interval; however, each chart automatically works out the scale according to the data so it is difficult to compare time periods. </p>
<div>
<p><strong>Top Minitab Tip</strong>: If you want to compare multiple charts measuring the same thing for different time periods or groups, make sure the Y-axis scales are the same. In many Minitab graphs and charts, if you select the Multiple Graphs button you will be given the option to select the same Y-axis scale.</p>
</div>
Getting the Data into Minitab
<p>I realized that I could output the data to text files, which meant I could use my statistical skills and Minitab to answer my questions. For each month between Sept 2011 and June 2016 I exported a file like the example shown below. For each day I have the date, the cumulative units generated since the inverter was commissioned, and the daily generation.</p>
<p style="margin-left: 40px;"><img src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/65ceccab-4e9a-4eba-8ce7-73b8b3d4d078/File/06a0cad69d2d8bd7cc169fb1ccb039fc/06a0cad69d2d8bd7cc169fb1ccb039fc.png" /></p>
<p>These were easily read into Minitab, using <strong>File > Open</strong>, specifying the first row of data as row 9, and changing the delimiter from comma to semicolon. </p>
<p style="margin-left: 40px;"><img src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/65ceccab-4e9a-4eba-8ce7-73b8b3d4d078/File/a533773b174b01e721e6bae8f3240cdb/a533773b174b01e721e6bae8f3240cdb.png" style="line-height: 20.8px;" /></p>
<p>I read all of these monthly files into individual Minitab worksheets and then used <strong>Data > Stack Worksheets</strong> to create a single worksheet that contained all the data. </p>
Creating and Reviewing the Time Series Plots
<p>Using <strong>Graph > Time Series Plot, </strong>I created the following time series plots. To get each year in different colours, I double-clicked on an individual data point in the chart, chose the "Groups" tab in the Edit Symbols dialog box, and put Year as the grouping variable.</p>
<p style="margin-left: 40px;"><img src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/65ceccab-4e9a-4eba-8ce7-73b8b3d4d078/File/d8735a9c83b4d1fab3b48b1d850cab38/d8735a9c83b4d1fab3b48b1d850cab38.png" style="line-height: 20.8px;" /></p>
<p>Looking at this plot, it was clear that the most electricity is generated in the summer months and least in the winter months, but it was not easy to identify if the amount of electricity generated had been declining. I needed to consider another analytical approach.</p>
<p>Since I have only noticed this problem in the last 6 months, (Jan to June 2016) I decided to compare the electricity generated in the first 6 months of the year for the years 2012–2016. I did this using <strong>Assistant > Hypothesis Tests > One Way Anova</strong>. The descriptive results were as follows:</p>
<p style="margin-left: 40px;"><img src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/65ceccab-4e9a-4eba-8ce7-73b8b3d4d078/File/535915f4f060684ccbfb1bf1cf34475b/535915f4f060684ccbfb1bf1cf34475b.png" style="line-height: 1.6;" /></p>
<p>Just looking at the summary statistics, I can clearly see that the average electric units generated per day for the first six months of 2016 is much lower at 5.71 units than it was in the previous years, which range between 8.15 in 2012 and 9.22 in 2014. However by using the results from the one-way ANOVA I can work out if 2016 is <em>significantly </em>worse than previous years. </p>
<p>From this chart, you can see that the p-value is less than 0.001. Hence, we can conclude that not all the group means are equal. By using the Means Comparision Chart, shown below I can also see that 2016 is significantly lower than all the other years.</p>
<p style="margin-left: 40px;"><img src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/65ceccab-4e9a-4eba-8ce7-73b8b3d4d078/File/c3604a8dd269c552d10b231ad9e28f50/c3604a8dd269c552d10b231ad9e28f50.png" /></p>
<p>However, you might be thinking that first six months 2016 in England were darker than an average year, and there has been significantly less UV light. This might be a fair point, so to check this I looked at data produced by the UK Met Office, <strong>(<a href="http://www.metoffice.gov.uk/climate/uk/summaries/anomalygraphs">www.metoffice.gov.uk/climate/uk/summaries/anomalygraphs</a><u>)</u>. </strong>These charts, called anomaly graphs, compare the sunshine levels by month for particular years to the average sunshine levels for the previous decade.</p>
<p>The results for 2016 and 2012, the two worst years for average electricity generated per day, are as follows: </p>
<p style="margin-left: 40px;"><img src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/65ceccab-4e9a-4eba-8ce7-73b8b3d4d078/File/2a6f05e175bfc75a8fdf9ccb91037eef/2a6f05e175bfc75a8fdf9ccb91037eef.png" /></p>
<p style="margin-left: 40px;"><img src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/65ceccab-4e9a-4eba-8ce7-73b8b3d4d078/File/1803e9942cfdad869abf51aad522a874/1803e9942cfdad869abf51aad522a874.png" /></p>
<p>When I compare Met Office data for the amount of sunshine in the first six months of 2016 in England (red bar), with 2012, the second-worst year according to my the summary statistics, I can see that only Jan and March were better in 2012. It should also be noted you generate more electricity when there are more daylight hours. So a bad June has a bigger influence on electricity generated than a bad January, and June in 2012 was worse than 2016.</p>
<p>Consequently, I can see that the English weather cannot be blamed for the lower electricity generation figures and the fault is with my inverter. The next steps are to determine when this problem with the inverter started, and estimate what it has cost. </p>
<p>After I shared my results, the helpdesk at the manufacturer identified the problem with the Inverter: it had been set up with German power grid settings, and apparently the UK grid has more voltage fluctuation. The settings were changed on 15th July, and I'm looking forward to collecting more data and analyzing it in Minitab to determine whether this problem has been solved</p>
<p> </p>
ANOVAData AnalysisFun StatisticsHypothesis TestingStatisticsFri, 26 Aug 2016 12:00:00 +0000http://blog.minitab.com/blog/using-data-and-statistics/sunny-day-for-a-statistician-vs-dark-day-for-a-householder-with-solar-panelsGillian GroomData Not Normal? Try Letting It Be, with a Nonparametric Hypothesis Test
http://blog.minitab.com/blog/understanding-statistics/data-not-normal-try-letting-it-be-with-a-nonparametric-hypothesis-test
<p>So the data you nurtured, that you worked so hard to format and make useful, failed the normality test.</p>
<img alt="not-normal" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/c6e92e8046f3fcee28e7cf505fb77005/data_freak_flag_300.jpg" style="line-height: 20.8px; width: 300px; height: 293px; margin: 10px 15px; float: right;" />
<p>Time to face the truth: despite your best efforts, that data set is <em>never </em>going to measure up to the assumption you may have been trained to fervently look for.</p>
<p>Your data's lack of normality seems to make it poorly suited for analysis. Now what?</p>
<p>Take it easy. Don't get uptight. Just let your data be what they are, go to the <strong>Stat </strong>menu in Minitab Statistical Software, and choose "Nonparametrics."</p>
<p style="margin-left: 40px;"><img alt="nonparametrics menu" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/fbebf763ac6bd92b40c0d241b7c4029c/nonparametrics_menu.png" style="width: 367px; height: 309px;" /></p>
<p>If you're stymied by your data's lack of normality, nonparametric statistics might help you find answers. And if the word "nonparametric" looks like five syllables' worth of trouble, don't be intimidated—it's just a big word that usually refers to "tests that don't assume your data follow a normal distribution."</p>
<p>In fact, nonparametric statistics don't assume your data follow <em>any distribution at all</em>. The following table lists common parametric tests, their equivalent nonparametric tests, and the main characteristics of each.</p>
<p style="margin-left: 40px;"><img alt="correspondence table for parametric and nonparametric tests" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/4a69043809861f5187be271de67f8161/parametric_correspondence_table.png" style="width: 661px; height: 488px;" /></p>
<p>Nonparametric analyses free your data from the straitjacket of the <span style="line-height: 20.8px;">normality </span><span style="line-height: 1.6;">assumption. So choosing a nonparametric analysis is sort of like removing your data from a stifling, </span><a href="https://www.verywell.com/the-asch-conformity-experiments-2794996" style="line-height: 1.6;" target="_blank">conformist environment</a><span style="line-height: 1.6;">, and putting it into </span><a href="https://en.wikipedia.org/wiki/Utopia" style="line-height: 1.6;" target="_blank">a judgment-free, groovy idyll</a><span style="line-height: 1.6;">, where your data set can just be what it is, with no hassles about its unique and beautiful shape. How cool is </span><em style="line-height: 1.6;">that</em><span style="line-height: 1.6;">, man? Can you dig it?</span></p>
<p>Of course, it's not <em>quite </em>that carefree. Just like the 1960s encompassed both <a href="https://en.wikipedia.org/wiki/Woodstock" target="_blank">Woodstock</a> and <a href="https://en.wikipedia.org/wiki/Altamont_Free_Concert" target="_blank">Altamont</a>, so nonparametric tests offer both compelling advantages and serious limitations.</p>
Advantages of Nonparametric Tests
<p>Both parametric and nonparametric tests draw inferences about populations based on samples, but parametric tests focus on sample parameters like the mean and the standard deviation, and make various assumptions about your data—for example, that it follows a normal distribution, and that samples include a minimum number of data points.</p>
<p>In contrast, nonparametric tests are unaffected by the distribution of your data. Nonparametric tests also accommodate many conditions that parametric tests do not handle, including small sample sizes, ordered outcomes, and outliers.</p>
<p>Consequently, they can be used in a wider range of situations and with more types of data than traditional parametric tests. Many people also feel that nonparametric analyses are more intuitive.</p>
Drawbacks of Nonparametric Tests
<p><span style="line-height: 20.8px;">But nonparametric tests are not </span><em style="line-height: 20.8px;">completely </em><span style="line-height: 20.8px;">free from assumptions—they do require data to be an independent random sample, for example.</span></p>
<p>And nonparametric tests aren't a cure-all. For starters, they typically have less <a href="http://blog.minitab.com/blog/starting-out-with-statistical-software/how-powerful-am-i-power-and-sample-size-in-minitab">statistical power</a> than parametric equivalents. Power is the probability that you will correctly reject the null hypothesis when it is false. That means you have an increased chance making a Type II error with these tests.</p>
<p>In practical terms, that means nonparametric tests are <em>less </em>likely to detect an effect or association when one really exists.</p>
<p>So if you want to draw conclusions with the same confidence level you'd get using an equivalent parametric test, you will need larger sample sizes. </p>
<p>Nonparametric tests are not a one-size-fits-all solution for non-normal data, but they can yield good answers in situations that parametric statistics just won't work.</p>
Is Parametric or Nonparametric the Right Choice for You?
<p>I've briefly outlined differences between parametric and nonparametric hypothesis tests, looked at which tests are equivalent, and considered some of their advantages and disadvantages. If you're waiting for me to tell you which direction you should choose...well, all I can say is, "It depends..." But I can give you some established rules of thumb to consider when you're looking at the specifics of your situation.</p>
<p>Keep in mind that <strong>nonnormal data does not immediately disqualify your data for a parametric test</strong>. What's your sample size? <span style="line-height: 20.8px;">As long as a certain minimum sample size is met, most parametric tests will be </span><a href="http://blog.minitab.com/blog/fun-with-statistics/forget-statistical-assumptions-just-check-the-requirements" style="line-height: 20.8px;">robust to the normality assumption</a><span style="line-height: 20.8px;">. </span><span style="line-height: 1.6;">For example, the Assistant in Minitab (which uses Welch's t-test) points out that </span><span style="line-height: 1.6;">while the 2-sample t-test is based on the assumption that the data are normally distributed, this assumption is not critical when the sample sizes are at least 15. And Bonnett's 2-sample standard deviation test performs well for nonnormal data even when sample sizes are as small as 20. </span></p>
<p><span style="line-height: 1.6;">In addition, while they may not require normal data, many nonparametric tests have other assumptions that you can’t disregard.</span> For example, t<span style="line-height: 20.8px;">he Kruskal-Wallis test assumes your samples come from populations that have similar shapes and equal variances. </span><span style="line-height: 1.6;">And the 1-sample Wilcoxon test does not assume a particular population distribution, but it does assume the distribution is symmetrical. </span></p>
<p><span style="line-height: 1.6;">In most cases, your choice between parametric and nonparametric tests ultimately comes down to sample size, and whether the center of your data's distribution is better reflected by the mean or the median.</span></p>
<ul>
<li>If the mean accurately represents the center of your distribution and your sample size is large enough, a parametric test offers you better accuracy and more power. </li>
<li>If your sample size is small, you'll likely need to go with a nonparametric test. But if the median better represents the center of your distribution, a nonparametric test may be a better option even for a large sample.</li>
</ul>
<p> </p>
Data AnalysisHypothesis TestingStatisticsStatistics HelpMon, 22 Aug 2016 12:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/data-not-normal-try-letting-it-be-with-a-nonparametric-hypothesis-testEston MartzHave You Accidentally Done Statistics?
http://blog.minitab.com/blog/statistics-and-quality/have-you-accidentally-done-statistics
<p>Have you ever accidentally done statistics? Not all of us can (or would want to) be “stat nerds,” but the word “statistics” shouldn’t be scary. In fact, we all analyze things that happen to us every day. Sometimes we don’t realize that we are compiling data and analyzing it, but that’s exactly what we are doing. Yes, there are advanced statistical concepts that can be difficult to understand—but there are many concepts that we use every day that we don’t realize are statistics.</p>
<p>I consider myself a student of baseball, so my example of unknowingly performing statistical procedures concerns my own experiences playing that game.</p>
<p>My baseball career ended as a 5’7” college freshman walk-on. When I realized that my ceiling as a catcher was a lot lower than my 6’0”-6’5” teammates I hung up my spikes. As an adult, while finishing my degree in Business Statistics, I had the opportunity to shadow a couple of scouts from the Major League Baseball Scouting Bureau. Yes, I’ve seen <a href="http://blog.minitab.com/blog/the-statistics-game/moneyball-shows-the-power-of-statistics"><em>Moneyball </em></a>and I know that traditional scouting methods are reputed to conflict with the methods of stat nerds like myself, but as a former player I wanted to see what these scouts were looking at. </p>
<p><img alt="baseball statistics" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/076e1f8132a222e6204e393eb0d3e9a2/baseball_stats.jpg" style="width: 278px; height: 313px; margin: 10px 15px; float: right;" />My first day with the scouts I found out they were traditional baseball guys. They didn’t believe data could tell how good a player is better than observation could, and ultimately they didn't think statistics were important to what they do. </p>
<p>I found their thinking to be a little off, and a little funny. Although they didn’t believe in statistics, the tools they use for their jobs actually quantify a player's attributes. I watched as they used a radar gun to measure pitch speed, a stopwatch to measure running speed, and a notepad to record their measurements (they didn’t realize they were compiling data). As one of the scouts was conversing with me, asking how statistics are going to be brought into baseball, he was making a dot plot by hand of the pitcher's pitches by speed to find the velocity distribution of the pitcher.</p>
<p style="margin-left: 40px;"><img height="343" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/b51a0c86-e2dd-456e-878a-4196c7381c3a/File/8361f15f80b379a88187b539c124cad0/8361f15f80b379a88187b539c124cad0.png" width="514" /></p>
<p>After I explained to him that was unknowing creating a dot plot (like the one I created for Rasiel Iglesias using Minitab, and which has a <a href="http://support.minitab.com/minitab/17/topic-library/basic-statistics-and-graphs/summary-statistics/measures-of-central-tendency/">bimodal distribution</a>) we started talking about grading players’ skills. The scouts would grade how players hit, their power, how they run, arm strength, and fielding ability. They used a numeric grading system from 20-80 for each of the characteristics, with 20 being the lowest, 50 being average, and 80 being elite. After they compiled this data they would give the players grades through analysis, and they would create a report with these grades to convey to others what they saw in the player.</p>
<p style="margin-left: 40px;"><img height="401" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/b51a0c86-e2dd-456e-878a-4196c7381c3a/File/a57bd643816872de2fee895f303c0ddc/a57bd643816872de2fee895f303c0ddc.png" width="602" /></p>
<p>I was amazed at how these scouts—true, old-school baseball guys who said stats weren’t important for their jobs—were compiling data and analyzing it for their reports. </p>
<p>A few of the other statistical ideas the scouts were (accidentally) concerned about included the sample size of observations of a player, comparison analysis, and predicting a where a player falls within their physical development (regression).</p>
<p>Like the baseball scouts, many of us are unwittingly doing statistics. Just like these scouts, we run into data all day long without recognizing that we can compile and analyze it. In work we worry about customer satisfaction, wait time, average transaction value, cost ratios, efficiency, etc. And while many people get intimidated when we use the word "statistics," we don’t need advanced degrees to embrace observing, compiling data, and making solid decisions based on our analysis.</p>
<p>So, are <em>you </em>accidentally doing statistics? If you are wanting to get beyond accidentally doing statistics and analyze a little more deliberately, Minitab has many tools like the <a href="http://www.minitab.com/products/minitab/assistant/">Assistant menu</a>, and Stat Guide to help you on your stats journey.</p>
Data AnalysisFun StatisticsHypothesis TestingStatisticsStatistics in the NewsStatsTue, 02 Aug 2016 12:00:00 +0000http://blog.minitab.com/blog/statistics-and-quality/have-you-accidentally-done-statisticsJoseph HartsockOne-Sample t-test: Calculating the t-statistic is not really a bear
http://blog.minitab.com/blog/marilyn-wheatleys-blog/one-sample-t-test-calculating-the-t-statistic-is-not-really-a-bear
<p>While some posts in our Minitab blog focus on <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-t-tests-t-values-and-t-distributions">understanding t-tests and t-distributions</a> this post will focus more simply on how to hand-calculate the t-value for a one-sample t-test (and how to replicate the p-value that Minitab gives us). </p>
<p>The formulas used in this post are available within <a href="http://www.minitab.com/en-us/products/minitab/">Minitab Statistical Software</a> by choosing the following menu path: <strong>Help</strong> > <strong>Methods and Formulas</strong> > <strong>Basic Statistics</strong> > <strong>1-sample t</strong>.</p>
<p>The null and three alternative hypotheses for a one-sample t-test are shown below:</p>
<p style="margin-left: 40px;"><img border="0" height="184" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/553bfcce02e2394b13b5175655c99df6/553bfcce02e2394b13b5175655c99df6.png" width="368" /></p>
<p>The default alternative hypothesis is the last one listed: The true population mean is not equal to the mean of the sample, and this is the option used in this example.</p>
<p><img alt="bear" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/88db51bd8ccbfcbb306372bb65fa4902/bear.jpg" style="margin: 10px 15px; float: right; width: 400px; height: 290px;" />To understand the calculations, we’ll use a sample data set available within Minitab. The name of the dataset is <strong>Bears.MTW</strong>, because the calculation is not a huge bear to wrestle (plus who can resist a dataset with that name?). The path to access the sample data from within Minitab depends on the version of the software. </p>
<p>For the current version of Minitab, <a href="http://www.minitab.com/en-us/products/minitab/whats-new/">Minitab 17.3.1</a>, the sample data is available by choosing <strong>Help</strong> > <strong>Sample Data</strong>.</p>
<p>For previous versions of Minitab, the data set is available by choosing <strong>File</strong> > <strong>Open Worksheet</strong> and clicking the <strong>Look in Minitab Sample Data folder</strong> button at the bottom of the window.</p>
<p>For this example, we will use column C2, titled Age, in the Bears.MTW data set, and we will test the hypothesis that the average age of bears is 40. First, we’ll use <strong>Stat</strong> > <strong>Basic Statistics</strong> > <strong>1-sample t</strong> to test the hypothesis:</p>
<p style="margin-left: 40px;"><img border="0" height="315" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/d3336e100a9a4a91501ed1206c8e807f/d3336e100a9a4a91501ed1206c8e807f.png" width="400" /></p>
<p>After clicking <strong>OK</strong> above we see the following results in the session window:</p>
<p style="margin-left: 40px;"><img border="0" height="118" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/e62a2a776614c60eff0dd6383f66e5f5/e62a2a776614c60eff0dd6383f66e5f5.png" width="464" /></p>
<p>With a high p-value of 0.361, we don’t have enough evidence to conclude that the average age of bears is significantly different from 40. </p>
<p>Now we’ll see how to calculate the T value above by hand.</p>
<p>The formula for the T value (0.92) shown above is calculated using the following formula in Minitab:</p>
<p style="margin-left: 40px;"><img border="0" height="172" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/701f9c0efa98a38fb397f3c3ec459b66/701f9c0efa98a38fb397f3c3ec459b66.png" width="247" /></p>
<p>The output from the 1-sample t test above gives us all the information we need to plug the values into our formula:</p>
<p style="margin-left: 40px;">Sample mean: 43.43</p>
<p style="margin-left: 40px;">Sample standard deviation: 34.02</p>
<p style="margin-left: 40px;">Sample size: 83</p>
<p>We also know that our target or hypothesized value for the mean is 40.</p>
<p>Using the numbers above to calculate the t-statistic we see:</p>
<p style="margin-left: 40px;">t = (43.43-40)/34.02/√83) = <strong>0.918542</strong><br />
(which rounds to 0.92, as shown in Minitab’s 1-sample t-test output)</p>
<p>Now, we <em>could </em>dust off a statistics textbook and use it to compare our calculated t of 0.918542 to the corresponding critical value in a t-table, but that seems like a pretty big bear to wrestle when we can easily get the p-value from Minitab instead. To do that, I’ve used <strong>Graph</strong> > <strong>Probability Distribution Plot</strong> > <strong>View Probability</strong>:</p>
<p style="margin-left: 40px;"><img border="0" height="382" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/e43510dc233e71f22b93f190deb5e523/e43510dc233e71f22b93f190deb5e523.png" width="419" /></p>
<p>In the dialog above, we’re using the t distribution with 82 degrees of freedom (we had an N = 83, so the degrees of freedom for a 1-sample t-test is N-1). Next, I’ve selected the <strong>Shaded Area</strong> tab:</p>
<p style="margin-left: 40px;"><img border="0" height="383" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/e36572b6cead5cf393763d880b6f229a/e36572b6cead5cf393763d880b6f229a.png" width="414" /></p>
<p>In the dialog box above, we’re defining the shaded area by the X value (the calculated t-statistic), and I’ve typed in the t-value we calculated in the <strong>X value</strong> field. This was a 2-tailed test, so I’ve selected <strong>Both Tails</strong> in the dialog above.</p>
<p>After clicking <strong>OK</strong> in the window above, we see:</p>
<p style="margin-left: 40px;"><img border="0" height="384" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/a12abfcbe5ecea6902e4a138e96a53a6/a12abfcbe5ecea6902e4a138e96a53a6.png" width="576" /></p>
<p>We add together the probabilities from both tails, 0.1805 + 0.1805 and that equals 0.361 – the same p-value that Minitab gave us for the 1-sample t test. </p>
<p>That wasn’t so bad—not a difficult bear to wrestle at all!</p>
Data AnalysisFun StatisticsHypothesis TestingLearningStatisticsStatistics HelpStatsWed, 27 Jul 2016 17:57:00 +0000http://blog.minitab.com/blog/marilyn-wheatleys-blog/one-sample-t-test-calculating-the-t-statistic-is-not-really-a-bearMarilyn WheatleyUnderstanding Analysis of Variance (ANOVA) and the F-test
http://blog.minitab.com/blog/adventures-in-statistics/understanding-analysis-of-variance-anova-and-the-f-test
<p>Analysis of variance (ANOVA) can determine whether the means of three or more groups are different. ANOVA uses F-tests to statistically test the equality of means. In this post, I’ll show you how ANOVA and F-tests work using a one-way ANOVA example.</p>
<p>But wait a minute...have you ever stopped to wonder why you’d use an analysis of <em>variance</em> to determine whether <em>means</em> are different? I'll also show how variances provide information about means.</p>
<p>As in my posts about <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-t-tests:-1-sample,-2-sample,-and-paired-t-tests" target="_blank">understanding t-tests</a>, I’ll focus on concepts and graphs rather than equations to explain ANOVA F-tests.</p>
What are F-statistics and the F-test?
<p>F-tests are named after its test statistic, F, which was named in honor of Sir Ronald Fisher. The F-statistic is simply a ratio of two variances. Variances are a measure of dispersion, or how far the data are scattered from the mean. Larger values represent greater dispersion.</p>
<img alt="F is for F-test" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/2176eecdb5dee3586bf90f5dc2ca0007/f.gif" style="line-height: 20.8px; margin: 10px 15px; float: right; width: 200px; height: 221px;" />
<p>Variance is the square of the standard deviation. For us humans, standard deviations are easier to understand than variances because they’re in the same units as the data rather than squared units. However, many analyses actually use variances in the calculations.</p>
<p>F-statistics are based on the ratio of mean squares. The term “<a href="http://support.minitab.com/minitab/17/topic-library/modeling-statistics/anova/anova-statistics/understanding-mean-squares/" target="_blank">mean squares</a>” may sound confusing but it is simply an estimate of population variance that accounts for the <a href="http://support.minitab.com/minitab/17/topic-library/basic-statistics-and-graphs/introductory-concepts/basic-concepts/df/" target="_blank">degrees of freedom (DF)</a> used to calculate that estimate.</p>
<p>Despite being a ratio of variances, you can use F-tests in a wide variety of situations. Unsurprisingly, the F-test can assess the equality of variances. However, by changing the variances that are included in the ratio, the F-test becomes a very flexible test. For example, you can use F-statistics and F-tests to <a href="http://blog.minitab.com/blog/adventures-in-statistics/what-is-the-f-test-of-overall-significance-in-regression-analysis" target="_blank">test the overall significance for a regression model</a>, to compare the fits of different models, to test specific regression terms, and to test the equality of means.</p>
Using the F-test in One-Way ANOVA
<p>To use the F-test to determine whether group means are equal, it’s just a matter of including the correct variances in the ratio. In one-way ANOVA, the F-statistic is this ratio:</p>
<p style="margin-left: 40px;"><strong>F = variation between sample means / variation within the samples</strong></p>
<p>The best way to understand this ratio is to walk through a one-way ANOVA example.</p>
<p>We’ll analyze four samples of plastic to determine whether they have different mean strengths. You can download the <a href="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/File/a8a9c678090ccac0f3be61be91cf8012/plasticstrength.mtw">sample data</a> if you want to follow along. (If you don't have Minitab, you can download a <a href="http://www.minitab.com/en-us/products/minitab/free-trial/" target="_blank">free 30-day trial</a>.) I'll refer back to the one-way ANOVA output as I explain the concepts.</p>
<p>In Minitab, choose <strong>Stat > ANOVA > One-Way ANOVA...</strong> In the dialog box, choose "Strength" as the response, and "Sample" as the factor. Press OK, and Minitab's Session Window displays the following output: </p>
<p style="margin-left: 40px;"><img alt="Output for Minitab's one-way ANOVA" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/42587221b52ed940d53478106c134ebc/1way_swo.png" style="width: 315px; height: 322px;" /></p>
Numerator: Variation Between Sample Means
<p>One-way ANOVA has calculated a mean for each of the four samples of plastic. The group means are: 11.203, 8.938, 10.683, and 8.838. These group means are distributed around the overall mean for all 40 observations, which is 9.915. If the group means are clustered close to the overall mean, their variance is low. However, if the group means are spread out further from the overall mean, their variance is higher.</p>
<p>Clearly, if we want to show that the group means are different, it helps if the means are further apart from each other. In other words, we want higher variability among the means.</p>
<p>Imagine that we perform two different one-way ANOVAs where each analysis has four groups. The graph below shows the spread of the means. Each dot represents the mean of an entire group. The further the dots are spread out, the higher the value of the variability in the numerator of the F-statistic.</p>
<p style="margin-left: 40px;"><img alt="Dot plot that shows high and low variability between group means" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/f9a100946675098ca09c4440a7907230/group_means_dot_plot.png" style="width: 576px; height: 86px;" /></p>
<p>What value do we use to measure the variance between sample means for the plastic strength example? In the one-way ANOVA output, we’ll use the adjusted mean square (Adj MS) for Factor, which is 14.540. Don’t try to interpret this number because it won’t make sense. It’s the sum of the squared deviations divided by the factor DF. Just keep in mind that the further apart the group means are, the larger this number becomes.</p>
Denominator: Variation Within the Samples
<p>We also need an estimate of the variability within each sample. To calculate this variance, we need to calculate how far each observation is from its group mean for all 40 observations. Technically, it is the sum of the squared deviations of each observation from its group mean divided by the error DF.</p>
<p>If the observations for each group are close to the group mean, the variance within the samples is low. However, if the observations for each group are further from the group mean, the variance within the samples is higher.</p>
<p style="margin-left: 40px;"><img alt="Plot that shows high and low variability within groups" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/9ef2eae1cf6bba97ccb1b664356d0d0a/within_group_dplot.png" style="width: 576px; height: 384px;" /></p>
<p>In the graph, the panel on the left shows low variation in the samples while the panel on the right shows high variation. The more spread out the observations are from their group mean, the higher the value in the denominator of the F-statistic.</p>
<p>If we’re hoping to show that the means are different, it's good when the within-group variance is low. You can think of the within-group variance as the background noise that can obscure a difference between means.</p>
<p>For this one-way ANOVA example, the value that we’ll use for the variance within samples is the Adj MS for Error, which is 4.402. It is considered “error” because it is the variability that is not explained by the factor.</p>
The F-Statistic: Variation Between Sample Means / Variation Within the Samples
<p>The F-statistic is the <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/hypothesis-tests/basics/what-is-a-test-statistic/" target="_blank">test statistic</a> for F-tests. In general, an F-statistic is a ratio of two quantities that are expected to be roughly equal under the null hypothesis, which produces an F-statistic of approximately 1.</p>
<p>The F-statistic incorporates both measures of variability discussed above. Let's take a look at how these measures can work together to produce low and high F-values. Look at the graphs below and compare the width of the spread of the group means to the width of the spread within each group.</p>
<img alt="Graph that shows sample data that produce a low F-value" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/a8faab4bb32bf1a1f5864d34d96e8d56/low_f_dplot.png" style="width: 350px; height: 233px;" />
<img alt="Graph that shows sample data that produce a high F-value" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/054b86eb1e48803baba2cff9c78028ab/high_f_dplot.png" style="width: 350px; height: 233px;" />
<p>The low F-value graph shows a case where the group means are close together (low variability) relative to the variability within each group. The high F-value graph shows a case where the variability of group means is large relative to the within group variability. In order to reject the null hypothesis that the group means are equal, we need a high F-value.</p>
<p>For our plastic strength example, we'll use the Factor Adj MS for the numerator (14.540) and the Error Adj MS for the denominator (4.402), which gives us an F-value of 3.30.</p>
<p>Is our F-value high enough? A single F-value is hard to interpret on its own. We need to place our F-value into a larger context before we can interpret it. To do that, we’ll use the F-distribution to calculate probabilities.</p>
F-distributions and Hypothesis Testing
<p>For one-way ANOVA, the ratio of the between-group variability to the within-group variability follows an <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/probability-distributions-and-random-data/distributions/f-distribution/" target="_blank">F-distribution</a> when the null hypothesis is true.</p>
<p>When you perform a one-way ANOVA for a single study, you obtain a single F-value. However, if we drew multiple random samples of the same size from the same population and performed the same one-way ANOVA, we would obtain many F-values and we could plot a distribution of all of them. This type of distribution is known as a <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/introductory-concepts/basic-concepts/sampling-distribution/" target="_blank">sampling distribution</a>.</p>
<p>Because the F-distribution assumes that the null hypothesis is true, we can place the F-value from our study in the F-distribution to determine how consistent our results are with the null hypothesis and to calculate probabilities.</p>
<p>The probability that we want to calculate is the probability of observing an F-statistic that is at least as high as the value that our study obtained. That probability allows us to determine how common or rare our F-value is under the assumption that the null hypothesis is true. If the probability is low enough, we can conclude that our data is inconsistent with the null hypothesis. The evidence in the sample data is strong enough to reject the null hypothesis for the entire population.</p>
<p>This probability that we’re calculating is also known as the p-value!</p>
<p>To plot the F-distribution for our plastic strength example, I’ll use Minitab’s <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/graphs/graphs-of-distributions/probability-distribution-plots/probability-distribution-plot/" target="_blank">probability distribution plots</a>. In order to graph the F-distribution that is appropriate for our specific design and sample size, we'll need to specify the correct number of DF. Looking at our one-way ANOVA output, we can see that we have 3 DF for the numerator and 36 DF for the denominator.</p>
<p><img alt="Probability distribution plot for an F-distribution with a probability" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/6303a2314437d8fcf2f72d9a56b1293a/f_distribution_probability.png" style="width: 576px; height: 384px;" /></p>
<p>The graph displays the distribution of F-values that we'd obtain if the null hypothesis is true and we repeat our study many times. The shaded area represents the probability of observing an F-value that is at least as large as the F-value our study obtained. F-values fall within this shaded region about 3.1% of the time when the null hypothesis is true. This probability is low enough to reject the null hypothesis using the common <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests:-significance-levels-alpha-and-p-values-in-statistics" target="_blank">significance level</a> of 0.05. We can conclude that not all the group means are equal.</p>
<p><a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-correctly-interpret-p-values" target="_blank">Learn how to correctly interpret the p-value.</a></p>
Assessing Means by Analyzing Variation
<p>ANOVA uses the F-test to determine whether the variability between group means is larger than the variability of the observations within the groups. If that ratio is sufficiently large, you can conclude that not all the means are equal.</p>
<p><span style="line-height: 20.8px;">This brings us back to why we analyze variation to make judgments about means. </span>Think about the question: "Are the group means different?" You are implicitly asking about the variability of the means. After all, if the group means <em>don't </em>vary, or don't vary by more than random chance allows, then you can't say the means are different. And that's why you use analysis of variance to test the means.</p>
ANOVAData AnalysisHypothesis TestingLearningStatistics HelpWed, 18 May 2016 12:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/understanding-analysis-of-variance-anova-and-the-f-testJim FrostAn Overview of Discriminant Analysis
http://blog.minitab.com/blog/starting-out-with-statistical-software/an-overview-of-discriminant-analysis
<p>Among the most underutilized statistical tools in Minitab, and I think in general, are multivariate tools. Minitab offers a number of different multivariate tools, including principal component analysis, factor analysis, <span><a href="http://blog.minitab.com/blog/quality-data-analysis-and-statistics/cluster-analysis-tips-part-2">clustering</a></span>, and more. In this post, my goal is to give you a better understanding of the multivariate tool called discriminant analysis, and how it can be used.</p>
<p>Discriminant analysis is used to classify observations into two or more groups if you have a sample with known groups. Essentially, it's a way to handle a classification problem, where two or more groups, clusters, populations are known up front, and one or more new observations are placed into one of these known classifications based on the measured characteristics. Discriminant analysis can also used to investigate how variables contribute to group separation.</p>
<p>An area where this is especially useful is species classification. We'll use that as an example to explore how this all works. If you want to follow along and you don't already have Minitab, you can get it <a href="http://www.minitab.com/products/minitab/free-trial/">free for 30 days</a>. </p>
Discriminant Analysis in Action
<img alt="Arctic wolf" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/43484b551c0cc2eacb1b848678d666be/wolf.jpg" style="line-height: 20.8px; margin: 10px 15px; float: right; width: 241px; height: 300px;" />
<div>
<p>I have a <a href="//cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/9429cbd678e906f6bbbda0793aa859f6/discrimdata.mtw">data set</a> with variables containing data on both Rocky Mountain and Arctic wolves. We already know which species each observation belongs to; the main goal of this analysis is find out how the data we have contribute to the groupings, and then to use this information to help us classify new individuals. </p>
<p>In Minitab, we set up our worksheet to be column-based like usual. We have a column denoting the species of wolf, as well as 9 other columns containing measurements for each individual on a number of different features.</p>
<p>Once we have our continuous predictors and a group identifier column in our worksheet, we can go to <strong>Stat > Multivariate > Discriminant Analysis</strong>. Here's how we'd fill out the dialog:</p>
<p style="margin-left: 40px;"><img alt="dialog" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/732ead34-1005-4470-b034-d7f8b87fabcf/Image/bbfff731ce2f30923c064a73324dba1e/discrimdia.png" style="width: 448px; height: 336px;" /></p>
<p>'Groups' is where you would enter the column that contains the data on which group the observation falls into. In this case, "Location" is the species ID column. Our predictors, in my case X1-X9, represent the measurements of the individual wolves for each of 9 categories; we'll use these to determine which characteristics determine the groupings.</p>
<p>Some notes before we click OK. First, we're using a Linear discriminant function for simplicity. This makes the assumption that the covariance matrices are equal for all groups. This is something we can verify using Bartlett's Test (also available in Minitab). Once we have our dialog filled out, we can click OK and see our results.</p>
Using the Linear Discriminant Function to Classify New Observations
<p>One of the most important parts of the output we get is called the Linear Discriminant Function. In our example, it looks like this:</p>
<p style="margin-left: 40px;"><img alt="function" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/732ead34-1005-4470-b034-d7f8b87fabcf/Image/a3f3b5199c25010c69d3b19843c31b0e/function.PNG" style="width: 303px; height: 208px;" /></p>
<p>This is the function we will use to classify new observations into groups. Using this function, we can use these coefficients to determine which group provides the best fit for a new individual's measurements. Minitab can do this in the "Options" subdialog. For example, let's say we had an observation with a certain vector of measurements (X1,...,X9). If we do that, we get output like this:</p>
<p style="margin-left: 40px;"><img alt="pred" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/732ead34-1005-4470-b034-d7f8b87fabcf/Image/49873dcbc94d8aa1ae75a45474aaf147/predic.PNG" style="width: 421px; height: 119px;" /></p>
<p>This will give us the probability that a particular new observation falls into either of our groups. In our case, it was an easy one. The probability that is belongs to the AR species was 1. We're reasonably sure, based on the data, that this is the case. In some cases, you may get probabilities much closer to each other, meaning it isn't as clear cut.</p>
<p>I hope this gives you some idea of the usefulness of discriminant analysis, and how you can use it in Minitab to make decisions.</p>
</div>
Data AnalysisHypothesis TestingStatisticsMon, 16 May 2016 12:00:00 +0000http://blog.minitab.com/blog/starting-out-with-statistical-software/an-overview-of-discriminant-analysisEric HeckmanTests of 2 Standard Deviations? Side Effects May Include Paradoxical Dissociations
http://blog.minitab.com/blog/data-analysis-and-quality-improvement-and-stuff/tests-of-2-standard-deviations-side-effects-may-include-paradoxical-dissociations
<p>Once upon a time, when people wanted to compare the standard deviations of two samples, they had two handy tests available, the F-test and Levene's test.</p>
<p>Statistical lore has it that the F-test is so named because <a href="##footnote">it so frequently fails you.1</a> Although the F-test is suitable for data that are normally distributed, its sensitivity to departures from <span><a href="http://blog.minitab.com/blog/the-statistical-mentor/anderson-darling-ryan-joiner-or-kolmogorov-smirnov-which-normality-test-is-the-best">normality</a></span> limits when and where it can be used.</p>
<p><a name="#back"></a>Levene’s test was developed as an antidote to the F-test's extreme sensitivity to nonnormality. However, Levene's test<span style="line-height: 1.6;"> is sometimes accompanied by a troubling side effect: paradoxical </span>dissociations<span style="line-height: 1.6;">. To see what I mean, take a look at these results from an </span><span style="line-height: 1.6;">actual </span><span style="line-height: 1.6;">test of 2 standard deviations that I actually ran in Minitab 16 using actual data that I actually made up:</span></p>
<p style="margin-left: 40px;"><img alt="Ratio of the standard deviations in Release 16" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/313db9f57725eeb074002df423c4415e/16_ratio.jpg" style="width: 286px; height: 99px;" /></p>
<p>Nothing surprising so far. The ratio of the standard deviations from samples 1 and 2 (s1/s2) is <span style="line-height: 20.8px;">1.414 / 1.575 = 0.898. This ratio is </span>our best "point estimate" for the ratio of the standard deviations from populations 1 and 2 (Ps1/Ps2).</p>
<p>Note that the ratio is less than 1, which suggests that Ps2 is greater than Ps1. </p>
<p>Now, let's have a look at the confidence interval (CI) for the population ratio. The CI gives us a range of likely values for the ratio of Ps1/Ps2. The CI <span style="line-height: 20.8px;">below</span><span style="line-height: 1.6;"> labeled "Continuous" is the one calculated using Levene's method:</span></p>
<p style="margin-left: 40px;"><img alt="Confidence interval for the ratio in Release 16" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/aee886880d52d5aed7150abd242b5d61/16_ci.jpg" style="width: 338px; height: 114px;" /></p>
<p><span style="line-height: 1.6;">What in Gauss' name is going on here?!? The range of likely values for Ps1/Ps2—1.046 to 1.566—doesn't include the point estimate of 0.898?!? In fact, the CI suggests that Ps1/Ps2 is </span><em style="line-height: 1.6;">greater </em><span style="line-height: 1.6;">than 1. Which suggests that Ps1 is actually </span><em style="line-height: 20.8px;">greater </em><span style="line-height: 1.6;">than Ps2. </span></p>
<p><span style="line-height: 1.6;">But the point estimate suggests the exact opposite! Which suggests that </span><span style="line-height: 20.8px;">something odd is going on here. Or that</span><span style="line-height: 1.6;"> I might be losing my mind (which wouldn't be that odd). Or both.</span></p>
<p>As it turns out, the very elements that make Levene's test robust to departures from normality also leave the test susceptible to paradoxical dissociations like this one. You see, Levene's test isn't <em>actually </em>based on the standard deviation. Instead, the test is based on a statistic called the <em>mean absolute deviation from the median</em>, or MADM. The MADM is much less affected by nonnormality and outliers than is the standard deviation. And even though the MADM and the <span style="line-height: 20.8px;">standard deviation of a sample </span>can be very different, the <em>ratio </em>of MADM1/MADM2 is nevertheless a good approximation for the <em>ratio </em>of Ps1/Ps2. </p>
<p><span style="line-height: 1.6;">However, in extreme cases, outliers can affect the sample standard deviations so much that s1/s2 can fall completely outside of Levene's CI. And that's when you're left with an awkward and confusing case of paradoxical dissociation. </span></p>
<p><span style="line-height: 1.6;">Fortunately (and this may be the first and last time that you'll ever hear this next phrase), our </span><span style="line-height: 1.6;">statisticians have made things a lot less awkward. </span><span style="line-height: 1.6;">One of the brave folks in Minitab's R&D department toiled against all odds, and at considerable personal peril to solve this enigma. The result, which has been incorporated into Minitab 17, is an effective, elegant, and </span>non-enigmatic<span style="line-height: 1.6;"> test that we call Bonett's test. </span></p>
<p style="margin-left: 40px;"><span style="line-height: 1.6;"><img alt="Confidence interval in Release 17" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/3c014cdea970a3f1f6a540119ef3b533/bonnet_results.jpg" style="width: 310px; height: 170px;" /></span></p>
<p>Like Levene's test, Bonett's test can be used with nonnormal data. But <em>unlike </em>Levene's test, Bonett's test is actually based on the actual standard deviations of the actual samples. Which means that Bonett's test is not subject to the same awkward and confusing paradoxical dissociations that can accompany Levene's test. And I don't know about you, but I try to avoid paradoxical dissociations whenever I can. (Especially as I get older, ... I just don't bounce back the way I used to.) </p>
<p><span style="line-height: 20.8px;">When you compare two standard deviations in Minitab 17, you get a handy graphical report </span><span style="line-height: 20.8px;">that quickly and clearly summarizes the results of your test, including the point estimate and the CI from Bonett's test. Which means n</span><span style="line-height: 20.8px;">o more awkward and confusing paradoxical dissociations. </span></p>
<p style="margin-left: 40px;"><img alt="Summary plot in Release 17" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/b785749b3292df1aa6d32abe4e430b63/17_summary_plot.jpg" style="width: 578px; height: 386px;" /></p>
<p><span style="line-height: 1.6;">------------------------------------------------------------</span></p>
<p><a name="#footnote"> </a></p>
<p>1 So, that bit about the name of the F-test—I kind of made that up. Fortunately, there is a better source of information for the genuinely curious. Our white paper, <a href="http://support.minitab.com/en-us/minitab/17/bonetts_method_two_variances.pdf">Bonett's Method</a>, includes all kinds of details about these tests and comparisons between the CIs calculated with each. Enjoy.</p>
<p> <br />
<em><a href="##back">return to text of post</a></em></p>
<p> </p>
<p> </p>
Hypothesis TestingStatisticsStatsWed, 11 May 2016 12:00:00 +0000http://blog.minitab.com/blog/data-analysis-and-quality-improvement-and-stuff/tests-of-2-standard-deviations-side-effects-may-include-paradoxical-dissociationsGreg FoxUnderstanding t-Tests: 1-sample, 2-sample, and Paired t-Tests
http://blog.minitab.com/blog/adventures-in-statistics/understanding-t-tests%3A-1-sample%2C-2-sample%2C-and-paired-t-tests
<p>In statistics, t-tests are a type of hypothesis test that allows you to compare means. They are called t-tests because each t-test boils your sample data down to one number, the t-value. If you understand how t-tests calculate t-values, you’re well on your way to understanding how these tests work.</p>
<p>In this series of posts, I'm focusing on concepts rather than equations to show how t-tests work. However, this post includes two simple equations that I’ll work through using the analogy of a signal-to-noise ratio.</p>
<p><a href="http://www.minitab.com/products/minitab/" target="_blank">Minitab statistical software</a> offers the 1-sample t-test, paired t-test, and the 2-sample t-test. Let's look at how each of these t-tests reduce your sample data down to the t-value.</p>
How 1-Sample t-Tests Calculate t-Values
<p>Understanding this process is crucial to understanding how t-tests work. I'll show you the formula first, and then I’ll explain how it works.</p>
<p style="margin-left: 40px;"><img alt="formula to calculate t for a 1-sample t-test" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/dbbda42fec926eef96a56c22ed462458/formula_1t.png" style="width: 142px; height: 88px;" /></p>
<p>Please notice that the formula is a ratio. A common analogy is that the t-value is the signal-to-noise ratio.</p>
<strong>Signal (a.k.a. the effect size)</strong>
<p>The numerator is the signal. You simply take the sample mean and subtract the null hypothesis value. If your sample mean is 10 and the null hypothesis is 6, the difference, or signal, is 4.</p>
<p>If there is no difference between the sample mean and null value, the signal in the numerator, as well as the value of the entire ratio, equals zero. For instance, if your sample mean is 6 and the null value is 6, the difference is zero.</p>
<p>As the difference between the sample mean and the null hypothesis mean increases in either the positive or negative direction, the strength of the signal increases.</p>
<div style="float: right; width: 325px; margin: 15px 0px 15px 15px;"><img alt="Photo of a packed stadium to illustrate high background noise" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/695f063e8d38c2bc9c5fa61637ef6327/crowd.jpg" style="width: 325px; height: 244px; margin-bottom:5px;" /><br />
<em>Lots of noise can overwhelm the signal.</em></div>
<strong>Noise</strong>
<p>The denominator is the noise. The equation in the denominator is a measure of variability known as the <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/hypothesis-tests/tests-of-means/what-is-the-standard-error-of-the-mean/" target="_blank">standard error of the mean</a>. This statistic indicates how accurately your sample estimates the mean of the population. A larger number indicates that your sample estimate is less precise because it has more random error.</p>
<p>This random error is the “noise.” When there is more noise, you expect to see larger differences between the sample mean and the null hypothesis value <em>even when the null hypothesis is true</em>. We include the noise factor in the denominator because we must determine whether the signal is large enough to stand out from it.</p>
<strong>Signal-to-Noise ratio</strong>
<p>Both the signal and noise values are in the units of your data. If your signal is 6 and the noise is 2, your t-value is 3. This t-value indicates that the difference is 3 times the size of the standard error. However, if there is a difference of the same size but your data have more variability (6), your t-value is only 1. The signal is at the same scale as the noise.</p>
<p>In this manner, t-values allow you to see how distinguishable your signal is from the noise. Relatively large signals and low levels of noise produce larger t-values. If the signal does not stand out from the noise, it’s likely that the observed difference between the sample estimate and the null hypothesis value is due to random error in the sample rather than a true difference at the population level.</p>
A Paired t-test Is Just A 1-Sample t-Test
<p>Many people are confused about when to use a paired t-test and how it works. I’ll let you in on a little secret. The paired t-test and the 1-sample t-test are actually the same test in disguise! As we saw above, a 1-sample t-test compares one sample mean to a null hypothesis value. A paired t-test simply calculates the difference between paired observations (e.g., before and after) and then performs a 1-sample t-test on the differences.</p>
<p>You can test this with <a href="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/File/946c3f4725847e714e7fcc9664ae67b2/paired_t_test.mtw">this data set</a> to see how all of the results are identical, including the mean difference, t-value, p-value, and confidence interval of the difference.</p>
<p style="margin-left: 40px;"><img alt="Minitab worksheet with paired t-test example" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/02fbcdbbf62fec3823123fbcc818b11f/paired_t_worksheet.png" style="width: 229px; height: 223px;" /><img alt="paired t-test output" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/170d6d4fa1fbbb1bf4f5aa56b1783b5f/paired_t_swo.png" style="width: 518px; height: 196px;" /></p>
<p style="margin-left: 40px;"><img alt="1-sample t-test output" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/08d652fb45599fc1ac247181a935c471/1t_difc_swo.png" style="width: 504px; height: 115px;" /></p>
<p>Understanding that the paired t-test simply performs a 1-sample t-test on the paired differences can really help you understand how the paired t-test works and when to use it. You just need to figure out whether it makes sense to calculate the difference between each pair of observations.</p>
<p>For example, let’s assume that “before” and “after” represent test scores, and there was an intervention in between them. If the before and after scores in each row of the example worksheet represent the same subject, it makes sense to calculate the difference between the scores in this fashion—the paired t-test is appropriate. However, if the scores in each row are for different subjects, it doesn’t make sense to calculate the difference. In this case, you’d need to use another test, such as the 2-sample t-test, which I discuss below.</p>
<p>Using the paired t-test simply saves you the step of having to calculate the differences before performing the t-test. You just need to be sure that the paired differences make sense!</p>
<p>When it is appropriate to use a paired t-test, it can be more powerful than a 2-sample t-test. For more information, go to <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/hypothesis-tests/tests-of-means/why-use-paired-t/" target="_blank">Why should I use a paired t-test?</a></p>
How Two-Sample T-tests Calculate T-Values
<p>The 2-sample t-test takes your sample data from two groups and boils it down to the t-value. The process is very similar to the 1-sample t-test, and you can still use the analogy of the signal-to-noise ratio. Unlike the paired t-test, the 2-sample t-test requires independent groups for each sample.</p>
<p>The formula is below, and then some discussion.</p>
<p style="margin-left: 40px;"><img alt="formula to cacalculate t for a 2-sample t-test" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/276994cf179b4997ce6097d1f4462363/formula_2t.png" style="width: 102px; height: 54px;" /></p>
<p>For the 2-sample t-test, the numerator is again the signal, which is the difference between the means of the two samples. For example, if the mean of group 1 is 10, and the mean of group 2 is 4, the difference is 6.</p>
<p>The default null hypothesis for a 2-sample t-test is that the two groups are equal. You can see in the equation that when the two groups are equal, the difference (and the entire ratio) also equals zero. As the difference between the two groups grows in either a positive or negative direction, the signal becomes stronger.</p>
<p>In a 2-sample t-test, the denominator is still the noise, but Minitab can use two different values. You can either assume that the variability in both groups is equal or not equal, and Minitab uses the corresponding estimate of the variability. Either way, the principle remains the same: you are comparing your signal to the noise to see how much the signal stands out.</p>
<p>Just like with the 1-sample t-test, for any given difference in the numerator, as you increase the noise value in the denominator, the t-value becomes smaller. To determine that the groups are different, you need a t-value that is large.</p>
What Do t-Values Mean?
<p>Each type of t-test uses a procedure to boil all of your sample data down to one value, the t-value. The calculations compare your sample mean(s) to the null hypothesis and incorporates both the sample size and the variability in the data. A t-value of 0 indicates that the sample results exactly equal the null hypothesis. In statistics, we call the difference between the sample estimate and the null hypothesis the effect size. As this difference increases, the absolute value of the t-value increases.</p>
<p>That’s all nice, but what does a t-value of, say, 2 really mean? From the discussion above, we know that a t-value of 2 indicates that the observed difference is twice the size of the variability in your data. However, we use t-tests to evaluate hypotheses rather than just figuring out the signal-to-noise ratio. We want to determine whether the effect size is statistically significant.</p>
<p>To see how we get from t-values to assessing hypotheses and determining statistical significance, read the other post in this series, <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-t-tests-t-values-and-t-distributions">Understanding t-Tests: t-values and t-distributions</a>.</p>
Data AnalysisHypothesis TestingLearningStatistics HelpWed, 04 May 2016 12:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/understanding-t-tests%3A-1-sample%2C-2-sample%2C-and-paired-t-testsJim FrostUnderstanding t-Tests: t-values and t-distributions
http://blog.minitab.com/blog/adventures-in-statistics/understanding-t-tests-t-values-and-t-distributions
<p>T-tests are handy <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-why-we-need-to-use-hypothesis-tests-in-statistics" target="_blank">hypothesis tests</a> in statistics when you want to compare means. You can compare a sample mean to a hypothesized or target value using a one-sample t-test. You can compare the means of two groups with a two-sample t-test. If you have two groups with paired observations (e.g., before and after measurements), use the paired t-test.</p>
<img alt="Output that shows a t-value" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/efd51d69e3947d70197143b735e0c51d/t_value_swo.png" style="line-height: 20.8px; float: right; width: 400px; height: 57px; margin: 10px 15px; border-width: 1px; border-style: solid;" />
<p>How do t-tests work? How do t-values fit in? In this series of posts, I’ll answer these questions by focusing on concepts and graphs rather than equations and numbers. After all, a key reason to use <a href="http://www.minitab.com/products/minitab">statistical software like </a><a href="http://www.minitab.com/en-us/products/minitab/" target="_blank">Minitab</a> is so you don’t get bogged down in the calculations and can instead focus on understanding your results.</p>
<p>In this post, I will explain t-values, t-distributions, and how t-tests use them to calculate probabilities and assess hypotheses.</p>
What Are t-Values?
<p>T-tests are called t-tests because the test results are all based on t-values. T-values are an example of what statisticians call test statistics. A test statistic is a standardized value that is calculated from sample data during a hypothesis test. The procedure that calculates the test statistic compares your data to what is expected under the <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/hypothesis-tests/basics/null-and-alternative-hypotheses/" target="_blank">null hypothesis</a>.</p>
<p>Each type of t-test uses a specific procedure to boil all of your sample data down to one value, the t-value. The calculations behind t-values compare your sample mean(s) to the null hypothesis and incorporates both the sample size and the variability in the data. A t-value of 0 indicates that the sample results exactly equal the null hypothesis. As the difference between the sample data and the null hypothesis increases, the absolute value of the t-value increases.</p>
<p>Assume that we perform a t-test and it calculates a t-value of 2 for our sample data. What does that even mean? I might as well have told you that our data equal 2 fizbins! We don’t know if that’s common or rare when the null hypothesis is true.</p>
<p>By itself, a t-value of 2 doesn’t really tell us anything. T-values are not in the units of the original data, or anything else we’d be familiar with. We need a larger context in which we can place individual t-values before we can interpret them. This is where t-distributions come in.</p>
What Are t-Distributions?
<p>When you perform a t-test for a single study, you obtain a single t-value. However, if we drew multiple random samples of the same size from the same population and performed the same t-test, we would obtain many t-values and we could plot a distribution of all of them. This type of distribution is known as a <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/introductory-concepts/basic-concepts/sampling-distribution/" target="_blank">sampling distribution</a>.</p>
<p>Fortunately, the properties of t-distributions are well understood in statistics, so we can plot them without having to collect many samples! A specific t-distribution is defined by its <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/introductory-concepts/basic-concepts/df/" target="_blank">degrees of freedom (DF)</a>, a value closely related to sample size. Therefore, different t-distributions exist for every sample size. <span style="line-height: 20.8px;">You can graph t-distributions u</span><span style="line-height: 1.6;">sing Minitab’s </span><a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/graphs/graphs-of-distributions/probability-distribution-plots/probability-distribution-plot/" style="line-height: 1.6;" target="_blank">probability distribution plots</a><span style="line-height: 1.6;">.</span></p>
<p>T-distributions assume that you draw repeated random samples from a population where the null hypothesis is true. You place the t-value from your study in the t-distribution to determine how consistent your results are with the null hypothesis.</p>
<p style="margin-left: 40px;"><img alt="Plot of t-distribution" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/d628e56f0380e0edcf575502a670ed31/t_dist_20_df.png" style="width: 576px; height: 384px;" /></p>
<p>The graph above shows a t-distribution that has 20 degrees of freedom, which corresponds to a sample size of 21 in a one-sample t-test. It is a symmetric, bell-shaped distribution that is similar to the normal distribution, but with thicker tails. This graph plots the probability density function (PDF), which describes the likelihood of each t-value.</p>
<p>The peak of the graph is right at zero, which indicates that obtaining a sample value close to the null hypothesis is the most likely. That makes sense because t-distributions assume that the null hypothesis is true. T-values become less likely as you get further away from zero in either direction. In other words, when the null hypothesis is true, you are less likely to obtain a sample that is very different from the null hypothesis.</p>
<p>Our t-value of 2 indicates a positive difference between our sample data and the null hypothesis. The graph shows that there is a reasonable probability of obtaining a t-value from -2 to +2 when the null hypothesis is true. Our t-value of 2 is an unusual value, but we don’t know exactly <em>how </em>unusual. Our ultimate goal is to determine whether our t-value is unusual enough to warrant rejecting the null hypothesis. To do that, we'll need to calculate the probability.</p>
Using t-Values and t-Distributions to Calculate Probabilities
<p>The foundation behind any hypothesis test is being able to take the test statistic from a specific sample and place it within the context of a known probability distribution. For t-tests, if you take a t-value and place it in the context of the correct t-distribution, you can calculate the probabilities associated with that t-value.</p>
<p>A probability allows us to determine how common or rare our t-value is under the assumption that the null hypothesis is true. If the probability is low enough, we can conclude that the effect observed in our sample is inconsistent with the null hypothesis. The evidence in the sample data is strong enough to reject the null hypothesis for the entire population.</p>
<p>Before we calculate the probability associated with our t-value of 2, there are two important details to address.</p>
<p>First, we’ll actually use the t-values of +2 and -2 because we’ll perform a two-tailed test. A two-tailed test is one that can test for differences in both directions. For example, a two-tailed 2-sample t-test can determine whether the difference between group 1 and group 2 is statistically significant in either the positive or negative direction. A one-tailed test can only assess one of those directions.</p>
<p>Second, we can only calculate a non-zero probability for a range of t-values. As you’ll see in the graph below, a range of t-values corresponds to a proportion of the total area under the distribution curve, which is the probability. The probability for any specific point value is zero because it does not produce an area under the curve.</p>
<p>With these points in mind, we’ll shade the area of the curve that has t-values greater than 2 and t-values less than -2.</p>
<p><img alt="T-distribution with a shaded area that represents a probability" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/5e124a2c8139681afec706799ebabcec/t_dist_prob.png" style="width: 576px; height: 384px;" /></p>
<p>The graph displays the probability for observing a difference from the null hypothesis that is at least as extreme as the difference present in our sample data while assuming that the null hypothesis is actually true. Each of the shaded regions has a probability of 0.02963, which sums to a total probability of 0.05926. When the null hypothesis is true, the t-value falls within these regions nearly 6% of the time.</p>
<p>This probability has a name that you might have heard of—it’s called the p-value! While the probability of our t-value falling within these regions is fairly low, it’s not low enough to reject the null hypothesis using the common <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-significance-levels-alpha-and-p-values-in-statistics" target="_blank">significance level</a> of 0.05.</p>
<p><a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-correctly-interpret-p-values" target="_blank">Learn how to correctly interpret the p-value.</a></p>
t-Distributions and Sample Size
<p>As mentioned above, t-distributions are defined by the DF, which are closely associated with sample size. As the DF increases, the probability density in the tails decreases and the distribution becomes more tightly clustered around the central value. The graph below depicts t-distributions with 5 and 30 degrees of freedom.</p>
<p><img alt="Comparison of t-distributions with different degrees of freedom" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/5220dc6347611a230e89b70de904b034/t_dist_comp_df.png" style="width: 576px; height: 384px;" /></p>
<p>The t-distribution with fewer degrees of freedom has thicker tails. This occurs because the t-distribution is designed to reflect the added uncertainty associated with analyzing small samples. In other words, if you have a small sample, the probability that the sample statistic will be further away from the null hypothesis is greater even when the null hypothesis is true.</p>
<p>Small samples are more likely to be unusual. This affects the probability associated with any given t-value. For 5 and 30 degrees of freedom, a t-value of 2 in a two-tailed test has p-values of 10.2% and 5.4%, respectively. Large samples are better!</p>
<p>I’ve showed how t-values and t-distributions work together to produce probabilities. To see how each type of t-test works and actually calculates the t-values, read the other post in this series, <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-t-tests:-1-sample,-2-sample,-and-paired-t-tests">Understanding t-Tests: 1-sample, 2-sample, and Paired t-Tests</a>.</p>
<p>If you'd like to learn how the ANOVA F-test works, read my post, <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-analysis-of-variance-anova-and-the-f-test">Understanding Analysis of Variance (ANOVA) and the F-test</a>.</p>
Data AnalysisHypothesis TestingLearningStatistics HelpWed, 20 Apr 2016 12:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/understanding-t-tests-t-values-and-t-distributionsJim FrostBest Way to Analyze Likert Item Data: Two Sample T-Test versus Mann-Whitney
http://blog.minitab.com/blog/adventures-in-statistics/best-way-to-analyze-likert-item-data%3A-two-sample-t-test-versus-mann-whitney
<p><img alt="Worksheet that shows Likert data" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/6b1cf78b969699ed58febb026d32051d/likert_worksheet.png" style="float: right; width: 162px; height: 265px; margin: 10px 15px;" />Five-point Likert scales are commonly associated with surveys and are used in a wide variety of settings. You’ve run into the Likert scale if you’ve ever been asked whether you strongly agree, agree, neither agree or disagree, disagree, or strongly disagree about something. The worksheet to the right shows what five-point Likert data look like when you have two groups.</p>
<p>Because Likert item data are discrete, ordinal, and have a limited range, there’s been a longstanding dispute about the most valid way to analyze Likert data. The basic choice is between <a href="http://blog.minitab.com/blog/adventures-in-statistics/choosing-between-a-nonparametric-test-and-a-parametric-test" target="_blank">a parametric test and a nonparametric test</a>. The pros and cons for each type of test are generally described as the following:</p>
<ul>
<li>Parametric tests, such as the 2-sample t-test, assume a normal, continuous distribution. However, with a sufficient sample size, t-tests are robust to departures from normality.</li>
<li>Nonparametric tests, such as the Mann-Whitney test, do not assume a normal or a continuous distribution. However, there are concerns about a lower ability to detect a difference when one truly exists.</li>
</ul>
<p>What’s the better choice? This is a real-world decision that users of <a href="http://www.minitab.com/en-us/products/minitab/" target="_blank">statistical software</a> have to make when they want to analyze Likert data.</p>
<p>Over the years, a number of studies that have tried to answer this question. However, they’ve tended to look at a limited number of potential distributions for the Likert data, which causes the generalizability of the results to suffer. Thanks to increases in computing power, simulation studies can now thoroughly assess a wide range of distributions.</p>
<p>In this blog post, I highlight a simulation study conducted by de Winter and Dodou* that compares the capabilities of the two sample t-test and the Mann-Whitney test to analyze five-point Likert items for two groups. Is it better to use one analysis or the other?</p>
<p>The researchers identified a diverse set of 14 distributions that are representative of actual Likert data. The computer program drew independent pairs of samples to test all possible combinations of the 14 distributions. All in all, 10,000 random samples were generated for each of the 98 distribution combinations! The pairs of samples are analyzed using both the two sample t-test and the Mann-Whitney test to compare how well each test performs. The study also assessed different sample sizes.</p>
<p>The results show that for all pairs of distributions the <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/hypothesis-tests/basics/type-i-and-type-ii-error/" target="_blank">Type I (false positive) error rates</a> are very close to the target amounts. In other words, if you use either analysis and your results are statistically significant, you don’t need to be overly concerned about a false positive.</p>
<p>The results also show that for most pairs of distributions, the difference between the <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/power-and-sample-size/what-is-power/" target="_blank">statistical power</a> of the two tests is trivial. In other words, if a difference truly exists at the population level, either analysis is equally likely to detect it. The concerns about the Mann-Whitney test having less power in this context appear to be unfounded.</p>
<p>I do have one caveat. There are a few pairs of specific distributions where there is a power difference between the two tests. If you perform both tests on the same data and they disagree (one is significant and the other is not), you can look at a table in the article to help you determine whether a difference in statistical power might be an issue. This power difference affects only a small minority of the cases.</p>
<p>Generally speaking, the choice between the two analyses is tie. If you need to compare two groups of five-point Likert data, it usually doesn’t matter which analysis you use. Both tests almost always provide the same protection against false negatives and always provide the same protection against false positives. These patterns hold true for sample sizes of 10, 30, and 200 per group.</p>
<p>*de Winter, J.C.F. and D. Dodou (2010), Five-Point Likert Items: t test versus Mann-Whitney-Wilcoxon, <em>Practical Assessment, Research and Evaluation</em>, 15(11).</p>
Data AnalysisHypothesis TestingStatisticsStatistics HelpWed, 06 Apr 2016 12:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/best-way-to-analyze-likert-item-data%3A-two-sample-t-test-versus-mann-whitneyJim FrostThe American Statistical Association's Statement on the Use of P Values
http://blog.minitab.com/blog/adventures-in-statistics/the-american-statistical-associations-statement-on-the-use-of-p-values
<p>P values have been around for nearly a century and they’ve been the subject of criticism since their origins. In recent years, the debate over P values has risen to a fever pitch. In particular, there are serious fears that P values are misused to such an extent that it has actually damaged science.</p>
<p>In March 2016, spurred on by the growing concerns, the American Statistical Association (ASA) did something that it has never done before and took an official position on a statistical practice—how to use P values. The ASA tapped a group of 20 experts who discussed this over the course of many months. Despite facing complex issues and many heated disagreements, this group managed to reach a consensus on specific points and produce the <a href="http://amstat.tandfonline.com/doi/pdf/10.1080/00031305.2016.1154108" target="_blank">ASA Statement on Statistical Significance and P-values</a>.</p>
<p>I’ve written previously about my concerns over how P values have been misused and misinterpreted. My opinion is that P values are powerful tools but they need to be used and interpreted correctly. P value calculations incorporate the effect size, sample size, and variability of the data into a single number that objectively tells you how consistent your data are with the null hypothesis. You can read my case for the power of P values in my <a href="http://blog.minitab.com/blog/adventures-in-statistics/banned-p-values-and-confidence-intervals-a-rebuttal-part-1" target="_blank">rebuttal to a journal that banned them</a>.</p>
<p><span style="line-height: 1.6;">The ASA statement contains the following six principles on how to use P values, which</span><span style="line-height: 20.8px;"> </span><span style="line-height: 20.8px;">are remarkably aligned with my own. </span><span style="line-height: 20.8px;">Let’s take a look at what they came up with.</span></p>
<ol>
<li>P-values can indicate how incompatible the data are with a specified statistical model.</li>
<li>P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.</li>
</ol>
<p>I discuss these ideas in my post <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-correctly-interpret-p-values" target="_blank">How to Correctly Interpret P Values</a>. It turns out that the common misconception stated in principle #2 creates the illusion of substantially more evidence against the null hypothesis than is justified. There are a number of reasons <a href="http://blog.minitab.com/blog/adventures-in-statistics/why-are-p-value-misunderstandings-so-common" target="_blank">why this type of P value misunderstanding is so common</a>. In reality, a P value is a probability about your sample data and not about the truth of a hypothesis.</p>
<ol>
<li value="3">Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.</li>
</ol>
<p>In statistics, we’re working with samples to describe a complex reality. Attempting to discover the truth based on an oversimplified process of comparing a single P value to an arbitrary significance level is destined to have problems. False positives, false negatives, and otherwise fluky results are bound to happen.</p>
<p>Using P values in conjunction with a significance level to decide when to reject the null hypothesis increases your chance of making the correct decision. However, there is no magical threshold that distinguishes between the studies that have a true effect and those that don’t with 100% accuracy. You can see a graphical representation of why this is the case in my post <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-why-we-need-to-use-hypothesis-tests-in-statistics" target="_blank">Why We Need to Use Hypothesis Tests</a>.</p>
<p>When Sir Ronald Fisher introduced P values, he never intended for them to be the deciding factor in such a rigid process. Instead, Fisher considered them to be just one part of a process that incorporates scientific reasoning, experimentation, statistical analysis and replication to lead to scientific conclusions.</p>
<p>According to Fisher, “A scientific fact should be regarded as experimentally established only if a properly designed experiment rarely fails to give this level of significance.”</p>
<p>In other words, don’t expect a <em>single</em> study to provide a definitive answer. No single P value can divine the truth about reality by itself.</p>
<ol>
<li value="4">Proper inference requires full reporting and transparency.</li>
</ol>
<p>If you don’t know the full context of a study, you can’t properly interpret a carefully selected subset of the results. Data dredging, cherry picking, significance chasing, data manipulation, and other forms of p-hacking can make it impossible to draw the proper conclusions from selectively reported findings. You must know the full details about all data collection choices, how many and which analyses were performed, and all P values.</p>
<p><img alt="Comic about jelly beans causing acne with selective reporting of the results" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/22099bc252d3630a4876f579c1b83778/jelly_bean_comic.png" style="line-height: 20.8px; width: 500px; height: 1387px; margin: 10px 15px;" /></p>
<div><span style="line-height: 1.6;">In the </span><a href="http://xkcd.com/882/" style="line-height: 1.6;" target="_blank">XKCD comic</a><span style="line-height: 1.6;"> about jelly beans, if you didn’t know about the post hoc decision to subdivide the data and the 20 insignificant test results, you’d be pretty convinced that green jelly beans cause acne!</span></div>
<ol>
<li value="5">A p-value, or statistical significance, does not measure the size of an effect or the importance of an effect.</li>
<li value="6">By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.</li>
</ol>
<p>I cover these ideas, and more, in my <a href="http://blog.minitab.com/blog/adventures-in-statistics/five-guidelines-for-using-p-values">Five Guidelines for Using P Values</a>. P-values don’t tell you the size or importance of the effect. An effect can be statistically significant but trivial in the real world. This is the difference between <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/introductory-concepts/p-value-and-significance-level/practical-significance/" target="_blank">statistical significance and practical significance</a>. The analyst should supplement P values with other statistics, such as effect sizes and confidence intervals, to convey the importance of the effect.</p>
<p>Researchers need to apply their scientific judgment about the plausibility of the hypotheses, results of similar studies, proposed mechanisms, proper experimental design, and so on. Expert knowledge transforms statistics from numbers into meaningful, trustworthy findings.</p>
Data AnalysisHypothesis TestingLearningStatisticsStatistics in the NewsWed, 23 Mar 2016 12:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/the-american-statistical-associations-statement-on-the-use-of-p-valuesJim FrostDo Actors Wait Longer than Actresses for Oscars? A Comparison Between Academy Award Winners
http://blog.minitab.com/blog/statistics-and-more/do-actors-wait-longer-than-actresses-for-oscars-a-comparison-between-academy-award-winners
<p><span style="line-height: 1.6;">I am a bit of an Oscar fanatic. Every year after the ceremony, I religiously go online to find out who won the awards and listen to their acceptance speeches. This year, I was <em>so </em>chuffed to learn that Leonardo Di Caprio won his first Oscar for his performance in <em>The Revenant</em> in the 88</span>th<span style="line-height: 1.6;"> Academy Awards—after five nominations in previous ceremonies. As a longtime Di Caprio fan, I still remember going to the cinema when <em>Titanic </em>was released, and returning four more times. Every time, I could not hold back any tears and used up all tissues I'd brought with me!<img alt="this year's winner..." src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a51cc79cd412237ef2f241d69e7e83ec/dicaprio.png" style="margin: 10px 15px; float: right; width: 190px; height: 250px;" /></span></p>
<p>Compared to his <em>Titanic </em>costar Kate Winslet, who won the Best Actress award in 2009 (aged 33), Leonardo waited 7 more years (20 years since his first nomination) before his turn came. I can name several actresses—Gwyneth Paltrow, Hilary Swank, and Jennifer Lawrence come immediately to mind—who obtained the award at younger ages. However, it appears that few young actors have received the Academy Award in recent years. This makes me wonder whether Oscar-winning actors tend to be older than Oscar-winning actresses.</p>
<p>To investigate, I collected data of the dates of past Academy Awards ceremonies and the birthdays of the winning actors and actresses. From these, I calculated the age of the winners on their Oscar-winning night. Below is a screenshot of some of the data.</p>
<p><img alt="oscars data" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ef494be723f2d7d3bb55d8f055124ad1/oscar1.png" style="width: 564px; height: 390px;" /></p>
<p>I used <a href="http://www.minitab.com/products/minitab">Minitab Statistical Software</a> to create a time series plot of the data, shown below.</p>
<p><img alt="time series plot" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/2d5d4f84cb67fbe6fd41aee118f40c6a/oscar2.png" style="width: 550px; height: 367px;" /></p>
<p><span style="line-height: 1.6;">The plot suggests that there is usually a substantial age difference between the Best Actor and Best Actress winners. There are more years when the Best Actor winner is much older than the best actress winner (blue dots above red dots) than years where the winning actress is older. Some examples:</span></p>
<p style="margin-left: 40px;">1992: Anthony Hopkins (54.2466), Jodie Foster (29.3616)</p>
<p style="margin-left: 40px;">1987: Paul Newman (62.1726), Marlee Matlin (21.5973)</p>
<p style="margin-left: 40px;">1989: Dustin Hoffman (51.6329), Jodie Foster (26.3507)</p>
<p style="margin-left: 40px;">1990: Daniel Day-Lewis (32.9068), Jessica Tandy (80.8000)</p>
<p style="margin-left: 40px;">1998: Jack Nicholson (60.9178), Helen Hunt (34.7699)</p>
<p style="margin-left: 40px;">2011: Colin Firth (50.4658), Natalie Portman (29.7205)</p>
<p style="margin-left: 40px;">2013: Daniel Day-Lewis (55.8247), Jennifer Lawrence (22.5288)</p>
<p><span style="line-height: 1.6;">There are not many occasions when both the Best Actor and Best Actress are in their 30s, 40s, 50s, etc.</span></p>
<p><a href="http://blog.minitab.com/blog/cpammer/planning-a-trip-to-disney-world%3A-using-statistics-to-keep-it-in-the-green">Conditional formatting</a> was introduced with the release of Minitab 17.2 and this is what I am going to use to identify any repeats in the data. </p>
<p style="margin-left: 40px;"><img alt="conditional formatting" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/9c70ddd1b5378dd004dac75a6dafaf31/oscar3.png" style="width: 505px; height: 213px;" /></p>
<p>Minitab applies the following conditional formatting to the data set:</p>
<p style="margin-left: 40px;"><img alt="conditional formatting" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/bfff88d02adb86b588b2ee63fc9d41a4/oscar4.png" style="width: 547px; height: 570px;" /></p>
<p>For the Best Actor award, Daniel Day-Lewis received the award on three occasions, while <span style="line-height: 20.8px;">Marlon Brando, Gary Cooper, Tom Hanks, Dustin Hoffman, Fredric March, Jack Nicholson, </span><span style="line-height: 20.8px;">Sean Penn, and Spencer Tracy each</span><span style="line-height: 1.6;"> won the award twice.</span></p>
<p>For the Best Actress category, Katharine Hepburn won four times. <span style="line-height: 20.8px;">Ingrid Bergman, Bette Davis, Olivia de Havilland, Sally Field, Jane Fonda, Jodie Foster, </span><span style="line-height: 20.8px;">Glenda Jackson, Vivien Leigh, Luise Rainer, Meryl Streep, Hilary Swank, and Elizabeth Taylor each</span><span style="line-height: 1.6;"> received the award twice.</span></p>
<p>Winners below the age of 30 could be regarded as obtaining the award at an early stage of their careers. Using the conditional formatting again, I can quickly identify the actors and actress in the data who are in this group.</p>
<p><img alt="conditional formatting" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a1397cfe27b867b0fae4b1da7271a945/oscar5.png" style="width: 496px; height: 210px;" /></p>
<p><span style="line-height: 1.6;">As shown below, a lot more actresses than actors obtain the award before the age of 30.</span></p>
<p><img alt="conditional formatted data" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/2dd1d117449fee623b47ce7c0062bb7d/oscar5a.png" style="width: 649px; height: 432px;" /></p>
<p><span style="line-height: 1.6;">To get a better comparison, I am going to remove the repeats (with the help of the highlighted cells) for actors and actress who won more than once and only take into account their age at first win. This gives data from 79 Best Actor and 74 Best Actress winners. I am going to use <a href="http://www.minitab.com/products/minitab/assistant/">the Assistant</a> to carry out a comparison using the 2-sample t-test.</span></p>
<p><img alt="Assistant 2-sample t test" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/eab880dbb38bb33bbfb452693513610f/oscar6.png" style="width: 641px; height: 495px;" /></p>
<p><span style="line-height: 1.6;">Apart from generating easy-to-interpret output, the Assistant also has the advantage of carrying out a powerful t-test even with unequal sample sizes using the Welch approach.</span></p>
<p><img alt="Report Card" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ac73bee979d25f49f4221170e9769956/oscar7.png" style="line-height: 1.6; width: 650px; height: 488px;" /></p>
<p><span style="line-height: 1.6;">The Report Card indicates that we have sufficient data and the assumptions of the t-test are fulfilled. However, Minitab also detects some usual data which I will look into further.</span></p>
<p><img alt="2 sample t test diagnostic report" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/384df9dc72ef6868ed050f65d663e1bc/oscar8.png" style="width: 650px; height: 487px;" /></p>
<p><span style="line-height: 1.6;">Using the brush, the following unusual data are identified.</span></p>
<p style="margin-left: 40px;"><strong>Best Actor: </strong><br />
<span style="line-height: 1.6;">John Wayne (62.8658)</span><br />
<span style="line-height: 1.6;">Henry Fonda (76.8685)</span></p>
<p>These winners were considerably older, as the majority of the actor winners are in their 40s and 50s.</p>
<p style="margin-left: 40px;"><strong>Best Actress:</strong><br />
<span style="line-height: 1.6;">Marie Dressler (63.0027)</span><br />
<span style="line-height: 1.6;">Geraldine Page (61.3342)</span><br />
<span style="line-height: 1.6;">Jessica Tandy (80.8000)</span><br />
<span style="line-height: 1.6;">Helen Mirren (61.5863)</span></p>
<p>These winners were considerably older as the majority of the actress winners were in their late 30s and 40s.</p>
<p><img alt="2-sample t test summary report" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/8d4c39828bcad9ffdc0e151771078244/oscar9.png" style="width: 650px; height: 488px;" /></p>
<p><span style="line-height: 1.6;">The Summary Report provides the key output of the t-test. The mean age of Best Actor is 43.746, while the mean age of Best Actress is 35. The p-value value of the test is very small (<0.001). This means that we have enough evidence to suggest that, on average, the Best Actor winner is older than the Best Actress winner.</span></p>
<p><span style="line-height: 1.6;">I will leave it to others to speculate (and perhaps even use data to explore) why this apparent age gap exists. However, whatever their ages, we all enjoy seeing these Oscar winners' amazing performances on the big screen!</span></p>
<p><span style="font-size: 8px; line-height: 1.6;">Photograph of Leonardo DiCaprio by <a href="https://www.flickr.com/photos/phototoday2008/11933209533/" target="_blank">See Li</a>, used under Creative Commons 2.0. </span></p>
Fun StatisticsHypothesis TestingStatistics in the NewsMon, 07 Mar 2016 13:00:00 +0000http://blog.minitab.com/blog/statistics-and-more/do-actors-wait-longer-than-actresses-for-oscars-a-comparison-between-academy-award-winnersEugenie ChungHow to Compare Regression Slopes
http://blog.minitab.com/blog/adventures-in-statistics/how-to-compare-regression-lines-between-different-models
<p>If you perform linear regression analysis, you might need to compare different regression lines to see if their constants and slope coefficients are different. Imagine there is an established relationship between X and Y. Now, suppose you want to determine whether that relationship has changed. Perhaps there is a new context, process, or some other qualitative change, and you want to determine whether that affects the relationship between X and Y.</p>
<p>For example, you might want to assess whether the relationship between the height and weight of football players is significantly different than the same relationship in the general population.</p>
<p>You can graph the regression lines to visually compare the slope coefficients and constants. However, you should also statistically test the differences. <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-why-we-need-to-use-hypothesis-tests-in-statistics" target="_blank">Hypothesis testing</a> helps separate the true differences from the random differences caused by sampling error so you can have more confidence in your findings.</p>
<p>In this blog post, I’ll show you how to compare a relationship between different regression models and determine whether the differences are statistically significant. Fortunately, these tests are easy to do using <a href="http://www.minitab.com/en-us/products/minitab/" target="_blank">Minitab statistical software</a>.</p>
<p>In the example I’ll use throughout this post, there is an input variable and an output variable for a hypothetical process. We want to compare the relationship between these two variables under two different conditions. Here is the <a href="//cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/File/569a0e7d067944f6f9147434794efcd6/comparingregressionmodels.MPJ">Minitab project file</a> with the data.</p>
Comparing Constants in Regression Analysis
<p>When the <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-how-to-interpret-the-constant-y-intercept" target="_blank">constants</a> (or y intercepts) in two different regression equations are different, this indicates that the two regression lines are shifted up or down on the Y axis. In the scatterplot below, you can see that the Output from Condition B is consistently higher than Condition A for any given Input value. We want to determine whether this vertical shift is statistically significant.</p>
<p><img alt="Scatterplot with two regression lines that have different constants." src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/2ed27f4204515bac9d9674c16fa0c0f7/scatter_constant_dift.png" style="width: 576px; height: 384px;" /></p>
<p>To test the difference between the constants, we just need to include a <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/introductory-concepts/data-concepts/cat-quan-variable/" target="_blank">categorical variable</a> that identifies the qualitative attribute of interest in the model. For our example, I have created a variable for the condition (A or B) associated with each observation.</p>
<p>To fit the model in Minitab, I’ll use: <strong>Stat > Regression > Regression > Fit Regression Model</strong>. I’ll include <em>Output</em> as the <a href="http://support.minitab.com/en-us/minitab/17/topic-library/modeling-statistics/regression-and-correlation/regression-models/what-are-response-and-predictor-variables/" target="_blank">response variable</a>, <em>Input</em> as the continuous <a href="http://support.minitab.com/en-us/minitab/17/topic-library/modeling-statistics/regression-and-correlation/regression-models/what-are-response-and-predictor-variables/" target="_blank">predictor</a>, and <em>Condition</em> as the categorical predictor.</p>
<p>In the regression analysis output, we’ll first check the coefficients table.</p>
<p style="margin-left: 40px;"><img alt="Coefficients table that shows that the constants are different" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/23657868f2cf893d216d05d3400ab9e6/coeff_constant_dift.png" style="width: 369px; height: 117px;" /></p>
<p>This table shows us that the relationship between Input and Output is statistically significant because the p-value for Input is 0.000.</p>
<p>The <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-interpret-regression-analysis-results-p-values-and-coefficients" target="_blank">coefficient</a> for Condition is 10 and its <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-interpret-regression-analysis-results-p-values-and-coefficients" target="_blank">p-value</a> is significant (0.000). The coefficient tells us that the vertical distance between the two regression lines in the scatterplot is 10 units of Output. The p-value tells us that this difference is statistically significant—you can reject the null hypothesis that the distance between the two constants is zero. You can also see the difference between the two constants in the regression equation table below.</p>
<p style="margin-left: 40px;"><img alt="Regression equation table that shows constants that are different" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/a879996e37ebb05a297721e695a71943/equ_constant_dift.png" style="width: 305px; height: 113px;" /></p>
Comparing Coefficients in Regression Analysis
<p>When two slope coefficients are different, a one-unit change in a predictor is associated with different mean changes in the response. In the scatterplot below, it appears that a one-unit increase in Input is associated with a greater increase in Output in Condition B than in Condition A. We can <em>see</em> that the slopes look different, but we want to be sure this difference is statistically significant.</p>
<p><img alt="Scatterplot that shows two slopes that are different" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/200c12087fdf7eecd9b773d9ce213020/scatter_slope_dift.png" style="width: 576px; height: 384px;" /></p>
<p>How do you statistically test the difference between regression coefficients? It sounds like it might be complicated, but it is actually very simple. We can even use the same Condition variable that we did for testing the constants.</p>
<p>We need to determine whether the coefficient for Input depends on the Condition. In statistics, when we say that the effect of one variable depends on another variable, that’s an interaction effect. All we need to do is include the interaction term for Input*Condition!</p>
<p>In Minitab, you can specify interaction terms by clicking the <strong>Model</strong> button in the main regression dialog box. After I fit the regression model with the interaction term, we obtain the following coefficients table:</p>
<p style="margin-left: 40px;"><img alt="Coefficients table that shows different slopes" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/f06eff56f2266d0ff7e3919aa1292285/coeff_slope_dift.png" style="width: 410px; height: 154px;" /></p>
<p>The table shows us that the interaction term (Input*Condition) is statistically significant (p = 0.000). Consequently, we reject the null hypothesis and conclude that the difference between the two coefficients for Input (below, 1.5359 and 2.0050) does not equal zero. We also see that the main effect of Condition is not significant (p = 0.093), which indicates that difference between the two constants is not statistically significant.</p>
<p style="margin-left: 40px;"><img alt="Regression equation table that shows different slopes" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/d5e5142c0ff13645d1dacc3e2c0bee27/equ_coeff_dift.png" style="width: 295px; height: 105px;" /></p>
<p>It is easy to compare and test the differences between the constants and coefficients in regression models by including a categorical variable. These tests are useful when you can see differences between regression models and you want to defend your conclusions with p-values.</p>
<p>If you're learning about regression, read my <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-tutorial-and-examples">regression tutorial</a>!</p>
Data AnalysisHypothesis TestingRegression AnalysisStatistics HelpWed, 13 Jan 2016 13:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/how-to-compare-regression-lines-between-different-modelsJim Frost