Data Analysis Software | MinitabBlog posts and articles with tips for using statistical software to analyze data for quality improvement.
http://blog.minitab.com/blog/data-analysis-software/rss
Mon, 02 May 2016 05:51:21 +0000FeedCreator 1.7.3What's a Moving Range, and How Is It Calculated?
http://blog.minitab.com/blog/marilyn-wheatleys-blog/whats-a-moving-range-and-how-is-it-calculated
<p>We often receive questions about moving ranges because they're used in various tools in our <a href="http://www.minitab.com/products/minitab">statistical software</a>, including control charts and capability analysis when data is not collected in subgroups. In this post, I'll explain what a moving range is, and how a moving range and average moving range are calculated.</p>
<p>A moving range measures how variation changes over time when data are collected as individual measurements rather than in subgroups.</p>
<p>If we collect individual measurements and need to plot the data on a control chart, or assess the capability of a process, we need a way to estimate the variation over time. But when we have individual observations, we cannot calculate the standard deviation for each subgroup. In such cases, the average moving range across all subgroups is an alternative way to estimate process variation.</p>
<p>Consider the 10 random data points plotted in the graph below:</p>
<p style="margin-left: 40px;"><img height="369" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/7b447a0adb4a6e3a23fee5a34ab07563/7b447a0adb4a6e3a23fee5a34ab07563.png" width="624" /></p>
<p>A moving range is the distance or difference between consecutive points. For example, MR1 in the graph below represents the first moving range, MR2 represents the second moving range, and so forth:</p>
<p style="margin-left: 40px;"><img height="414" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/041539e9131ddbfb6cae7517ec190ab8/041539e9131ddbfb6cae7517ec190ab8.png" width="624" /></p>
<p>The difference between the first and second points (MR1) is 0.704, and that’s a positive number since the first point has a lower value than the second. The second moving range, MR2, is the difference between the second point (21.0494) and the third (19.6375), and that’s a negative number (-1.4119), since the third point has a lower value than the second. If we continue that way, we’ll have 9 moving ranges for our 10 data points.</p>
<p>In Minitab, a moving range is easy to compute by "lagging" the data. Continuing the example with the 10 data points above, I can use <strong>Stat</strong> > <strong>Time Series</strong> > <strong>Lag</strong>, and then complete the dialog box as shown below:</p>
<p style="margin-left: 40px;"><img alt="a" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/Image/2b125f53827fb9cc7aec8b2a300845a7/capture.PNG" style="width: 557px; height: 330px;" /></p>
<p>Clicking <strong>OK</strong> in the dialog above will shift the data in C1 down by one row and store the results in C4. Now we can use <strong>Calc</strong> > <strong>Calculator</strong> to subtract C4 from C1 and calculate all the moving ranges:</p>
<p style="margin-left: 40px;"><img alt="b" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/Image/070834223bef3007c9621c940ff3a195/capture.PNG" style="width: 563px; height: 380px;" /></p>
<p>To calculate the average moving range, we need to use the absolute value of the moving ranges we calculated above. We’ll take a look at how to do that later. </p>
<p>When Minitab calculates the average of a moving range, the calculation also includes and <a href="http://support.minitab.com/en-us/minitab/17/topic-library/quality-tools/capability-analyses/data-and-data-assumptions/unbiasing-constants/">unbiasing constant</a>. The formula used to calculate the moving range is:</p>
<p style="margin-left: 40px;"><img alt="equation" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/a5a46a4ff68b1425bbd155792d20a701/a5a46a4ff68b1425bbd155792d20a701.png" style="border-width: 0px; border-style: solid; width: 624px; height: 140px;" /></p>
<p>The table of unbiasing constants is available within Minitab and <a href="http://support.minitab.com/en-us/minitab-express/1/help-and-how-to/control-charts/how-to/variables-data-in-subgroups/xbar-r-chart/methods-and-formulas/unbiasing-constants-d2-d3-and-d4/">on this page</a>.</p>
<p>We’ve already done most of the work. To finish, we’ll find the right value of d2 in the table linked above, and use Minitab’s calculator to get the answer. We need the value of d2 that corresponds to a moving range of length 2 (that’s the number of points in each moving range calculation, but don’t worry, I’ll explain more about the length of the moving range later):</p>
<p style="margin-left: 40px;"><img border="0" height="179" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/2caa9e4eec046f281a834976260d3f8c/2caa9e4eec046f281a834976260d3f8c.png" width="173" /></p>
<p>Now back to Minitab, and we can use <strong>Calc</strong> > <strong>Calculator</strong> to get our answer:</p>
<p style="margin-left: 40px;"><img alt="c" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/Image/f3eaf58a9007d6420c44b559206206eb/capture.PNG" style="width: 604px; height: 386px;" /></p>
<p>Using the formula above, we’re telling Minitab to use the absolute values (ABS calculator command) in C5 to calculate the mean, and then divide that by our unbiasing constant value of 1.128.</p>
<p>Now to check our results against Minitab, we can use <strong>Stat </strong>> <strong>Control Charts</strong> > <strong>Variables Charts for Individuals</strong> > <strong>I-MR</strong> and enter our original data column:</p>
<p style="margin-left: 40px;"><img border="0" height="334" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/0c80b992ef94f8d021aa1ebfc5bbc594/0c80b992ef94f8d021aa1ebfc5bbc594.png" width="507" /></p>
<p>Next, choose <strong>I-MR Options</strong> > <strong>Storage</strong>, and check the box next to <strong>Standard deviations</strong>, then click <strong>OK</strong> in each dialog box:</p>
<p style="margin-left: 40px;"><img alt="d" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/Image/c4b545c37882980e3f690ad046f63626/capture.PNG" style="width: 582px; height: 440px;" /></p>
<p>The results show the same average moving range value we calculated, <strong>0.602627</strong>. </p>
<p>In this case, because we used a moving range of length 2, the average moving range gives us an estimate of the average distance between our consecutive individual data points. A moving range of length 2 is Minitab’s default, but that can be changed by clicking the <strong>I-MR Options</strong> button in the I-MR chart dialog, and then choosing the <strong>Estimate</strong> tab:</p>
<p style="margin-left: 40px;"><img border="0" height="438" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/3e03c57905dc63ff5be0971285a4d518/3e03c57905dc63ff5be0971285a4d518.png" width="442" /></p>
<p>Here we can type in a different value (let’s use 3 as an example), and Minitab will use that number of points to estimate the moving ranges. If we did that for the calculations above, we’d have to make two adjustments:</p>
<ol>
<li>
<p>We’d need to choose the correct value for the unbiasing constant, d2, that corresponds with a moving range length of 3:</p>
<p><img alt="t" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/Image/94764a32eec04329f8dfdd4d73219214/capture.PNG" style="width: 173px; height: 182px;" /></p>
</li>
<li>We’d have to adjust the number of points used for our moving ranges from 2 to 3. Using the same random data as before:</li>
</ol>
<p style="margin-left: 40px;"><img border="0" height="248" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/bf32b968b0788bc21e03920397ccefe4/bf32b968b0788bc21e03920397ccefe4.png" width="71" /></p>
<p style="margin-left: 40px;">With three data points, we’ll use just the highest and the lowest values from the first 3 rows, so MR1 will be 21.0494 – 19.6375 = 1.4119.</p>
<p><span style="line-height: 1.6;">If you’ve enjoyed this post, check out some of our other blog </span><a href="http://blog.minitab.com/blog/control-charts" style="line-height: 1.6;">posts about control charts</a><span style="line-height: 1.6;">.</span></p>
<p> </p>
Fri, 29 Apr 2016 12:00:00 +0000http://blog.minitab.com/blog/marilyn-wheatleys-blog/whats-a-moving-range-and-how-is-it-calculatedMarilyn WheatleyManipulating Your Survey Data in Minitab
http://blog.minitab.com/blog/statistics-and-quality/manipulating-your-survey-data-in-minitab
<p>As a recent graduate from Arizona State University with a degree in Business Statistics, I had the opportunity to work with students from different areas of study and help analyze data from various projects for them.</p>
<p><img alt="survey symbold" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/3b2a7f4c85707a09177d3da12dbaa009/online_survey_icon_or_logo_svg.png" style="margin: 10px 15px; float: right; width: 300px; height: 300px;" />One particular group asked for help analyzing online survey data they had gathered from other students, and they wanted to see if their new student program was beneficial. I would describe this request as them giving us a "pile of data" and saying, "Tell us what you can find out." </p>
<p>There were numerous problems with this "pile of data" because it wasn't organized, in part because of the way the survey itself was set up. (Our statistics professor later told us that she asked this group to come in because she'd looked at their data before they presented it to us and she wanted to see how we would perform with a "real-world" situation.)</p>
<p>Unfortunately, the statistics department didn't have a time machine that would enable us to go back and set up the survey to have better data that was more organized (I guess if we <em>did </em>have a time machine there would be no need for predictive analytics), but we did have <a href="http://www.minitab.com/products/minitab/">Minitab and its tools</a> to help with the importing of data, reviewing the data, and putting it in a format that is best for analyzing. </p>
<p>So let’s assume you have a pile of survey data that is:</p>
<ul>
<li>Unbiased</li>
<li>Taken from a random sample</li>
<li>Taken from the appropriate audience</li>
<li>Contained enough respondents</li>
</ul>
<p><span style="line-height: 1.6;">Many online survey tools allow you to download your data to a .csv or Excel file, which would be perfect to <span>import into Minitab</span>. </span></p>
<p><span style="line-height: 1.6;">In fact, Minitab 17.3 has recently included a new dialog box that shows you the data before it is opened so you can modify the data type, include/exclude certain columns, and see how many rows are within the data. Within options of that same dialog box you are able to choose what is done with missing data points, and missing data rows. All of these new functions give you the ability to bring a "pile of data" into Minitab a little cleaner with less headache.</span></p>
<p style="margin-left: 40px;"><img alt="open survey data dialog" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/b51a0c86-e2dd-456e-878a-4196c7381c3a/File/c5319276614d905f12f38eca2f3a6343/c5319276614d905f12f38eca2f3a6343.png" style="width: 669px; height: 570px;" /> </p>
<p><span style="line-height: 1.6;">Once the data is in Minitab reviewing the data is essential to uncover any irregularities that may be hiding in the data before analysis. Within the Project Manager Bar there is the information icon that allows you to be able to see each column name, column ID, row count, how many missing data points and the type of data of each column. This provides the ability to quickly scan the different columns to make sure that the online data you received correctly by checking the row count, any missing data irregularities, and data type. </span></p>
<p style="margin-left: 40px;"><img alt="data" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/b51a0c86-e2dd-456e-878a-4196c7381c3a/File/637ee7794419e3ad489f4a98c96cbc3c/637ee7794419e3ad489f4a98c96cbc3c.png" style="width: 396px; height: 342px;" /></p>
<p> </p>
<p>Minitab also has numerous tools to format the data before analysis, including <a href="http://blog.minitab.com/blog/understanding-statistics/what-to-do-when-your-datas-a-mess2c-part-3">coding, sorting and splitting worksheets</a>. </p>
<p>For example, occasionally survey data will use “0” in the place of a non-response. This can be a problem because any data analysis will make this a data point when it probably shouldn't be. Minitab can find those “0”s and replace them with missing data to remove them from your worksheet so they won't throw off your analysis (<strong>Editor > Find and Replace > Replace</strong>).</p>
<p>Before analysis you can also sort your data (<strong>Data > Sort</strong>) and choose the column you would like to sort the data to, and you can also create a new worksheet from the sorted data. I also really like the Split and Subset Worksheet options in the event you have a lot of data and it would be easier to look at smaller sections of it for analysis (<strong>Data > Split Worksheet</strong> and <strong>Data > Subset Worksheet</strong>)<strong>.</strong></p>
<p>These are just a few tools that allow you to import data and then prepare the data without having to go back and forth between your spreadsheet software and statistical software. So when you have someone drop off a "pile of data," see how you can use your Minitab tools to shovel through and find the gems that are lying beneath the surface.</p>
Data AnalysisStatisticsTue, 26 Apr 2016 12:00:00 +0000http://blog.minitab.com/blog/statistics-and-quality/manipulating-your-survey-data-in-minitabJoseph Hartsock3 Tips for Importing Excel Data into Minitab
http://blog.minitab.com/blog/michelle-paret/3-tips-for-importing-excel-data-into-minitab
<p>Getting your data from Excel into <a href="http://www.minitab.com/products/minitab/">Minitab Statistical Software</a> for analysis is easy, especially if you keep the following tips in mind.</p>
Copy and Paste
<p><span style="line-height: 20.8px;">To paste into Minitab, you can either right-click in the worksheet and choose </span><strong style="line-height: 20.8px;">Paste Cells</strong><span style="line-height: 20.8px;"> or you can use </span><strong style="line-height: 20.8px;">Control-V</strong><span style="line-height: 20.8px;">. </span>Minitab allows for 1 row of column headers, so if you have a single row of column info (or no column header info), then you can quickly copy and paste an entire sheet at once. However, if you have multiple rows of descriptive text at the top of your Excel file, then use the following steps:</p>
<p><em> Step 1</em> - Choose a single row for your column headers and paste it into Minitab. </p>
<p><em> Step 2</em> - Go back to your Excel file to copy all of the actual data over.</p>
<p>And if you have any summary info at the end of your Excel file, you'll want to exclude that too, just like any extraneous column header info.</p>
<p><img alt="Excel to Minitab" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/951006fe8ebf8bfde86486660018fbe0/excel_to_mtb.jpg" style="width: 650px; height: 379px;" /></p>
<p> </p>
Importing Lots of Data
<p><img alt="File Open dialog" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/75e6b833214b1e9cbda4e6056a2fde43/file_open_menu.jpg" style="line-height: 20.8px; width: 253px; height: 359px; margin: 10px 15px; float: right;" /></p>
<p>Copy/paste is ideal when you have only a few Excel sheets. But what if you have lots of <span style="line-height: 1.6;">sheets? In this case, try using </span><strong style="line-height: 1.6;">File > Open</strong><span style="line-height: 1.6;">. Another advantage of </span><strong style="line-height: 1.6;">File > Open</strong><span style="line-height: 1.6;"> is the additional import options, should you need them. For example, you can specify which sheets </span><span style="line-height: 1.6;">and rows to include. And there are even options to handle messy data issues, such as case mismatches and </span><a href="http://blog.minitab.com/blog/michelle-paret/how-to-remove-leading-or-trailing-spaces-from-a-data-set" style="line-height: 1.6;">leading and trailing spaces</a><span style="line-height: 1.6;">.</span></p>
<div>
Fixing Column Formats
<p>Minitab has 3 column formats: numeric, text, and date/time. Text columns are noted with a <strong>-T</strong> and date/time columns are noted with a <strong>-D</strong>, while numeric columns appear without such an indicator. Why does column format matter? It matters because certain graphs and analyses are only available for certain formats. For example, if you want to create a time series plot, Minitab will not allow you to use a text column. If you bring data over from Excel and the format does not reflect the type of data in a given column, just right-click in the column and choose <strong>Format Column</strong> to select the right type, such as <strong>Automatic numeric</strong>.</p>
<p><img alt="column formats" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/350de8d0fc91e01d485bc1f124a28148/column_format.jpg" style="width: 645px; height: 332px;" /></p>
<p><span style="line-height: 1.6;">Once you import your data and it's properly formatted, you can then use the </span><strong style="line-height: 1.6;">Stat</strong><span style="line-height: 1.6;">, </span><strong style="line-height: 1.6;">Graph</strong><span style="line-height: 1.6;">, and </span><strong style="line-height: 1.6;">Assistant</strong><span style="line-height: 1.6;"> menus to start analyzing it. And if you need help running a particular analysis, just </span><a href="http://www.minitab.com/contact-us" style="line-height: 1.6;">contact Minitab Technical Support</a><span style="line-height: 1.6;">. This outstanding service is free and is staffed with statisticians, so don't hesitate to give them a call.</span></p>
</div>
Data AnalysisFri, 22 Apr 2016 12:00:00 +0000http://blog.minitab.com/blog/michelle-paret/3-tips-for-importing-excel-data-into-minitabMichelle ParetBest Way to Analyze Likert Item Data: Two Sample T-Test versus Mann-Whitney
http://blog.minitab.com/blog/adventures-in-statistics/best-way-to-analyze-likert-item-data%3A-two-sample-t-test-versus-mann-whitney
<p><img alt="Worksheet that shows Likert data" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/6b1cf78b969699ed58febb026d32051d/likert_worksheet.png" style="float: right; width: 162px; height: 265px; margin: 10px 15px;" />Five-point Likert scales are commonly associated with surveys and are used in a wide variety of settings. You’ve run into the Likert scale if you’ve ever been asked whether you strongly agree, agree, neither agree or disagree, disagree, or strongly disagree about something. The worksheet to the right shows what five-point Likert data look like when you have two groups.</p>
<p>Because Likert item data are discrete, ordinal, and have a limited range, there’s been a longstanding dispute about the most valid way to analyze Likert data. The basic choice is between <a href="http://blog.minitab.com/blog/adventures-in-statistics/choosing-between-a-nonparametric-test-and-a-parametric-test" target="_blank">a parametric test and a nonparametric test</a>. The pros and cons for each type of test are generally described as the following:</p>
<ul>
<li>Parametric tests, such as the 2-sample t-test, assume a normal, continuous distribution. However, with a sufficient sample size, t-tests are robust to departures from normality.</li>
<li>Nonparametric tests, such as the Mann-Whitney test, do not assume a normal or a continuous distribution. However, there are concerns about a lower ability to detect a difference when one truly exists.</li>
</ul>
<p>What’s the better choice? This is a real-world decision that users of <a href="http://www.minitab.com/en-us/products/minitab/" target="_blank">statistical software</a> have to make when they want to analyze Likert data.</p>
<p>Over the years, a number of studies that have tried to answer this question. However, they’ve tended to look at a limited number of potential distributions for the Likert data, which causes the generalizability of the results to suffer. Thanks to increases in computing power, simulation studies can now thoroughly assess a wide range of distributions.</p>
<p>In this blog post, I highlight a simulation study conducted by de Winter and Dodou* that compares the capabilities of the two sample t-test and the Mann-Whitney test to analyze five-point Likert items for two groups. Is it better to use one analysis or the other?</p>
<p>The researchers identified a diverse set of 14 distributions that are representative of actual Likert data. The computer program drew independent pairs of samples to test all possible combinations of the 14 distributions. All in all, 10,000 random samples were generated for each of the 98 distribution combinations! The pairs of samples are analyzed using both the two sample t-test and the Mann-Whitney test to compare how well each test performs. The study also assessed different sample sizes.</p>
<p>The results show that for all pairs of distributions the <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/hypothesis-tests/basics/type-i-and-type-ii-error/" target="_blank">Type I (false positive) error rates</a> are very close to the target amounts. In other words, if you use either analysis and your results are statistically significant, you don’t need to be overly concerned about a false positive.</p>
<p>The results also show that for most pairs of distributions, the difference between the <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/power-and-sample-size/what-is-power/" target="_blank">statistical power</a> of the two tests is trivial. In other words, if a difference truly exists at the population level, either analysis is equally likely to detect it. The concerns about the Mann-Whitney test having less power in this context appear to be unfounded.</p>
<p>I do have one caveat. There are a few pairs of specific distributions where there is a power difference between the two tests. If you perform both tests on the same data and they disagree (one is significant and the other is not), you can look at a table in the article to help you determine whether a difference in statistical power might be an issue. This power difference affects only a small minority of the cases.</p>
<p>Generally speaking, the choice between the two analyses is tie. If you need to compare two groups of five-point Likert data, it usually doesn’t matter which analysis you use. Both tests almost always provide the same protection against false negatives and always provide the same protection against false positives. These patterns hold true for sample sizes of 10, 30, and 200 per group.</p>
<p>*de Winter, J.C.F. and D. Dodou (2010), Five-Point Likert Items: t test versus Mann-Whitney-Wilcoxon, <em>Practical Assessment, Research and Evaluation</em>, 15(11).</p>
Data AnalysisHypothesis TestingStatisticsStatistics HelpWed, 06 Apr 2016 12:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/best-way-to-analyze-likert-item-data%3A-two-sample-t-test-versus-mann-whitneyJim FrostAre You Putting the Data Cart Before the Horse? Best Practices for Prepping Data for Analysis, ...
http://blog.minitab.com/blog/meredith-griffith/are-you-putting-the-data-cart-before-the-horse-best-practices-for-prepping-data-for-analysis%2C-part-1
<p>Most of us have heard a backwards way of completing a task, or doing something in the conventionally wrong order, described as “putting the cart before the horse.” That’s because a horse pulling a cart is much more efficient than a horse pushing a cart.</p>
<p><img alt="cart before horse" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ec1fbea4785510ea0e0a9997c1669c68/cart_horse.png" style="margin: 10px 15px; float: right; width: 350px; height: 206px;" />This saying may be especially true in the world of statistics. Focusing on a statistical tool or analysis before checking out the condition of your data is one way you may be putting the cart before the horse. You may then find yourself trying to force your data to fit an analysis, particularly when the data has not been set up properly. It’s far more efficient to first make sure your <a href="http://blog.minitab.com/blog/understanding-statistics/the-single-most-important-question-in-every-statistical-analysis">data are reliable</a> and then allow your questions of interest to guide you to the right analysis.</p>
<p>Spending a little quality time with your data up front can save you from wasting a lot of time on an analysis that either can’t work—or can’t be trusted.</p>
<p>As a quality practitioner, you’re likely to be involved in many activities—establishing quality requirements for external suppliers, monitoring product quality, reviewing product specifications and ensuring they are met, improving process efficiency, and much more.</p>
<p>All of these tasks will involve data collection and statistical analysis with software such as Minitab. For example, suppose you need to perform a <a href="http://blog.minitab.com/blog/meredith-griffith/fundamentals-of-gage-rr">Gage R&R</a> study to verify your measurement systems are valid, or you need to understand how machine failures impact downtime.</p>
<p>Rather than jumping right into the analysis, you will be at an advantage if you take time to look at your data. Ask yourself questions such as:</p>
<ul>
<li>What problem am I trying to solve?</li>
<li>Is my data set up in a way that will be useful to answering my question?</li>
<li>Did I make any mistakes while recording my data?</li>
</ul>
<p>Utilizing process knowledge can also help you answer questions about your data and identify data entry errors. A focus on preparing and exploring your data prior to an analysis will not only save you time in the long run, but will help you obtain reliable results.</p>
<p>So then, where to begin with best practices for prepping data for an analysis? Let’s look no further than your data.</p>
Clean your data before you analyze it
<p>Let’s assume you already know what problem you’re trying to solve with your data. For instance, you are the area supervisor of a manufacturing facility, and you’ve been experiencing lower productivity than usual on the machines in your area and want to understand why. You have collected data on these machines, recording the amount of time a machine was out of operation, the reason for the machine being down, the shift number when the machine went down, and the speed of the machine when it went down.</p>
<p>The first step toward answering your question is to ensure your data are clean. Cleaning your data before you begin an analysis can save time by preventing rework, such as reformatting data or correcting data entry errors, after you’ve already begun the analysis. Data cleaning is also essential to ensure your analyses and results—and the decisions you make—are reliable.</p>
<p>With the <a href="https://www.minitab.com/en-us/support/minitab/minitab-17.3.1-update/" style="line-height: 20.8px;">latest update to Minitab 17</a><span style="line-height: 20.8px;">, an improved data import helps you identify and correct case mismatches, fix improperly formatted columns, represent missing data accurately and in a manner that is recognized by the software, remove blank rows and extra spaces, and more. When importing your data, you see a preview of your data as a reminder to ensure it’s in the best possible state before it finds its way into Minitab. This preview helps you spot mistakes you have made in your data collection, and automatically corrects mistakes you don’t notice or that are difficult to find in large data sets.</span></p>
<p><img alt="Data Import" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/dae6c7b7-fc22-4616-9d65-f04909c20ab1/Image/b1c679056c60ac2fa82f37e1f1de406b/data_import.jpg" style="width: 775px; height: 655px;" /></p>
<p><em>Minitab offers a data import dialog that helps you quickly clean and format your data before importing into the software, ensuring your data are trustworthy and allowing you to get to your analysis sooner.</em></p>
<p><span style="line-height: 20.8px;">If you’d rather copy and paste your data from Excel, Minitab will ensure you paste your data in the right place. For instance, if your data have column names and you accidentally paste your data into the first row of the worksheet, your data will all be formatted as text—even when the data following your column names are numeric! With </span><a href="https://www.minitab.com/en-us/products/minitab/whats-new/" style="line-height: 20.8px;">Minitab 17.3</a><span style="line-height: 20.8px;">, you will receive an alert that your data is in the wrong place, and Minitab will automatically move your data where it belongs. This alert ensures your data are formatted properly, preventing you from running into the problem during an analysis and saving you time manually correcting every improperly formatted column.</span></p>
<p><img alt="Copy Paste Warning" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/dae6c7b7-fc22-4616-9d65-f04909c20ab1/Image/5df941ffaa491a0072261aef075a19d6/copy_paste_warning.jpg" style="width: 431px; height: 299px;" /></p>
<p><em>Pasting your Excel data in the first row of a Minitab worksheet will trigger this warning, which safeguards against improperly formatted columns.</em></p>
<p><span style="line-height: 1.6;">This is only the beginning! Minitab makes it quick and painless to begin exploring and visualizing your data, offering more insights and ease once you get to the analysis. If you’d like to learn additional best practices for prepping your data for any analysis, stay tuned for my next post where I’ll offer tips for exploring and drawing insights from your data!</span></p>
Data AnalysisStatisticsWed, 30 Mar 2016 14:05:04 +0000http://blog.minitab.com/blog/meredith-griffith/are-you-putting-the-data-cart-before-the-horse-best-practices-for-prepping-data-for-analysis%2C-part-1Meredith GriffithHow to Remove Leading or Trailing Spaces from a Data Set
http://blog.minitab.com/blog/michelle-paret/how-to-remove-leading-or-trailing-spaces-from-a-data-set
<p>Leading and trailing spaces in a data set are like termites in your house. If you don’t realize they are there and you don’t get rid of them, they’re going to wreak havoc.</p>
<p><img alt="keyboard" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/cc148fcf427d6e92fba27f00bba3968c/keyboard.jpg" style="margin: 10px 15px; float: right; width: 274px; height: 184px;" />Here are a few easy ways to remove these pesky characters with <a href="http://www.minitab.com/products/minitab/">Minitab Statistical Software</a> prior to analysis.</p>
Data Import
<p>If you’re importing data from Excel, a text file, or some other file type:</p>
<ol>
<li>Choose <strong>File > Open</strong> and select your Excel file, text file, etc.</li>
<li>Click <strong>Options</strong> and select <em>Remove nonprintable characters and extra spaces</em>.</li>
<li>Click <strong>OK</strong>.</li>
</ol>
<p>Note: This feature was introduced in Minitab 17.3. If you have an older version of Minitab 17, use <strong>Help > Check for Updates</strong>. If you have Minitab 16 or earlier—or you don't have Minitab at all—you can download a <a href="http://www.minitab.com/products/minitab/free-trial/">free 30-day trial</a>.</p>
The Calculator
<p>Suppose you already have your data in Minitab, located in column C1:</p>
<ol>
<li>Choose <strong>Calc > Calculator</strong>.</li>
<li>In <strong>Store result in variable</strong>, enter a blank column (e.g. <em>C5</em>), or you can overwrite an existing column.</li>
<li>In <strong>Expression</strong>, enter <em>TRIM(C1).</em></li>
<li>Click <strong>OK</strong>.</li>
</ol>
<p>If you also want to remove all non-printable characters using the Calculator, <em>CLEAN</em> is available as well.</p>
<p><img alt="Calculator" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/bce689217238ff90e42bbde154308b74/calculator.jpg" style="width: 350px; height: 310px;" /></p>
<p><span style="line-height: 1.6;">And that’s all there is to it.</span></p>
Data AnalysisFri, 25 Mar 2016 12:00:00 +0000http://blog.minitab.com/blog/michelle-paret/how-to-remove-leading-or-trailing-spaces-from-a-data-setMichelle ParetGage R&R Metrics: What Do They All Mean?
http://blog.minitab.com/blog/starting-out-with-statistical-software/gage-rr-metrics%3A-what-do-they-all-mean
<p>When you analyze a Gage R&R study in <a href="http://www.minitab.com/products/minitab/">statistical software</a>, your results can be overwhelming. There are a lot of statistics listed in Minitab's Session Window—what do they all mean, and are they telling you the same thing?</p>
<p>If you don't know where to start, it can be hard to figure out what the analysis is telling you, especially if your measurement system is giving you some numbers you'd think are good, and others that might not be. I'm going to focus on three different statistics that are often confused when <span><a href="http://blog.minitab.com/blog/meredith-griffith/fundamentals-of-gage-rr">reading Gage R&R output</a></span>. </p>
<p>The first thing to look at is the %Study Variation and the %Contribution.</p>
<p style="margin-left: 40px;"><img alt="gage r&R output" src="https://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f7e1af57-c25e-4ec3-a999-2166d525717e/Image/be2a9d9d311b9fad9b00eacdd73abff5/gage2.png" style="width: 618px; height: 404px;" /></p>
<p>You could look at either of them, as they are both telling you the same thing, just in a different way. By definition, the %Contribution for a source is 100 times the variance component for that source divided by the Total Variation variance component. This calculation has the benefit of making all of your sources of variability add up to 100%, which can make things easy to interpret.</p>
<p>The %Study Variation does not sum up to 100% like %Contribution, but it does have other benefits. %Contribution is based on the variance component that is specific to the values you observed in your study, not what the population of values might be. In contrast, the %Study Variation, by taking 6*standard deviation, extrapolates out over the entire population of values (based on the observed values, of course).</p>
<p>The bottom line is that both % Study Variation and %Contribution are telling you, in simple terms, about the percentage of variation in your process attributable to that particular source. </p>
<p>What about %Tolerance? What does <em>that </em>allow us to look at? While %StudyVar and %Contribution compare the variation from a particular source to the total variation, the %Tolerance compares the amount of variation from a source to a specified tolerance spread. This can lead to seemingly conflicting results, such as getting a low %StudyVar while having a high %Tolerance. In this case, your gage system may be introducing low levels of variability compared to other sources, but the amount of variation is still too much based on your spec limits. The %Tolerance column may be more important to you in this case, as it's more specific to your actual product and its spec limits. </p>
<p>So, a short summary:</p>
<p><strong>%Contribution: </strong>The percentage of variation due to the source compared to the total variation, but with the added benefit that all sources will sum to 100%</p>
<p><strong>%StudyVar:</strong> The <span style="line-height: 20.8px;">percentage of variation due to the source compared to the total variation, but with the added benefit of extrapolating beyond your specific data values. </span></p>
<p><strong>%Tolerance:</strong> The percentage of variation due to the source compared to your specified tolerance range.</p>
<p>The %StudyVar is perhaps more reliant on having a good quality study and can be used when your goal is improving the measurement system. On the other hand %Tolerance can be used when the focus is on the measurement system being able to do it’s job and classify parts as in or out of spec.</p>
<p>Each of these statistics provide valuable information, and how you weigh each of these largely depends on what you're looking to get out of your study.</p>
Lean Six SigmaProject ToolsQuality ImprovementMon, 21 Mar 2016 12:00:00 +0000http://blog.minitab.com/blog/starting-out-with-statistical-software/gage-rr-metrics%3A-what-do-they-all-meanEric HeckmanWhat a Trip to the Dentist Taught Us about Automation
http://blog.minitab.com/blog/meredith-griffith/what-a-trip-to-the-dentist-taught-us-about-automation
<p>After my husband’s most recent visit to the dentist, he returned home cavity-free...and with a $150 electric toothbrush in hand. </p>
<p><span style="line-height: 1.6;">I wanted details.</span></p>
<p>It began innocently. His dreaded trip to the dentist ended in high praise for no cavities and only a warning to floss more. That prompted my programming-and-automation-obsessed husband, still in the chair, to exclaim, "I wish there was a way to automate this whole process—the brushing and the flossing."</p>
<p><img alt="Teeth" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/935fceb17af86287ddad7b098f1ab0cf/teeth.png" style="margin: 10px 15px; float: right; width: 296px; height: 273px;" />Next thing you know, he’s swiping the credit card (to "earn miles for our next flight," he says) and walking out with a nice Philips Sonicare DiamondClean Sonic Electric Toothbrush.</p>
<p>From this anecdote, you’d think I was sitting beside him as his teeth-cleaning proceeded. I merely received the story secondhand from our dental hygienist the very next day when I went in for my own visit. But I digress.</p>
<p>When my husband exclaimed his desire to automate a process that very few humans enjoy doing, our dentist was pleased to tell him that this toothbrush comes close. Granted, the toothbrush can’t <em>completely </em>automate these tasks: it still requires the user to be present. However, our dentist offered the following points to consider:</p>
<ol>
<li>The toothbrush does most of the brushing for you (with the exception of moving your hand so that you brush all your teeth).</li>
<li>The bristles automatically move, reaching crevices between teeth that no manual tooth-brushing ever could.</li>
<li>Because of point #2, plaque buildup will decrease and gum health will improve.</li>
<li>Because of point #3, flossing won’t be a strict daily requirement.</li>
</ol>
<p>Sold.</p>
<p>The dentist's points give us a nice framework for thinking about automation. An automated solution might not be perfect. But an automated solution should:</p>
<p style="margin-left: 40px;">a. make a task easier and more efficient (brushing hard-to-reach places more effectively)<br />
b. <span style="line-height: 1.6;">require less of your time (reduces the need to floss daily), and <br />
c. </span><span style="line-height: 1.6;">save you money (better tooth and gum health and fewer fillings equates to cost savings). </span></p>
<p><span style="line-height: 1.6;">Who wouldn’t buy into that idea?</span></p>
<p>Automated solutions can turn feelings of boredom over performing tedious tasks into feelings of excitement. Why? Because automation removes the need to perform repetitive tasks that we know how to do but might not particularly enjoy, helps us see results faster, and incites us to implement change sooner. This can translate into business efficiency and increased profit.</p>
<p>The mere <em>idea</em> of automating the task of brushing teeth and the results he might experience incited my husband to think about tooth-brushing differently, and prompted the decision to purchase this custom solution (the electric toothbrush) before even implementing it in his daily habits; imagine the changes and process improvements that might occur once the automated solution is in place. Perhaps a report of no cavities for several visits <em>in a row</em> and an extra lump of cash for him to spend on me!</p>
<p>Just as Philips (and other manufacturers) developed an electric toothbrush as a custom solution to automate difficult or tedious aspects of brushing teeth, Minitab has created custom statistical solutions and has automated processes for numerous customers in various industries, including manufacturing, pharmaceutical, medical devices, and healthcare.</p>
<p>Did you know that Minitab is not merely an out-of-the-box statistical software package? Behind the software interface is a powerful statistical and graphical engine that can integrate with a customer’s workflow and provide a unique solution tailored to that customer’s industry-specific problem. Minitab’s engine can communicate with a customer’s databases, applications, and other programs such as Excel, in order to automatically perform analyses and provide output relevant to the customer’s needs.</p>
<p>One interesting example that comes to mind is a project our custom development consultants tackled for a pharmaceutical company. This company was responding to an FDA warning letter and needed to assess the quality of hundreds of active ingredients in a particular drug. They needed to analyze data collected for each ingredient using Minitab’s <a href="http://blog.minitab.com/blog/starting-out-with-statistical-software/starting-out-with-capability-analysis">capability analysis tool</a>, and create a report detailing the result of the analysis in order to show the FDA that their drug was stable and safe for consumption—but they needed to perform the same analysis and create the same report hundreds of times over.</p>
<p>Our custom development consultants used Minitab’s engine to access the customer’s data in Excel, automatically perform capability analysis on each active ingredient in the drug, and create custom reports detailing the quality level of each ingredient and a few additional pieces of output that the FDA wanted to see. Automating this work saved a tremendous amount of time, energy, and money, and ultimately helped the pharmaceutical company respond to the FDA warning letter in a timely manner.</p>
<p>Of course, Minitab’s <a href="https://www.minitab.com/en-us/services/custom-development/">custom solutions</a> can take on many forms, including custom reports as mentioned in the pharmaceutical example above, real-time dashboard solutions, and alert systems (I’ll save details on that one for the second installment of this blog series, where we’ll hear about more of my husband’s shenanigans pertaining to online bill payments).</p>
<p><span style="line-height: 1.6;">We live in a world of innovation and creativity; automated solutions touch on both ideas. If we can automate aspects of brushing our teeth, then surely we can automate a business process or task to help you become more efficient, save time, reduce costs, and see results sooner. If you’d like to learn how Minitab can help you, contact us at </span><a href="mailto:customdev@minitab.com" style="line-height: 1.6;">customdev@minitab.com</a><span style="line-height: 1.6;">.</span></p>
<p>My hope is that after reading this blog post, you see the relevance and value of automation—whether brushing your teeth, performing the same statistical analyses, or creating custom reports. And the power of automation extends far beyond these simple examples! So if I’ve piqued your interest, stay tuned for Part 2 of this series to hear more lessons learned by my husband in his automation endeavors!</p>
AutomationWed, 02 Mar 2016 13:00:00 +0000http://blog.minitab.com/blog/meredith-griffith/what-a-trip-to-the-dentist-taught-us-about-automationMeredith GriffithFive Reasons Why Your R-squared Can Be Too High
http://blog.minitab.com/blog/adventures-in-statistics/five-reasons-why-your-r-squared-can-be-too-high
<p>I’ve written about R-squared before and I’ve concluded that it’s not as intuitive as it seems at first glance. It can be a misleading statistic because <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-how-do-i-interpret-r-squared-and-assess-the-goodness-of-fit" target="_blank">a high R-squared is not always good and a low R-squared is not always bad</a>. I’ve even said that <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-high-should-r-squared-be-in-regression-analysis" target="_blank">R-squared is overrated</a> and that <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-how-to-interpret-s-the-standard-error-of-the-regression" target="_blank">the standard error of the estimate (S)</a> can be more useful.</p>
<p>Even though I haven’t always been enthusiastic about R-squared, that’s not to say it isn’t useful at all. For instance, if you perform a study and notice that similar studies generally obtain a notably higher or lower R-squared, you should investigate why yours is different because there might be a problem.</p>
<p>In this blog post, I look at five reasons why your R-squared can be too high. This isn’t a comprehensive list, but it covers some of the more common reasons.</p>
Is A High R-squared Value a Problem?
<p><img alt="Very high R-squared" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/072d9c1d584683676849dd76d9802993/highr_sq.png" style="float: right; width: 216px; height: 80px;" />A very high R-squared value is not necessarily a problem. Some processes can have R-squared values that are in the high 90s. These are often physical process where you can obtain precise measurements and there's low process noise.</p>
<p>You'll have to use your subject area knowledge to determine whether a high R-squared is problematic. Are you modeling something that is inherently predictable? Or, not so much? If you're measuring a physical process, an R-squared of 0.9 might not be surprising. However, if you're predicting human behavior, that's way too high!</p>
<p>Compare your study to similar studies to determine whether your R-squared is in the right ballpark. If your R-squared is too high, consider the following possibilities. To determine whether any apply to your model specifically, you'll have to use your subject area knowledge, information about how you fit the model, and data specific details.</p>
Reason 1: R-squared is a biased estimate
<p><img alt="bathroom scale" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/410bbc59cf4450a2d8bbc20373c683a4/overweight_w1024.jpeg" style="float: right; width: 200px; height: 300px;" />The R-squared in your regression output is a biased estimate based on your sample—it tends to be too high. This bias is a reason why some practitioners don’t use R-squared at all but use adjusted R-squared instead.</p>
<p>R-squared is like a broken bathroom scale that tends to read too high. No one wants that! Researchers have long recognized that regression’s optimization process takes advantage of chance correlations in the sample data and inflates the R-squared.</p>
<p>Adjusted R-squared does what you’d do with that broken bathroom scale. If you knew the scale was consistently too high, you’d reduce it by an appropriate amount to produce a weight that is correct on average.</p>
<p>Adjusted R-squared does this by comparing the sample size to the number of terms in your regression model. Regression models that have many samples per term produce a better R-squared estimate and require less shrinkage. Conversely, models that have few samples per term require more shrinkage to correct the bias.</p>
<p>For more information, read my posts about <a href="http://blog.minitab.com/blog/adventures-in-statistics/multiple-regession-analysis-use-adjusted-r-squared-and-predicted-r-squared-to-include-the-correct-number-of-variables" target="_blank">Adjusted R-squared</a> and <a href="http://blog.minitab.com/blog/adventures-in-statistics/r-squared-shrinkage-and-power-and-sample-size-guidelines-for-regression-analysis" target="_blank">R-squared shrinkage</a>.</p>
Reason 2: You might be overfitting your model
<p>An overfit model is one that is too complicated for your data set. You’ve included too many terms in your model compared to the number of observations. When this happens, the regression model becomes tailored to fit the quirks and random noise in your specific sample rather than reflecting the overall population. If you drew another sample, it would have its own quirks, and your original overfit model would not likely fit the new data.</p>
<p>Adjusted R-squared doesn't always catch this, but <a href="http://blog.minitab.com/blog/adventures-in-statistics/multiple-regession-analysis-use-adjusted-r-squared-and-predicted-r-squared-to-include-the-correct-number-of-variables" target="_blank">predicted R-squared</a> often does. Read my post about <a href="http://blog.minitab.com/blog/adventures-in-statistics/the-danger-of-overfitting-regression-models" target="_blank">the dangers of overfitting your model</a>.</p>
Reason 3: Data mining and chance correlations
<p>If you fit many models, you will find variables that appear to be significant but they are correlated only by chance. While your final model might not be too complex for the number of observations (Reason 2), problems occur when you fit many different models to arrive at the final model. Data mining can produce <a href="http://blog.minitab.com/blog/adventures-in-statistics/four-tips-on-how-to-perform-a-regression-analysis-that-avoids-common-problems" target="_blank">high R-squared values even with entirely random data</a>!</p>
<p>Before performing regression analysis, you should already have an idea of what the important variables are along with their relationships, coefficient signs, and effect magnitudes based on previous research. Unfortunately, recent trends have moved away from this approach thanks to large, readily available databases and automated procedures that build regression models.</p>
<p>For more information, read my post about using <a href="http://blog.minitab.com/blog/adventures-in-statistics/beware-of-phantom-degrees-of-freedom-that-haunt-your-regression-models" target="_blank">too many phantom degrees of freedom</a>.</p>
Reason 4: Trends in Panel (Time Series) Data
<p>If you have time series data and your response variable and a predictor variable both have significant trends over time, this can produce very high R-squared values. You might try a <a href="http://support.minitab.com/en-us/minitab/17/topic-library/modeling-statistics/time-series/basics/time-series-analyses-in-minitab/" target="_blank">time series analysis</a>, or including time related variables in your regression model, such as <a href="http://support.minitab.com/en-us/minitab/17/topic-library/minitab-environment/calculator-and-matrices/column-calculator-functions/lag-function/" target="_blank">lagged</a> and/or <a href="http://support.minitab.com/en-us/minitab/17/topic-library/minitab-environment/calculator-and-matrices/column-calculator-functions/differences-function/" target="_blank">differenced</a> variables. Conveniently, these analyses and functions are all available in <a href="http://www.minitab.com/en-us/products/minitab/" target="_blank">Minitab statistical software</a>.</p>
Reason 5: Form of a Variable
<p>It's possible that you're including different forms of the same variable for both the response variable and a predictor variable. For example, if the response variable is temperature in Celsius and you include a predictor variable of temperature in some other scale, you'd get an R-squared of nearly 100%! That's an obvious example, but you can have the same thing happening more subtlety.</p>
<p>For more information about regression models, read my post about <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-choose-the-best-regression-model">How to Choose the Best Regression Model</a>.</p>
Data AnalysisRegression AnalysisStatisticsStatistics HelpWed, 24 Feb 2016 13:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/five-reasons-why-your-r-squared-can-be-too-highJim FrostMind the Gap
http://blog.minitab.com/blog/data-analysis-and-quality-improvement-and-stuff/mind-the-gap
<p><span style="line-height: 1.6;"><img alt="Mind the gap" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/686c659773c05351ff2e3d7608e17984/subwayip3.jpg" style="float: right; width: 336px; height: 168px; margin: 10px 15px;" />Mind the gap. It's is an important concept to bear in mind whilst traveling on the Tube in London, the T in Boston, the Metro in Washington, D.C., etc. But how many of us remember to mind the gap when we create an interval plot in Minitab Statistical Software? Not too many of us, I'd wager. And it's a shame, too.</span></p>
<p>When you travel on the subway, minding the gap means giving thoughtful consideration to the space between the platform the and the train. On the subway, minding the gap can make the difference between these two very different views of the subway station:</p>
<p><img alt="Bad view of subway" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/d66944dc44479e5de0cbefe80ec188cd/subwaybadview.jpg" style="line-height: 1.6; width: 146px; height: 220px;" /><span style="line-height: 1.6;"> </span><img alt="Nice view of subway" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/e0d2c8a78ea25db3522d762b37f3c9e2/subwayniceview_medium.jpg" style="line-height: 1.6; width: 331px; height: 220px;" /></p>
<p>When you make an interval plot in Minitab, minding the gap means giving thoughtful consideration to the space between groups on the x-axis. For interval plots, minding the gap can make the difference between these two very different views of your data:</p>
<p><img alt="Plain view of data" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/11d00d4165de5508c8b7cb9ae9fadb33/plainviewofdata.jpg" style="width: 260px; height: 174px;" /> <img alt="Awesome view of data" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/0f7fe05a6b132c3477aa7cc2ce526cbb/awesomeviewofdata.jpg" style="width: 260px; height: 174px;" /></p>
<p>Allow me to demonstrate with an example. If you like, you can download the data file, <a href="http://support.minitab.com/en-us/datasets/anova-data-sets/moisture-content/">PercentMoisture.MTW</a> from our data set library and follow along. (You can get the free 30-day trial of Minitab <a href="http://it.minitab.com/products/minitab/free-trial.aspx">here</a> if you don't already have the software.) Technicians at a food company collected these data to try to figure out the best combination of time and temperature to bake cereal grains to minimize their moisture content. </p>
<p>Interval plots are useful because they summarize your data and allow you to simultaneously compare the means (represented by the points or symbols) and the variability (represented by the interval bars) for each sample or group. (To see more interval plots in action, check out these other blog posts: <a href="http://blog.minitab.com/blog/understanding-statistics/seven-alternatives-to-pie-charts">Seven Alternatives to Pie Charts</a> and <a href="http://blog.minitab.com/blog/fun-with-statistics/when-even-cupid-isnt-accurate-enough-interval-plots-and-olympic-finals">When Even Cupid Isn't Accurate Enough</a>.) </p>
<p>Creating a basic interval plot in Minitab is simple. Just select <strong>Graph > Interval Plot</strong>. Then choose the <strong>One Y, With Groups</strong> option, enter the data as follows, and click <strong>OK</strong>. (For the sake of space in this article, I renamed the columns "Time" and "Temp".)</p>
<p><img alt="Creating the interval plot" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/403a5bb5dd1a85c5cdb6317fa50b043f/initialdb.jpg" style="width: 360px; height: 168px;" /> </p>
<p><img alt="Basic interval plot" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/00ef81c34648e57be92f9151a7938db6/initialip.jpg" style="width: 360px; height: 240px;" /></p>
<p>The nice thing about interval plots is that multiple levels of multiple factors can be represented by different positions on the x-axis. But the unfortunate thing about interval plots is that multiple levels of multiple factors are represented by different positions on the x-axis.</p>
<p>All the information is there, but it's hard to see how one group relates to the next. For example, to compare the results for the 130-degree oven temperature across the different oven times, you need to compare the 2nd interval bar to the 5th <span style="line-height: 20.8px;">interval bar </span>and the 8th <span style="line-height: 20.8px;">interval bar</span>. You end up going from one similar-looking bar to another and another, and that seldom ends well. </p>
<p>To make the different oven temperatures stand out more, you can add a little color. Just double-click one of the symbols to open the <em>Edit Mean Symbols</em> dialog box. Click the <em>Groups </em>tab, enter the temperature variable, and click <strong>OK</strong>. </p>
<p><img alt="Grouping the symbols" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/047d352b6f366120350b9883c9f4a118/groupsdb.jpg" style="width: 360px; height: 147px;" /> </p>
<p><img alt="Interval plot with grouped symbols" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/b4560e6f14caa1ef4c3a9f2efc38d197/ipwithgroups.jpg" style="width: 360px; height: 240px;" /></p>
<p>To help make the grouping even clearer, you can connect the dots. Right-click the graph and choose <strong>Add > Data Display</strong>, then select <strong>Mean connect line</strong> and click <strong>OK</strong>.</p>
<p><img alt="Adding mean connect lines" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/01ef5650cc8067423f804beff3d45992/dbconnectline.jpg" style="width: 360px; height: 207px;" /> </p>
<p><img alt="Interval plot with mean connect lines" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/a4db0ce40482f00310be78c21044907a/ipwithconnectlines.jpg" style="width: 360px; height: 240px;" /></p>
<p>Now it's much easier to identify and compare the results for the different oven temperatures. But here is where we really start to mind that gap. By which I mean that we start to give thoughtful consideration to the space between the <span style="line-height: 20.8px;">oven-</span>time groups on the x-axis. And by which I also mean that we mind these gaps because they are annoying and we want them to go away. But we need not worry, because that's one gap we can shrink easily.</p>
<p>Double-click the x-axis to open the <em>Edit Scale</em> dialog box. Notice the <strong>Gap within clusters</strong> setting. A setting of –1 means that the intervals for all levels of <span style="line-height: 20.8px;">oven </span>temperature at each level of <span style="line-height: 20.8px;">oven </span>time will be at the same location on the x-axis. Change the setting to –1 and the gap is closed. </p>
<p>And while we're at it, let's make the tick labels for temperature go away as well because they are redundant with the legend, and because the legend conveys the same information. And because if we don't, those labels would appear on top of each other, which looks pretty weird. </p>
<p><img alt="Removing the gap" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/a8bf53a30ab3e6f202cb1584578f5185/dbremovinggap.jpg" style="width: 275px; height: 263px;" /> <img alt="Removing labels" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/dc04c4f2dc9da480fec6e5f7d9ff5383/dbremovelabels.jpg" style="line-height: 1.6; width: 300px; height: 156px;" /><span style="line-height: 1.6;"> </span></p>
<p><img alt="Interval plot with no more gap!" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/c1b5587fbb5c69394f085cd590be70a7/ipsansgap.jpg" style="width: 360px; height: 240px;" /></p>
<p>Awesome! The plot looks much better without the big gaps. Although, perhaps a little gap would make it easier to see the individual intervals more clearly. If we change that gap to –0.85, then everything is groovy.</p>
<p><img alt="Interval plot with tasteful gap" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/06e3d9586005800f4eef141a3c6b4503/ipwithgroovygap.jpg" style="width: 360px; height: 240px;" /></p>
<p>Now that's a gap I don't mind at all! Now it's really easy to compare the results for different oven temperatures within and across the different oven times. The interval plot suggests that to minimize moisture content, we want to use the 90-minute oven time, but we don't want to use the 125-degree oven temperature. </p>
<p>As you can see, the interval plot is an easy and fast way to get a good idea of which differences could be important. But remember, the interval plot can’t tell us which effects or which differences are statistically significant or not. For that, we need to conduct an <a href="http://support.minitab.com/minitab/17/topic-library/modeling-statistics/anova/basics/what-is-anova/">analysis of variance (ANOVA)</a>.</p>
<p>Spoiler alert: I already ran an ANOVA on these data and it confirms what we gleaned from the interval plot. The main effects for both time and temperature are significant. (The interaction effect is not quite significant at the 0.05-level.) Tukey comparisons show that 90 minutes in the oven reduces moisture significantly better than either 30 minutes or 60 minutes in the oven. Tukey comparisons also show that a 125-degree oven is significantly worse at reducing moisture than either a 130-degree oven or a 135-degree oven. The effects of the 135-degree oven are not significantly different from the 130-degree oven, so we can probably save some energy and just use 130 degrees to desiccate our wild oats. </p>
<p><em style="box-sizing: border-box; font-family: 'Segoe UI', Frutiger, 'Frutiger Linotype', 'Dejavu Sans', 'Helvetica Neue', Tahoma, Arial, sans-serif; line-height: 15px; color: rgb(77, 79, 81); font-size: 10px;">Credit for the <a href="https://www.flickr.com/photos/thomasclaveirole/1414940422/">subway tunnel photo</a> goes to Thomas Claveirole.</em><em style="box-sizing: border-box; font-family: 'Segoe UI', Frutiger, 'Frutiger Linotype', 'Dejavu Sans', 'Helvetica Neue', Tahoma, Arial, sans-serif; line-height: 15px; color: rgb(77, 79, 81); font-size: 10px;"> </em><em style="box-sizing: border-box; font-family: 'Segoe UI', Frutiger, 'Frutiger Linotype', 'Dejavu Sans', 'Helvetica Neue', Tahoma, Arial, sans-serif; line-height: 15px; color: rgb(77, 79, 81); font-size: 10px;"> Credit for the <a href="https://www.flickr.com/photos/36217981@N02/14008362659/">subway station photo</a> goes to Tim Adams<span style="box-sizing: border-box;">. Both are available under Creative Commons License 2.0. </span></em></p>
Data AnalysisStatisticsTue, 23 Feb 2016 13:00:00 +0000http://blog.minitab.com/blog/data-analysis-and-quality-improvement-and-stuff/mind-the-gapGreg FoxHow to Calculate BX Life, Part 2
http://blog.minitab.com/blog/meredith-griffith/how-to-calculate-bx-life-part-2
<p><span style="line-height: 1.6;">When I wrote <a href="http://blog.minitab.com/blog/meredith-griffith/how-to-calculate-b10-life-with-statistical-software">How to Calculate B10 Life with Statistical Software</a></span><span style="line-height: 1.6;">, I promised a follow-up blog post that would describe how to compute any “BX” lifetime. In this post I’ll follow through on that promise, and in a third blog post in this series, I will explain why BX life is one of the best measures you can use in your reliability analysis.</span></p>
<p>As a refresher, B10 life refers to the time at which 10% of the population has failed—or, to put it another way, it is the 90% reliability of a population at a specific point in time. Let’s revisit our pacemaker battery example from part 1 of this blog series. Here's <a href="//cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/File/126181c5eca45c380dfed332ee3c3c7d/pacemakerbatterylife.MTW">the data</a>.</p>
<p><img alt="Data" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/dae6c7b7-fc22-4616-9d65-f04909c20ab1/Image/87f5a262f6fa042461f047c74c26a72a/table1.jpg" style="width: 177px; height: 243px;" /></p>
<p>Recall that we found the B10 life of pacemaker batteries to be 6.36 years. Another way to interpret this value is to say that 6.36 years is the time at which 10% of the population of pacemaker batteries will fail. This information is useful in establishing a realistic warranty period for a product so that customers are covered through a product’s 90% reliability period, and so the manufacturer won’t have to incur extra cost by replacing an excess of the product during the warranty period.</p>
<p>But perhaps a particular product has additional reliability requirements a manufacturer wishes to monitor, such as B15 life. Or perhaps we would like to know when half of the population will fail—its B50 life. Both B10 and B50 life are industry standards for measuring the life expectancy of an automotive engine, for instance. This is where BX life calculations become even more useful—and Minitab makes it incredibly easy to compute and interpret those values. (If you don't already have Minitab and you'd like to follow along, <a href="http://www.minitab.com/products/minitab/free-trial/">download the free trial</a>.)</p>
Calculating BX Life
<p>Navigate to Minitab’s <strong>Statistics > Reliability/Survival > Distribution Analysis (Right Censoring) > Parametric Distribution Analysis</strong> menu and set up the main dialog and the 'Censor' subdialog the same way we did in <a href="http://blog.minitab.com/blog/meredith-griffith/how-to-calculate-b10-life-with-statistical-software">Part 1</a>:</p>
<p><img alt="Parametric Distribution Analysis - Main Dialog" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/dae6c7b7-fc22-4616-9d65-f04909c20ab1/Image/d0669f86f85b236ba2a3adcef520a994/dialog1.jpg" style="width: 507px; height: 345px;" /></p>
<p>Press the "Censor" button and fill out the subdialog as follows: </p>
<p><img alt="Censor Subdialog" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/dae6c7b7-fc22-4616-9d65-f04909c20ab1/Image/d319e5438d544d051aa997adeb14f271/dialog3.jpg" style="width: 426px; height: 313px;" /></p>
<p>When you press OK, Minitab analyzes the distribution of your data and by default will display a Table of Percentiles in the session window. We can take advantage of this table for measures such as B50 life, because the table produces output for a variety of percentiles by default. The percent of population failures at the 50th percentile is included in the default output.</p>
<p><img alt="Table of Percentiles for B50 Life" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/dae6c7b7-fc22-4616-9d65-f04909c20ab1/Image/1e6e0c6abf084578a9c2993e6c09f530/table2.jpg" style="width: 536px; height: 482px;" /></p>
<p>We see that 50% of the population of pacemaker batteries will fail by 9.735 years. But what if we want to compute B15 life? This percentile does not display by default in the Table of Percentiles.</p>
<p>Revisiting the Parametric Distribution Analysis dialog (pressing CTRL-E is a Minitab shortcut that will bring up your most recently completed dialog), we can click the ‘Estimate’ button to specify what “BX” life we want. In the section titled ‘Estimate percentiles for these additional percents,’ entering the number 15 will give us the B15 life for pacemaker batteries.</p>
<p><img alt="Estimate Subdialog" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/dae6c7b7-fc22-4616-9d65-f04909c20ab1/Image/c84a41ca235867a883d886d754e6fc5d/dialog2.jpg" style="width: 508px; height: 447px;" /></p>
<p>Click OK through the dialogs, and we see that a row of output for the 15th percentile is now included in the Table of Percentiles.</p>
<p><img alt="Table of Percentiles for B15 Life" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/dae6c7b7-fc22-4616-9d65-f04909c20ab1/Image/1ac702357a0ce4437048ec6aa470ba1f/table3.jpg" style="width: 313px; height: 47px;" /></p>
<p>It’s as simple as that!</p>
<p>If you’ve never used BX life as a reliability metric, and you’re wondering just how and why these can be some of the best measures of reliability, stay tuned for my final post in this series!</p>
Quality ImprovementReliability AnalysisSix SigmaFri, 05 Feb 2016 13:00:00 +0000http://blog.minitab.com/blog/meredith-griffith/how-to-calculate-bx-life-part-2Meredith GriffithHow to Analyze Like a Citizen Data Scientist in Flint
http://blog.minitab.com/blog/statistics-and-quality-improvement/how-to-analyze-like-a-citizen-data-scientist-in-flint
<p><img alt="The Citizen's Bank Weather Ball in Flint, Michigan" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/f0a4660c3136750443aede3c2be41c52/6109589699_98d685d0d5_z.jpg" style="width: 200px; height: 133px; float: right; border-width: 1px; border-style: solid; margin: 10px 15px;" />If you follow the news in the United States then you’ve heard that there’s a water crisis in Flint, Michigan. Although there’s going to continue to be debate about how much ethics played a role in the data collection practices, it’s worthwhile to at least be ready to perform the correct analysis on the data when you have it. Here’s how you can use Minitab to be like a citizen data scientist in Flint, and see for yourself what the data indicate.</p>
<p>Let’s start with the Environmental Protection Agency’s (EPA) <a href="http://www.epa.gov/dwreginfo/lead-and-copper-rule">Lead and Copper Rule</a>. The EPA says that a water system needs to act when “lead concentrations exceed an action level of 15 ppb” in more than 10% of samples. The statistic that identifies the highest 10% of the samples is called the 90th percentile.</p>
<p><a href="http://www.ecfr.gov/cgi-bin/text-idx?SID=531617f923c3de2cbf5d12ae4663f56d&mc=true&node=sp40.23.141.i&rgn=div6#se40.23.141_186">The applicable Code of Federal Regulations</a> (CFR) does not prescribe a random sample to characterize the entire water system. Instead, the CFR suggests that those who administer the water system should select sampling sites based on the likelihood of contamination. In particular, those who administer the system should prefer sampling sites that meet these two criteria:</p>
<p style="margin-left:.5in;">(i) Contain copper pipes with lead solder installed after 1982 or contain lead pipes; and/or</p>
<p style="margin-left:.5in;">(ii) Are served by a lead service line.</p>
<p>Clearly, we are not dealing with a random sample—that's because the goal is not to characterize the entire system, but to better understand the worst contamination risks. In this context we're characterizing only the sites that we sample, which we suspect contain the highest lead results in the system. The CFR suggests taking samples from at least 60 sites for a system the size of Flint’s.</p>
<p>The <a href="http://flintwaterstudy.org/2015/12/complete-dataset-lead-results-in-tap-water-for-271-flint-samples/" target="_blank">data we’ll work with</a> was collected through an effort organized by <a href="http://flintwaterstudy.org/about-page/about-us/" target="_blank">an independent research team at Virginia Tech</a>. The data contain 271 samples from 269 different locations, which exceeds the minimum recommended sample size. Because we’re looking for the 90th percentile, what we do isn’t very different from counting down 271/10 ≈ 27 data points from the maximum. The CFR references the use of “first draw” tap samples, so we’ll pay attention to that column in the Virginia Tech data.</p>
A Quick Calculation of the 90th Percentile
<p>Once the data’s in <a href="http://www.minitab.com/products/minitab">Minitab Statistical Software</a>, the fastest way to calculate the 90th percentile is with Minitab’s calculator. Try this:</p>
<ol>
<li>Choose <strong>Calc > Calculator</strong>.</li>
<li>In <strong>Store result in variable</strong>, enter <em>90th percentile</em>.</li>
<li>In <strong>Expression</strong>, enter <em>percentile (‘PB Bottle 1 (ppb) – First Draw’, 0.9)</em>. Click <strong>OK.</strong></li>
</ol>
<p>Minitab stores the value 26.944. Because this value is greater than 15, you are now ready to make <a href="http://flintwaterstudy.org/information-for-flint-residents/results-for-citizen-testing-for-lead-300-kits/" target="_blank">strongly-worded statements urging people to take measures to protect themselves from lead exposure</a>.</p>
Communicating the 90th Percentile on a Graph
<p>But if you’re really going to communicate your results, it’s nice to have a graph available. A simple bar chart might do:</p>
<p><img alt="Bart chart of the actual 90th percentile and the action limit." src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/44a161f0fc39a3b030b9895a11313c1f/bar_chart.png" style="border-width: 0px; border-style: solid; width: 576px; height: 384px;" /></p>
<p>However, you can show the data in more detail with a <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/graphs/graphs-of-distributions/histograms/histogram/">histogram</a>.</p>
<ol>
<li>Choose <strong>Graph > Histogram</strong>.</li>
<li>Select <strong>Simple</strong>. Click <strong>OK</strong>.</li>
<li>In <strong>Graph variables</strong>, enter ‘<em>PB Bottle 1 (ppb) – First Draw’</em>.</li>
<li>Click <strong>Scale</strong>.</li>
<li>Select the <strong>Reference Lines</strong> tab.</li>
<li>In <strong>Show reference lines at data values</strong>, enter <em>15 26.9</em>. Click <strong>OK</strong> twice.</li>
</ol>
<p><img alt="Histogram showing the 90th percentile exceeds the action limit of 15 parts per billion." src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/a6d9b14bf5031621ac62f922b0d68466/histogram.png" style="border-width: 0px; border-style: solid; width: 576px; height: 384px;" /></p>
<p>Histograms divide the sample values into intervals called bins. The height of the histogram represents the number of observations that are in the bin. The taller the bar, the more observations in that interval. The reference lines on the graph show the action limit for the 90th percentile and the actual value of the 90th percentile. This graph shows that the action limit is exceeded.</p>
Gather Your Data
<p>In April of 2015, then-mayor of Flint Dayne Walling reported that he and his family “drink and use the Flint water everyday, at home, work, and schools.” It’s easy for me to believe that the mayor’s personal experience with water that was not dangerous affected his judgment about the situation. The zip code for the mayor’s office in Flint is 48502. The news bureau for WNEM TV 5, <a href="http://www.wnem.com/story/29511581/flints-mayor-drinks-water-from-tap" target="_blank">one place where Mayor Walling drank tap water on TV</a>, is in the same zip code. The citizen data scientists who analyzed the Flint data knew that the geographically-limited sample being shown on TV and Twitter wasn't good enough. Instead, they collected data from 269 different locations around Flint and found that lead was a serious problem.</p>
<p>Of course, collecting that data was no small task: the data scientists estimate that gathering, preparing, and analyzing water samples ended up costing about $180,000, not including volunteer labor. If you’d like to donate towards offsetting the costs and future efforts, check out the <a href="http://flintwaterstudy.org/2016/01/the-flintwaterstudy-research-support-fundraiser/" target="_blank">Flint Water Study Research Support Fundraiser</a>.</p>
<p>If you’d like to support residents in Flint, consider volunteering for or contributing to the <a href="http://www.unitedwaygenesee.org/civicrm/contribute/transact?reset=1&id=5" target="_blank">United Way of Genesee County’s Flint Water Fund</a> which “has sourced more than 11,000 filters systems and 5,000 replacement filters, ongoing sources of bottled water to the Food Bank of Eastern Michigan and also supports a dedicated driver for daily distribution.”</p>
<p>The attention brought to Flint <a href="http://www.theguardian.com/environment/2016/jan/22/water-lead-content-tests-us-authorities-distorting-flint-crisis" target="_blank">has called into question the water testing done in other municipalities in the United States</a>. If you’re concerned about the potential for lead in your own water, the EPA notes that <a href="http://www.epa.gov/lead/protect-your-family#testdw" target="_blank">lead testing kits are available in home improvement stores</a> that can be sent to laboratories for analysis.</p>
<p>The citation for the referenced data set is: FlintWaterStudy.org (2015)<strong> “Lead Results from Tap Water Sampling in Flint, MI during the Flint Water Crisis.”</strong> This link provides the data as a Minitab worksheet: <a href="https://app.compendium.com/api/post_attachments/3d9b8ce9-c0ce-45ed-a759-3da70816d238/view">lead_results_from_tap_water_sampling_in_flint__mi_during_the_flint_water_crisis.MTW</a></p>
<p> </p>
<p><em>The image of the Citizen's Bank Weather Ball is by the <a href="https://www.flickr.com/photos/michigancommunities/6109589699">Michigan Municipal League</a> and is licensed under <a href="https://creativecommons.org/licenses/by-nd/2.0/">this Creative Commons License</a></em>.</p>
Statistics in the NewsMon, 01 Feb 2016 13:00:00 +0000http://blog.minitab.com/blog/statistics-and-quality-improvement/how-to-analyze-like-a-citizen-data-scientist-in-flintCody SteeleHow to Compare Regression Slopes
http://blog.minitab.com/blog/adventures-in-statistics/how-to-compare-regression-lines-between-different-models
<p>If you perform linear regression analysis, you might need to compare different regression lines to see if their constants and slope coefficients are different. Imagine there is an established relationship between X and Y. Now, suppose you want to determine whether that relationship has changed. Perhaps there is a new context, process, or some other qualitative change, and you want to determine whether that affects the relationship between X and Y.</p>
<p>For example, you might want to assess whether the relationship between the height and weight of football players is significantly different than the same relationship in the general population.</p>
<p>You can graph the regression lines to visually compare the slope coefficients and constants. However, you should also statistically test the differences. <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-why-we-need-to-use-hypothesis-tests-in-statistics" target="_blank">Hypothesis testing</a> helps separate the true differences from the random differences caused by sampling error so you can have more confidence in your findings.</p>
<p>In this blog post, I’ll show you how to compare a relationship between different regression models and determine whether the differences are statistically significant. Fortunately, these tests are easy to do using <a href="http://www.minitab.com/en-us/products/minitab/" target="_blank">Minitab statistical software</a>.</p>
<p>In the example I’ll use throughout this post, there is an input variable and an output variable for a hypothetical process. We want to compare the relationship between these two variables under two different conditions. Here is the <a href="//cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/File/569a0e7d067944f6f9147434794efcd6/comparingregressionmodels.MPJ">Minitab project file</a> with the data.</p>
Comparing Constants in Regression Analysis
<p>When the <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-how-to-interpret-the-constant-y-intercept" target="_blank">constants</a> (or y intercepts) in two different regression equations are different, this indicates that the two regression lines are shifted up or down on the Y axis. In the scatterplot below, you can see that the Output from Condition B is consistently higher than Condition A for any given Input value. We want to determine whether this vertical shift is statistically significant.</p>
<p><img alt="Scatterplot with two regression lines that have different constants." src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/2ed27f4204515bac9d9674c16fa0c0f7/scatter_constant_dift.png" style="width: 576px; height: 384px;" /></p>
<p>To test the difference between the constants, we just need to include a <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/introductory-concepts/data-concepts/cat-quan-variable/" target="_blank">categorical variable</a> that identifies the qualitative attribute of interest in the model. For our example, I have created a variable for the condition (A or B) associated with each observation.</p>
<p>To fit the model in Minitab, I’ll use: <strong>Stat > Regression > Regression > Fit Regression Model</strong>. I’ll include <em>Output</em> as the <a href="http://support.minitab.com/en-us/minitab/17/topic-library/modeling-statistics/regression-and-correlation/regression-models/what-are-response-and-predictor-variables/" target="_blank">response variable</a>, <em>Input</em> as the continuous <a href="http://support.minitab.com/en-us/minitab/17/topic-library/modeling-statistics/regression-and-correlation/regression-models/what-are-response-and-predictor-variables/" target="_blank">predictor</a>, and <em>Condition</em> as the categorical predictor.</p>
<p>In the regression analysis output, we’ll first check the coefficients table.</p>
<p style="margin-left: 40px;"><img alt="Coefficients table that shows that the constants are different" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/23657868f2cf893d216d05d3400ab9e6/coeff_constant_dift.png" style="width: 369px; height: 117px;" /></p>
<p>This table shows us that the relationship between Input and Output is statistically significant because the p-value for Input is 0.000.</p>
<p>The <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-interpret-regression-analysis-results-p-values-and-coefficients" target="_blank">coefficient</a> for Condition is 10 and its <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-interpret-regression-analysis-results-p-values-and-coefficients" target="_blank">p-value</a> is significant (0.000). The coefficient tells us that the vertical distance between the two regression lines in the scatterplot is 10 units of Output. The p-value tells us that this difference is statistically significant—you can reject the null hypothesis that the distance between the two constants is zero. You can also see the difference between the two constants in the regression equation table below.</p>
<p style="margin-left: 40px;"><img alt="Regression equation table that shows constants that are different" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/a879996e37ebb05a297721e695a71943/equ_constant_dift.png" style="width: 305px; height: 113px;" /></p>
Comparing Coefficients in Regression Analysis
<p>When two slope coefficients are different, a one-unit change in a predictor is associated with different mean changes in the response. In the scatterplot below, it appears that a one-unit increase in Input is associated with a greater increase in Output in Condition B than in Condition A. We can <em>see</em> that the slopes look different, but we want to be sure this difference is statistically significant.</p>
<p><img alt="Scatterplot that shows two slopes that are different" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/200c12087fdf7eecd9b773d9ce213020/scatter_slope_dift.png" style="width: 576px; height: 384px;" /></p>
<p>How do you statistically test the difference between regression coefficients? It sounds like it might be complicated, but it is actually very simple. We can even use the same Condition variable that we did for testing the constants.</p>
<p>We need to determine whether the coefficient for Input depends on the Condition. In statistics, when we say that the effect of one variable depends on another variable, that’s an interaction effect. All we need to do is include the interaction term for Input*Condition!</p>
<p>In Minitab, you can specify interaction terms by clicking the <strong>Model</strong> button in the main regression dialog box. After I fit the regression model with the interaction term, we obtain the following coefficients table:</p>
<p style="margin-left: 40px;"><img alt="Coefficients table that shows different slopes" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/f06eff56f2266d0ff7e3919aa1292285/coeff_slope_dift.png" style="width: 410px; height: 154px;" /></p>
<p>The table shows us that the interaction term (Input*Condition) is statistically significant (p = 0.000). Consequently, we reject the null hypothesis and conclude that the difference between the two coefficients for Input (below, 1.5359 and 2.0050) does not equal zero. We also see that the main effect of Condition is not significant (p = 0.093), which indicates that difference between the two constants is not statistically significant.</p>
<p style="margin-left: 40px;"><img alt="Regression equation table that shows different slopes" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/d5e5142c0ff13645d1dacc3e2c0bee27/equ_coeff_dift.png" style="width: 295px; height: 105px;" /></p>
<p>It is easy to compare and test the differences between the constants and coefficients in regression models by including a categorical variable. These tests are useful when you can see differences between regression models and you want to defend your conclusions with p-values.</p>
<p>If you're learning about regression, read my <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-tutorial-and-examples">regression tutorial</a>!</p>
Data AnalysisHypothesis TestingRegression AnalysisStatistics HelpWed, 13 Jan 2016 13:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/how-to-compare-regression-lines-between-different-modelsJim FrostHow to Add an "Update Data from My Database" Button to a Minitab Menu or Toolbar
http://blog.minitab.com/blog/understanding-statistics/how-to-add-an-update-data-from-my-database-button-to-a-minitab-menu-or-toolbar
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/874a002935f56765a5225f0d3e900bfc/320px_darpa_big_data_1_.jpg" style="margin: 15px 10px; float: right; width: 320px; height: 207px;" />Many of us have data stored in a database or file that we need to analyze on a regular basis. If you're in that situation and you're using Minitab Statistical Software, here's how you can save some time and effort by automating the process.</p>
<p>When you're finished, instead of using <strong>File > Query Database (ODBC)</strong> each time you want to perform analysis on the most up-to-date set of data, you can add a button to a menu or toolbar that will update the data. To do this you will need to:</p>
<p>A. <a href="http://support.minitab.com/en-us/minitab/17/topic-library/minitab-environment/input-output/import-data-from-a-database-with-odbc/create-a-reusable-odbc-exec-file/">Create an Exec (.MTB) file</a> that retrieves the data and replaces the current data.<br />
B. Add a shortcut to that file to either a menu or toolbar.</p>
Creating an Exec (.MTB) file
<p>First, I'll create a Minitab script or "exec" that pulls in new data to my worksheet. This is easier than it might sound. </p>
<p>1. Use <strong>File > Query Database (ODBC)</strong> to import the desired data. I have several fields that need to be updated, so I can just use <strong>File > Query Database (ODBC)</strong> repeatedly to pull required fields from multiple tables.</p>
<p>2. Open the <strong>History </strong>window by clicking the yellow notepad icon and select the ODBC commands/subcommands.</p>
<p>3. Right-click the selected commands and choose <strong>Save As...</strong></p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/eac084fa5a17d5d8cf647f7afbd12e45/copy_commands.png" style="width: 545px; height: 221px;" /></p>
<p>4. In the <strong>Save As...</strong> dialog box, choose <strong>Exec Files (*MTB)</strong> from the Save as Type: drop-down. Choose a filename and location—for example, I'm going to save this as GetData.MTB on my desktop.</p>
<p>5. In Minitab, choose <strong>Tools > Notepad</strong>.</p>
<p>6. In Notepad, choose <strong>File > Open</strong>. Change <em>Files of Type</em>: to All Files, and open the .MTB file you just created.</p>
<p>7. Do the following for <em>each </em>ODBC command and corresponding subcommands: </p>
<ul>
<li><span style="line-height: 1.6;">Replace the period (.) at the end of the last subcommand with a semi-colon (;).</span><br />
</li>
<li><span style="line-height: 1.6;">Add the following below the last subcommand, including the period (In this example, 'Date' and 'Measurement' are the columns I want to store the imported data in. Typically, these share the same name as the fields they are imported from):</span></li>
</ul>
<p style="margin-left: 80px;">Columns 'Date' 'Measurement'.</p>
<p style="margin-left: 40px;">For example:</p>
<p style="margin-left: 40px;"><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/d7caa1a84ccc0fc544d5770d983a6088/edit_exe.png" style="width: 600px; height: 279px;" /><br />
Make sure the column names you specify in the Columns subcommand already exist in the Minitab worksheet. You also can use column numbers such as C1 C2, without single-quotes. If you're importing many columns, instead of naming each one individually, you can specify a range like this: Columns C1-C10. </p>
<p>8. Choose <strong>File > Save</strong> and then close Notepad. This exec will run the commands and update my data sheet each time it is run.</p>
<p>But I want to make it even easier. Instead of opening the script when I want to use it, I want to be able to just select it from a menu.</p>
Adding a Shortcut to a Minitab Menu
<p>To add the .MTB file to a menu in Minitab, I do the following:</p>
<p>1. Choose <strong>Tools > Customize</strong>.</p>
<p>2. Click the <strong>Tools </strong>tab.</p>
<p>3. Click <img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ba397856ae9de1bce980dbcad5f8cc36/new_button.png" style="width: 22px; height: 18px;" /> for <strong>New (Insert)</strong> as shown. If you hover the cursor over the button, the ToolTip displays New (Insert).</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/0e7673e0bc7fe54646c7d6be300922fc/new_insert_dialog.png" style="width: 447px; height: 380px;" /></p>
<p>4. Enter a name for the button, and then press [Enter]. (For example, enter <em>Get My Data</em>.)</p>
<p>5. Click <img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/2f7a4e6186003ef318ec38e305f2fc77/open_button.png" style="width: 14px; height: 16px;" /> to view the Open files dialog box. <span style="line-height: 1.6;">From <strong>Files of type</strong>, choose <strong>All Files (*.*)</strong> then n</span><span style="line-height: 1.6;">avigate to the .MTB file and double-click it. The dialog box will look like this:</span></p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/6ca1ad04a801fcb1e88f46db1a2bb7e0/new_insert_dialog2.png" style="line-height: 20.8px; width: 447px; height: 380px;" /></p>
<p>6. Click <strong>Close</strong>. <span style="line-height: 1.6;">Now I can run the macro by choosing <strong>Tools > Get My Data</strong>.</span></p>
<p style="margin-left: 40px;"><span style="line-height: 1.6;"><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/0a8c6deb52e205071af7a7613a97265c/new_command_menu.png" style="width: 173px; height: 239px;" /></span></p>
<p>I can also add the macro to a menu other than Tools. </p>
Adding a Button to a Minitab Toolbar
<p>But now that I think about it, I really don't even want to bother with a menu. I'd prefer to just click on a button and have my data updated automatically. It's easy to do. </p>
<p>7. Choose <strong>Tools > Customize</strong>.</p>
<p>8. On the <strong>Commands</strong> tab, under <strong>Categories</strong>, choose <strong>Tools</strong>. Note: If you did not complete steps 5 and 6, the macro will not yet appear in the list.</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/00ae8696cb48ced3a1755b1b30f23419/customize_commands.png" style="width: 447px; height: 380px;" /></p>
<p>9. Click and drag <em>Get My Data</em> to the desired place on a menu or toolbar.</p>
<p>Basically, that's it. However, y<span style="line-height: 1.6;">ou can change what is displayed on the toolbar by right-clicking the button or text while the </span><strong style="line-height: 1.6;">Tools > Customize</strong><span style="line-height: 1.6;"> dialog box is open. You can select </span><strong style="line-height: 1.6;">Image</strong><span style="line-height: 1.6;">, </span><strong style="line-height: 1.6;">Text</strong><span style="line-height: 1.6;">, or </span><strong style="line-height: 1.6;">Image and Text</strong><span style="line-height: 1.6;">. </span></p>
<p>To change the image that is displayed, choose <strong>Edit Button Image</strong>. To change the text that is displayed, choose <strong>Name Button</strong>. As shown below, I have inserted a red button with a circular arrow in the main toolbar, and named it "Get My Data." </p>
<p style="margin-left: 40px;"><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f3d102ef59aa0279287cdcccafe79280/new_button_in_toolbar.png" style="width: 137px; height: 119px;" /></p>
<p>Now I can update my data at any time by clicking on the new button. And if you've been following along, so can you! If you don't already have Minitab Statistical Software and you'd like to give it a try, <a href="http://www.minitab.com/en-us/products/minitab/free-trial/">download the free 30-day trial</a>. </p>
<p> </p>
<p> </p>
Data AnalysisQuality ImprovementMon, 21 Dec 2015 13:05:00 +0000http://blog.minitab.com/blog/understanding-statistics/how-to-add-an-update-data-from-my-database-button-to-a-minitab-menu-or-toolbarEston MartzApproaching Statistics as a Language
http://blog.minitab.com/blog/understanding-statistics/approaching-statistics-as-a-language
<p><span style="line-height: 1.6;">Not long ago, I couldn’t abide statistics. I did </span><em style="line-height: 1.6;">respect</em><span style="line-height: 1.6;"> it, but in much the same way a gazelle respects a lion. Most of my early experiences with statistics indicated that close encounters resulted in pain, so I avoided further contact whenever possible.</span></p>
<p>So how is it that today I write about statistics? That’s simple: it merely required completely reinventing the way I thought about and approached the discipline. When I decided to approach it as a language rather than a purely mathematical set of skills, the doors opened.</p>
<p>Why does my experience matter to you? If you're a statistician yourself, you know all too well the typical reactions people have when they learn we work with statistics and data analysis: blank stares, uncomfortable silence, horrible jokes, or some variant of, “Oh, how nice." Followed quickly by, "Excuse me, I'm going somewhere I don’t have to talk about statistics.”</p>
<p><img alt="tower of babel" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/c5f78092696aa71ae39edad48a262792/tower_of_babel.png" style="margin: 15px; float: right; width: 300px; height: 247px;" />People react this way because they’re intimidated by statistics. Maybe <em>you're</em> intimidated by statistics. I certainly used to be...I thought it was too hard to understand. I thought that I'd forgotten what I needed to know. Sometimes I suspected that maybe I just wasn't smart enough to get it. Then I realized I was in a Tower of Babel situation: I just didn't speak the <em>language </em>of statistics. </p>
<p>Maybe my experience in actually coming to <em>love</em> statistics will resonate with you. Approaching statistics as a kind of conceptual language—rather than a peculiarly ambivalent branch of mathematics—may offer a path to make data analysis more accessible to more people, or at least help us do a better job of communicating with our fellow humans who <em>don’t</em> love statistics.</p>
Stalked by Statistics?
<p><span style="line-height: 1.6;">Straight out of college, I was hired as a feature writer for a science magazine. A few years later I was editing the magazine myself. But in some respects I felt like a gazelle glimpsing a lion’s tail in the grass: my environment delivered constant reminders that statistics existed. The science journals were full of them, scientists cited statistics constantly, and I needed to write about them in every article I did.</span></p>
<p>I realized I needed to confront my dysfunctional relationship with statistics. So as a seasoned, professional editor, filled with trepidation, I enrolled in a basic statistics course. <span style="line-height: 1.6;">Now I felt like a gazelle trying to tiptoe quietly through the lion’s den. I was terrified but determined to pass, at least. </span><span style="line-height: 1.6;">When I received an A, I couldn’t believe it. What had changed? </span></p>
<p>I realized I no longer saw statistics through a mathematical lens. I had come to recognize statistics as a way to describe, understand, and communicate about the world, just like other languages. </p>
Calculations and Concepts
<p>Once I began thinking of statistics as a language that enriches how we know and experience life, it immediately became less threatening. I enrolled in subsequent statistics courses, and completed a master’s degree in applied statistics almost before I realized it.</p>
<p>Mathematics was a core element of these studies, of course, but I loved that simply solving equations wasn’t the ultimate goal: the <em>meaning</em> of the solution was what counted, and the numbers were just a tool to get there. I had never enjoyed math, but I loved statistics. The difference was that in statistics, doing the math correctly is only the beginning.</p>
<p>The real effort comes next: understanding, interpreting, and communicating the implications of our results, including any conditions, caveats, and shortcomings. Given that statistics deals with probability, every analysis has elements of ambiguity and uncertainty. <span style="line-height: 20.8px;">Our models are never <em>complete</em>.</span><span style="line-height: 20.8px;"> T</span><span style="line-height: 20.8px;">here is always another factor to consider, another way to evaluate and dissect the data, another sample to take, or another method that could be applied.</span><span style="line-height: 1.6;"> </span><span style="line-height: 1.6;">That's not unlike the study of literature, where there is always another lens through which to refract the text, another frame of reference through which it can be interpreted. </span></p>
<p>Statisticians know the challenges involved in communicating what it is we do. Many people see statistics as inaccessible, esoteric, and intimidating—and in fairness, many statistical concepts <em>are </em>difficult to grasp. </p>
<p>Maybe it’s incumbent on us to be better translators for this strange language we’ve adopted. One of the ways we've tried make data analysis more accessible for more people is by adding the Assistant to Minitab Statistical Software, so people can get their <a href="http://www.minitab.com/products/minitab/assistant/">statistical results in plain language</a>. </p>
<p>What else could we, as companies and as individuals, be doing to make more people more comfortable with our data-driven world? </p>
<p> </p>
StatisticsStatsTue, 15 Dec 2015 13:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/approaching-statistics-as-a-languageEston Martz