Data Analysis Software | MinitabBlog posts and articles with tips for using statistical software to analyze data for quality improvement.
http://blog.minitab.com/blog/data-analysis-software/rss
Wed, 29 Mar 2017 20:53:04 +0000FeedCreator 1.7.3What to Do When Your Data's a Mess, part 3
http://blog.minitab.com/blog/understanding-statistics/what-to-do-when-your-datas-a-mess2c-part-3
<p>Everyone who analyzes data regularly has the experience of getting a worksheet that just isn't ready to use. Previously I wrote about tools you can use to <a href="http://blog.minitab.com/blog/understanding-statistics/what-to-do-when-your-data-is-a-mess-part-1">clean up and eliminate clutter in your data</a> and <a href="http://blog.minitab.com/blog/understanding-statistics/what-to-do-when-your-datas-a-mess2c-part-2">reorganize your data</a>. </p>
<p><span style="line-height: 1.6;">In this post, I'm going to highlight tools that help you get the most out of messy data by altering its characteristics.</span></p>
Know Your Options
<p>Many problems with data don't become obvious until you begin to analyze it. A shortcut or abbreviation that seemed to make sense while the data was being collected, for instance, might turn out to be a time-waster in the end. What if abbreviated values in the data set only make sense to the person who collected it? Or a column of numeric data accidentally gets coded as text? You can solve those problems quickly with <a href="http://www.minitab.com/products/minitab">statistical software</a> packages.</p>
Change the Type of Data You Have
<p>Here's an instance where a data entry error resulted in a column of numbers being incorrectly classified as text data. This will severely limit the types of analysis that can be performed using the data.</p>
<p><img alt="misclassified data" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/c45b427d3e5e2b5eac4a505ed5c3b24f/misclassified_data.png" style="width: 200px; height: 156px;" /></p>
<p>To fix this, select <strong>Data > Change Data Type</strong> and use the dialog box to choose the column you want to change.</p>
<p><img alt="change data type menu" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/46ece127300500409098383a2e476a9b/text_to_numeric_data.png" style="width: 376px; height: 175px;" /></p>
<p>One click later, and the errant text data has been converted to the desired numeric format:</p>
<p><img alt="numeric data" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f1b9df0211f9085e577a41b0e3661b45/numeric_data.png" style="width: 200px; height: 156px;" /></p>
Make Data More Meaningful by Coding It
<p>When this company collected data on the performance of its different functions across all its locations, it used numbers to represent both locations and units. </p>
<p><img alt="uncoded data" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/d22a57fe9e9e398bd948e86c0adafe34/uncoded_data.png" style="width: 135px; height: 158px;" /></p>
<p>That may have been a convenient way to record the data, but unless you've memorized what each set of numbers stands for, interpreting the results of your analysis will be a confusing chore. You can make the results easy to understand and communicating by coding the data. </p>
<p>In this case, we select <strong>Data > Code > Numeric to Text...</strong></p>
<p><img alt="code data menu" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/c75e46cc190497fd41b0e6736518c0fe/code_data_menu.png" style="width: 384px; height: 255px;" /></p>
<p>And we complete the dialog box as follows, telling the software to replace the numbers with more meaningful information, like the town each facility is located in. </p>
<p><img alt="Code data dialog box" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/cd75c14324187806b8f3a74a3b8996b4/code_data_dialog.png" style="width: 400px; height: 345px;" /></p>
<p>Now you have data columns that can be understood by anyone. When you create graphs and figures, they will be clearly labeled. </p>
<p><img alt="Coded data" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/7ff81bdb08170d6d8a4e8547623cf557/coded_data.png" style="width: 161px; height: 200px;" /></p>
Got the Time?
<p>Dates and times can be very important in looking at performance data and other indicators that might have a cyclical or time-sensitive effect. But the way the date is recorded in your data sheet might not be exactly what you need. </p>
<p>For example, if you wanted to see if the day of the week had an influence on the activities in certain divisions of your company, a list of dates in the MM/DD/YYYY format won't be very helpful. </p>
<p><img alt="date column" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f5b0dd178afbc0352f8dc2d9378e887b/date_column.png" style="width: 240px; height: 223px;" /></p>
<p>You can use <strong>Data > Date/Time > Extract to Text... </strong>to identify the day of the week for each date.</p>
<p><img alt="extract-date-to-text" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/7e6f7e8a87ee8291b9c6d51507092c19/extract_date_to_text.png" style="width: 351px; height: 132px;" /></p>
<p>Now you have a column that lists the day of the week, and you can easily use it in your analysis. </p>
<p><img alt="day column" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/dede93c9621917a0cfb54beef121d4e2/day_column.png" style="width: 249px; height: 205px;" /></p>
Manipulating for Meaning
<p>These tools are commonly seen as a way to correct data-entry errors, but as we've seen, you can use them to make your data sets more meaningful and easier to work with.</p>
<p>There are many other tools available in Minitab's Data menu, including an array of options for arranging, combining, dividing, fine-tuning, rounding, and otherwise massaging your data to make it easier to use. Next time you've got a column of data that isn't quite what you need, try using the Data menu to get it into shape.</p>
<p> </p>
<p> </p>
Data AnalysisStatisticsStatsTue, 28 Mar 2017 12:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/what-to-do-when-your-datas-a-mess2c-part-3Eston MartzTrouble Starting an Analysis? Graph Your Data with an Individual Value Plot
http://blog.minitab.com/blog/understanding-statistics/trouble-starting-an-analysis-graph-your-data-with-an-individual-value-plot
<p><span style="line-height: 1.6;">You've collected a bunch of data. It wasn't easy, but you did it. Yep, there it is, right there...just look at all those numbers, right there in neat columns and rows. Congratulations. </span></p>
<p><span style="line-height: 1.6;">I hate to ask...but what are you going to <em>do</em> with your data? </span></p>
<p><span style="line-height: 1.6;">If you're not sure precisely <em>what </em>to do with the data you've got, graphing it is a great way to get some valuable insight and direction. And a good graph to start with is an individual value plot, which you can create in Minitab <a href="http://www.minitab.com/products/minitab">Statistical Software</a> by going to <strong>Graph > Individual Value Plot</strong>. </span></p>
<span style="line-height: 20.7999992370605px;">How can individual value plots help me?</span>
<p><span style="line-height: 1.6;">There are <span><a href="http://blog.minitab.com/blog/understanding-statistics/seven-alternatives-to-pie-charts">other graphs</a></span> you could start with, so what makes the individual value plot such a strong contender? That fact it lets you view important data features, find miscoded values, and identify unusual cases. </span></p>
<p>In other words, taking a look at an individual value plot can help you to choose the appropriate direction for your analysis and to avoid wasted time and frustration.</p>
<p><strong>IDENTIFY INDIVIDUAL VALUES</strong></p>
<p>Many people like to look at their data in boxplots, and you can learn many valuable things from those graphs. Unlike boxplots, individual value plots display all data values and may be more informative than boxplots for small amounts of data.</p>
<p><img alt="boxplot of length" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/49712c7981ed83a0d5e9a678f783cd20/ivp1_boxplot_of_length.png" style="width: 576px; height: 384px;" /></p>
<p>The boxplots for the two variables look identical.</p>
<p><img alt="individual value plot" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/15420a5ce214daebe193faac3fb1d74c/ivp2_individual_value_plot_of_length.png" style="width: 576px; height: 384px;" /></p>
<p>The individual value plot of the same data shows that there are many more values for Batch 1 than for Batch 2.</p>
<p>You can use individual value plots to identify possible outliers and other values of interest. Hover the cursor over any point to see its exact value and position in the worksheet.</p>
<p><img alt="clustered data distribution" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ab3edfd6d0898085a25d56bc0684986b/ivp3_outlier_selected.png" style="width: 573px; height: 380px;" /></p>
<p>Individual value plots can also clearly illustrate characteristics of the data distribution. In this graph, most values are in a cluster between 4 and 10. Minitab can jitter (randomly nudge) the points horizontally, so that one value doesn’t obscure another. You can edit the plot to turn on or turn off jitter.</p>
<p><strong>MAKE GROUP COMPARISONS</strong></p>
<p>Because individual value plots display all values for all groups at the same time, they are especially helpful when you compare variables, groups, and even subgroups.</p>
<p><img alt="time vs. shift plot" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/e78b561b70c524bbbe1dc8e262f11556/ivp4_individual_value_plot_of_diameter.png" style="width: 576px; height: 384px;" /></p>
<p>This plot shows the diameter of pipes from two lines over four shifts. You can see that the diameters of pipes produced by Line 1 seem to increase in variability across shifts, while the diameters of pipes from Line 2 appear more stable.</p>
<p><strong>SUPPORT OTHER ANALYSES</strong></p>
<p>An individual value plot is one of the built-in graphs that are available with many Minitab statistical analyses. You can easily display an individual value plot while you perform these analyses. In the analysis dialog box, simply clickGraphs and check Individual Value Plot.</p>
<p>Some built-in individual value plots include specific analysis information. For example, the plot that accompanies a 1-sample t-test displays the 95% confidence interval for the mean and the reference value for the null hypothesis mean. These plots give you a graphical representation of the analysis results.</p>
<p><img alt="horizontal plot " src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/034da921e1e89e0d4ba22ce0556517a8/ivp5_individual_value_plot_of_diameter.png" style="width: 576px; height: 384px;" /></p>
<p>This plot accompanies a 1-sample t-test. All of the data values are between 4.5 and 5.75. The reference mean lies outside of the confidence interval, which suggests that the population mean differs from the hypothesized value.</p>
Individual Value Plot: A Case Study
<p>Suppose that salad dressing is bottled by four different machines and that you want to make sure that the bottles are filled correctly to 16 ounces. You weigh 30 samples from each machine. You plan to run an ANOVA to see if the means of the samples from each machine are equal. But, first, you display an individual value plot of the samples to get a better understanding of the data.</p>
<p><img alt="data" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/c6542a20ef538f767d7721e64599c8d6/ivp7_data.jpg" style="line-height: 20.8px; width: 192px; height: 200px;" /></p>
<p>Choose <strong>Graph > Individual Value Plot</strong>.<br />
Under <strong>One Y</strong>, choose <strong>With Groups</strong>.<br />
Click <strong>OK</strong>.<br />
In <strong>Graph </strong>variables, enter <em>Weight</em>.<br />
In <strong>Categorical variables for grouping</strong>, enter <em>Machine</em>.<br />
Click <strong>Data View</strong>.<br />
Under <strong>Data Display</strong>, check Interval bar and Mean symbol.<br />
Click <strong>OK </strong>in each dialog box.</p>
<p><img alt="individual value plot of weight" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/fa1af70661f2f3e8ebbba090bee4f5f9/ivp8_individual_value_plot_of_weight.png" style="width: 576px; height: 384px;" /></p>
<p>The mean fill weight is about 16 ounces for Fill2, Fill3, and Fill4, with no suspicious data points. For Fill1, however, the mean appears higher, with a possible outlier at the lower end.</p>
<p>Before you continue with the analysis, you may want to investigate problems with the Fill1 machine.</p>
Putting individual value plots to use
<p>Use Minitab’s individual value plot to get a quick overview of your data before you begin your analysis—especially if you have a small data set or if you want to compare groups. The insight that you gain can help you to decide what to do next and may save you time exploring other paths.</p>
<p>For more information on individual value plots and other Minitab graphs, see <a href="http://support.minitab.com/en-us/minitab/17/">Minitab Help</a>.</p>
Data AnalysisStatisticsStatistics HelpStatsThu, 23 Mar 2017 12:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/trouble-starting-an-analysis-graph-your-data-with-an-individual-value-plotEston MartzWhat to Do When Your Data's a Mess, part 2
http://blog.minitab.com/blog/understanding-statistics/what-to-do-when-your-datas-a-mess2c-part-2
<p><span style="line-height: 1.6;">In my last post, I wrote about making a cluttered data set easier to work with by removing unneeded columns entirely, and by displaying just those columns you want to work with <em>now</em>. But <a href="http://blog.minitab.com/blog/understanding-statistics/what-to-do-when-your-data-is-a-mess-part-1">too much unneeded data</a> isn't always the problem. </span></p>
<p><span style="line-height: 1.6;">What can you do when someone gives you data that isn't organized the way you need it to be? </span></p>
<p><span style="line-height: 1.6;">That happens for a variety of reasons, but most often it's because the simplest way for people to collect data is with a format that might make it difficult to assess in a worksheet. Most <a href="http://www.minitab.com/products/minitab">statistical software</a> will accept a wide range of data layouts, but just because a layout is readable doesn't mean it will be easy to analyze.</span></p>
<p><span style="line-height: 1.6;">You may not be in control of how your data were collected, but you can use tools like sorting, stacking, and ordering to put your data into a format that makes sense and is easy for you to use. </span></p>
Decide How You Want to Organize Your Data
<p>Depending on how its arranged, the same data can be easier to work with, simpler to understand, and can even yield deeper and more sophisticated insights. I can't tell you the best way to organize your specific data set, because that will depend on the types of analysis you want to perform, and the nature of the data you're working with. However, I can show you some easy ways to rearrange your data into the form that you select. </p>
Unstack Data to Make Multiple Columns
<p>The data below show concession sales for different types of events held at a local theater. </p>
<p><img alt="stacked data" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/8ea617d9de8138f26f2da0f3f95f4b88/stackedata.png" style="width: 202px; height: 188px;" /></p>
<p><span style="line-height: 20.7999992370605px;">If we wanted to perform an analysis that requires each type of event to be in its own column, we can choose <strong>Data > Unstack Columns...</strong> and complete the dialog box as shown: </span></p>
<p><span style="line-height: 20.7999992370605px;"><img alt="unstack columns dialog" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/fc098d3ddcbc21fe12602cb45336949c/unstack_columns.png" style="width: 350px; height: 263px;" /> </span></p>
<p>Minitab creates a new worksheet that contains a separate column of Concessions sales data for each type of event:</p>
<p><img alt="Unstacked Data" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f24dd4ac29678e25069d299ccc13c535/unstacked_data.png" style="width: 400px; height: 150px;" /></p>
Stack Data to Form a Single Column (with Grouping Variable)
<p>A similar tool will help you put data from separate columns into a single column for the type of analysis required. The data below show sales figures for four employees: </p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f546e2611e4fd6fe804de7c0aee3d230/stacked_data.png" style="width: 265px; height: 92px;" /></p>
<p>Select <strong>Data > Stack > Columns...</strong> and select the columns you wish to combine. Checking the "Use variable names in subscript column" will create a second column that identifies the person who made each sale. </p>
<p><img alt="Stack columns dialog" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a09dba196e68e5e75d0f248339a53e11/stack_data_dialog.jpg" style="width: 400px; height: 292px;" /></p>
<p>When you press OK, the sales data are stacked into a single column of measurements and ready for analysis, with Employee available as a grouping variable: </p>
<p><img alt="stacked columns" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/c26bec8bec9447ab1df6b9ad669d9a1a/stacked_columns.jpg" style="width: 138px; height: 181px;" /></p>
Sort Data to Make It More Manageable
<p>The following data appear in the worksheet in the order in which individual stores in a chain sent them into the central accounting system.</p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/431dcae640fa0855a8db03b14bad3998/unsorted_data.jpg" style="width: 200px; height: 228px;" /></p>
<p>When the data appear in this uncontrolled order, finding an observation for any particular item, or from any specific store, would entail reviewing the entire list. We can fix that problem by selecting <strong>Data > Sort...</strong> and reordering the data by either store or item. </p>
<p><img alt="sorted data by item" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/0c982bb11359a001c048cb6c39ab1f60/sorted_data_by_item.jpg" style="width: 221px; height: 246px;" /> <img alt="sorted data by store" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/53e9a3f22b4a959af11952995703d7d4/sorted_data_by_store.jpg" style="width: 209px; height: 248px;" /></p>
Merge Multiple Worksheets
<p>What if you need to analyze information about the same items, but that were recorded on separate worksheets? For instance, if one group was gathering historic data about all of a corporation's manufacturing operations, while another was working on strategic planning, and your analysis required data from each? </p>
<p><img alt="two worksheets" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f63ed557c91fb6136b28ab43001b48b4/two_worksheets.png" style="width: 350px; height: 327px;" /></p>
<p>You can use <strong>Data > Merge Worksheets</strong> to bring the data together into a single worksheet, using the Division column to match the observations:</p>
<p><img alt="merging worksheets" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/651d3d676a4099a71eb180344d2e8282/merge_worksheets.png" style="width: 393px; height: 363px;" /></p>
<p>You can also choose whether or not <span style="line-height: 20.7999992370605px;">multiple</span><span style="line-height: 1.6;">, missing, or unmatched observations will be included in the merged worksheet. </span></p>
Reorganizing Data for Ease of Use and Clarity
<p>Making changes to the layout of your worksheet does entail a small investment of time, but it can bring big returns in making analyses quicker and easier to perform. The next time you're confronted with raw data that isn't ready to play nice, try some of these approaches to get it under control. </p>
<p>In my next post, I'll share some tips and tricks that can help you get more information out of your data.</p>
Data AnalysisStatisticsStatsWed, 22 Mar 2017 12:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/what-to-do-when-your-datas-a-mess2c-part-2Eston MartzWhat to Do When Your Data's a Mess, part 1
http://blog.minitab.com/blog/understanding-statistics/what-to-do-when-your-data-is-a-mess-part-1
<p>Isn't it great when you get a set of data and it's perfectly organized and ready for you to analyze? I love it when the people who collect the data take special care to make sure to format it consistently, arrange it correctly, and eliminate the junk, clutter, and useless information I don't need. </p>
<p><img alt="Messy Data" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ad531bc1c0dc575e774b7ecef670b231/messydata.png" style="border-width: 1px; border-style: solid; margin: 10px 15px; width: 250px; height: 248px; float: right;" />You've never received a data set in such perfect condition, you say?</p>
<p>Yeah, me neither. But I can dream, right? </p>
<p><span style="line-height: 1.6;">The truth is, when other people give me data, it's typically not ready to analyze. It's frequently messy, disorganized, and inconsistent. I get big headaches if I try to analyze it without doing a little clean-up work first. </span></p>
<p>I've talked with many people who've shared similar experiences, so I'm writing a series of posts on how to get your data in usable condition. In this first post, I'll talk about some basic methods you can use to make your data easier to work with. </p>
Preparing Data Is a Little Like Preparing Food
<p>I'm not complaining about the people who give me data. In most cases, they aren't statisticians and they have many higher priorities than giving me data in exactly the form I want. </p>
<p>The end result is that getting data is a little bit like getting food: it's not always going to be ready to eat when you pick it up. You don't eat raw chicken, and usually you can't analyze raw data, either. <span style="line-height: 20.7999992370605px;"> </span><span style="line-height: 1.6;">In both cases, you need to prepare it first or the results aren't going to be pretty. </span></p>
<p><span style="line-height: 1.6;">Here are a couple of very basic things to look for when you get a messy data set, and how to handle them. </span></p>
<span style="line-height: 1.6;">Kitchen-Sink Data and Information Overload</span>
<p>Frequently I get a data set that includes a lot of information that I don't need for my analysis. I also get data sets that combine or group information in ways that make analyzing it more difficult. </p>
<p>For example, let's say I needed to analyze data about different types of events that take place at a local theater. Here's my raw data sheet: </p>
<p><img alt="April data sheet" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/14fe4e9930171f54848b589c0e8139d1/april_data_raw.png" style="width: 400px; height: 224px;" /></p>
<p>With each type of event jammed into a single worksheet, it's a challenge to analyze just one event category. What would work better? A separate worksheet for each type of occasion. In Minitab <a href="http://www.minitab.com/products/minitab">Statistical Software</a>, I can go to <strong>Data > Split Worksheet...</strong> and choose the Event column: </p>
<p><img alt="split worksheet" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/69c63e422339f9871ada5a244222dcfc/split_worksheet.png" style="width: 300px; height: 309px;" /></p>
<p>And Minitab will create new worksheets that include only the data for each type of event. </p>
<p><img alt="separate worksheets by event type" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/8b97ea00ae39da8cb60e307ebe6140dc/separate_data_sheets.png" style="width: 300px; height: 243px;" /></p>
<p><span style="line-height: 20.7999992370605px;">Minitab also lets you merge worksheets to </span>combine items provided in separate data files. </p>
<p><span style="line-height: 1.6;">Let's say the data set you've been given contains a lot of columns that you don't need: irrelevant factors, redundant information, and the like. Those items just clutter up your data set, and getting rid of them will make it easier to identify and access the columns of data you actually need. </span><span style="line-height: 20.7999992370605px;">You can delete rows and columns you don't need, or use the</span><strong style="line-height: 20.7999992370605px;"> Data > Erase Variables</strong><span style="line-height: 20.7999992370605px;"> tool to make your worksheet more manageable. </span></p>
<span style="line-height: 1.6;">I Can't See You Right Now...Maybe Later</span>
<p>What if you don't want to actually <em>delete </em>any data, but you only want to see the columns you intend to use? For instance, in the data below, I don't need the Date, Manager, or Duration columns now, but I may have use for them in the future: </p>
<p><img alt="unwanted columns" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/99d785a0b5ff0cbac36f0c6af05b1cac/unwantedcolumns.png" style="width: 400px; height: 225px;" /></p>
<p>I can select and right-click those columns, then use <strong>Column > Hide Selected Columns</strong> to make them disappear. </p>
<p><img alt="hide selected columns" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/00defa2646d5e100873ef2961d374ff0/hideselectedcolumns.png" style="width: 400px; height: 308px;" /></p>
<p>Voila! They're gone from my sight. Note how the displayed columns jump from C1 to C5, indicating that some columns are hidden: </p>
<p><img alt="hidden columns" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a140bb6413744b431460e70f523e5a0b/hiddencolumns.png" style="width: 323px; height: 138px;" /></p>
<p>It's just as easy to bring those columns back in the limelight. When I want them to reappear, I select the C1 and C5 columns, right-click, and choose "Unhide Selected Columns." </p>
<p>Data may arrive in a disorganized and messy state, but you don't need to keep it that way. Getting rid of extraneous information and choosing the elements that are visible can make your work much easier. But that's just the tip of the iceberg. In my next post, I'll cover some more <a href="http://blog.minitab.com/blog/understanding-statistics/what-to-do-when-your-datas-a-mess2c-part-2">ways to make unruly data behave</a>. </p>
Data AnalysisStatisticsWed, 15 Mar 2017 14:52:00 +0000http://blog.minitab.com/blog/understanding-statistics/what-to-do-when-your-data-is-a-mess-part-1Eston MartzP-value Roulette: Making Hypothesis Testing a Winnerâ€™s Game
http://blog.minitab.com/blog/rkelly/p-value-roulette-making-hypothesis-testing-a-winner%E2%80%99s-game
<p>Welcome to the Hypothesis Test Casino! The featured game of the house is roulette. But this is no <em>ordinary</em> game of roulette. This is p-value roulette!</p>
<p>Here’s how it works: We have two roulette wheels, the Null wheel and the Alternative wheel. Each wheel has 20 slots (instead of the usual 37 or 38). You get to bet on one slot.</p>
<p><img alt="http://upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Edvard_Munch_-_At_the_Roulette_Table_in_Monte_Carlo_-_Google_Art_Project.jpg/256px-Edvard_Munch_-_At_the_Roulette_Table_in_Monte_Carlo_-_Google_Art_Project.jpg" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/8647ae2930d63e128d09f0b2cc5cdb87/p_value_roulette.jpg" style="line-height: 20.7999992370605px; border-width: 1px; border-style: solid; margin: 10px 15px; width: 256px; height: 166px; float: right;" /></p>
<p>What happens if the ball lands in the slot you bet on? Well, that depends on which wheel we spin. If we spin the Null wheel, you lose your bet. But if we spin the Alternative wheel, you win!</p>
<p>I’m sorry, but we can’t tell you which wheel we’re spinning.</p>
<p>Doesn’t that sound like a good game?</p>
<p>Not convinced yet? I assure you the odds are in your favor <em>if </em>you choose your slot wisely. Look, I’ll show you a graph of some data from the Null wheel. We spun it 10,000 times and counted how many times the ball landed in each slot. As you can see each slot is just as likely as any other, with a probability of about 0.05 each. That means there’s a 95% probability the ball won’t land on your slot, so you have only a 5% chance of losing—no matter what—<em>if</em> we happen to spin the Null wheel.</p>
<p><img alt="histogram of p values for null hypothesis" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/dc5efcd7001f33a77bea1c635af837e5/histogram_of_p_values_null_hypothesis.png" style="width: 576px; height: 384px;" /></p>
<p>What about that Alternative wheel, you ask? Well, we’ve had quite a few different Alternative wheels over the years. Here’s a graph of some data from one we were spinning last year:</p>
<p><img alt="histogram of p values from alternative hypothesis" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/dd0cafe3375f3202adaf3542d15eb9ab/histogram_of_p_values_alternative_hypothesis.png" style="width: 576px; height: 384px;" /></p>
<p>And just a few months ago, we had a different one. Check out the data from this one. It was very, very popular.</p>
<p><img alt=" histogram of p-values from popular alternative hypothesis" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/fc6f0ff641e7eb4d3f7750c8163ac968/histogram_of_p_values_alternative_hypothesis_2.png" style="width: 576px; height: 384px;" /></p>
<p>Now that’s what I call an Alternative! People in the know always picked the first slot. You can see why.</p>
<p>I’m not allowed to show you data from the current game. But I assure you the Alternatives all follow this same pattern. They tend to favor those smaller numbers.</p>
<p>So, you’d like to play? Great! Which slot would you like to bet on?</p>
Is this on the level?
<p>No, I don’t really have a casino with two roulette wheels. My graphs are simulated p-values for a <a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/what-is-a-t-test-and-why-is-it-like-telling-a-kid-to-clean-up-that-mess-in-the-kitchen">1-sample t-test</a>. The null hypothesis is that the mean of a process or population is 5. The two-sided alternative is that the mean is different from 5. In my first graph, the null hypothesis was true: I used Minitab to generate random samples of size 20 from a normal distribution with mean 5 and standard deviation of 1. For the other two graphs, the only thing I changed was the mean of the normal distribution I sampled from. For the second graph, the mean was 5.3. For the final graph, the mean was 5.75.</p>
<p>For just about any hypothesis test you do in Minitab <a href="http://www.minitab.com/products/minitab">Statistical Software</a>, you will see a p-value. Once you understand how p-values work, you will have greater insight into what they are telling you. Let’s see what we can learn about p-values from playing p-value roulette.</p>
<ol>
<li>Just as you didn’t know whether you are spinning the Null or Alternative wheel, you don’t know for sure whether the null hypothesis is true or not. But basing your decision to reject the null hypothesis on the p-value favors your chance of making a good decision.<br />
</li>
<li>If the null hypothesis is true, then any p-value is just as likely as any other. You control the probability of making a Type I error by rejecting only when the p-value falls within a narrow range, typically 0.05 or smaller. A <a href="http://blog.minitab.com/blog/the-stats-cat/understanding-type-1-and-type-2-errors-from-the-feline-perspective-all-mistakes-are-not-equal">Type I error</a> occurs if you incorrectly reject a true null hypothesis.<br />
</li>
<li>If the alternative hypothesis is true, then smaller p-values become more likely and larger p-values become less likely. That’s why you can think of a small p-value as evidence in favor of the alternative hypothesis.<br />
</li>
<li>It is tempting to try to interpret the p-value as the probability that the null hypothesis is true. But that’s not what it is. The null hypothesis is either true, or it’s not. Each time you “spin the wheel” the ball will land in a different slot, giving you a different p-value. But the truth of the null hypothesis—or lack thereof—remains unchanged.<br />
</li>
<li>In the roulette analogy there were different alternative wheels, because there is not usually just a single alternative condition. There are infinitely many mean values that are not equal to 5; my graphs looked at just two of these.<br />
</li>
<li>The probability of rejecting the null hypothesis when the alternative hypothesis is true is called the power of the test. In the 1-sample t-test, the power depends on how different the mean is from the null hypothesis value, relative to the standard error. While you don’t control the true mean, you can reduce the standard error by taking a larger sample. This will give the test greater power.<br />
</li>
</ol>
You Too Can Be a Winner!
<p>To be a winner at p-value roulette, you need to make sure you are performing the right hypothesis test, and that your data fit the assumptions of that test. Minitab’s <a href="http://www.minitab.com/en-us/products/minitab/assistant/">Assistant menu</a> can help you with that. The Assistant helps you choose the right statistical analysis, provides easy-to-understand guidelines to walk you through data collection and analysis. Then it gives you clear graphical output to let you know how to interpret your p-value, while helping you evaluate whether your data are appropriate, so you can trust your results.</p>
<p> </p>
Hypothesis TestingStatisticsStatistics HelpStatsMon, 06 Mar 2017 13:00:00 +0000http://blog.minitab.com/blog/rkelly/p-value-roulette-making-hypothesis-testing-a-winner%E2%80%99s-gameRob KellyCreating and Reading Statistical Graphs: Trickier than You Think
http://blog.minitab.com/blog/understanding-statistics/creating-and-reading-statistical-graphs-trickier-than-you-think
<p>My colleague Cody Steele wrote a post that illustrated <a href="http://blog.minitab.com/blog/statistics-and-quality-improvement/how-painful-does-the-income-gap-look-to-you">how the same set of data can appear to support two contradictory positions</a>. He showed how changing the scale of a graph that displays mean and median household income over time drastically alters the way it can be interpreted, even though there's no change in the data being presented.</p>
<p><img alt="Graph interpretation is tricky, especially if you're doing it quickly" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f594d20f8daa8e00e29380f68010b1cc/hunh.jpg" style="margin: 10px 15px; float: right; width: 200px; height: 200px;" /> When we analyze data, we need to present the results in an objective, honest, and fair way. That's the catch, of course. What's "fair" can be debated...and that leads us straight into "Lies, damned lies, and statistics" territory. </p>
<p><span style="line-height: 20.7999992370605px;">Cody's post got me thinking about the importance of statistical literacy, especially in a mediascape saturated with overhyped news reports about seemingly every new study, not to mention omnipresent "infographics" of frequently dubious origin and intent.</span></p>
<p><span style="line-height: 20.7999992370605px;">As consumers and providers of statistics, can we trust our own impressions of the information we're bombarded with on a daily basis? It's an increasing challenge, even for the statistics-savvy. </span></p>
So Much Data, So Many Graphs, So Little Time
<p>The increased amount of information available, combined with the acceleration of the news cycle to speeds that wouldn't have been dreamed of a decade or two ago, means we have less time available to absorb and evaluate individual items critically. </p>
<p>A half-hour television news broadcast might include several animations, charts, and figures based on the latest research, or polling numbers, or government data. They'll be presented for several seconds at most, then it's on to the next item. </p>
<p>Getting news online is even more rife with opportunities for split-second judgment calls. We scan through the headlines and eyeball the images, searching for stories interesting enough to click on. But with 25 interesting stories vying for your attention, and perhaps just a few minutes before your next appointment, you race through them very quickly. </p>
<p>But when we see graphs for a couple of seconds, do we really absorb their meaning completely and accurately? Or are we susceptible to misinterpretation? </p>
<p>Most of the graphs we see are very simple: bar charts and pie charts predominate. But <span style="line-height: 1.6;">as statistics educator Dr. Nic points out in </span><a href="http://learnandteachstatistics.wordpress.com/2012/07/16/tricky_graphs/" style="line-height: 1.6;">this blog post</a>,<span style="line-height: 1.6;"> </span><span style="line-height: 20.7999992370605px;">interpreting</span><span style="line-height: 20.7999992370605px;"> </span><span style="line-height: 1.6;">even simple bar charts can be a deceptively tricky business</span><span style="line-height: 1.6;">. I've adapted her example to demonstrate this below. </span></p>
Which Chart Shows Greater Variation?
<p>A city surveyed residents of two neighborhoods about the quality of service they get from local government. Respondents were asked to rate local services on a scale of 1 to 10. Their responses were charted using Minitab <a href="http://www.minitab.com/products/minitab">Statistical Software</a>, as shown below. </p>
<p>Take a few seconds to scan the charts, then choose which neighborhood's responses exhibit the most variation, Ferndale or Lawnwood?</p>
<p><img alt="Lawnwood Bar Chart" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f88262f2732bc43e8ac0b919d43139a5/lawnwoodbarchart.gif" style="width: 500px; height: 333px;" /></p>
<p><img alt="Ferndale Bar Chart" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/67ee1909a89236e3caac2d11a9d42795/ferndalebarchart.gif" style="width: 500px; height: 333px;" /></p>
<p>Seems pretty straightforward, right? Lawnwood's graph is quite spiky and disjointed, with sharp peaks and valleys. The graph of Ferndale's responses, on the other hand, looks nice and even. Each bar's roughly the same height. </p>
<p>It looks like Lawnwood's responses have the most variation. But let's verify that impression with some basic descriptive statistics about each neighborhood's responses:</p>
<p style="margin-left: 40px;"><img alt="Descriptive Statistics for Fernwood and Lawndale" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/1eeed755d2a0baea0939dc7ccecacaea/descriptive_statistics.gif" style="width: 369px; height: 105px;" /></p>
<p>Uh-oh. A glance at the graphs suggested that Lawnwood has more variation, but the analysis demonstrates that Ferndale's variation is, in fact, much higher. <span style="line-height: 20.7999992370605px;">How did we get this so wrong?</span><span style="line-height: 20.7999992370605px;"> </span><span style="line-height: 1.6;"> </span></p>
Frequencies, Values, and Counterintuitive Graphs
<p><span style="line-height: 1.6;">The answer lies in how the data were presented. The charts above show frequencies, or counts, rather than individual responses. </span></p>
<p><span style="line-height: 1.6;">What if we graph the individual responses for each neighborhood? </span></p>
<p><img alt="Lawndale Individuals Chart" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/d8e91ae6c007e8f5327c54ac3ec65604/lawnwoodindividualsbarchart.gif" style="width: 500px; height: 333px;" /></p>
<p><img alt="Ferndale Individuals Chart" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/4c01c68dbb96e2126a1fd313ee38e001/ferndaleindividualsbarchart.gif" style="width: 500px; height: 333px;" /></p>
<p>In <em>these </em>graphs, it's easy to see that the responses of Ferndale's citizens had much more variation than those of Lawnwood. But unless you appreciate the differences between values and frequencies—and paid careful attention to how the first set of graphs was labeled—a quick look at the earlier graphs could well leave you with the wrong conclusion. </p>
Being Responsible
<p>Since you're reading this, you probably both create and consume data analysis. You may generate your own reports and charts at work, and see the results of other peoples' analyses on the news. We should approach both situations with a certain degree of responsibility. </p>
<p>When looking at graphs and charts produced by others, we need to avoid snap judgments. We need to pay attention to what the graphs really show, and take the time to draw the right conclusions based on how the data are presented. </p>
<p>When sharing our own analyses, we have a responsibility to communicate clearly. In the frequency charts above, the X and Y axes are labeled adequately—but couldn't they be more explicit? Instead of just "Rating," couldn't the label read "Count for Each Rating" or some other, more meaningful description? </p>
<p>Statistical concepts may seem like common knowledge if you've spent a lot of time working with them, but many people aren't clear on ideas like "correlation is not causation" and margins of error, let alone the nuances of statistical assumptions, distributions, and significance levels.</p>
<p>If your audience includes people without a thorough grounding in statistics, are you going the extra mile to make sure the results are understood? For example, many expert statisticians have told us they use <a href="http://www.minitab.com/products/minitab/assistant/">the Assistant</a> in Minitab 17 to present their results precisely because it's designed to communicate the outcome of analysis clearly, even for statistical novices. </p>
<p><span style="line-height: 20.7999992370605px;">If you're already doing everything you can to make statistics accessible to others, kudos to you. </span><span style="line-height: 20.7999992370605px;">And if you're not, why aren't you? </span></p>
Data AnalysisStatisticsStatistics in the NewsStatsWed, 01 Mar 2017 13:30:00 +0000http://blog.minitab.com/blog/understanding-statistics/creating-and-reading-statistical-graphs-trickier-than-you-thinkEston MartzThree Common P-Value Mistakes You'll Never Have to Make
http://blog.minitab.com/blog/understanding-statistics/three-common-p-value-mistakes-youll-never-have-to-make
<p>Statistics can be challenging, especially if you're not analyzing data and interpreting the results every day. <a href="http://www.minitab.com/products/minitab/" title="statistical software for analyzing quality data">Statistical software</a> makes things easier by handling the arduous mathematical work involved in statistics. But ultimately, we're responsible for correctly interpreting and communicating what the results of our analyses show.</p>
<p>The p-value is probably the most frequently cited statistic. We use p-values to interpret the results of regression analysis, hypothesis tests, and many other methods. Every introductory statistics student and every Lean Six Sigma Green Belt learns about p-values. </p>
<p>Yet this common statistic is misinterpreted so often that at least one scientific journal has abandoned its use.</p>
What Does a P-value Tell You?
<p>Typically, a P value is defined as "the probability of observing an effect at least as extreme as the one in your sample data—<em>if the <span><a href="http://blog.minitab.com/blog/understanding-statistics/why-shrewd-experts-fail-to-reject-the-null-every-time">null hypothesis</a></span> is true</em>." Thus, the only question a p-value can answer is this one:</p>
<p><em>How likely is it that I would get the data I have, assuming the null hypothesis is true?</em></p>
<p>If your p-value is less than your selected <span><a href="http://blog.minitab.com/blog/adventures-in-statistics-2/understanding-hypothesis-tests%3A-significance-levels-alpha-and-p-values-in-statistics">alpha level</a></span> (typically 0.05), you <em>reject the null hypothesis</em> in favor of the alternative hypothesis. If the p-value is above your alpha value, you <em>fail to reject</em> the null hypothesis. It's important to note that the null hypothesis is never accepted; we can only <em>reject </em>or <em>fail to reject</em> it. </p>
The P-Value in a 2-Sample t-Test
<p>Consider a typical hypothesis test—say, a 2-sample t-test of the mean weight of boxes of cereal filled at different facilities. We collect and weigh 50 boxes from each facility to confirm that the mean weight for each line's boxes is the listed package weight of 14 oz. </p>
<p>Our null hypothesis is that the two means are equal. Our alternative hypothesis is that they are <em>not </em>equal. </p>
<p>To run this test in Minitab, we enter our data in a worksheet and select <strong>Stat > Basic Statistics > 2-Sample T-test</strong>. If you'd like to follow along, you can download the <a href="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/2edc594cf40ec4931e5cd0021df6703e/cereal_weight.mtw">data</a> and, if you don't already have it, get the <a href="http://www.minitab.com/products/minitab/free-trial/">30-day trial of Minitab</a>. In the t-test dialog box, select<em> Both samples are in one column</em> from the drop-down menu, and choose "Weight" for Samples, and "Facility" for Sample IDs.</p>
<p style="margin-left: 40px;"><img alt="t test for the mean" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/1a090752bef395f3b227511c6e57946d/dialog.png" style="width: 424px; height: 296px;" /></p>
<p>Minitab gives us the following output, and I've highlighted the p-value for the hypothesis test:</p>
<p style="margin-left: 40px;"><img alt="t-test output" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/3b27f14d1859460a1875c81384c52ccb/t_test_output.png" style="width: 544px; height: 222px;" /></p>
<p>So we have a p-value of 0.029, which is less than our selected alpha value of 0.05. Therefore, we reject the null hypothesis that the means of Line A and Line B are equal. Note also that while the evidence indicates the means are different, that difference is estimated at 0.338 oz—a pretty small amount of cereal. </p>
<p>So far, so good. But this is the point at which trouble often starts.</p>
Three Frequent Misstatements about P-Values
<p>The p-value of 0.029 means we reject the null hypothesis that the means are equal. But that doesn't mean any of the following statements are accurate:</p>
<ol>
<li><strong>"There is 2.9% probability the means are the same, and 97.1% probability they are different." </strong><br />
We don't know that at all. The p-value only says that <strong><em>if </em></strong>the null hypothesis is true, the sample data collected would exhibit a difference this large or larger only 2.9% of the time. Remember that the p-value doesn't tell you anything <em>directly </em>about what you've seen. Instead, it tells you the <em>odds </em>of seeing it. </li>
<br />
<li><strong>"The p-value is low, which indicates there's an important difference in the means." </strong><br />
Based on the 0.029 p-value shown above, we can conclude that a statistically significant difference between the means exists. But the estimated size of that difference is less than a half-ounce, and won't matter to customers. A p-value may indicate a difference exists, but it tells you nothing about its practical impact.</li>
<br />
<li><strong>"The low p-value shows the alternative hypothesis is true."</strong><br />
A low p-value provides statistical evidence to reject the null hypothesis—but that doesn't prove the truth of the alternative hypothesis. If your alpha level is 0.05, there's a 5% chance you will incorrectly reject the null hypothesis. Or to put it another way, if a jury fails to convict a defendant, it doesn't prove the defendant is <em>innocent</em>: it only means the prosecution failed to prove the defendant's guilt beyond a reasonable doubt. </li>
</ol>
<p>These misinterpretations happen frequently enough to be a concern, but that doesn't mean that we shouldn't use p-values to help interpret data. The p-value remains a very useful tool, as long as we're interpreting and communicating its significance accurately.</p>
P-Value Results in Plain Language
<p>It's one thing to keep all of this straight if you're doing data analysis and statistics all the time. It's another thing if you're only analyze data occasionally, and need to do many other things in between—like most of us. "Use it or lose it" is certainly true about statistical knowledge, which could well be another factor that contributes to misinterpreted p-values. </p>
<p>If you're leery of that happening to you, a good way to avoid that possibility is to use the Assistant in Minitab to perform your analyses. If you haven't used it yet, the Assistant menu guides you through your analysis from start to finish. The dialog boxes and output are all in plain language, so it's easy to figure out what you need to do and what the results mean, even if it's been a while since your last analysis. (But even expert statisticians tell us they like using the Assistant because the output is so clear and easy to understand, regardless of an audience's statistical background.) </p>
<p>So let's redo the analysis above using the Assistant, to see what that output looks like and how it can help you avoid misinterpreting your results—or having them be misunderstood by others!</p>
<p>Start by selecting <strong>Assistant > Hypothesis Test...</strong> from the Minitab menu. Note that a window pops up to explain exactly what a hypothesis test does. </p>
<p style="margin-left: 40px;"><img alt="assistant hypothesis test" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f26601f26db3576a7cf2b5bc3178f9ca/assistant_hypothesis_test.png" style="width: 420px; height: 252px;" /></p>
<p>The Assistant asks what we're trying to do, and gives us three options to choose from.</p>
<p style="margin-left: 40px;"><img alt="hypothesis test chooser" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/fba2ee28b10063e1c5f0f00eb77db1b2/assistant_hypothesis_test_chooser.png" style="width: 600px; height: 472px;" /></p>
<p>We know we want to compare a sample from Line A with a sample from Line B, but what if we can't remember which of the 5 available tests is the appropriate one in this situation? We can get guidance by clicking "Help Me Choose."</p>
<p style="margin-left: 40px;"><img alt="help me choose the right hypothesis test" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/51bb23fbb44603efff50fe4fa1d9dbd1/assistant_hypothesis_test_decision_tree.png" style="width: 700px; height: 551px;" /></p>
<p>The choices on the diagram direct us to the appropriate test. In this case, we choose continuous data instead of attribute (and even if we'd forgotten the difference, clicking on the diamond would explain it). We're comparing two means instead of two standard deviations, and we're measuring two different sets of items since our boxes came from different production lines. </p>
<p>Now we know what test to use, but suppose you want to make sure you don't miss anything that's important about the test, like requirements that must be met? Click the "more..." link and you'll get those details. </p>
<p style="margin-left: 40px;"><img alt="more info about the 2-Sampe t-Test" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/1b4f09a2438b0aaef14e8da6564524cf/assistant_hypothesis_test_more_info.png" style="width: 700px; height: 526px;" /></p>
<p>Now we can proceed to the Assistant's dialog box. Again, statistical jargon is minimized and everything is put in straightforward language. We just need to answer a few questions, as shown. Note that the Assistant even lets us tell it how big a difference needs to be for us to consider it practically important. In this case, we'll enter 2 ounces.</p>
<p style="margin-left: 40px;"><img alt="Assistant 2-sample t-Test dialog" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/994d9172bf788282258f765d4d08aefa/assistant_hypothesis_test_dialog.png" style="width: 641px; height: 495px;" /></p>
<p>When we press OK, the Assistant performs the t-test and delivers three reports. The first of these is a summary report, which includes summary statistics, confidence intervals, histograms of both samples, and more. And interpreting the results couldn't be more straightforward than what we see in the top left quadrant of the diagram. In response to the question, "Do the means differ?" we can see that p-value of 0.029 marked on the bar, very far toward the "Yes" end of the scale. </p>
<p style="margin-left: 40px;"><img alt="2-Sample t-Test summary report" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/8927b8bc833551678715f68149dd18ad/assistant_hypothesis_test_summary.png" style="width: 700px; height: 526px;" /></p>
<p>Next is the Diagnostic Report, which provides additional information about the test. </p>
<p style="margin-left: 40px;"><img alt="2-Sample t-Test diagnostic report" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/6467a0be0ba60329f2be282e14b9be33/assistant_hypothesis_test_diagnostic.png" style="width: 700px; height: 526px;" /></p>
<p>In addition to letting us check for outliers, the diagnostic report shows us the size of the observed difference, as well as the chances that our test could detect a practically significant difference of 2 oz. </p>
<p>The final piece of output the Assistant provides is the report card, which flags any problems or concerns about the test that we would need to be aware of. In this case, all of the boxes are green and checked (instead of red and x'ed). </p>
<p style="margin-left: 40px;"><img alt="2-Sample t-Test report card" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/0e4cd0dce832a8251701f8175de9a037/assistant_hypothesis_test_report_card.png" style="width: 700px; height: 526px;" /></p>
<p>When you're not doing statistics all the time, the Assistant makes it a breeze to find the right analysis for your situation and to make sure you interpret your results the right way. Using it is a great way to make sure you're not attaching too much, or too little, importance on the results of your analyses.</p>
<p> </p>
Hypothesis TestingStatisticsStatistics HelpStatsWed, 22 Feb 2017 14:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/three-common-p-value-mistakes-youll-never-have-to-makeEston MartzChi-Square Analysis: Powerful, Versatile, Statistically Objective
http://blog.minitab.com/blog/michelle-paret/chi-square-analysis-powerful-versatile-statistically-objective
<p style="line-height: 20.7999992370605px;">To make objective decisions about the processes that are critical to your organization, you often need to examine categorical data. You may know how to use a t-test or ANOVA when you’re comparing measurement data (like weight, length, <span style="line-height: 1.6;">revenue, </span><span style="line-height: 1.6;">and so on), but do you know how to compare attribute or counts data? It easy to do with <a href="http://www.minitab.com/products/minitab">statistical software</a> like Minitab. </span></p>
<p style="line-height: 20.7999992370605px;"><img alt="failures per production line" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/19b2bd8557279d21284a23e2174fef88/chisquare_onevariable_revision.jpg" style="line-height: 20.8px; width: 400px; height: 267px; float: right; margin: 10px 15px;" /></p>
<p style="line-height: 20.7999992370605px;">One person may look at this bar chart and decide that the production lines performed similarly<span style="line-height: 1.6;">. But another person may focus on the small difference between the bars and decide that one of the lines has outperformed the others. Without an appropriate statistical analysis, how can you know which person is right?</span></p>
<p style="line-height: 20.7999992370605px;">When time, money, and quality depend on your answers, you can’t rely on subjective visual assessments alone. To answer questions like these with statistical objectivity, you can use a Chi-Square analysis.</p>
Which Analysis Is Right for Me?
<p style="line-height: 20.7999992370605px;">Minitab offers three Chi-Square tests. The appropriate analysis depends on the number of variables that you want to examine. And for all three options, the data can be formatted either as raw data or summarized counts.</p>
<strong>Chi-Square Goodness-of-Fit Test – 1 Variable</strong>
<p style="line-height: 20.7999992370605px;">Use Minitab’s <strong>Stat > Tables > Chi-Square Goodness-of-Fit Test (One Variable)</strong> when you have just one variable.</p>
<p style="line-height: 20.7999992370605px;">The Chi-Square Goodness-of-Fit Test can test if the proportions for all groups are equal. It can also be used to test if the proportions for groups are equal to specific values. For example:</p>
<ul style="line-height: 20.7999992370605px;">
<li>A bottle cap manufacturer operates three production lines and records the number of defective caps for each line. The manufacturer uses the <strong>Chi-Square Goodness-of-Fit Test</strong> to determine if the proportion of defectives is equal across all three lines.</li>
<li>A bottle cap manufacturer operates three production lines and records the number of defective caps and the total number produced for each line. One line runs at high speed and produces twice as many caps as the other two lines that run at a slower speed. The manufacturer uses the <strong>Chi-Square Goodness-of-Fit Test</strong> to determine if the number of defective units for each line is proportional to the volume of caps it produces.</li>
</ul>
<strong>Chi-Square Test for Association – 2 Variables</strong>
<p style="line-height: 20.7999992370605px;">Use Minitab’s <strong>Stat > Tables > Chi-Square Test for Association</strong> when you have two variables.</p>
<p style="line-height: 20.7999992370605px;">The Chi-Square Test for Association can tell you if there’s an association between two variables. In another words, it can test if two variables are independent or not. For example:</p>
<ul style="line-height: 20.7999992370605px;">
<li>A paint manufacturer operates two production lines across three shifts and records the number of defective units per line per shift. The manufacturer uses the <strong>Chi-Square Goodness-of-Fit Test</strong> to determine if the percent defective is similar across all shifts and production lines. Or, are certain lines during certain shifts more prone to issues?<br />
<br />
<img alt="Defectives per line per shift" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/8f78b557ef93b1390b79b866787d5503/chisquare_twovariables_revision.jpg" style="width: 600px; height: 400px;" /><br />
<br />
</li>
<li>A call center randomly samples 100 incoming calls each day of the week for each of its three locations, for a total of 1500 calls. They then record the number of abandoned calls per location per day. The call center uses a Chi-Square Test to determine if there are is any association between location and day of the week with respect to missed calls.</li>
</ul>
<p style="margin-left: 40px;"><img alt="call center data" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/e60774e6ddac893694e7b8a1a39a47b4/callcenterdata.jpg" style="width: 265px; height: 133px;" /><br />
</p>
<strong>Cross Tabulation and Chi-Square – 2 or more variables</strong>
<p style="line-height: 20.7999992370605px;">Use Minitab’s <strong>Stat > Tables > Cross Tabulation and Chi-Square </strong>when you have two or more variables.</p>
<p style="line-height: 20.7999992370605px;">If you simply want to test for associations between two variables, you can use either <strong>Cross Tabulation and Chi-Square</strong> or <strong>Chi-Square Test for Association</strong>. However, <span><a href="http://blog.minitab.com/blog/understanding-statistics/using-cross-tabulation-and-chi-square-the-survey-says">Cross Tabulation and Chi-Square</a></span> also lets you control for the effect of additional variables. Here’s an example:</p>
<ul style="line-height: 20.7999992370605px;">
<li>A tire manufacturer records the number of failed tires for four different tire sizes across two production lines and three shifts. The plant uses a Cross Tabulation and Chi-Square analysis to look for failure dependencies between the tire sizes and production lines, while controlling for any shift effect. Perhaps a particular production line for a certain tire size is more prone to failures, but only during the first shift.</li>
</ul>
<p style="line-height: 20.7999992370605px;">This analysis also offers advanced options. For example, if your categories are ordinal (good, better, best or small, medium, large) you can include a special test for concordance.</p>
Conducting a Chi-Square Analysis in Minitab
<p style="line-height: 20.7999992370605px;">Each of these analyses is easy to run in Minitab. For more examples that include step-by-step instructions, just navigate to the Chi-Square menu of your choice and then click Help > example.</p>
<p style="line-height: 20.7999992370605px;">It can be tempting to make subjective assessments about a given set of data, their makeup, and possible interdependencies, but why risk an error in judgment when you can be sure with a Chi-Square test?</p>
<p style="line-height: 20.7999992370605px;">Whether you’re interested in one variable, two variables, or more, a Chi-Square analysis can help you make a clear, statistically sound assessment.</p>
Data AnalysisHypothesis TestingLean Six SigmaManufacturingQuality ImprovementSix SigmaStatisticsStatistics HelpFri, 17 Feb 2017 13:16:00 +0000http://blog.minitab.com/blog/michelle-paret/chi-square-analysis-powerful-versatile-statistically-objectiveMichelle ParetA Field Guide to Statistical Distributions
http://blog.minitab.com/blog/statistics-in-the-field/a-field-guide-to-statistical-distributions
<p><em><span style="line-height: 1.6;">by Matthew Barsalou, guest blogger. </span></em></p>
<p>The old saying “if it walks like a duck, quacks like a duck and looks like a duck, then it must be a duck” may be appropriate in bird watching; however, the same idea can’t be applied when observing a statistical distribution. The dedicated ornithologist is often armed with binoculars and a field guide to the local birds and this should be sufficient. A statologist (I just made the word up, feel free to use it) on the other hand, is ill-equipped for the visual identification of his or her targets.</p>
Normal, Student's t, Chi-Square, and F Distributions
<p>Notice the upper two distributions in figure 1. The <span><a href="http://blog.minitab.com/blog/fun-with-statistics/normal-the-kevin-bacon-of-distributions">normal distribution</a></span> and student’s t distribution may appear similar. However, the standard normal distribution is calculated using n and <a href="http://blog.minitab.com/blog/michelle-paret/guinness-t-tests-and-proving-a-pint-really-does-taste-better-in-ireland">student’s t distribution</a> is calculated using n-1. This may appear to be a minor difference, but when n is small, student’s t distribution displays much more peakedness. Student’s t distribution approaches the normal distribution as the sample size increases, but it never truly matches the shape of the normal distribution.</p>
<p>Observe the Chi-square and F distribution in the lower half of figure 1. The shapes of the distributions can vary and even the most astute observer will not be able to differentiate between them by eye. Many distributions can be sneaky like that. It is a part of their nature that we must accept as we can’t change it.</p>
<p align="center"><img alt="Distribution Field Guide Figure 1" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/b5c12365f066b6ca3d255bcd458314e1/distribution_field_guide_1.gif" style="width: 605px; height: 352px;" /><em><span style="line-height: 1.6;">Figure 1</span></em></p>
Binomial, Hypergeometric, Poisson, and Laplace Distributions
<p>Notice the distributions illustrated in figure 2. A bird watcher may suddenly encounter four birds sitting in a tree; a quick check of a reference book may help to determine that they are all of a different species. The same can’t always be said for statistical distributions. <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-and-using-discrete-distributions">Observe the binomial distribution, hypergeometric distribution and Poisson distribution</a>. We can’t even be sure the three are not the same distribution. If they are together with a Laplace distribution, an observer may conclude “one of these does not appear to be the same as the others.” But they <em>are </em>all different, which our eyes alone may fail to tell us.</p>
<p align="center"><img alt="Distribution Field Guide Figure 2" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/b9011bf86767f49c3e7ec47c76d20631/distribution_field_guide_2.gif" style="width: 605px; height: 352px;" /><em><span style="line-height: 1.6;">Figure 2</span></em></p>
Weibull, Cauchy, Loglogistic, and Logistic Distributions
<p>Suppose we observe the four distributions in figure 3.What are they? Could you tell if they were not labeled? We must identify them correctly before we can do anything with them. One is a Weibull distribution, but all four could conceivably be various Weibull distributions. The shape of the Weibull distribution varies based upon the shape parameter (κ) and scale parameter (λ).The Weibull distribution is a useful, but potentially devious distribution that can be much like the double-barred finch, which may be mistaken for an owl upon first glance.</p>
<p align="center"><img alt="Distribution Field Guide Figure 3" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/2b606d88ff9ae159f94dcac04748c3e2/distribution_field_guide_3.gif" style="width: 605px; height: 351px;" /><em><span style="line-height: 1.6;">Figure 3</span></em></p>
<p>Attempting to visually identify a statistical distribution can be very risky. Many distributions such as the Chi-Square and F distribution change shape drastically based on the number of degrees of freedom. Figure 4 shows various shapes for the Chi-Square, F distribution and the Weibull distribution. Figure 4 also compares a standard normal distribution with a standard deviation of one to a t distribution with 27 degrees of freedom; notices how the shapes overlap to the point where it is no longer possible to tell the two distributions apart.</p>
<p>Although there is no definitive Field Guide to Statistical Distributions to guide us, there are formulas available to correctly identify statistical distributions. We can also use <a href="http://www.minitab.com/products/minitab">Minitab Statistical Software</a> to identify our distribution.</p>
<p align="center"><img alt="Distribution Field Guide Figure 4" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/aa4be49733e980c8c7e26395c5e8262a/distribution_field_guide_4.gif" style="width: 605px; height: 351px;" /><em style="line-height: 1.6;">Figure 4</em></p>
<p>Go to <strong>Stat > Quality Tools > Individual Distribution Identification...</strong> and enter the column containing the data and the subgroup size. The results can be observed in either the session window (figure 5) or the graphical outputs shown in figures 6 through 9.</p>
<p>In this case, we can conclude we are observing a 3-parameter Weibull distribution based on the p value of 0.364.</p>
<p align="center"><img alt="Distribution Field Guide Figure 5" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/29448180c3ff01cae81cfaf250a60115/distribution_field_guide_5.gif" style="width: 547px; height: 739px;" /></p>
<p align="center"><em>Figure 5</em></p>
<p> </p>
<p style="text-align: center;"><img alt="Distribution Field Guide Figure 6" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/781c7a83b14261ae062c63a07479b10d/distribution_field_guide_6.png" style="width: 576px; height: 384px;" /><em style="line-height: 1.6;">Figure 6</em></p>
<p style="text-align: center;"><img alt="Distribution Field Guide Figure 7" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/fcf5a7b56b859e6861ae8d96e8273fe1/distribution_field_guide_7.png" style="width: 576px; height: 384px;" /><em><span style="line-height: 1.6;">Figure 7</span></em></p>
<p style="text-align: center;"><em><img alt="Distribution Field Guide Figure 8" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a13530fb7ec7ee8e3fe90143772eefbc/distribution_field_guide_8.png" style="width: 576px; height: 384px;" /><span style="line-height: 1.6;">Figure 8</span></em></p>
<p style="text-align: center;"><em><img alt="Distribution Field Guide Figure " src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/6f28cb199afaee379ccc2244a955557f/distribution_field_guide_9.png" style="width: 576px; height: 384px;" /><span style="line-height: 1.6;">Figure 9</span></em></p>
<p> </p>
<p> </p>
<div> </div>
<div>
<p style="line-height: 20.7999992370605px;"><strong>About the Guest Blogger</strong></p>
<p style="line-height: 20.7999992370605px;"><em><a href="https://www.linkedin.com/pub/matthew-barsalou/5b/539/198" target="_blank">Matthew Barsalou</a> is a statistical problem resolution Master Black Belt at <a href="http://www.3k-warner.de/" target="_blank">BorgWarner</a> Turbo Systems Engineering GmbH. He is a Smarter Solutions certified Lean Six Sigma Master Black Belt, ASQ-certified Six Sigma Black Belt, quality engineer, and quality technician, and a TÜV-certified quality manager, quality management representative, and auditor. He has a bachelor of science in industrial sciences, a master of liberal studies with emphasis in international business, and has a master of science in business administration and engineering from the Wilhelm Büchner Hochschule in Darmstadt, Germany. He is author of the books <a href="http://www.amazon.com/Root-Cause-Analysis-Step---Step/dp/148225879X/ref=sr_1_1?ie=UTF8&qid=1416937278&sr=8-1&keywords=Root+Cause+Analysis%3A+A+Step-By-Step+Guide+to+Using+the+Right+Tool+at+the+Right+Time" target="_blank">Root Cause Analysis: A Step-By-Step Guide to Using the Right Tool at the Right Time</a>, <a href="http://asq.org/quality-press/display-item/index.html?item=H1472" target="_blank">Statistics for Six Sigma Black Belts</a> and <a href="http://asq.org/quality-press/display-item/index.html?item=H1473&xvl=76115763" target="_blank">The ASQ Pocket Guide to Statistics for Six Sigma Black Belts</a>.</em></p>
</div>
<p> </p>
Fun StatisticsStatisticsStatistics HelpStatsFri, 10 Feb 2017 13:00:00 +0000http://blog.minitab.com/blog/statistics-in-the-field/a-field-guide-to-statistical-distributionsGuest BloggerStatistical Tools for Process Validation, Stage 2: Process Qualification
http://blog.minitab.com/blog/michelle-paret/statistical-tools-for-process-validation-stage-2-process-qualification
<p>In its industry guidance to companies that manufacture drugs and biological products for people and animals, the Food and Drug Administration (FDA) recommends three stages for process validation.<img alt="Process Validation Stages" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/26c294a2e9b5b993bfd0f571be11113d/processvalidationstages.jpg" style="width: 220px; height: 235px; margin: 10px 15px; float: right;" /> While my last post covered <a href="http://blog.minitab.com/blog/michelle-paret/statistical-tools-for-process-validation,-stage-1:-process-design">statistical tools for the Process Design stage</a>, here we will focus on the statistical techniques typically utilized for the second stage, Process Qualification.</p>
Stage 2: Process Qualification
<p>During this stage, the process design is evaluated to determine if it is capable of reproducible commercial manufacture. Successful completion of Stage 2 is necessary before commercial distribution.</p>
<span style="color:#008080;"><strong>Example: Evaluate Acceptance Criteria with Capability Analysis</strong></span>
<p>Suppose the active ingredient amount in a tranquilizer needs to be between 360 and 370 mg/mL and you need to assess the quality level, where a minimum Cpk of 1.33 is defined as the acceptance criteria. To assess process performance and determine if measurements are within specification, you can use capability analysis, available in <a href="http://www.minitab.com/products/minitab/">Minitab Statistical Software</a>.</p>
<p>Five samples are randomly selected from 50 batches and the amount of active ingredient is measured. The data is then analyzed relative to the 360 mg/mL minimum and 370 mg/mL maximum.</p>
<p style="margin-left: 40px;"><img alt="Process Capability" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/c48fee09e2caab2f6499c5e1ee74a867/processcapability.jpg" style="width: 400px; height: 300px; margin-left: 15px; margin-right: 15px;" /></p>
<p>The capability analysis reveals a Cpk of 0.53, which fails to meet the acceptance criteria of 1.33. The active ingredient amounts for this tranquilizer are not acceptable. So how can we improve it? The <a href="http://blog.minitab.com/blog/michelle-paret/how-to-improve-cpk">Cp value</a> of 1.41 and the graph both reveal that, although the variability is acceptable with respect to the width of the specification limits, the process average needs to be shifted to a higher mg/mL in order to achieve an acceptable Cpk.</p>
<span style="color:#008080;"><strong>Example: Conduct Variation Analysis across Batches</strong></span>
<p>Suppose we want to assess content uniformity, a critical quality characteristic, across 3 batches at 10 locations. To visualize the intra-batch (within-batch) variation and the inter-batch (between-batch) variation, we can create boxplots for each batch.</p>
<p>A boxplot can help us visually assess both the intra- and inter-batch variation, and identify any outliers. This specific graph shows a homogeneous dispersion of measurements both within each batch and between batches. And there are no <a href="http://blog.minitab.com/blog/michelle-paret/how-to-identify-outliers-and-get-rid-of-them">outliers</a>, which Minitab would flag with an asterisk (*). </p>
<p><img alt="Boxplot" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/f7f242d7f0f0ea0b793c91c4cf710ca8/boxplots.jpg" style="width: 400px; height: 267px; margin-left: 15px; margin-right: 15px;" /></p>
<p>Although boxplots are useful tools to conduct a visual assessment, we can also statistically assess if there is a significant difference in the between batch variation using an equal variances test. The test reveals a p-value greater than an alpha-level of 0.05 (or whatever alpha-level you prefer), which supports the conclusion that there is consistency between batches.</p>
<span style="color:#008080;"><strong>Example: Various Applications for Tolerance Intervals</strong></span>
<p>Another useful tool for Process Qualification is the tolerance interval. This tool has multiple applications. For example, tolerance intervals can be used to compare your process to specifications, profile the outcome of a process, or establish acceptance criteria.</p>
<p>For a given product characteristic, a tolerance interval provides a range of values that likely covers a specified proportion of the population (for example, 95%) for a specified confidence level (like 99%).</p>
<p>For example, suppose we want to know how the active ingredient values in the manufacturing process compare to our specification limits. Based on a dose-response study, the limits are 360 to 370 mg/mL.</p>
<p><img alt="Tolerance Interval" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/9a8a69e60f00528975c7b40d52fb8206/toleranceinterval.jpg" style="width: 400px; height: 267px; margin-left: 15px; margin-right: 15px;" /></p>
<p>For this particular data set, Minitab reveals that we can be 99% confident that 95% of the units will be between 362.272 and 367.468 mg/mL. The process bounds therefore indicate that we can meet the requirements of 360 to 370, and we can conclude with high confidence that the process variation is less than the allowable variation, defined by the specification limits.</p>
<p>Or perhaps we need to assess content uniformity using 99% confidence and 99% coverage. We sample 30 tablets and calculate a tolerance interval, revealing that we can be 99% certain that 99% of the tablets will have a content uniformity within some range, calculated using Minitab.</p>
<p>And that’s how you can use various statistical tools to support Process Qualification. In the final post in this series, we’ll explore the Continued Process Verification stage!</p>
Capability AnalysisData AnalysisQuality ImprovementStatisticsStatistics HelpStatsFri, 03 Feb 2017 13:00:00 +0000http://blog.minitab.com/blog/michelle-paret/statistical-tools-for-process-validation-stage-2-process-qualificationMichelle ParetHow to Use Data to Understand and Resolve Differences in Opinion, Part 3
http://blog.minitab.com/blog/understanding-statistics/how-to-use-data-to-understand-and-resolve-differences-in-opinion-part-3
<p>In the first part of this series, we saw how <a href="http://blog.minitab.com/blog/understanding-statistics/how-to-use-data-to-understand-and-resolve-differences-in-opinion-part-1">conflicting opinions about a subjective factor</a> can create business problems. In part 2, we used Minitab's Assistant feature to <a href="http://Previously, I discussed how business problems arise when people have conflicting opinions about a subjective factor, such as whether something is the right color, or whether a job applicant is qualified for a position. The key to resolving such honest disagreements and handling future decisions more consistently is a statistical tool called attribute agreement analysis. In this post, we'll cover how to set up and conduct an attribute agreement analysis.">set up an attribute agreement analysis study</a> that will provide a better understanding of where and when such disagreements occur. </p>
<p>We asked four loan application reviewers to reject or approve 30 selected applications, two times apiece. Now that we've collected that data, we can analyze it. If you'd like to follow along, you can download the data set <a href="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/6999a9517a572ff2fc3df681c36b3e44/loan_application_attribute_agreement_analysis.mtw">here</a>.</p>
<p>As is so often the case, you don't need statistical software to do this analysis—but with 240 data points to contend with, a computer and software such as <a href="http://www.minitab.com/products/minitab">Minitab</a> will make it much easier. </p>
Entering the Attribute Agreement Analysis Study Data
<p>Last time, we showed that the only data we need to record is whether each appraiser approved or rejected the sample application in each case. Using the data collection forms and the worksheet generated by Minitab, it's very easy to fill in the Results column of the worksheet. </p>
<p style="margin-left: 40px;"><img alt="attribute agreement analysis worksheet data entry" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/41387399781177418d2cc236755a4f41/attribute_agreement_worksheet_data_entry.png" style="width: 448px; height: 324px;" /></p>
Analyzing the Attribute Agreement Analysis Data
<p>The next step is to use statistics to better understand how well the reviewers agree with each others' assessments, and how consistently they judge the same application when they evaluate it again. Choose <strong>Assistant > Measurement Systems Analysis (MSA)...</strong> and press the <em>Attribute Agreement Analysis</em> button to bring up the appropriate dialog box: </p>
<p style="margin-left: 40px;"><img alt="attribute agreement analysis assistant selection" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/36dfe0d806a026f66083efd4e4e8e3be/assistant_msa_dialog.png" style="width: 500px; height: 393px;" /></p>
<p>The resulting dialog couldn't be easier to fill out. Assuming you used the Assistant to create your worksheet, just select the columns that correspond to each item in the dialog box, as shown: </p>
<p style="margin-left: 40px;"><img alt="attribute agreement analysis dialog box" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a3b70ac28bd2783a6414e6f3ed6583ae/attribute_agreement_analysis_dialog.png" style="width: 500px; height: 285px;" /></p>
<p>If you set up your worksheet manually, or renamed the columns, just choose the appropriate column for each item. Select the value for good or acceptable items—"Accept," in this case—then press OK to analyze the data. </p>
Interpreting the Results of the Attribute Agreement Analysis
<p>Minitab's Assistant generates four reports as part of its attribute agreement analysis. The first is a summary report, shown below: </p>
<p style="margin-left: 40px;"><img alt="attribute agreement analysis summary report" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/b7d0dab58d734d058f6a34830d81f0af/attribute_agreement_analysis_summary_report.png" style="width: 600px; height: 471px;" /></p>
<p>The green bar at top left of the report indicates that overall, the error rate of the application reviewers is 15.8%. That's not as bad as it could be, but it certainly indicates that there's room for improvement! The report also shows that 13% of the time, the reviewers rejected applications that should be accepted, and they accepted applications that should be rejected 18% of the time. In addition, the reviewers rated the same item two different ways almost 22% of the time.</p>
<p>The bar graph in the lower left indicates that Javier and Julia have the lowest accuracy percentages among the reviewers at 71.7% and 78.3%, respectively. Jim has the highest accuracy, with 96%, followed by Jill at 90%.</p>
<p>The second report from the Assistant, shown below, provides a graphic summary of the accuracy rates for the analysis.</p>
<p style="margin-left: 40px;"><img alt="attribute agreement analysis accuracy report" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/e0797dc24089c1624fdc9cd4f881b391/attribute_agreement_analysis_accuracy_report.png" style="width: 600px; height: 471px;" /></p>
<p>This report illustrates the 95% confidence intervals for each reviewer in the top left, and further breaks them down by standard (accept or reject) in the graphs on the right side of the report. Intervals that don't overlap are likely to be different. We can see that overall, Javier and Jim have different overall accuracy percentages. In addition, Javier and Jim have different accuracy percentages when it comes to assessing those applications that should be rejected. However, most of the other confidence intervals overlap, suggesting that the reviewers share similar abilities. Javier clearly has the most room for improvement, but none of the reviewers are performing terribly when compared to the others. </p>
<p>The Assistant's third report shows the most frequently misclassified items, and individual reviewers' misclassification rates:</p>
<p style="margin-left: 40px;"><img alt="attribute agreement analysis misclassification report" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/42c982c39fb1eb3835fe1382d4ccc1f0/attribute_agreement_analysis_misclassification_report.png" style="width: 600px; height: 471px;" /></p>
<p>This report shows that App 9 gave the reviewers the most difficulty, as it was misclassified almost 80% of the time. (A check of the application revealed that this was indeed a borderline application, so the fact that it proved challenging is not surprising.) Among the reject applications that were mistakenly accepted, App 5 was misclassified about half of the time. </p>
<p>The individual appraiser misclassification graphs show that Javier and Julia both misclassified acceptable applications as rejects about 20% of the time, but Javier accepted "reject" applications nearly 40% of the time, compared to roughly 20% for Julia. However, Julia rated items both ways nearly 40% of the time, compared to 30% for Javier. </p>
<p>The last item produced as part of the Assistant's analysis is the report card:</p>
<p style="margin-left: 40px;"><img alt="attribute agreement analysis report card" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/01e31b93979be4847836cfa28fc176bc/attribute_agreement_analysis_report_card.png" style="width: 600px; height: 471px;" /></p>
<p>This report card provides general information about the analysis, including how accuracy percentages are calculated. It also can alert you to potential problems with your analysis (for instance, if there were an imbalance in the amount of acceptable to rejectable items being evaluated); in this case, there are no alerts we need to be concerned about. </p>
Moving Forward from the Attribute Agreement Analysis
<p>The results of this attribute agreement analysis give the bank a clear indication of how the reviewers can improve their overall accuracy. Based on the results, the loan department provided additional training for Javier and Julia (who also were the least experienced reviewers on the team), and also conducted a general review session for all of the reviewers to refresh their understanding about which factors on an application were most important. </p>
<p>However, training may not always solve problems with inconsistent assessments. In many cases, the criteria on which decisions should be based are either unclear or nonexistent. "Use your common sense" is not a defined guideline! In this case, the loan officers decided to create very specific checklists that the reviewers could refer to when they encountered borderline cases. </p>
<p>After the additional training sessions were complete and the new tools were implemented, the bank conducted a second attribute agreement analysis, which verified improvements in the reviewers' accuracy. </p>
<p>If your organization is challenged by honest disagreements over "judgment calls," an attribute agreement analysis may be just the tool you need to get everyone back on the same page. </p>
Data AnalysisLean Six SigmaQuality ImprovementSix SigmaStatisticsMon, 30 Jan 2017 13:04:00 +0000http://blog.minitab.com/blog/understanding-statistics/how-to-use-data-to-understand-and-resolve-differences-in-opinion-part-3Eston MartzHow to Use Data to Understand and Resolve Differences in Opinion, Part 2
http://blog.minitab.com/blog/understanding-statistics/how-to-use-data-to-understand-and-resolve-differences-in-opinion-part-2
<p>Previously, I discussed how business <a href="http://blog.minitab.com/blog/understanding-statistics/how-to-use-data-to-understand-and-resolve-differences-in-opinion-part-1">problems arise when people have conflicting opinions about a subjective factor</a>, such as whether something is the right color or not, or whether a job applicant is qualified for a position. The key to resolving such honest disagreements and handling future decisions more consistently is a statistical tool called attribute agreement analysis. In this post, we'll cover how to set up and conduct an attribute agreement analysis. </p>
Does This Applicant Qualify, or Not?
<p>A busy loan office for a major financial institution processed many applications each day. A team of four reviewers inspected each application and categorized it as Approved, in which case it went on to a loan officer for further handling, or Rejected, in which case the applicant received a polite note declining to fulfill the request. <img alt="filling out an application" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/64918ddbce0abb888c85031374765491/filling_out_paper.png" style="width: 300px; height: 223px; margin: 10px 15px; float: right; border-width: 1px; border-style: solid;" /></p>
<p>The loan officers began noticing inconsistency in approved applications, so the bank decided to conduct an attribute agreement analysis on the application reviewers.</p>
<p>Two outcomes were possible: </p>
<p style="margin-left: 40px;"><strong>1. The reviewers make the right choice most of the time.</strong> If this is the case, loan officers can be confident that the reviewers do a good job, rejecting risky applicants and approving applicants with potential to be good borrowers. </p>
<p style="margin-left: 40px;"><strong>2. The reviewers too often choose incorrectly.</strong> In this case, the loan officers might not be focusing their time on the best applications, and some people who may be qualified may be rejected incorrectly. </p>
<p>One particularly useful thing about an attribute agreement analysis: even if reviewers make the wrong choice too often, the results will indicate where the reviewers make mistakes. The bank can then use that information to help improve the reviewers' performance. </p>
The Basic Structure of an Attribute Agreement Analysis
<p>A typical attribute agreement analysis asks individual appraisers to evaluate multiple samples, which have been selected to reflect the range of variation they are likely to observe. The appraisers review each sample item several times each, so the analysis reveals how not only how well individual appraisers agree with each other, but also howl consistently each appraiser evaluates the same item. </p>
<p>For this study, the loan officers selected 30 applications, half of which the officers agreed should receive approval and half which should be rejected. These included both obvious and borderline applications. </p>
<p>Next, each of the four reviewers was asked to approve or reject the 30 applications two times. These evaluation sessions took place one week apart, to make it less likely they would remember how they'd classified them the first time. The applications were randomly ordered each time.</p>
<p>The reviewers did not know how the applications had been rated by the loan officers. In addition, they were asked not to talk about the applications until after the analysis was complete, to avoid biasing one another. </p>
Using Software to Set Up the Attribute Agreement Analysis
<p>You don't <em>need </em>to use software to perform an Attribute Agreement Analysis, but a program like <a href="http://www.minitab.com/products/minitab">Minitab</a> does make it easier both to plan the study and gather the data, as well as to analyze the data after you have it. There are two ways to set up your study in Minitab. </p>
<p>The first way is to go to <strong>Stat > Quality Tools > Create Attribute Agreement Analysis Worksheet...</strong> as shown here: </p>
<p style="margin-left: 40px;"><img alt="create attribute agreement analysis worksheet" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/3925c8ebdb8bb03a73638f78dfbc0c3d/attribute_agreement_stat_menu.png" style="width: 510px; height: 495px;" /></p>
<p>This option calls up an easy-to-follow dialog box that will set up your study, randomize the order of reviewer evaluations, and permit you to print out data collection forms for each evaluation session. </p>
<p>But it's even easier to use Minitab's Assistant. In the menu, select <strong>Assistant > Measurement Systems Analysis...</strong>, then click the <em>Attribute Agreement Worksheet</em> button:</p>
<p style="margin-left: 40px;"><img alt="Assistant MSA Dialog" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/78f07ea29a7339b689361b91f602de45/assistant_msa_dialog1.png" style="width: 500px; height: 393px;" /></p>
<p>That brings up the following dialog box, which walks you through setting up your worksheet and printing out data collection forms, if desired. For this analysis, the Assistant dialog box is filled out as shown here: </p>
<p style="margin-left: 40px;"><img alt="Create Attribute Agreement Analysis Worksheet" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/bae016bb52b30ab527fc82f471ae056f/attribute_agreement_setup_dialog.png" style="width: 500px; height: 492px;" /></p>
<p>After you press OK, Minitab creates a worksheet for you and gives you the option to print out data collection forms for each reviewer and each trial. As you can see in the "Test Items" column below, Minitab randomizes the order of the observed items in each trial automatically, and the worksheet is arranged so you need only enter the reviewers' judgments in the the "Results" column. </p>
<p style="margin-left: 40px;"><img alt="attribute agreement analysis worksheet" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/697bab496ec1eebf6dfb10ba4a27b15f/attribute_worksheet.png" style="width: 451px; height: 475px;" /></p>
<p>In my next post, we'll analyze the data collected in this attribute agreement analysis. </p>
Data AnalysisLean Six SigmaQuality ImprovementSix SigmaStatisticsMon, 23 Jan 2017 13:03:00 +0000http://blog.minitab.com/blog/understanding-statistics/how-to-use-data-to-understand-and-resolve-differences-in-opinion-part-2Eston MartzDMAIC Tools and Techniques: The Measure Phase
http://blog.minitab.com/blog/michelle-paret/dmaic-tools-and-techniques%3A-the-measure-phase
<p>In my last post on <a href="http://blog.minitab.com/blog/michelle-paret/dmaic-tools-and-techniques:-the-define-phase">DMAIC tools for the Define phase</a>, we reviewed various graphs and stats typically used to <em>define</em> project goals and customer deliverables. Let’s now move along to the tools you can use in <a href="http://www.minitab.com/products/minitab/">Minitab Statistical Software</a> to conduct the Measure phase.</p>
Measure Phase Methodology
<p>The goal of this phase is to <em>measure</em> the process to determine its current performance and quantify the problem. This includes validating the measurement system and establishing a baseline process capability (i.e., sigma level).</p>
I. Tools for Continuous Data
<strong><img alt="Gage RandR" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/6ff6ed7f4c0940a9eb1a548487b72b2b/gagerr.jpg" style="width: 350px; height: 263px; float: right; margin: 10px 15px;" /></strong>
Gage R&R
<p>Before you analyze your data, you should first make sure you can trust it, which is why successful Lean Six Sigma projects begin the Measure phase with Gage R&R. This measurement systems analysis tool assesses if measurements are both <a href="http://blog.minitab.com/blog/michelle-paret/do-you-know-the-truth-about-gage-repeatability-and-reproducibility">repeatable and reproducible</a>. And there are Gage R&R studies available in Minitab for both <a href="http://blog.minitab.com/blog/michelle-paret/a-simple-guide-to-gage-randr-for-destructive-testing">destructive and non-destructive tests</a>.</p>
<p>Minitab location:<strong> </strong><strong><em>Stat > Quality Tools > Gage Study > Gage R&R Study</em></strong> OR <strong><em>Assistant > Measurement Systems Analysis</em>.</strong></p>
Gage Linearity and Bias
<p>When assessing the validity of our data, we need to consider both <a href="http://blog.minitab.com/blog/real-world-quality-improvement/accuracy-vs-precision-whats-the-difference">precision and accuracy</a>. While Gage R&R assesses precision, it’s Gage Linearity and Bias that tells us if our measurements are accurate or are biased.</p>
<p>Minitab location: <em><strong>Stat > Quality Tools > Gage Study > Gage Linearity and Bias Study</strong>.</em></p>
<p style="margin-left: 40px;"><img alt="Gage Linearity and Bias" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/85a16583e2d97dd638b6ff21071a61dd/gage_linearity_and_bias.jpg" style="width: 350px; height: 263px;" /></p>
Distribution Identification
<p>Many statistical tools and p-values assume that your data follow a specific distribution, commonly the normal distribution, so it’s good practice to assess the distribution of your data before analyzing it. And if your data don’t follow a normal distribution, do not fear as there are various <a href="http://www.minitab.com/en-us/lp/Non-Normal-Data-Tips-And-Tricks">techniques for analyzing non-normal data</a>.</p>
<p>Minitab location: <strong><em>Stat > Basic Statistics > Normality Test</em></strong> OR <strong><em>Stat > Quality Tools > Individual Distribution Identification.</em></strong></p>
<p style="margin-left: 40px;"><img alt="Distribution Identification" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/1e6b3763f36f991cf5cf1eb142b0f8d0/distribution_id_plot.jpg" style="width: 350px; height: 233px;" /></p>
Capability Analysis
<p>Capability analysis is arguably the crux of “Six Sigma” because it’s the tool for calculating your sigma level. Is your process at a 1 Sigma, 2 Sigma, etc.? It reveals just how good or bad a process is relative to specification limit(s). And in the Measure phase, it’s important to use this tool to establish a baseline before making any improvements.</p>
<p>Minitab location: <strong><em>Stat > Quality Tools > Capability Analysis/Sixpack</em><em> </em></strong>OR <strong><em>Assistant > Capability Analysis.</em></strong></p>
<p style="margin-left: 40px;"><img alt="Process Capability Analysis" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/7f8e9183ad3a5b3ee66e0fadca51aea4/process_capability_sixpack_report.jpg" style="width: 350px; height: 263px;" /></p>
II. Tools for Categorical (Attribute) Data
Attribute Agreement Analysis
<strong><img alt="Attribute Agreement Analysis" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/5d1759e9ef4da886e677bb7c2a7b2c79/attribute_agreement_analysis.jpg" style="width: 300px; height: 233px; float: right; margin: 10px 15px;" /></strong>
<p>Like Gage R&R and Gage Linearity and Bias studies mentioned above for continuous measurements, this tool helps you <a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/the-lady-tasting-beer-evaluating-a-gono-go-gage-part-ii">assess if you can trust categorical measurements</a>, such as pass/fail ratings. This tool is available for <a href="http://blog.minitab.com/blog/adventures-in-statistics-2/understanding-and-using-discrete-distributions">binary, ordinal, and nominal data types</a>.</p>
<p>Minitab location: <strong><em>Stat > Quality Tools > Attribute Agreement Analysis</em> </strong>OR <strong><em>Assistant > Measurement Systems Analysis.</em></strong></p>
Capability Analysis (Binomial and Poisson)
<p>If you’re counting the number of defective items, where each item is classified as either pass/fail, go/no-go, etc., and you want to compute parts per million (PPM) defective, then you can use binomial capability analysis to assess the current state of the process.</p>
<p>Or if you’re counting the number of defects, where each item can have multiple flaws, then you can use Poisson capability analysis to establish your baseline performance.</p>
<p>Minitab location:<em> <strong>Stat > Quality Tools > Capability Analysis</strong></em> OR <strong><em>Assistant > Capability Analysis.</em></strong></p>
<p style="margin-left: 40px;"><img alt="Binomial Process Capability" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/4aad5a79836d8105d3adba60388b16b1/binomial_process_capability.jpg" style="width: 350px; height: 263px;" /></p>
Variation is Everywhere
<p>As I mentioned in my last post on the Define phase, Six Sigma projects can vary. Every project does not necessarily use the same identical tool set every time, so the tools above merely serve as a guide to the types of analyses you may need to use. And there are other tools to consider, such as flowcharts to map the process, which you can complete using Minitab’s cousin, <a href="http://www.minitab.com/products/quality-companion/">Quality Companion</a>.</p>
Capability AnalysisData AnalysisLean Six SigmaProject ToolsQuality ImprovementSix SigmaStatisticsStatsWed, 18 Jan 2017 13:00:00 +0000http://blog.minitab.com/blog/michelle-paret/dmaic-tools-and-techniques%3A-the-measure-phaseMichelle ParetHow to Use Data to Understand and Resolve Differences in Opinion, Part 1
http://blog.minitab.com/blog/understanding-statistics/how-to-use-data-to-understand-and-resolve-differences-in-opinion-part-1
<p>People frequently have different opinions. Usually that's fine—if everybody thought the same way, life would be pretty boring—but many business decisions are based on opinion. And when different people in an organization reach different conclusions about the same business situation, problems follow. </p>
<img alt="difference of opinion" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ad85e799b88c440d589cfc6b82caef8f/honest_disagreement.png" style="width: 300px; height: 200px; margin: 10px 15px; float: right;" />
<div>
<p>Inconsistency and poor quality result when people being asked to make yes / no, pass / fail, and similar decisions don't share the same opinions, or base their decisions on divergent standards. Consider the following examples. </p>
<p style="margin-left: 40px;"><strong>Manufacturing:</strong> Is this part acceptable? </p>
<p style="margin-left: 40px;"><strong>Billing and Purchasing:</strong> Are we paying or charging an appropriate amount for this project? </p>
<p style="margin-left: 40px;"><strong>Lending:</strong> Does this person qualify for a new credit line? </p>
<p style="margin-left: 40px;"><strong>Supervising:</strong> Is this employee's performance satisfactory or unsatisfactory? </p>
<p style="margin-left: 40px;"><strong>Teaching:</strong> Are essays being graded consistently by teaching assistants?</p>
<p>It's easy to see how differences in judgment can have serious impacts. I wrote about a situation encountered by the recreational equipment manufacturer <a href="http://www.minitab.com/burley">Burley</a>. Pass/fail decisions of inspectors at a manufacturing facility in China began to conflict with those of inspectors at Burley's U.S. headquarters. To make sure no products reached the market unless the company's strict quality standards were met, Burley acted quickly to ensure that inspectors at both facilities were making consistent decisions about quality evaluations. </p>
Sometimes We <em>Can't </em>Just Agree to Disagree
<p>The challenge is that people can have honest differences of opinion about, well, nearly everything—including different aspects of quality. So how do you get people to make business decisions based on a common viewpoint, or standard?</p>
<p>Fortunately, there's a statistical tool that can help businesses and other organizations figure out how, where, and why people evaluate the same thing in different ways. From there, problematic inconsistencies can be minimized. Also, inspectors and others who need to make tough judgment calls can be confident they are basing their decisions on a clearly defined, agreed-upon set of standards. </p>
<p>That statistical tool is called "Attribute Agreement Analysis," and using it is easier than you might think—especially with <a href="http://www.minitab.com/products/minitab">data analysis software such as Minitab</a>. </p>
What Does "Attribute Agreement Analysis" Mean?
<p>Statistical terms can be confusing, but "attribute agreement analysis" is exactly what it sounds like: a tool that helps you gather and <em>analyze </em>data about how much <em>agreement </em>individuals have on a given <em>attribute</em>.</p>
<p>So, what is an attribute? Basically, any characteristic that entails a <span><a href="http://blog.minitab.com/blog/understanding-statistics/got-good-judgment-prove-it-with-attribute-agreement-analysis">judgment call</a></span>, or requires us to classify items as <em>this </em>or <em>that</em>. We can't measure an attribute with an objective scale like a ruler or thermometer. The following statements concern such attributes:</p>
<ul>
<li>This soup is <strong><a href="http://www.minitab.com/products/minitab/quick-start/soup/">spicy</a></strong>.</li>
<li>The bill for that repair is <strong>low</strong>. </li>
<li>That dress is <strong>red</strong>. </li>
<li>The carpet is <strong>rough</strong>. </li>
<li>That part is <strong>acceptable</strong>. </li>
<li>This candidate is <strong>unqualified</strong>. </li>
</ul>
<p>Attribute agreement analysis uses data to understand how different people assess a particular item's attribute, how consistently the same person assesses the same item on multiple occasions, and compares both to the "right" assessment. </p>
<img alt="pass-fail" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/5868c6194234ef965e9320d10c7dab9e/pass_fail.png" style="width: 252px; height: 204px; margin: 10px 15px; float: right;" />
<ul>
</ul>
<p>This method can be applied to any situation where people need to appraise or rate things. In a typical quality improvement scenario, you might take a number of manufactured parts and ask multiple inspectors to assess each part more than once. The parts being inspected should include a roughly equal mix of good and bad items, which have been identified by an expert such as a senior inspector or supervisor. </p>
<p>In my next post, we'll look at an example from the financial industry to see how a loan department used this statistical method to make sure that applications for loans were accepted or rejected appropriately and consistently. </p>
</div>
Data AnalysisLean Six SigmaManufacturingProject ToolsQuality ImprovementServicesSix SigmaStatisticsMon, 16 Jan 2017 13:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/how-to-use-data-to-understand-and-resolve-differences-in-opinion-part-1Eston MartzStatistical Tools for Process Validation, Stage 1: Process Design
http://blog.minitab.com/blog/michelle-paret/statistical-tools-for-process-validation%2C-stage-1%3A-process-design
<p>Process validation is vital to the success of companies that manufacture drugs and biological products for people and animals. According to the FDA guidelines published by the U.S. Department of Health and Human Services:<img alt="Process Validation Stages" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/26c294a2e9b5b993bfd0f571be11113d/processvalidationstages.jpg" style="width: 280px; height: 299px; float: right; margin: 10px 15px;" /></p>
<p style="margin-left: 40px;"><em>“Process validation is defined as the collection and evaluation of data, from the process design state through commercial production, which establishes scientific evidence that a process is capable of consistently delivering quality product.”<br />
— Food and Drug Administration</em></p>
<p>The FDA recommends three stages for process validation. In this 3-part series, we will briefly explore the stage goals and the types of activities and statistical techniques typically conducted within each. For complete FDA guidelines, see <a href="http://www.fda.gov" target="_blank">www.fda.gov</a>. </p>
Stage 1: Process Design
<p>The goal of this stage is to design a process suitable for routine commercial manufacturing that can consistently deliver a product that meets its quality attributes. It is important to demonstrate an understanding of the process and characterize how it responds to various inputs within Process Design.</p>
Example: Identify Critical Process Parameters with DOE
<p>Suppose you need to identify the critical process parameters for an immediate-release tablet. There are three process input variables that you want to examine: filler%, disintegrant%, and particle size. You want to find which inputs and input settings will maximize the dissolution percentage at 30 minutes.</p>
<p>To conduct this analysis, you can use <a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/design-of-experiment-doe:-searching-for-a-selfie-fountain-of-youth">design of experiments</a> (DOE). DOE provides an efficient data collection strategy, during which inputs are simultaneously adjusted, to identify if relationships exist between inputs and output(s). Once you collect the data and analyze it to identify important inputs, you can then use DOE to pinpoint optimal settings.</p>
<strong>Running the Experiment</strong>
<p>The first step in DOE is to identify the inputs and corresponding input ranges you want to explore. The next step is to use statistical software, such as <a href="http://www.minitab.com">Minitab</a>, to create an experimental design that serves as your data collection plan.</p>
<p>According to the design shown below, we first want to use a particle size of 10, disintegrant of 1%, and MCC at 33.3%, and then record the corresponding average dissolution% using six tablets from a batch:</p>
<p style="margin-left: 40px;"><img alt="DOE Experiment" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/92bb269cff6b75b9a700a7e19bec78d2/doe_experiment.jpg" style="width: 250px; height: 189px;" /></p>
<strong>Analyzing the Data</strong>
<p>Using Minitab’s DOE analysis and p-values, we are ready to identify which X's are critical. Based on the bars that cross the red significance line, we can conclude that particle size and disintegrant% significantly affect the dissolution%, as does the interaction between these two factors. Filler% is not significant.</p>
<p style="margin-left: 40px;"><img alt="Pareto chart" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/2b32669ef0ad0071a038fa7b5ffa25b7/paretochart.jpg" style="width: 350px; height: 233px;" /></p>
<strong>Optimizing Product Quality</strong>
<p>Now that we've identified the critical X's, we're ready to determine the optimal settings for those inputs. Using a contour plot, we can easily identify the process window for the particle size and disintegrant% settings needed to achieve a percent dissolution of 80% or greater.</p>
<img alt="Contour plot" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/89f74b68916fd451deca51e832a72591/doe_contourplot.jpg" style="width: 350px; height: 233px;" />
<p>And that's how you can use design of experiments to conduct the Process Design stage. Next in this series, we'll look at the statistical tools and techniques commonly used for Process Qualification!</p>
Data AnalysisDesign of ExperimentsHealth Care Quality ImprovementHealthcareMedical DevicesStatisticsFri, 13 Jan 2017 13:00:00 +0000http://blog.minitab.com/blog/michelle-paret/statistical-tools-for-process-validation%2C-stage-1%3A-process-designMichelle Paret