Minitab | MinitabBlog posts and articles about using Minitab software in quality improvement projects, research, and more.
http://blog.minitab.com/blog/minitab/rss
Sun, 30 Apr 2017 03:12:21 +0000FeedCreator 1.7.3Understanding Qualitative, Quantitative, Attribute, Discrete, and Continuous Data Types
http://blog.minitab.com/blog/understanding-statistics/understanding-qualitative-quantitative-attribute-discrete-and-continuous-data-types
<p>"Data! Data! Data! I can't make bricks without clay."<br />
— Sherlock Holmes, in Arthur Conan Doyle's <em>The Adventure of the Copper Beeches</em></p>
<p>Whether you're the world's greatest detective trying to crack a case or a person trying to solve a problem at work, you're going to need information. Facts. <em>Data</em>, as Sherlock Holmes says. </p>
<p><img alt="jujubes" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/96d7c87addccc11b6072d6dfa38d0039/jujubes.jpg" style="line-height: 20.7999992370605px; margin: 10px 15px; float: right; width: 200px; height: 200px;" /></p>
<p>But not all data is created equal, especially if you plan to analyze as part of a quality improvement project.</p>
<p>If you're using Minitab Statistical Software, you can access the Assistant to <a href="http://www.minitab.com/products/minitab/assistant">guide you through your analysis step-by-step</a>, and help identify the type of data you have.</p>
<p>But it's still important to have at least a basic understanding of the different types of data, and the kinds of questions you can use them to answer. </p>
<p>In this post, I'll provide a basic overview of the types of data you're likely to encounter, and we'll use a box of my favorite candy—<a href="http://en.wikipedia.org/wiki/Jujube_(confectionery)" target="_blank">Jujubes</a>—to illustrate how we can gather these different kinds of data, and what types of analysis we might use it for. </p>
The Two Main Flavors of Data: Qualitative and Quantitative
<p>At the highest level, two kinds of data exist: <em><strong>quantitative</strong></em> and <em><strong>qualitative</strong></em>.</p>
<p><strong><em>Quantitative</em> </strong>data deals with numbers and things you can measure objectively: dimensions such as height, width, and length. Temperature and humidity. Prices. Area and volume.</p>
<p><strong><em>Qualitative </em></strong>data deals with characteristics and descriptors that can't be easily measured, but can be observed subjectively—such as smells, tastes, textures, attractiveness, and color. </p>
<p>Broadly speaking, when you measure something and give it a number value, you create quantitative data. When you classify or judge something, you create qualitative data. So far, so good. But this is just the highest level of data: there are also different types of quantitative and qualitative data.</p>
Quantitative Flavors: Continuous Data and Discrete Data
<p>There are two types of quantitative data, which is also referred to as numeric data: <em><strong>continuous </strong></em>and <em><strong>discrete</strong>. </em><span style="line-height: 20.7999992370605px;">As a general rule, </span><em style="line-height: 20.7999992370605px;">counts </em><span style="line-height: 20.7999992370605px;">are discrete and </span><em style="line-height: 20.7999992370605px;">measurements </em><span style="line-height: 20.7999992370605px;">are continuous.</span></p>
<p><strong><em>Discrete </em></strong>data is a count that can't be made more precise. Typically it involves integers. For instance, the number of children (or adults, or pets) in your family is discrete data, because you are counting whole, indivisible entities: you can't have 2.5 kids, or 1.3 pets.</p>
<p><strong><em>Continuous</em> </strong>data, on the other hand, could be divided and reduced to finer and finer levels. For example, you can measure the height of your kids at progressively more precise scales—meters, centimeters, millimeters, and beyond—so height is continuous data.</p>
<p>If I tally<span style="line-height: 1.6;"> the number of individual Jujubes in a box, that number is a piece of discrete data. </span></p>
<p style="margin-left: 40px;"><img alt="a count of jujubes is discrete data" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f5e3c44269356903cf156c065b10746a/jujubes_count_tally.jpg" style="width: 200px; height: 200px;" /></p>
<p><span style="line-height: 1.6;">If I use a scale to measure the weight of each Jujube, or the weight of the entire box, that's continuous data. </span></p>
<p style="margin-left: 40px;"><span style="line-height: 1.6;"><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/d11051162c9e2375e531ac589fd5a20e/jujube_weight_continuous_data.jpg" style="width: 200px; height: 200px;" /></span></p>
<p>Continuous data can be used in many different kinds of <a href="http://blog.minitab.com/blog/understanding-statistics/what-statistical-hypothesis-test-should-i-use">hypothesis tests</a>. For example, to assess the accuracy of the weight printed on the Jujubes box, we could measure 30 boxes and perform a 1-sample t-test. </p>
<p>Some analyses use continuous and discrete quantitative data at the same time. For instance, we could perform a <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-tutorial-and-examples">regression analysis</a> to see if the weight of Jujube boxes (continuous data) is correlated with the number of Jujubes inside (discrete data). </p>
Qualitative Flavors: Binomial Data, Nominal Data, and Ordinal Data
<p>When you classify or categorize something, you create <em>Qualitative</em> or attribute<em> </em>data. There are three main kinds of qualitative data.</p>
<p><em><strong>Binary </strong></em>data place things in one of two mutually exclusive categories: right/wrong, true/false, or accept/reject. </p>
<p>Occasionally, I'll get a box of Jujubes that contains a couple of individual pieces that are either too hard or too dry. If I went through the box and classified each piece as "Good" or "Bad," that would be binary data. I could use this kind of data to develop a statistical model to predict how frequently I can expect to get a bad Jujube.</p>
<p>When collecting <em><strong>unordered </strong></em>or <em><strong>nominal </strong></em>data, we assign individual items to named categories that do not have an implicit or natural value or rank. If I went through a box of Jujubes and recorded the color of each in my worksheet, that would be nominal data. </p>
<p style="margin-left: 40px;"><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ce64d648ac395d5c8098985caabc754f/jujubes_sorted_nominal_data.jpg" style="width: 200px; height: 97px;" /></p>
<p>This kind of data can be used in many different ways—for instance, I could use <a href="http://blog.minitab.com/blog/understanding-statistics/chi-square-analysis-of-halloween-and-friday-the-13th-is-there-a-slasher-movie-gender-gap">chi-square analysis</a> to see if there are statistically significant differences in the amounts of each color in a box. </p>
<p>We also can have <strong><em>ordered </em></strong>or <em><strong>ordinal </strong></em>data, in which items are assigned to categories that do have some kind of implicit or natural order, such as "Short, Medium, or Tall." <span style="line-height: 1.6;">Another example is a survey question that asks us to rate an item on a 1 to 10 scale, with 10 being the best. This implies that 10 is better than 9, which is better than 8, and so on. </span></p>
<p>The uses for ordered data is a matter of some debate among statisticians. Everyone agrees its appropriate for creating bar charts, but beyond that the answer to the question "What should I do with my ordinal data?" is "It depends." Here's a post from another blog that offers an excellent summary of the <a href="http://learnandteachstatistics.wordpress.com/2013/07/08/ordinal/" target="_blank">considerations involved</a>. </p>
Additional Resources about Data and Distributions
<p>For more fun statistics you can do with candy, check out this article (PDF format): <a href="http://www.minitab.com/uploadedFiles/Content/Academic/sweetening_statistics.pdf">Statistical Concepts: What M&M's Can Teach Us.</a> </p>
<p>For a deeper exploration of the probability distributions that apply to different types of data, check out my colleague Jim Frost's posts about <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-and-using-discrete-distributions">understanding and using discrete distributions</a> and <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-identify-the-distribution-of-your-data-using-minitab">how to identify the distribution of your data</a>.</p>
Data AnalysisStatsFri, 28 Apr 2017 13:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/understanding-qualitative-quantitative-attribute-discrete-and-continuous-data-typesEston MartzMaking the World a Little Brighter with Monte Carlo Simulation
http://blog.minitab.com/blog/understanding-statistics/making-the-world-a-little-brighter-with-monte-carlo-simulation
<p><span style="line-height: 1.6;">If you have a process that isn’t meeting specifications, using the Monte Carlo simulation and optimization tool in Companion by Minitab can help. Here’s how you, as a chemical technician for a paper products company, could use </span>Companion <span style="line-height: 1.6;">to optimize a chemical process and ensure it consistently delivers a paper product that meets brightness standards.</span></p>
<p><img alt="paper" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/10698132d6bd3c9311e06aa9a29465cb/paper.png" style="margin: 10px 15px; float: right; width: 240px; height: 213px;" />The brightness of Perfect Papyrus Company’s new copier paper needs to be at least 84 on the TAPPI brightness scale. The important process inputs are the bleach concentration of the solution used to treat the pulp, and the processing temperature. The relationship is explained by this equation:</p>
<p style="margin-left: 40px;">Brightness = 70.37 + 44.4 Bleach + 0.04767 Temp – 64.3 Bleach*Bleach</p>
<p>Bleach concentration follows a normal distribution with a mean of 0.25 and a standard deviation of 0.0095 percent. Temperature also follows a normal distribution, with a mean of 145 and a standard deviation of 15.3 degrees C.</p>
Building your process model
<p>To assess the process capability, you can enter the parameter information, transfer function, and specification limit into Companion's straightforward interface, and instantly run 50,000 simulations.</p>
<p style="margin-left: 40px;"><img alt="paper brightness monte carlo simulation" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f5a7c5f019ffa2fd71be622d9b07672a/paper_brightness1.png" style="width: 750px; height: 597px;" /></p>
Understanding your results
<p style="margin-left: 40px;"><img alt="monte carlo simulation output" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/b36f79dc4dcfffb8b143c9a2e63148f3/paper_brightness2.png" style="width: 750px; height: 597px;" /></p>
<p>The process performance measurement (Cpk) is 0.162, far short of the minimum standard of 1.33. Companion also indicates that under current conditions, you can expect the paper’s brightness to fall below standards about 31.5% of the time.</p>
Finding optimal input settings
<p>Quality Companion's smart workflow guides you to the next step for improving your process: optimizing your inputs.</p>
<p style="margin-left: 40px;"><img alt="paramater optimzation" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/dd5c9986aa27ecc57b1717ffca351940/paper_brightness3.png" style="width: 700px; height: 72px;" /></p>
<p>You set the goal—in this case, maximizing the brightness of the paper—and enter the high and low values for your inputs.</p>
<p style="margin-left: 40px;"><img alt="optimization dialog" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/bff42a08b3fa2c0a202a331214ce8110/paper_brightness4.png" style="width: 700px; height: 423px; border-width: 1px; border-style: solid;" /></p>
Simulating the new process
<p>After finding the optimal input settings in the ranges you specified, Companion presents the simulated results for the recommended process changes.</p>
<p style="margin-left: 40px;"><img alt="optimized process output" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/3e39b83a408bb004b742604ab28f6293/paper_brightness5.png" style="width: 750px; height: 780px;" /></p>
<p>The results indicate that if the bleach amount was set to approximately 0.3 percent and the temperature to 160 degrees, the % outside of specification would be reduced to about 2% with a Cpk of 0.687. Much better, but not good enough.</p>
Understanding variability
<p>To further improve the paper brightness, Companion’s smart workflow suggests that you next perform a sensitivity analysis.</p>
<p style="margin-left: 40px;"><img alt="sensitivity analysis" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f4244a7a487d7f7b1804a4a59320ce4f/paper_brightness6.png" style="width: 700px; height: 85px;" /></p>
<p>Companion’s unique graphic presentation of the sensitivity analysis gives you more insight into how the variation of your inputs influences the percentage of your output that doesn’t meet specifications.</p>
<p style="margin-left: 40px;"><img alt="sensitivity analysis of paper brightness" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/733213f84fee78a3e9fb67faf2675b74/paper_brightness7a.png" style="width: 700px; height: 531px;" /></p>
<p>The blue line representing temperature indicates that variation in this factor has a greater impact on your process than variation in bleach concentration, so you run another simulation to visualize the brightness using the 50% variation reduction in temperature.</p>
<p style="margin-left: 40px;"><img alt="final paper brightness model simulation" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/00c7d686aee7979422eba1f389118fc9/paper_brightness8.png" style="width: 750px; height: 848px;" /></p>
<p>The simulation shows that reducing the variability will result in 0.000 percent of the paper falling out of spec, with a Cpk of 1.34. Thanks to you, the outlook for the Perfect Papyrus Company’s new copier paper is looking very bright.</p>
Getting great results
<p>Figuring out how to improve a process is easier when you have the right tool to do it. With Monte Carlo simulation to assess process capability, Parameter Optimization to identify optimal settings, and Sensitivity Analysis to pinpoint exactly where to reduce variation, Companion can help you get there.</p>
<p>To try the Monte Carlo simulation tool, as well as Companion's more than 100 other tools for executing and reporting quality projects, learn more and get the free 30-day trial version for you and your team at <a href="http://companionbyminitab,com.">companionbyminitab,com</a>.</p>
Monte CarloMonte Carlo SimulationWed, 26 Apr 2017 12:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/making-the-world-a-little-brighter-with-monte-carlo-simulationEston MartzUnderstanding Monte Carlo Simulation with an Example
http://blog.minitab.com/blog/adventures-in-statistics-2/understanding-monte-carlo-simulation-with-an-example
<p>As someone who has collected and analyzed real data for a living, the idea of using simulated data for a Monte Carlo simulation sounds a bit odd. How can you improve a real product with simulated data? In this post, I’ll help you understand the methods behind Monte Carlo simulation and walk you through a simulation example using Companion by Minitab.</p>
<p><img alt="Process capability chart" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/8b31c0befc7c93d3b4ceeea2bc8479e8/main_image.png" style="line-height: 20.7999992370605px; float: right; width: 300px; height: 241px; margin: 10px 15px;" /></p>
<p>Companion by Minitab is a software platform that combines a desktop app for executing quality projects with a web dashboard that makes reporting on your entire quality initiative literally effortless. Among the first-in-class tools in the desktop app is a Monte Carlo simulation tool that makes this method extremely accessible. </p>
What Is Monte Carlo Simulation?
<p>The Monte Carlo method uses repeated random sampling to generate simulated data to use with a mathematical model. This model often comes from a statistical analysis, such as a <a href="http://support.minitab.com/en-us/minitab/17/topic-library/modeling-statistics/doe/basics/what-is-a-designed-experiment/">designed experiment</a> or a <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-tutorial-and-examples">regression analysis</a>.</p>
<p>Suppose you study a process and use statistics to model it like this:</p>
<p style="margin-left: 40px;"><img alt="Regression equation for the process" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/174c81a027515c63241c34903d579ee6/regression_equation.png" style="width: 576px; height: 83px;" /></p>
<p>With this type of linear model, you can enter the process input values into the equation and predict the process output. However, in the real world, the input values won’t be a single value thanks to variability. Unfortunately, this input variability causes variability and defects in the output.</p>
<p>To design a better process, you could collect a mountain of data in order to determine how input variability relates to output variability under a variety of conditions. However, if you understand the typical distribution of the input values and you have an equation that models the process, you can easily generate a vast amount of simulated input values and enter them into the process equation to produce a simulated distribution of the process outputs.</p>
<p>You can also easily change these input distributions to answer "what if" types of questions. That's what Monte Carlo simulation is all about. In the example we are about to work through, we'll change both the mean and standard deviation of the simulated data to improve the quality of a product.</p>
<p>Today, simulated data is routinely used in situations where resources are limited or gathering real data would be too expensive or impractical.</p>
How Can Monte Carlo Simulation Help You?
<p>With Companion by Minitab, engineers can easily perform a Monte Carlo analysis in order to:</p>
<ul>
<li>Simulate product results while accounting for the variability in the inputs</li>
<li>Optimize process settings</li>
<li>Identify critical-to-quality factors</li>
<li>Find a solution to reduce defects</li>
</ul>
<p>Along the way, Companion interprets simulation results and provides step-by-step guidance to help you find the best possible solution for reducing defects. I'll show you how to accomplish all of this right now!</p>
Step-by-Step Example of Monte Carlo Simulation
<p>A materials engineer for a building products manufacturer is developing a new insulation product. The engineer performed an experiment and used statistics to analyze process factors that could impact the insulating effectiveness of the product. (The data for this DOE is just one of the many data set examples that can be found in <a href="http://support.minitab.com/en-us/datasets/">Minitab’s Data Set Library</a>.) For this Monte Carlo simulation example, we’ll use the regression equation shown above, which describes the statistically significant factors involved in the process.</p>
<p>Let's open Companion by Minitab's desktop app (if you don't already have it, you can <a href="http://www.minitab.com/products/companion/try-it-free/">try Companion free</a> for 30 days). Open or start a new a project, then right-click on the project Roadmap™ to insert the Monte Carlo Simulation tool.</p>
<p style="margin-left: 40px;"><img alt="insert monte carlo simulation" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/5c535f39df60b03ecfedee627de78c42/companion_insert_monte_carlo.png" style="width: 350px; height: 299px;" /></p>
<p><strong>Step 1: Define the Process Inputs and Outputs</strong></p>
<p>The first thing we need to do is to define the inputs and the distribution of their values. The process inputs are listed in the regression output and the engineer is familiar with the typical mean and standard deviation of each variable. For the output, we simply copy and paste the regression equation that describes the process from <a href="http://www.minitab.com/products/minitab/features/">Minitab statistical software</a> right into Companion's Monte Carlo tool!</p>
<p>When the Monte Carlo tool opens, we are presented with these entry fields:</p>
<p style="margin-left: 40px;"><img alt="Setup the process inputs and outputs" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a9d206b562248523a19361c6ed8d68ac/monte_carlo_dialog_1.png" style="width: 700px; height: 233px; border-width: 0px; border-style: solid;" /></p>
<p>It's an easy matter to enter the information about the inputs and outputs for the process as shown.</p>
<p style="margin-left: 40px;"><img alt="Setup the input values and the output equation" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a86d77de3e6e79ef9483dca610ea2af7/monte_carlo_dialog_2.png" style="width: 800px; height: 510px;" /></p>
<p>Verify your model with the above diagram and then click <strong>Simulate</strong> in the application ribbon.</p>
<p style="margin-left: 40px;"><img alt="perform the monte carlo simulation" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/cecd73581290afdf570f6b4432f92033/monte_carlo_dialog_3.png" style="width: 473px; height: 277px;" /></p>
<p><em><strong>Initial Simulation Results</strong></em></p>
<p>After you click <strong>Simulate</strong>, Companion very quickly runs 50,000 simulations by default, though you can specify a higher or lower number of simulations. </p>
<p style="margin-left: 40px;"><img alt="Initial simulation results" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/5800795d00dbdb15ca6fb00cbcc85582/monte_carlo_output1.png" style="width: 750px; height: 375px; border-width: 0px; border-style: solid;" /></p>
<p>Companion interprets the results for you using output that is typical for <a href="http://support.minitab.com/en-us/minitab/17/topic-library/quality-tools/capability-analyses/basics/uses-of-capability-analysis/" target="_blank">capability analysis</a>—a capability histogram, percentage of defects, and the Ppk statistic. Companion correctly points out that our Ppk is below the generally accepted minimum value of Ppk.</p>
<p><em><strong>Step-by-Step Guidance for the Monte Carlo Simulation</strong></em></p>
<p>But Companion doesn’t just run the simulation and then let you figure what to do next. Instead, Companion has determined that our process is not satisfactory and presents you with a smart sequence of steps to improve the process capability.</p>
<p>How is it smart? Companion knows that it is generally <a href="http://blog.minitab.com/blog/adventures-in-statistics/quality-improvement-controlling-variability-more-difficult-than-the-mean">easier to control the mean than the variability</a>. Therefore, the next step that Companion presents is <strong>Parameter Optimization</strong>, which finds the mean settings that minimize the number of defects while still accounting for input variability.</p>
<p style="margin-left: 40px;"><img alt="Next steps leading to parameter optimization" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f7a9208c5d0bd683b879fe5e47b86ba5/monte_carlo_parameter_optimization.png" style="width: 750px; height: 78px;" /></p>
<p><strong>Step 2: Define the Objective and Search Range for Parameter Optimization</strong></p>
<p>At this stage, we want Companion to find an optimal combination of mean input settings to minimize defects. After you click <strong>Parameter Optimization</strong>, you'll need to specify your goal and use your process knowledge to define a reasonable search range for the input variables.</p>
<p style="margin-left: 40px;"><img alt="Setup for parameter optimization" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ed51e72e05c366214832195727bfe1b9/monte_carlo_parameter_optimization_dialog.png" style="width: 750px; height: 478px;" /></p>
<p>And, here are the simulation results!</p>
<p style="margin-left: 40px;"><img alt="Results of the parameter optimization" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/509bc51406b7f0a9187214c3231daf69/monte_carlo_parameter_optimization_results_1.png" style="width: 750px; height: 376px; border-width: 1px; border-style: solid;" /></p>
<p>At a glance, we can tell that the percentage of defects is way down. We can also see the optimal input settings in the table. However, our Ppk statistic is still below the generally accepted minimum value. Fortunately, Companion has a recommended next step to further improve the capability of our process.</p>
<p style="margin-left: 40px;"><img alt="Next steps leading to a sensitivity analysis" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/fd4a5321a6f3e59e33111414a2cca9f6/monte_carlo_parameter_optimization_next.png" style="width: 750px; height: 106px; border-width: 0px; border-style: solid;" /></p>
<p><strong>Step 3: Control the Variability to Perform a Sensitivity Analysis</strong></p>
<p>So far, we've improved the process by optimizing the mean input settings. That reduced defects greatly, but we still have more to do in the Monte Carlo simulation. Now, we need to reduce the variability in the process inputs in order to further reduce defects.</p>
<p>Reducing variability is typically more difficult. Consequently, you don't want to waste resources controlling the standard deviation for inputs that won't reduce the number defects. Fortunately, Companion includes an innovative graph that helps you identify the inputs where controlling the variability will produce the largest reductions in defects.</p>
<p style="margin-left: 40px;"><img alt="Setup for the sensitivity analysis" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/1ac0aad99b18d9906744787cc88aa017/monte_carlo_sensitivity_dialog_1.png" style="width: 750px; height: 569px;" /></p>
<p>In this graph, look for inputs with sloped lines because reducing these standard deviations can reduce the variability in the output. Conversely, you can ease tolerances for inputs with a flat line because they don't affect the variability in the output.</p>
<p>In our graph, the slopes are fairly equal. Consequently, we'll try reducing the standard deviations of several inputs. You'll need to use process knowledge in order to identify realistic reductions. To change a setting, you can either click the points on the lines, or use the pull-down menu in the table.</p>
<p><strong>Final Monte Carlo Simulation Results</strong></p>
<p style="margin-left: 40px;"><img alt="Results of the sensitivity analysis" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/99ea718e1b2a5b1248e79a5faae25e98/monte_carlo_sensitivity_output.png" style="width: 750px; height: 632px; border-width: 0px; border-style: solid;" /></p>
<p>Success! We've reduced the number of defects in our process and our Ppk statistic is 1.34, which is above the benchmark value. The assumptions table shows us the new settings and standard deviations for the process inputs that we should try. If we ran <strong>Parameter Optimization</strong> again, it would center the process and I'm sure we'd have even fewer defects.</p>
<p>To improve our process, Companion guided us on a smart sequence of steps during our Monte Carlo simulation:</p>
<ol>
<li>Simulate the original process</li>
<li>Optimize the mean settings</li>
<li>Strategically reduce the variability</li>
</ol>
<p>If you want to try Monte Carlo simulation for yourself, get <a href="http://www.minitab.com/products/companion/try-it-free/">the free trial of Companion by Minitab</a>!</p>
Monte CarloMonte Carlo SimulationProject ToolsQuality ImprovementStatisticsStatistics HelpTue, 25 Apr 2017 12:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics-2/understanding-monte-carlo-simulation-with-an-exampleJim FrostHow Could You Benefit from Between / Within Control Charts?
http://blog.minitab.com/blog/applying-statistics-in-quality-projects/how-could-you-benefit-from-between-within-control-charts
<p>Choosing the right type of subgroup in a control chart is crucial. In a rational subgroup, the variability within a subgroup should encompass common causes, random, short-term variability and represent “normal,” “typical,” natural process variations, whereas differences between subgroups are useful to detect drifts in variability over time (due to “special” or “assignable” causes). Variation <em>within </em>subgroup is therefore used to estimate the natural process standard deviation and to calculate the 3-sigma control chart limits.</p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/4b56682b3ffb627a10403f1cc65cae2a/parts_box_thumb.jpg" style="margin: 10px 15px; float: right; width: 200px; height: 200px;" />In some cases, however, <a href="http://blog.minitab.com/blog/quality-data-analysis-and-statistics/a-rational-look-at-subgrouping">identifying the correct rational subgroup is not easy</a>. For example, when parts are manufactured in batches, as they are in the automotive or in the semiconductor industries.</p>
<p>Batches of parts might seem to represent ideal subgroups, or at least a self-evident way to organize subgroups, for Statistical Process Control (SPC) monitoring. However, this is not always the right approach. When batches <em>aren't</em> a good choice for rational subgroups, control chart limits may become too narrow or too wide.</p>
Control Limits May Be Too Narrow
<p>Since batches are often manufactured at the same time on the same equipment, the variability within batches is often much smaller than the overall variability. In this case, the within-subgroup variability is not really representative and underestimates the natural process variability. Since within-subgroups variability is used to calculate the control chart limits, these limits may become unrealistically close to one another, which ultimately generates a large number of false alarms.</p>
<p style="margin-left: 40px;"><img alt="Too Narrow" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/4b4ef00c5f88c1b318eb2fd780644292/within_w640.jpeg" style="width: 500px; height: 196px;" /></p>
Control Limits May Be Too Wide
<p>On the other hand, suppose that within batches a systematic difference exists between the first two parts and the rest of the batch. In this case, the within-batch variability will include this systematic difference, which will inflate the within-subgroups standard deviation. Note that the between-subgroup variability is not affected by this systematic difference, and remember that only the within-subgroup variance is used to estimate the SPC limits. In this situation, the distance between the control limits would become too wide, would not allow you to quickly identify drifts.</p>
<p style="margin-left: 40px;"><img alt="Too wide" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/6a5007d472adb265562595880f6f18f8/between_w640.jpeg" style="width: 500px; height: 241px;" /></p>
<p>For example, in an injection mold with several cavities, when groups of parts molded at the same time but in different cavities are used as subgroups, systematic differences between cavities on the same mold will necessarily impact and inflate the within-subgroup variability.</p>
I-MR-R/S Between/Within charts
<p>When we encounter these situations in practice, using SPC charts can become more difficult and less efficient. An obvious solution is to consider within- and between-subgroup sources of variability separately. In Minitab, if you go to <strong>Stat > Control Charts > Variables Charts for Subgroups...</strong>, you will find <a href="http://support.minitab.com/minitab/17/topic-library/quality-tools/control-charts/understanding-variables-control-charts/what-is-an-i-mr-r-s-between-within-chart/"><em>I-MR-R/S Between/Within Charts</em></a> to cover these types of issues.</p>
<p style="margin-left: 40px;"><img alt="between Within" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/2911cd73ce7a75c8f9e20f026a9ea255/between_within_w640.jpeg" style="width: 640px; height: 368px;" /></p>
<p>Between/within charts are commonly used in the semiconductor industry, for example. Wafers are manufactured in batches (usually 25 wafers in a batch), and these batches are treated as subgroups in practice.</p>
<p>Using I-MR-R/S (between / within) charts allows you to use the I-MR chart to monitor differences between subgroups (I-MR charts), but in addition it also allows you to control within-subgroup variations (the R/S chart). Thus, this chart provides a full and coherent picture of the <em>overall</em> process variability. Thanks to that, identifying the right rational subgrouping scheme is not as crucial as it is when using standard Xbar-R or S control charts</p>
Conclusion
<p>We've all encountered ideas that seem simple in theory, but reality is often more complex than we expect. I-MR-R/S Between/Within control charts are a very flexible and efficient tool that make it much easier to account for complexities in process variability. They enable you to monitor Within and Between sources of variability separately.</p>
<p>If selecting the right rational subgroups is a challenge when you use control charts, this approach can minimize the number of false alarms you experience, while permitting you to react as quickly as possible to true “special” causes.</p>
AutomotiveData AnalysisLearningManufacturingQuality ImprovementSix SigmaStatisticsStatistics HelpStatsFri, 21 Apr 2017 12:00:00 +0000http://blog.minitab.com/blog/applying-statistics-in-quality-projects/how-could-you-benefit-from-between-within-control-chartsBruno ScibiliaDo Executives See the Impact of Quality Projects?
http://blog.minitab.com/blog/understanding-statistics/do-executives-see-the-impact-of-quality-projects
<p>Do your executives see how your quality initiatives affect the bottom line? Perhaps they would more often if they had accessible insights on the performance, and ultimately the overall impact, of improvement projects. </p>
<p>For example, 60% of the organizations surveyed by the American Society for Quality in their 2016 Global State of Quality study say <a href="http://asq.org/members/pdf/discoveries-2016.pdf" target="_blank"><em>they don’t know or don’t measure the financial impact of quality</em></a>.</p>
<p>Evidence shows company leaders just don't have good access to the kind of information they need about their quality improvement initiatives.</p>
<p>The 2013 <a href="http://asq.org/global-state-of-quality/" target="_blank"><em>ASQ Global State of Quality</em></a> study indicated that more than half of the executives are getting updates about quality only once a quarter, or even less. You can bet they make decisions that impact quality much more frequently than that.</p>
<p>Even for organizations that are working hard to assess the impact of quality, communicating that impact effectively to C-level executives is a huge challenge. The 2013 report revealed that the higher people rise in an organization's leadership, the less often they receive reports about quality metrics. Only 2% of senior executives get daily quality reports, compared to 33% of front-line staff members. </p>
<p>A quarter of the senior executives reported getting quality metrics <em>only on an annual basis</em>. That's a huge problem, and it resonates across all industries. The Juran Institute, which specializes in training, certification, and consulting on quality management globally, also concluded that <a href="http://info.juran.com/hubfs/documents/9002%20The%20No.%201%20Reason%20Why%20Performance%20Improvement%20Programs%20Fail.pdf" target="_blank">a lack of management support is the No. 1 reason quality improvement initiatives fail</a>.</p>
<p><img alt="reporting on quality initiatives is difficult" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/6424af008c86077421fd08bd84732739/reporting.jpg" style="margin: 10px 15px; width: 202px; height: 202px; float: right;" /></p>
<p>Quality practitioners are a dedicated, hard-working lot, and their task is challenging and frequently thankless. Their successes should be understood and recognized. But their efforts don't appear to be reaching C-level executive offices as often as they deserve. </p>
<p>Why <em>do</em> so many leaders get so few reports about their quality programs?</p>
5 Factors that Make Reporting on Quality Programs Impossible
<p>In fairness to everyone involved, from the practitioner to the executive, piecing together the full picture of quality in a company is daunting. Practitioners tell us that even in organizations with robust, mature quality programs, assessing the cumulative impact of an initiative can be difficult, and sometimes impossible. The reasons include:</p>
<strong>Scattered, Inaccessible Project Data</strong>
<p>Individual teams are very good at capturing and reporting their results, but a large company may have thousands of simultaneous quality projects. Just gathering the critical information from all of those projects and putting it into a form leaders can use is a monumental task. </p>
<strong>Disparate Project Applications and Documents</strong>
<p>Teams typically use an array of different applications to create charters, process maps, <a href="http://blog.minitab.com/blog/understanding-statistics/four-more-tips-for-making-the-most-of-value-stream-maps">value stream maps</a>, and other documents. So the project record becomes a mix of files from many different applications. Adding to the confusion, the latest versions of some documents may reside on several different computers, project leaders often need to track multiple versions of a document to keep the official project record current. </p>
<strong>Inconsistent Metrics Across Projects </strong>
<p>Results and metrics aren’t always measured the same way from one team's project to another. If one team measures apples and the next team measures oranges, their results can't be evaluated or aggregated as if they were equivalent. </p>
<strong>Ineffective and Ill-suited Tracking</strong>
<p>Many organizations have tried quality tracking methods ranging from homegrown project databases to full-featured project portfolio management (PPM) systems. But homegrown systems often become a burden to maintain, while off-the-shelf solutions created for IT or other business functions don’t effectively support projects involving continuous quality improvement methods like Lean and Six Sigma. </p>
<strong>Too Little Time </strong>
<p>Reporting on projects can be a burden. There are only so many hours in the day, and busy team members need to prioritize. Copying and pasting information from project documents into an external system seems like non-value-added time, so it's easy to see why putting the latest information into the system gets low priority—if it happens at all.</p>
Reporting on Quality Shouldn't Be So Difficult
<p>Given the complexity of the task, and the systemic and human factors involved in improving quality, it's not hard to see why many organizations struggle with knowing how well their initiatives are doing. </p>
<p>But for quality professionals and leaders, the challenge is to make sure that reporting on results becomes a critical step in every individual project, and that all projects are using consistent metrics. Teams that can do that will find their results getting more attention and more credit for how they affect the bottom line. </p>
<p>This finding in the ASQ report caught dramatically underscores problems we at Minitab have been focusing on recently—in fact, our <a href="http://www.minitab.com/products/companion/">Companion by Minitab</a> software tackles many of these factors head-on. </p>
<p>Companion takes a desktop app that provides a complete set of integrated tools for completing projects, and combines it with a cloud-based project storage system and web-based dashboard. For teams, the desktop app makes it easier to complete projects—and since project data is centrally stored and rolls up to the dashboard automatically, reporting on projects is <em>literally</em> effortless.</p>
<p>For executives, managers, and stakeholders, Companion delivers unprecedented and unparalleled insight into the progress, performance, and bottom-line impact of the organization’s entire quality initiative, or any individual piece of it. </p>
<p></p>
<p>Regardless of the tools they use, this issue—how to ensure the results of quality improvement initiatives are understood throughout an organization—is one that every practitioner is likely to grapple with in their career. </p>
<p>How will<em> you </em>make sure the results of your work reach your organization's decision-makers? </p>
<p> </p>
Lean Six SigmaProject ToolsQuality ImprovementWed, 19 Apr 2017 12:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/do-executives-see-the-impact-of-quality-projectsEston MartzWhat Do Ventilated Shelf Installation and Measurement Systems Analysis Have in Common?
http://blog.minitab.com/blog/quality-business/what-do-ventilated-shelf-installation-and-measurement-systems-analysis-have-in-common
<p><img alt="Ventilated Shelf" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/3c89dfbf7dc1971b6031bf669ac45625/ventilated_shelf.jpg" style="margin: 10px 15px; float: left; width: 148px; height: 113px;" />Have you ever tried to install ventilated shelving in a closet? You know: the heavy-duty, white- or gray-colored vinyl-coated wire shelving? The one that allows you to get organized, more efficient with space, and is strong and maintenance-free? Yep, that’s the one. Did I mention this stuff is strong? As in, <em>really </em>hard to cut? </p>
<p>It seems like a simple 4-step project. Measure the closet, go the store, buy the shelving, and install when you get home. Simple, right? Yeah, it sounded good in my head!</p>
<p>The lessons I learned in this project underscore the value of doing measurement system analysis in your quality improvement projects, with <a href="http://www.minitab.com/products/minitab/">statistical software such as Minitab</a>. Whatever you're trying to accomplish, if you don't get reliable measurements or data, the task is going to become more challenging.</p>
<p align="center"><img alt="Before Process Map" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/56432bf83b2d106b61f39f1ab76a8495/before_process_map.png" style="width: 600px; height: 145px; margin: 10px 15px;" /></p>
<p>Well it turned out to be more complicated and involved a lot of rework. Did I mention that this shelving is made of heavy gauge steel that is nearly impossible to cut with ordinary tools? So, my simple 4-step process turned into a 7-step process with lots of rework (multiple trips to the store to have the shelves re-cut).</p>
<p>My actual process looked more like this!</p>
<p><img alt="After Process Map" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/060c401c2db87e8176130b7e464441fb/after_process_map.png" style="width: 750px; height: 230px; margin: 10px 15px;" /></p>
<p>All the sources of variation from Measurement Systems Analysis (MSA) apply here: Repeatability, Reproducibility, Bias, Linearity, and Stability. Let’s review these terms and see how I could have done better at measuring the closet, the first time.</p>
<p align="center"><img alt="Components of Measurement Error" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/23550db0dc88ec22e9b5409ba6d6a592/components_of_measurement_error.png" style="width: 750px; height: 456px; margin: 10px 15px;" /></p>
<p>When it was time to measure the closet, I had a few measuring-device choices hanging around my garage: a yardstick, a cloth tape measure, and a steel tape measure. </p>
<p><strong>Bias</strong> examines the difference between the observed average measurement and a reference or master value. It answers the question: "How accurate is my gage when compared to a reference value?" Unless there is visible damage, all three of these measuring devices should be acceptable for my shelf project.</p>
<p><strong>Stability</strong> is the change in bias over time. Measurement stability represents the total variation in measurements obtained on the same part measured over time, also known as drift. It is important to assess stability on an ongoing basis. While calibrations and <span><a href="http://blog.minitab.com/blog/meredith-griffith/fundamentals-of-gage-rr">gage studies</a></span> provide some information about changes in the measurement system, neither provides information on what is happening to the measurement process over time. But unless there is visible damage, all three of these measuring devices should be acceptable for use.</p>
<p><strong>Linearity</strong> examines how accurate your measurements are through the expected range of the measurements. It answers the question: "Does my gauge have the same accuracy across all reference values?" If you use the yardstick or steel tape measure, then the answer might be “yes” because of its solid construction. But the cloth tape measure could stretch when extended, making it less reliable at longer lengths. Examine the cloth measuring tape for evidence of stretching or wear. If damage is present, do not use the measuring device.</p>
<p><strong>Repeatability</strong> represents the variation that occurs when the same appraiser measures the same part with the same device. This is best represented with the advice “Measure twice, cut once!” In my case, if I had measured the closet width multiple times, I would have realized I was getting a different answer each time and therefore needed to take better care when measuring. Then I could have gotten more accurate measurements for each shelf. </p>
<p><strong>Reproducibility</strong> represents the variation that occurs when different appraisers measure the same part with the same device. In my case, if I'd asked my son to measure the same locations that I just measured, I would have discovered that we got different answers: I should have accounted for the mounting brackets in my measurements. (The fact that he <em>did </em>is why he’s in school to become a Mechanical Engineer.)</p>
<p>In summary, my afternoon shelf installation project ended up taking two days to complete, resulting in multiple trips to the store, a lot of frustration for me, and late dinners for my family because I was too busy to cook! </p>
<p>My lessons learned from this project are:</p>
<ol>
<li>Don’t assume your closet walls are exactly parallel at the top, middle and bottom of the closet. Instead, measure at each location where a shelf is to be installed. Remember the Rule of Thumb for Gage R&R: take measurements representing the entire range of process variation.</li>
<li>Apply the Gage R&R sources of measurement error when measuring:
<ol style="list-style-type:lower-alpha;">
<li>Visually inspect the measuring device before using to verify it is in good condition.</li>
<li>Measure twice, cut once. (Repeatability)</li>
<li>Ask my family for assistance in measuring. (Reproducibility)</li>
</ol>
</li>
<li>Did you know that you can purchase a laser measure for about $30 these days? If only I had known…</li>
<li>Consider hiring a professional because this project was harder than it originally seemed.</li>
</ol>
Quality ImprovementMon, 17 Apr 2017 15:03:00 +0000http://blog.minitab.com/blog/quality-business/what-do-ventilated-shelf-installation-and-measurement-systems-analysis-have-in-commonBonnie K. StoneStatistical Fun … at the Grocery Store?
http://blog.minitab.com/blog/real-world-quality-improvement/statistical-fun-at-the-grocery-store
<p><img alt="Grocery Store" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/207c0625cdf8e1ab603302087b39a340/store_w640.jpeg" style="float: right; width: 300px; height: 225px; border-width: 1px; border-style: solid; margin: 10px 15px;" />Grocery shopping. For some, it's the most dreaded household activity. For others, it's fun, or perhaps just a “necessary evil.”</p>
<p>Personally, I enjoy it! My co-worker, Ginger, a content manager here at Minitab, opened my eyes to something that made me love grocery shopping even more: she shared the data behind her family’s shopping trips. Being something of a data nerd, I really geeked out over the ability to analyze spending habits at the grocery store!</p>
<p>So how did she collect her data? What I find especially interesting is that Ginger didn’t have to save her receipts or manually transfer any information from her receipts onto a spreadsheet. As a loyal <a href="http://www.wegmans.com/webapp/wcs/stores/servlet/HomepageView?storeId=10052&catalogId=10002&langId=-1&clear=true" target="_blank">Wegmans</a> grocery store shopper, Ginger was able to access over a year’s worth of her receipts just by signing up for a Wegmans.com account and using her ‘shoppers club’ card. The data she had access to includes the date, time of day, and total spent for each trip, as well as each item purchased, the grocery store department the item came from (i.e., dairy, produce, frozen foods, etc.), and if a discount was applied. As long as she used her card for purchases, it was tracked and accessible via her Wegmans.com account. Cool stuff!</p>
<p>Ginger created a Minitab worksheet with her grocery receipt data from Wegmans for a several-month period, and shared it with <a href="http://blog.minitab.com/blog/michelle-paret" target="_blank">Michelle</a> and myself to see what kinds of Minitab analysis we could do and what we might be able to uncover about her shopping habits.</p>
<strong>Using Time Series Plots to See Trends</strong>
<p>Time series plots are great for evaluating patterns and behavior in data over time, so a time series plot was a natural first step in helping us look for any initial trends in Ginger’s shopping behavior. Here’s how her Minitab worksheet looked:</p>
<p><img alt="Minitab Worksheet" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/8ed0839953e95e4330f7b73e7dc37270/data.jpg" style="border-width: 0px; border-style: solid; width: 414px; height: 202px;" /></p>
<p>And here’s a time series plot that shows her spending over time:</p>
<p><img alt="Time Series Plot in Minitab" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/65ac4649f82649bdb1ec4c6eccac7e77/time_series_plot.jpg" style="border-width: 0px; border-style: solid; width: 576px; height: 384px;" /></p>
<p>To create this time series plot in Minitab, we navigated to <strong>Graph > Time Series Plot</strong>. It was easy to see that Ginger’s spending appears random over time, filled with several higher dollar orders (likely her weekly bulk trip to stock up) and several smaller orders (things forgotten or extras needed throughout the week). There doesn’t appear to be a trend or pattern. Almost all of her spending remained under $200 per trip, which is pretty good considering that many of her trips looked to be weekly bulk orders to feed her family of four. There were also very few outlier points with extremely high spending away from her consistent behavior to spend between $100 and $150 a 3-4 times per month.</p>
<p>However, you’ll notice that the graph above isn’t the simplest to read. To make it easier to zone-in on monthly spending habits, we used the graph paneling feature in Minitab to divide the graph into more manageable pieces:</p>
<p><img alt="Minitab TIme Series Plot - Paneled " src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/2f2b9a07284b4ac07794c786cd6d9f3c/time_series_plot_of_total_2.jpg" style="border-width: 0px; border-style: solid; width: 576px; height: 384px;" /></p>
<p>The paneled graph makes it even easier to see that Ginger’s spending appears to be random, but consistently random! For more on paneling, check out this help topic on <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/graphs/graphs-that-compare-groups/bar-charts/create-a-bar-chart-with-groups-on-separate-panels/" target="_blank">Graph Paneling</a>.</p>
<strong>Visualizing Spending Data by Day of the Week</strong>
<p>To chart grocery spending by day of the week, we created a simple boxplot in Minitab (<strong>Graph > Boxplot</strong>):</p>
<p><img alt="Minitab Box Plot" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/1cd949605cf917f6a16e132293d268b6/boxplot_of_total.jpg" style="border-width: 0px; border-style: solid; width: 576px; height: 384px;" /></p>
<p>It’s pretty easy to see that Ginger’s higher-spending trips took place on Saturdays, Sundays, Mondays, and Tuesday, with the greatest spread of spending (high, low, and in-between) occurring on Tuesdays. Wednesday appeared to be a low-spending day, with what looks to be quick trips to pick up just a few items.</p>
<p>How about the number of trips occurring each day of the week? To see this, we created a simple bar chart in Minitab (<strong>Graph > Bar Chart</strong>):</p>
<p><img alt="Minitab Bar Chart" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/b57c6328854ad12e179ecf52542498ae/chart_of_day.jpg" style="border-width: 0px; border-style: solid; width: 576px; height: 384px;" /></p>
<p>The highest number of Ginger’s trips to Wegmans occurred on Sunday (35) and Saturday (26), which isn’t really a surprise considering that many people do the majority of their grocery shopping on the weekends when they have time off from work. It’s also neat to see that many of her trips occurring on Wednesday and Thursday were likely smaller dollar trips (according to our box plot from earlier in the post). I can definitely relate to those pesky mid-week trips to get items forgotten earlier in the week!</p>
<strong>Visualizing Spending Data by Department</strong>
<p>And finally, what grocery store department does Ginger purchase the most items from? To figure this out, we created a Pareto chart in Minitab (<strong>Stat > Quality Tools > Pareto Chart</strong>):</p>
<p><img alt="Minitab Pareto Chart" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/2c289ce64af01388bba4d1d22008e994/pareto_chart_of_department.jpg" style="border-width: 0px; border-style: solid; width: 576px; height: 384px;" /></p>
<p>You can see that the highest number of items purchased is classified under OTHER, which we found to be a catch-all for items that don’t fit neatly into any of the other categories. In looking through the raw data with the item descriptions classified as OTHER, I found everything from personal care items like toothbrushes, to paper plates, and other specialty food items. The GROCERY category is another ambiguous category, but it seems as if this category is largely made up of items like canned and convenience foods (think apple sauces, cereal, crackers, etc.). The rest of the categories (dairy, produce, beverages) seem pretty self-explanatory.</p>
<p>The Pareto analysis is helpful because it can bring perspective to the types of foods being bought. Healthier items will likely be in the produce and dairy categories, so it’s good to see that these categories have high counts and percents in the Pareto above.</p>
<strong>Grocery stores love data, too. </strong>
<p>It’s certainly no surprise that grocery stores love to track consumer buying behaviors through store discount cards. This helps stores to better target consumers and offer them promotions they are more likely to take advantage of. But it’s also great that grocery stores like Wegmans are sharing the wealth and giving consumers the ability to easily access their own spending data and draw their own conclusions! </p>
<p><strong>Do you analyze your spending at the grocery store? If so, how do you do it?</strong></p>
<p><span style="font-size:x-small;"><em>Top photo courtesy of Ginger MacRae</em>. <em>Yes, those are her actual groceries!</em></span></p>
Fun StatisticsStatisticsFri, 14 Apr 2017 13:17:00 +0000http://blog.minitab.com/blog/real-world-quality-improvement/statistical-fun-at-the-grocery-storeCarly BarryR-Squared: Sometimes, a Square is just a Square
http://blog.minitab.com/blog/statistics-and-quality-data-analysis/r-squared-sometimes-a-square-is-just-a-square
<p><img alt="rsquare" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/85ce10f546bd53d18ba69912862812ac/rsquarebest.jpg" style="width: 250px; float: right; height: 247px; margin: 10px 15px;" />If you regularly perform regression analysis, you know that R2 is a statistic used to evaluate the fit of your model. You may even know the standard definition of R2: <em>the percentage of variation in the response that is explained by the model. </em></p>
<p>Fair enough. With <a href="http://www.minitab.com/en-us/products/minitab/" target="_blank">Minitab Statistical Software</a> doing all the heavy lifting to calculate your R2 values, that may be all you ever need to know.</p>
<p>But if you’re like me, you like to crack things open to see what’s inside. Understanding the essential nature of a statistic helps you demystify it and interpret it more accurately.</p>
R-squared: Where Geometry Meets Statistics
<p>So where <em>does </em> this mysterious R-squared value come from? To find the formula in Minitab, choose<strong> Help > Methods and Formulas</strong>. Click<strong> General statistics > Regression > Regression > R-sq</strong>.</p>
<p><img alt="rsqare" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/25c1b2591db7f6b7cc34f2405fb07a1e/rsquare_no_annotation.jpg" style="width: 342px; height: 104px" /></p>
<p>Some spooky, wacky-looking symbols in there. Statisticians use those to make your knees knock together.</p>
<p>But all the formula really says is: “R-squared is a bunch of squares added together, divided by another bunch of squares added together, subtracted from 1.“</p>
<p><img alt="rsquare annotation" height="113" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/d67d32e7e7b6ee83b244f5a680ea394a/rsquare_annotation_w640.jpeg" width="506" /></p>
<p><em>What</em> bunch of squares, you ask?</p>
<p><img alt="square dance guys" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/017a00856d05924f11812e2e1e26ea41/square_dance_3.jpg" style="width: 460px; height: 299px" /></p>
<p>No, not them.</p>
SS Total: Total Sum of Squares
<p>First consider the "bunch of squares" on the bottom of the fraction. Suppose your data is shown on the scatterplot below:</p>
<p><img alt="original data" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/a60bf6e6a3f1ebb09859d68536438410/scatterplot_of_y_vs_x.jpg" style="width: 576px; height: 384px" /></p>
<p>(Only 4 data values are shown to keep the example simple. Hopefully you have more data than this for your actual regression analysis! )</p>
<p>Now suppose you add a line to show the mean (average) of all your data points:</p>
<p><img alt="scatterplot with line" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/9b87c2062abf94036e60d67a2ea7a4ab/scatterplot_of_y_vs_x_with_line.jpg" style="width: 576px; height: 384px" /></p>
<p>The line y = mean of Y is sometimes referred to the “trivial model” because it doesn’t contain any predictor (X) variables, just a constant. How well would this line model your data points?</p>
<p>One way to quantify this is to measure the vertical distance from the line to each data point. That tells you how much the line “misses” each data point. This distance can be used to construct the sides of a square on each data point.</p>
<p><img alt="pinksquares" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/624ab41f8f2cfbaee30fa0e96d619f69/scatterplot_of_y_vs_x_pink.jpg" style="width: 576px; height: 384px" /></p>
<p>If you add up the pink areas of all those squares for all your data points you get the total sum of squares (SS Total), the bottom of the fraction.</p>
<p><img alt="SS Total" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/babb38258faeb3eeffcf297614c3a2db/r2_formula_ss_total.jpg" style="width: 556px; height: 149px" /></p>
SS Error: Error Sum of Squares
<p>Now consider the model you obtain using regression analysis.</p>
<p><img alt="regression model" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/11d77cb1244bea8639363e59dfc9dd25/scatterplot_of_y_vs_x_regression.jpg" style="width: 576px; height: 384px" /></p>
<p>Again, quantify the "errors" of this model by measuring the vertical distance of each data value from the regression line and squaring it.</p>
<p><img alt="ss error graph" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/44bb7b18d43254e721ec367b07eae546/scatterplot_of_y_vs_x_ss_error.jpg" style="width: 576px; height: 384px" /></p>
<p>If you add the green areas of theses squares you get the SS Error, the top of the fraction.</p>
<p><img alt="ss error formula" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/b2e227513ca63f20339b7fcd36985cc2/r2_formula_ss_error_w640.jpeg" style="width: 656px; height: 148px" /></p>
<p>So R2 basically just compares the errors of your regression model to the errors you’d have if you just used the mean of Y to model your data.</p>
R-Squared for Visual Thinkers
<p> </p>
<p><img alt="rsquare final" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/69a3a2540f6601c26d025fe495d175d5/rsquare_final_w640.jpeg" style="width: 640px; height: 313px" /></p>
<p>The smaller the errors in your regression model (the green squares) in relation to the errors in the model based on only the mean (pink squares), the closer the fraction is to 0, and the closer R2 is to 1 (100%).</p>
<p>That’s the case shown here. The green squares are much smaller than the pink squares. So the R2 for the regression line is 91.4%.</p>
<p>But if the errors in your reqression model are about the same size as the errors in the trivial model that uses only the mean, the areas of the pink squares and the green squares will be similar, making the fraction close to 1, and the R2 close to 0. </p>
<p>That means that your model, isn't producing a "tight fit" for your data, generally speaking. You’re getting about the same size errors you’d get if you simply used the mean to describe all your data points! </p>
R-squared in Practice
<p>Now you know exactly what R2 is. People have different opinions about <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-high-should-r-squared-be-in-regression-analysis" target="_blank">how critical the R-squared value is in regression analysis</a>. My view? No single statistic ever tells the whole story about your data. But that doesn't invalidate the statistic. It's always a good idea to evaluate your data using a variety of statistics. Then interpret the composite results based on the context and objectives of your specific application. If you understand how a statistic is actually calculated, you'll better understand its strengths and limitations.</p>
Related link
<p>Want to see how another commonly used analysis, the t-test, really works? Read <a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/what-is-a-t-test-and-why-is-it-like-telling-a-kid-to-clean-up-that-mess-in-the-kitchen" target="_blank">this post</a> to learn how the t-test measures the "signal" to the "noise" in your data.</p>
Regression AnalysisThu, 13 Apr 2017 13:06:00 +0000http://blog.minitab.com/blog/statistics-and-quality-data-analysis/r-squared-sometimes-a-square-is-just-a-squarePatrick RunkelGauging Gage Part 3: How to Sample Parts
http://blog.minitab.com/blog/fun-with-statistics/gauging-gage-part-3-how-to-sample-parts
<p>In <a href="http://blog.minitab.com/blog/fun-with-statistics/gauging-gage-part-1-is-10-parts-enough">Parts 1</a> and <a href="http://blog.minitab.com/blog/fun-with-statistics/gauging-gage-part-2-are-3-operators-or-2-replicates-enough">2 of Gauging Gage</a> we looked at the numbers of parts, operators, and replicates used in a Gage R&R Study and how accurately we could estimate %Contribution based on the choice for each. In doing so, I hoped to provide you with valuable and interesting information, but mostly I hoped to make you like me. I mean like me so much that if I told you that you were doing something flat-out wrong and had been for years and probably screwed somethings up, you would hear me out and hopefully just revert back to being indifferent towards me.</p>
<p>For the third (and maybe final) installment, I want to talk about something that drives me crazy. It really gets under my skin. I see it all of the time, maybe more often than not. You might even do it. If you do, I'm going to try to convince you that you are very, very wrong. If you're an instructor, you may even have to contact past students with groveling apologies and admit you steered them wrong. And that's the best-case scenario. Maybe instead of admitting error, you will post scathing comments on this post insisting I am wrong and maybe even insulting me despite the evidence I provide here that I am, in fact, right.</p>
<p>Let me ask you a question:</p>
When you choose parts to use in a Gage R&R Study, how do you choose them?
<p>If your answer to that question required anymore than a few words - and it can be done in one word—then I'm afraid you may have been making a very popular but very bad decision. If you're in that group, I bet you're already reciting your rebuttal in your head now, without even hearing what I have to say. You've had this argument before, haven't you? Consider whether your response was some variation on the following popular schemes:</p>
<ol>
<li>Sample parts at regular intervals across the range of measurements typically seen</li>
<li>Sample parts at regular intervals across the process tolerance (lower spec to upper spec)</li>
<li>Sample randomly but pull a part from outside of either spec</li>
</ol>
<p>#1 is wrong. #2 is wrong. #3 is wrong.</p>
<p>You see, the statistics you use to qualify your measurement system are all reported relative to the part-to-part variation and all of the schemes I just listed do not accurately estimate your true part-to-part variation. The answer to the question that would have provided the most reasonable estimate?</p>
<p>"Randomly."</p>
<p>But enough with the small talk—this is a statistics blog, so let's see what the statistics say.</p>
<p>In Part 1 I described a simulated Gage R&R experiment, which I will repeat here using the standard design of 10 parts, 3 operators, and 2 replicates. The difference is that in only one set of 1,000 simulations will I randomly pull parts, and we'll consider that our baseline. The other schemes I will simulate are as follows:</p>
<ol>
<li>An "exact" sampling - while not practical in real life, this pulls parts corresponding to the 5th, 15th, 25, ..., and 95th percentiles of the underlying normal distribution and forms a (nearly) "exact" normal distribution as a means of seeing how much the randomness of sampling affects our estimates.</li>
<li>Parts are selected uniformly (at equal intervals) across a typical range of parts seen in production (from the 5th to the 95th percentile).</li>
<li>Parts are selected uniformly (at equal intervals) across the range of the specs, in this case assuming the process is centered with a Ppk of 1.</li>
<li>8 of the 10 parts are selected randomly, and then one part each is used that lies one-half of a standard deviation outside of the specs.</li>
</ol>
<p>Keep in mind that we know with absolute certainty that the underlying %Contribution is 5.88325%.</p>
Random Sampling for Gage
<p>Let's use "random" as the default to compare to, which, as you recall from Parts 1 and 2, already does not provide a particularly accurate estimate:</p>
<p style="margin-left:40px"><img alt="Pct Contribution with Random Sampling" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/46889f0e-f0a5-4b4a-8a19-2d2b8dce6087/Image/af91c4815469651cc698c3aa7d980c61/histogram_of_10_pctcontribution.gif" style="height:384px; width:576px" /></p>
<p>On several occasions I've had people tell me that you can't just sample randomly because you might get parts that don't really match the underlying distribution. </p>
Sample 10 Parts that Match the Distribution
<p>So let's compare the results of random sampling from above with our results if we could magically pull 10 parts that follow the underlying part distribution almost perfectly, thereby eliminating the effect of randomness:</p>
<p style="margin-left:40px"><img alt="Random vs Exact" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/46889f0e-f0a5-4b4a-8a19-2d2b8dce6087/Image/f2b7c1cc6c3cede482e7251b2b55f28e/random_vs_exact.gif" style="height:384px; width:576px" /></p>
<p>There's obviously something to the idea that the randomness that comes from random sampling has a big impact on our estimate of %Contribution...the "exact" distribution of parts shows much less skewness and variation and is considerably less likely to incorrectly reject the measurement system. To be sure, implementing an "exact" sample scheme is impossible in most cases...since you don't yet know how much measurement error you have, there's no way to know that you're pulling an exact distribution. What we have here is a statistical version of chicken-and-the-egg!</p>
Sampling Uniformly across a Typical Range of Values
<p>Let's move on...next up, we will compare the random scheme to scheme #2, sampling uniformly across a typical range of values:</p>
<p style="margin-left:40px"><img alt="Random vs Uniform Range" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/46889f0e-f0a5-4b4a-8a19-2d2b8dce6087/Image/d8e9f2f7a24a62457a2d517914baef73/random_vs_uniformrange.gif" style="height:384px; width:576px" /></p>
<p>So here we have a different situation: there is a very clear reduction in variation, but also a very clear bias. So while pulling parts uniformly across the typical part range gives much more consistent estimates, those estimates are likely telling you that the measurement system is much better than it really is.</p>
Sampling Uniformly across the Spec Range
<p>How about collecting uniformly across the range of the specs?</p>
<p style="margin-left:40px"><img alt="Random vs Uniform Specs" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/46889f0e-f0a5-4b4a-8a19-2d2b8dce6087/Image/5da456e491792be021485c0e9a514298/random_vs_uniformspecs.gif" style="height:384px; width:576px" /></p>
<p>This scheme results in an even more extreme bias, with qualifying this measurement system a certainty and in some cases even classifying it as excellent. Needless to say it does not result in an accurate assessment.</p>
Selectively Sampling Outside the Spec Limits
<p>Finally, how about that scheme where most of the points are taken randomly but just one part is pulled from just outside of each spec limit? Surely just taking 2 of the 10 points from outside of the spec limits wouldn't make a substantial difference, right?</p>
<p style="margin-left:40px"><img alt="Random vs OOS" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/46889f0e-f0a5-4b4a-8a19-2d2b8dce6087/Image/c0821d19873a65162535d231799052ce/random_vs_oos.gif" style="height:384px; width:576px" /></p>
<p>Actually those two points make a huge difference and render the study's results meaningless! This process had a Ppk of 1 - a higher-quality process would make this result even more extreme. Clearly this is not a reasonable sampling scheme.</p>
<strong>Why These Sampling Schemes?</strong>
<p>If you were taught to sample randomly, you might be wondering why so many people would use one of these other schemes (or similar ones). They actually all have something in common that explains their use: all of them allow a practitioner to assess the measurement system across a range of possible values. After all, if you almost always produce values between 8.2 and 8.3 and the process goes out of control, how do you know that you can adequately measure a part at 8.4 if you never evaluated the measurement system at that point?</p>
<p>Those that choose these schemes for that reason are smart to think about that issue, but just aren't using the right tool for it. Gage R&R evaluates your measurement system's ability to measure relative to the current process. To assess your measurement system across a range of potential values, the correct tool to use is a "Bias and Linearity Study" which is found in the Gage Study menu in Minitab. This tool establishes for you whether you have bias across the entire range (consistently measuring high or low) or bias that depends on the value measured (for example, measuring smaller parts larger than they are and larger parts smaller than they are).</p>
<p>To really assess a measurement system, I advise performing both a Bias and Linearity Study as well as a Gage R&R.</p>
<strong>Which Sampling Scheme to Use?</strong>
<p>In the beginning I suggested that a random scheme be used but then clearly illustrated that the "exact" method provides even better results. Using an exact method requires you to know the underlying distribution from having enough previous data (somewhat reasonable although existing data include measurement error) as well as to be able to measure those parts accurately enough to ensure you're pulling the right parts (not too feasible...if you know you can measure accurately, why are you doing a Gage R&R?). In other words, it isn't very realistic.</p>
<p>So for the majority of cases, the best we can do is to sample randomly. But we can do a reality check after the fact by looking at the average measurement for each of the parts chosen and verifying that the distribution seems reasonable. If you have a process that typically shows normality and your sample shows unusually high skewness, there's a chance you pulled an unusual sample and may want to pull some additional parts and supplement the original experiment.</p>
<p>Thanks for humoring me and please post scathing comments below!</p>
<p><a href="http://blog.minitab.com/blog/fun-with-statistics/gauging-gage-part-1-is-10-parts-enough">see Part I of this series</a><br />
<a href="http://blog.minitab.com/blog/fun-with-statistics/gauging-gage-part-2-are-3-operators-or-2-replicates-enough">see Part II of this series</a></p>
AutomotiveGovernmentHealth Care Quality ImprovementHealthcareLean Six SigmaManufacturingMedical DevicesMiningQuality ImprovementServicesWed, 12 Apr 2017 13:39:00 +0000http://blog.minitab.com/blog/fun-with-statistics/gauging-gage-part-3-how-to-sample-partsJoel SmithWhy Is Continuous Data "Better" than Categorical or Discrete Data?
http://blog.minitab.com/blog/understanding-statistics/why-is-continuous-data-better-than-categorical-or-discrete-data
<p>Earlier, I wrote about the <a href="http://blog.minitab.com/blog/understanding-statistics/understanding-qualitative-quantitative-attribute-discrete-and-continuous-data-types">different types of data</a> statisticians typically encounter. In this post, we're going to look at why, when given a choice in the matter, we prefer to analyze continuous data rather than categorical/attribute or discrete data. </p>
<p>As a reminder, when we assign something to a group or give it a name, we have created <strong>attribute </strong>or <strong>categorical </strong>data. If we count something, like defects, we have gathered <strong>discrete </strong>data. And if we can measure something to a (theoretically) infinite degree, we have <strong>continuous </strong>data.</p>
<p>Or, to put in bullet points: </p>
<ul>
<li><strong>Categorical </strong>= naming or grouping data</li>
<li><strong>Discrete </strong>= count data</li>
<li><strong>Continuous</strong> = measurement data</li>
</ul>
<p>A <a href="http://www.minitab.com/products/minitab" style="font-size: 13px; line-height: 18.9090900421143px;">statistical software package</a><span style="font-size: 13px; line-height: 18.9090900421143px;"> like Minitab is extremely powerful and can tell us many valuable things</span><span style="font-size: 13px; line-height: 18.9090900421143px;">—as long as we're able to feed it good numbers. Without numbers, we have no analyses nor graphs. Even categorical or</span><span style="font-size: 13px; line-height: 18.9090900421143px;"> attribute data needs to be converted into numeric form by counting before we can analyze it. </span></p>
What Makes Numeric Data Discrete or Continuous?
<p>At this point, you may be thinking, "Wait a minute—we can't <em>really </em>measure <em>anything </em>infinitely,so isn't measurement data actually discrete, too?" That's a fair question. </p>
<p>If you're a strict literalist, the answer is "yes"—when we measure a property that's continuous, like height or distance, we are <i>de facto </i>making a discrete assessment. When we collect a lot of those discrete measurements, it's the amount of detail they contain that will dictate whether we can treat the collection as discrete or continuous.</p>
<p>I like to think of it as a question of scale. Say <span style="line-height: 1.6;">I want to measure the weight of 16-ounce cereal boxes coming off a production line, and I want to be sure that the weight of each box is at least 16 ounces, but no more than 1/2 ounce over that. </span></p>
<p><span style="line-height: 1.6;">With a scale calibrated to whole pounds, all I can do is put every box into one of three categories: less than a pound, 1 pound, or more than a pound. </span></p>
<p>With a scale that can distinguish ounces, I will be able to measure with a bit more accuracy just how close to a pound the individual boxes are. I'm getting nearer to continuous data, but there are still only 16 degrees between each pound. </p>
<p>But if I measure with a scale capable of distinguishing 1/1000th of an ounce, I will have quite a wide scale—a <em>continuum</em>—of potential values between pounds. The individual boxes could have any value between 0.000 and 1.999 pounds. The scale of these measurements is fine enough to be analyzed with powerful statistical tools made for continuous data. </p>
What Can I Do with Continuous Data that I Can't Do with Discrete?
<p>Not all data points are equally valuable, and you can glean a lot more insight from 100 points of continuous data than you can from 100 points of attribute or count data. <span style="line-height: 18.9090900421143px;">How does this finer degree of detail affect what we can learn from a set of data?</span><span style="line-height: 18.9090900421143px;"> It's easy to see. </span></p>
<p>Let's start with the simplest kind of data, attribute data that rates a the weight of a cereal box as good or bad. For 100 boxes of cereal, any that are under 1 pound are classified as bad, so each box can have one of only two values.</p>
<p>We can create a bar chart or a pie chart to visualize this data, and that's about it:</p>
<p><img alt="Attribute Data Bar Chart" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/9a3aaad00a1a5858433f17bfd121f465/attribute_data_bar_chart.png" style="width: 576px; height: 384px;" /></p>
<p>If we bump up the precision of our scale to differentiate between boxes that are over and under 1 pound, we can put each box of cereal into one of three categories. Here's what that looks like in a pie chart:</p>
<p><img alt="pie chart of count data" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ae87a08eae95accccbc82b97fe3f0ced/pie_chart_of_count_data.png" style="width: 576px; height: 384px;" /></p>
<p>This gives us a little bit more insight—we now see that we are overfilling more boxes than we are underfilling—but there is still a very limited amount of information we can extract from the data. </p>
<p>If we measure each box to the nearest ounce, we open the door to using methods for continuous data, and get a still better picture of what's going on. We can see that, on average, the boxes weigh 1 pound. But there's high variability, with a standard deviation of 0.9. There's also a wide range in our data, with observed values from 12 to 20 ounces: </p>
<p><img alt="graphical summary of ounce data" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/26b4e51027b7afa154d0e6e3f14ab8e9/summary_statistics_for_ounces.png" style="width: 575px; height: 431px;" /></p>
<p>If I measure the boxes with a scale capable of differentiating thousandths of an ounce, more options for analysis open up. For example, now that the data are fine enough to distinguish half-ounces (and then some), I can perform a capability analysis to see if my process is even capable of consistently delivering boxes that fall between 16 and 16.5 ounces. I'll use the Assistant in Minitab to do it, selecting <strong>Assistant > Capability Analysis</strong>: </p>
<p><img alt="capability analysis for thousandths" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/0b0a37d1515c25b2e1d8d633b09da447/capability_analysis_for_thousandths___summary_report.png" style="width: 575px; height: 431px;" /></p>
<p>The analysis has revealed that my process isn't capable of meeting specifications. Looks like I have some work to do...but the Assistant also gives me an I-MR control chart, which reveals where and when my process is going out of spec, so I can start looking for root causes.</p>
<p><img alt="IMR Chart" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/df4a5f568e1d931ddcb96404fd888547/imr_chart.png" style="width: 575px; height: 224px;" /></p>
<p>If I were only looking at attribute data, I might think my process was just fine. Continuous data has allowed me to see that I can make the process better, and given me a rough idea where to start. <span style="line-height: 1.6;">By making changes and collecting additional continuous data, I'll be able to conduct hypothesis tests, analyze sources of variances, and more. </span></p>
Some Final Advantages of Continuous Over Discrete Data
<p>Does this mean discrete data is no good at all? Of course not—we are concerned with many things that can't be measured effectively except through discrete data, such as opinions and demographics. But when you can get it, continuous data is the better option. The table below lays out the reasons why. </p>
<p><strong>Continuous Data</strong></p>
<p><strong>Discrete Data</strong></p>
Inferences can be made with few data points—valid analysis can be performed with small samples.
More data points (a larger sample) needed to make an equivalent inference.
Smaller samples are usually less expensive to gather
Larger samples are usually more expensive to gather.
High sensitivity (how close to or far from a target)
Low sensitivity (good/bad, pass/fail)
Variety of analysis options that can offer insight into the sources of variation
Limited options for analysis, with little indication of sources of variation
<p>I hope this very basic overview has effectively illustrated why you should opt for continuous data over discrete data whenever you can get it. </p>
Data AnalysisStatisticsStatistics HelpFri, 07 Apr 2017 12:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/why-is-continuous-data-better-than-categorical-or-discrete-dataEston MartzHow to Improve Cpk
http://blog.minitab.com/blog/michelle-paret/how-to-improve-cpk
<p><span style="line-height: 1.6;">You run a capability analysis and your Cpk is bad. Now what? </span></p>
<p><span style="line-height: 1.6;">First, let’s start by defining what “bad” is. In simple terms, the smaller the Cpk, the more defects you have. So the larger your Cpk is, the better. </span><span style="line-height: 1.6;">Many practitioners use a Cpk of 1.33 as the gold standard, so we’ll treat that as the gold standard here, too.</span></p>
<p>Suppose we collect some data and run a capability analysis using <a href="http://www.minitab.com/products/minitab/">Minitab Statistical Software</a>. The results reveal a Cpk of 0.35 with a corresponding DPMO (defects per million opportunities) of more than 140,000. Not good. So how can we improve it? There are two ways to figure that out:</p>
#1 Look at the Graph
<p><strong style="line-height: 20.8px;">Example 1: </strong><span style="line-height: 20.8px;">The Cpk for Diameter1 is 0.35, which is well below 1.33. This means we have a lot of measurements that are out of spec. </span></p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/ca492b411b474ef0a6fd89ae25713cce/process_capability_report_for_diameter1_w1024.jpeg" style="width: 400px; height: 296px; margin-left: 5px; margin-right: 5px;" /></p>
<p>Using the graph, we can see that the data—represented by the blue histogram—is not centered <span style="line-height: 1.6;">between the spec limits shown in red. Fortunately, variability does not appear to be an issue since the histogram and corresponding normal curve can physically fit between the specification limits.</span></p>
<p style="margin-left: 40px;"><em>Q: How can we improve Cpk?</em></p>
<p style="margin-left: 40px;"><em>A: Center the process by moving the mean closer to 100 – halfway between the spec limits </em><em style="line-height: 20.8px;">–</em><em> without increasing the variation.</em></p>
<p> </p>
<p><span style="line-height: 1.6;"><strong>Example 2: </strong>In the analysis for Diameter2, we see a meager Cpk of only 0.41. Fortunately, the data is </span><span style="line-height: 1.6;">centered relative to</span><span style="line-height: 1.6;"> the spec limits. However, the histogram and corresponding </span><span style="line-height: 1.6;">normal curve extend beyond the specs.</span></p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/52c8fbb956a0bc411db98e20ec236820/process_capability_report_for_diameter2_w1024.jpeg" style="line-height: 20.8px; margin-left: 5px; margin-right: 5px; width: 400px; height: 296px;" /></p>
<p style="margin-left: 40px;"><em>Q: How can we improve Cpk?</em></p>
<p style="margin-left: 40px;"><em>A: Reduce the variability, while maintaining the same average.</em></p>
<p> </p>
<p><strong>Example 3: </strong>In the analysis for Diameter3, we can see that the process is not centered between the specs. To make matters worse, the histogram and corresponding normal curve are wider than the tolerance <span style="line-height: 1.6;">(i.e. the distance between the spec limits),</span> which indicates that there’s also too much variability.</p>
<p><em><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/0b7c4891bd5afe50f0bda0548666e0d0/process_capability_report_for_diameter3_w1024.jpeg" style="margin-left: 5px; margin-right: 5px; width: 400px; height: 296px;" /></em></p>
<p style="margin-left: 40px;"><em>Q</em><em style="line-height: 1.6;">: How can we improve Cpk?</em></p>
<p style="margin-left: 40px;"><em>A. Shift the mean closer to 100 to center the process AND reduce the variation.</em></p>
<p> </p>
#2 Compare Cp to Cpk
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/89bd0f1821c52297eb8aeab6efae8428/caringarage.jpg" style="line-height: 20.8px; margin-left: 5px; margin-right: 5px; float: right; width: 250px; height: 193px;" /></p>
<p><span style="line-height: 1.6;">Cp is similar to Cpk in that the smaller the number, the worse the process, and we can use the same 1.33 gold standard. However, the two statistics and <a href="http://blog.minitab.com/blog/statistics-in-the-field/learning-process-capability-with-a-catapult-part-2">their corresponding formulas</a> differ in that Cp only compares the spread of the data to the tolerance width, and </span><span style="line-height: 1.6;">does <em>not</em> account for whether or not the process is actually centered between the spec limits.</span></p>
<p>Interpreting Cp is much like asking “will my car fit in the garage?” where the data is your car and the spec limits are the walls of your garage. We’re not accounting for whether or not you’re a crappy driver and can actually drive straight and center the car—we’re just looking at whether or not your car is narrow enough to physically fit.</p>
<p><strong>Example 1: </strong>The analysis for Diameter1 has a Cp of 1.64, which is very good. Because Cp is good, we know the variation is acceptable—we can physically fit our car in the garage. However, Cpk, which does acccount for whether or not the process is centered, is <em>awful</em>, at only 0.35.</p>
<p><em> Q: How can we improve Cpk?</em></p>
<p><em> A: Shift the mean to center the process between the specs, without increasing the variation.</em></p>
<p><strong>Example 2: </strong>The analysis for Diameter 2 shows that Cp = 0.43 and Cpk = 0.41. Because Cp is bad, we know there’s too much variation—our car cannot physically fit in the garage. And because the Cp and Cpk values are similar, this tells us that the process is fairly centered.</p>
<p><em style="line-height: 20.8px;"> Q: How can we improve Cpk?</em></p>
<p style="line-height: 20.8px;"><em> A: Reduce the variation, while maintaining the same average.</em></p>
<p style="line-height: 20.8px;"><strong>Example 3: </strong>The analysis for Diameter 3 has a Cp = 0.43 and Cpk = -0.23. Because Cp is bad, we know there’s too much variation. And because Cp is not even close to Cpk, we know that the process is also off center.</p>
<p style="line-height: 20.8px;"><em> Q: How can we improve Cpk?</em></p>
<p style="line-height: 20.8px;"><em> A. Shift the mean AND reduce the variation.</em></p>
<p> </p>
And for a 3rd way...
<p>Whether you look at a capability analysis graph or compare the Cp and Cpk statistics, you’re going to arrive at the same conclusion regarding how to improve your results. And if you want yet another way to figure out how to improve Cpk, you can also look at the mean and standard deviation—but for now, I’ll spare you the math lesson and stick with #1 and #2 above.</p>
<p>In summary:</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/6aa0804ba89d8dc383d912663fd91f95/summarygrid.jpg" style="width: 644px; height: 210px;" /></p>
AutomotiveCapability AnalysisData AnalysisLean Six SigmaLearningManufacturingQuality ImprovementSix SigmaStatisticsStatsWed, 05 Apr 2017 12:00:00 +0000http://blog.minitab.com/blog/michelle-paret/how-to-improve-cpkMichelle ParetGauging Gage Part 2: Are 3 Operators or 2 Replicates Enough?
http://blog.minitab.com/blog/fun-with-statistics/gauging-gage-part-2-are-3-operators-or-2-replicates-enough
<p>In Part 1 of Gauging Gage, I looked at how adequate a <a href="http://blog.minitab.com/blog/fun-with-statistics/gauging-gage-part-1-is-10-parts-enough">sampling of 10 parts is for a Gage R&R Study</a> and providing some advice based on the results.</p>
<p>Now I want to turn my attention to the other two factors in the standard Gage experiment: 3 operators and 2 replicates. Specifically, what if instead of increasing the number of parts in the experiment (my previous post demonstrated you would need an unfeasible increase in parts), you increased the number of operators or number of replicates?</p>
<p>In this study, we are only interested in the effect on our estimate of overall Gage variation. Obviously, increasing operators would give you a better estimate of of the operator term and reproducibility, and increasing replicates would get you a better estimate of repeatability. But I want to look at the overall impact on your assessment of the measurement system.</p>
Operators
<p>First we will look at operators. Using the same simulation engine I described in Part 1, this time I did two different simulations. In one, I increased the number of operators to 4 and continued using 10 parts and 2 replicates (for a total of 80 runs); in the other, I increased to 4 operators and still used 2 replicates, but decreased the number of parts to 8 to get back close to the original experiment size (64 runs compared to the original 60).</p>
<p>Here is a comparison of the standard experiment and each scenario laid out here:</p>
<p style="margin-left:40px"><img alt="Operator Comparisons" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/46889f0e-f0a5-4b4a-8a19-2d2b8dce6087/Image/ab84f3d0ae2d826f47786930ee54c611/operator_comparisons.gif" style="height:384px; width:576px" /></p>
<p style="margin-left:40px"><img alt="Operator Descriptive Stats" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/46889f0e-f0a5-4b4a-8a19-2d2b8dce6087/Image/bc864992dcfd882e2c6066496b79ce19/operators_desc.GIF" style="height:68px; width:524px" /></p>
<p>It may not be obvious in the graph, but increasing to 4 operators while decreasing to 8 parts actually <em>increased</em> the variation in %Contribution seen...so despite requiring 4 more runs this is the poorer choice. And the experiment that involved 4 operators but maintained 10 parts (a total of 80 runs) showed no significant improvement over the standard study.</p>
Replicates
<p>Now let's look at replicates in the same manner we looked at parts. In one run of simulations we will increase replicates to 3 while continuing to use 10 parts and 3 operators (90 runs), and in another we will increase replicates to 3 and operators to 3, but reduce parts to 7 to compensate (63 runs).</p>
<p>Again we compare the standard experiment to each of these scenarios:</p>
<p style="margin-left:40px"><img alt="Replicate Comparisons" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/46889f0e-f0a5-4b4a-8a19-2d2b8dce6087/Image/f5d14793691f9ad2d39a598ca41e9945/replicate_comparisons.gif" style="height:384px; width:576px" /></p>
<p style="margin-left:40px"><img alt="Replicates Descriptive Statistics" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/46889f0e-f0a5-4b4a-8a19-2d2b8dce6087/Image/1c08fe3733c316f67e904e532c6b3e6e/replicates_desc.GIF" style="height:71px; width:528px" /></p>
<p>Here we see the same pattern as with operators. Increasing to 3 replicates while compensating by reducing to 7 parts (for a total of 63 runs) significantly increases the variation in %Contribution seen. And increasing to 3 replicates while maintaining 10 parts shows no improvement.</p>
<strong>Conclusions about Operators and Replicates in Gage Studies</strong>
<p>As stated above, we're only looking at the effect of these changes to the overall estimate of measurement system error. So while increasing to 4 operators or 3 replicates either showed no improvement in our ability to estimate %Contribution or actually made it worse, you may have a situation where you are willing to sacrifice that in order to get more accurate estimates of the individual components of measurement error. In that case, one of these designs might actually be a better choice.</p>
<p>For most situations, however, if you're able to collect more data, then increasing the number of parts used remains your best choice.</p>
<p>But how do we select those parts? I'll talk about that in my next post!</p>
<p><a href="http://blog.minitab.com/blog/fun-with-statistics/gauging-gage-part-1-is-10-parts-enough">see Part I of this series</a><br />
<a href="http://blog.minitab.com/blog/fun-with-statistics/gauging-gage-part-3-how-to-sample-parts">see Part III of this series</a></p>
AutomotiveData AnalysisGovernmentHealth Care Quality ImprovementHealthcareLean Six SigmaManufacturingMedical DevicesMiningServicesSix SigmaStatisticsStatsTue, 04 Apr 2017 12:00:00 +0000http://blog.minitab.com/blog/fun-with-statistics/gauging-gage-part-2-are-3-operators-or-2-replicates-enoughJoel Smith5 Simple Steps to Conduct Capability Analysis with Non-Normal Data
http://blog.minitab.com/blog/statistics-in-the-field/5-simple-steps-to-conduct-capability-analysis-with-non-normal-data
<p><em>by Kevin Clay, guest blogger</em></p>
<p>In transactional or service processes, we often deal with lead-time data, and usually that data does not follow the normal distribution.<img alt="why be normal" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/5cc1ae3ccb859112d24d54586a3067cb/why_be_normal.png" style="width: 169px; height: 158px; margin: 10px 15px; float: right;" /></p>
<p>Consider a Lean Six Sigma project to reduce the lead time required to install an information technology solution at a customer site. It should take no more than 30 days—working 10 hours per day Monday–Friday—to complete, test and certify the installation. Following the standard process, the target lead time should be around 24 days.</p>
<p>Twenty-four days may be the target, but we know customer satisfaction increases as we complete the installation faster. We need to understand our baseline capability to meet that demand, so we can perform a <span><a href="http://blog.minitab.com/blog/understanding-statistics/the-easiest-way-to-do-capability-analysis">capability analysis</a></span>.</p>
<p>We know our data should fit a non-normal (positively skewed) distribution. It should resemble a ski-slope like the picture below:</p>
<ul>
<li><img alt="ski slope distribution" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/2085d835d6dd108f3ce8220de23da5b2/image003.jpg" style="width: 400px; height: 240px;" /></li>
</ul>
<p>In this post, I will cover five simple steps to understand the capability of a non-normal process to meet customer demands.</p>
1. Collect data
<p>First we must gather data from the process. In this scenario, we are collecting sample data. We pull 100 samples that cover the full range of variation that occurs in the process.</p>
<p>In this case the full range of variation comes from three installation teams. We will take at least 30 data points from each team.</p>
2. Identify the Shape of the Distribution
<p>We know that the data should fit a non-normal distribution. As Lean Six Sigma practitioners, we must prove our assumption with data. In this case, we can conduct a normality test to prove non-normality.</p>
<p>We are using Minitab as the statistical analysis tool, and our data are available in <a href="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/9743c0f88d5ac591f033e809278b50df/lead_time_data.mtw">this worksheet</a>. (If you want to follow along and don't already have it, download the <a href="http://www.minitab.com/en-us/products/minitab/free-trial/">free Minitab trial</a>.)</p>
<p>From the menu, select “Normality Test” found under “<strong>Stat > Basic Statistics > Normality Test …</strong>”</p>
<p>Populate the “Variable:” field with LeadTime, and click OK as shown:</p>
<p style="margin-left: 40px;"><img alt="normality test dialog" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/7daf5c4264d13540ae84453e276d24d5/image004.jpg" style="border-width: 0px; border-style: solid; width: 400px; height: 309px;" /></p>
<p>You should get the following Probability Plot:</p>
<p style="margin-left: 40px;"><img alt="probability plot of lead time" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/375c30b4b4583fab4cc81913a42e441b/pvalue.png" style="border-width: 0px; border-style: solid; width: 500px; height: 333px;" /></p>
<p>Since the P-value (outlined in yellow in the above picture) is less than .05, we assume with 95% confidence the data fits a non-normal distribution.</p>
3. Verify Stability
<p>In a Lean Six Sigma project, we might find the answer to our problem anywhere on the DMAIC roadmap. Belts need to learn to look for the signals all throughout the project.</p>
<p>In this case, signals can come from instability in our process. They show up as red dots on a control chart.</p>
<p>To see if this lead time process is stable, we will run an I-MR Chart. In Minitab, select <strong>Stat > Control Charts > Variables Charts for Individuals > I-MR…</strong>”</p>
<p>Populate “Varibles:” with “LeadTime” in the dialog as shown below:</p>
<p style="margin-left: 40px;"><img alt="I-MR Chart dialog" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/1868b197223418928f8ea9fab155c31d/image006.jpg" style="border-width: 0px; border-style: solid; width: 400px; height: 281px;" /></p>
<p>Press OK, and you'll get the following “I-MR Chart of LeadTime”:</p>
<p style="margin-left: 40px;"><img alt="I-MR Chart of Lead Time" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/4155ef20cc8e75c5d30ed8d1037c9656/image007.jpg" style="border-width: 0px; border-style: solid; width: 500px; height: 345px;" /></p>
<p>The I-MR Chart shows two signal of instability (shown as red dots) on both the Individual Chart on the top of the graph, and the Moving Range Chart on the bottom.</p>
<p>These data points indicate abnormal variation, and their cause should be investigated. These signals could offer great insight into the problem you are trying to solve. Once identified and resolved the causes of these points, you can take additional data or remove the points from the data set.</p>
<p>In this scenario, we will leave the two points in the data set (we will not remove the two points)</p>
4. What Non-Normal Distribution Does the Data Best Fit?
<p>There are several non-normal data distributions that the data could fit, so we will use a tool in Minitab to show us which distribution fits the data best. Open the “Individual Distribution Identification” dialog by going to <strong>Stat > Quality Tools > Individual Distribution Identification…</strong></p>
<p>Populate “Single column:” and Subgroup size:” as follows:</p>
<p style="margin-left: 40px;"><img alt="individual distribution identification dialog" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/6ff509b390b27bd604613923a553aa9c/image008.jpg" style="border-width: 0px; border-style: solid; width: 400px; height: 359px;" /></p>
<p>Minitab will output the four graphs shown below. Each graph includes four different distributions:</p>
<p style="margin-left: 40px;"><img alt="probability ID plots 1" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/3ff912b3c896757b27193fbdb1ae9f32/image009.jpg" style="border-width: 0px; border-style: solid; width: 500px; height: 334px;" /></p>
<p style="margin-left: 40px;"><img alt="probability id plots 2" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/4d3600e9567cef0a24c393222bc0adfe/image010.jpg" style="border-width: 0px; border-style: solid; width: 500px; height: 334px;" /></p>
<p style="margin-left: 40px;"><img alt="probability id plots 3" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/82eba57cf4457de60b01898132c7a5ac/image011.jpg" style="border-width: 0px; border-style: solid; width: 500px; height: 334px;" /></p>
<p style="margin-left: 40px;"><img alt="probability id plots 4" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/8e9eca8a98a451fb1f74d1401a7f593c/image012.jpg" style="border-width: 0px; border-style: solid; width: 500px; height: 334px;" /></p>
<p>Pick the distribution with the Largest P-Value (excluding the Johnson Transformation and the Box Cox Transformation). In this scenario, the exponential distribution fits the data best.</p>
5. What Is the Process Capability?
<p>Now that we know the distribution that best fits these data, we can perform the non-normal capability analysis. In Minitab, select <strong>Stat > Quality Tools > Capability Analysis > Nonnormal…</strong></p>
<p>Populate the “Capability Analysis (Nonnormal Distribution)” dialogue box as seen below. Make sure to select “Exponential” next to Fit distribution. Then, Click on “Options”.</p>
<p style="margin-left: 40px;"><img alt="capability analysis dialog" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/7bc6c94ade960602af1ab6fab2c70a3b/image013.jpg" style="border-width: 0px; border-style: solid; width: 400px; height: 336px;" /></p>
<p>Fill in the “Capability Analysis (Non Normal Distribution): Options” dialogue box with the following:</p>
<p style="margin-left: 40px;"><img alt="capability analysis options dialog" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/9fee6e77df67ecac9358193e0bd0f2bd/image014.jpg" style="border-width: 0px; border-style: solid; width: 400px; height: 237px;" /></p>
<p>We chose “Percents” over “Parts Per Million” because in this scenario it would take years to produce over one million outputs (or data for each installation time).</p>
<p>OK out of the options and main dialog boxes, and you should get the following “Process Capability Report for LeadTime”:</p>
<p style="margin-left: 40px;"><img alt="process capability of lead time" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/2d32c03a666aa3e4b7fd613f24d4c2ce/image015.jpg" style="border-width: 0px; border-style: solid; width: 500px; height: 375px;" /></p>
<p>We interpret the results of a non-normal capability analysis just as we do an analysis done on data with a normal distribution.</p>
<p>Capability is determined by comparing the width of the process variation (VOP) to the width of the specification (VOC).We would like the process spread to be smaller than, and contained within, the specification spread.</p>
<p>That’s clearly not the case with this data.</p>
<p>The Overall Capability index on the right side of the graph depicts how the process is performing relative to the specification limits.</p>
<p>To quickly determine whether the process is capable, compare Ppk with your minimum requirement for the indices. Most quality professionals consider 1.33 to be a minimum requirement for a capable process. A value less than 1 is usually considered unacceptable.</p>
<p>With a Ppk of .23, it seems our IT Installation Groups have work ahead to get their process to meet customer specifications. At least these data offer a clear understanding of how much the process can be improved!</p>
<p> </p>
<strong style="box-sizing: border-box; line-height: 21px;">About the Guest Blogger:</strong>
<p style="box-sizing: border-box; font-family: "Segoe UI", Frutiger, "Frutiger Linotype", "Dejavu Sans", "Helvetica Neue", Tahoma, Arial, sans-serif; line-height: 21px; color: rgb(77, 79, 81); margin: 1em 0px; font-size: 14px;"><em style="box-sizing: border-box; line-height: 21px;">Kevin Clay is a Master Black Belt and President and CEO of Six Sigma Development Solutions, Inc., certified as an Accredited Training Organization with the International Association of Six Sigma Certification (IASSC). For more information visit <a href="http://www.sixsigmadsi.com/">www.sixsigmadsi.com</a></em><em style="box-sizing: border-box; line-height: 21px;"> or contact Kevin at 866-922-6566 or <a href="mailto:kclay@sixsigmadsi.com">kclay@sixsigmadsi.com</a>.</em></p>
<p style="box-sizing: border-box; font-family: "Segoe UI", Frutiger, "Frutiger Linotype", "Dejavu Sans", "Helvetica Neue", Tahoma, Arial, sans-serif; line-height: 21px; color: rgb(77, 79, 81); margin: 1em 0px; font-size: 14px;"> </p>
<p style="box-sizing: border-box; font-family: "Segoe UI", Frutiger, "Frutiger Linotype", "Dejavu Sans", "Helvetica Neue", Tahoma, Arial, sans-serif; line-height: 21px; color: rgb(77, 79, 81); margin: 1em 0px; font-size: 14px;"><strong style="box-sizing: border-box; line-height: 21px;">Would you like to publish a guest post on the Minitab Blog? Contact <a href="mailto:publicrelations@minitab.com?subject=Guest%20Blogger" style="box-sizing: border-box; line-height: 21px; color: rgb(66, 139, 202); font-weight: normal; background: 0px 0px; text-decoration: none;">publicrelations@minitab.com</a>.</strong></p>
Capability AnalysisControl ChartsLean Six SigmaQuality ImprovementSix SigmaStatisticsFri, 31 Mar 2017 12:00:00 +0000http://blog.minitab.com/blog/statistics-in-the-field/5-simple-steps-to-conduct-capability-analysis-with-non-normal-dataGuest BloggerGauging Gage Part 1: Is 10 Parts Enough?
http://blog.minitab.com/blog/fun-with-statistics/gauging-gage-part-1-is-10-parts-enough
<p>"You take 10 parts and have 3 operators measure each 2 times."</p>
<p>This standard approach to a Gage R&R experiment is so common, so accepted, so ubiquitous that few people ever question whether it is effective. Obviously one could look at whether 3 is an adequate number of operators or 2 an adequate number of replicates, but in this first of a series of posts about "Gauging Gage," I want to look at 10. Just 10 parts. How accurately can you assess your measurement system with 10 parts?</p>
Assessing a Measurement System with 10 Parts
<p>I'm going to use a simple scenario as an example. I'm going to simulate the results of 1,000 Gage R&R Studies with the following underlying characteristics:</p>
<ol>
<li>There are no operator-to-operator differences, and no operator*part interaction.</li>
<li>The measurement system variance and part-to-part variance used would result in a %Contribution of 5.88%, between the popular guidelines of <1% is excellent and >9% is poor.</li>
</ol>
<p>So—no looking ahead here—based on my 1,000 simulated Gage studies, what do you think the distribution of %Contribution looks like across all studies? Specifically, do you think it is centered near the true value (5.88%), or do you think the distribution is skewed, and if so, how much do you think the estimates vary?</p>
<p>Go ahead and think about it...I'll just wait here for a minute.</p>
<p>Okay, ready?</p>
<p>Here is the distribution, with the guidelines and true value indicated:</p>
<p style="margin-left:40px"><img alt="PctContribution for 10 Parts" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/46889f0e-f0a5-4b4a-8a19-2d2b8dce6087/Image/af91c4815469651cc698c3aa7d980c61/histogram_of_10_pctcontribution.gif" style="height:384px; width:576px" /></p>
<p>The good news is that it is roughly averaging around the true value.</p>
<p>However, the distribution is highly skewed—a decent number of observations estimated %Contribution to be at least double the true value with one estimating it at about SIX times the true value! And the variation is huge. In fact, about 1 in 4 gage studies would have resulted in failing this gage.</p>
<p>Now a standard gage study is no small undertaking—a total of 60 data points must be collected, and once randomization and "masking" of the parts is done it can be quite tedious (and possibly annoying to the operators). So just how many parts would be needed for a more accurate assessment of %Contribution?</p>
Assessing a Measurement System with 30 Parts
<p>I repeated 1,000 simulations, this time using 30 parts (if you're keeping score, that's 180 data points). And then for kicks, I went ahead and did 100 parts (that's 600 data points). So now consider the same questions from before for these counts—mean, skewness, and variation.</p>
<p>Mean is probably easy: if it was centered before, it's probably centered still.</p>
<p>So let's really look at skewness and how much we were able to reduce variation:</p>
<p style="margin-left:40px"><img alt="10 30 100 Parts" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/46889f0e-f0a5-4b4a-8a19-2d2b8dce6087/Image/2a6885f40fda396703a0176a030ae332/histogram_of_10_30_100_parts.gif" style="height:384px; width:576px" /></p>
<p>Skewness and variation have clearly decreased, but I suspect you thought variation would have decreased more than it did. Keep in mind that %Contribution is affected by your estimates of repeatability and reproducibility as well, so you can only tighten this distribution so much by increasing number of parts. But still, even using 30 parts—an enormous experiment to undertake—still results in this gage failing 7% of the time!</p>
<p>So what is a quality practitioner to do?</p>
<p>I have two recommendations for you. First, let's talk about %Process. Often times the measurement system we are evaluating has been in place for some time and we are simply verifying its effectiveness. In this case, rather than relying on your small sampling of parts to estimate the overall variation, you can use the historical standard deviation as your estimate and eliminate much of the variation caused by the same sample size of parts. Just enter your historical standard deviation in the Options subdialog in Minitab:</p>
<p style="margin-left:40px"><img alt="Options Subdialog" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/46889f0e-f0a5-4b4a-8a19-2d2b8dce6087/Image/cc286906ae0d7171affa92523707f722/options_dialog.png" style="height:422px; width:456px" /></p>
<p>Then your output will include an additional column of information called %Process. This column is the equivalent of the %StudyVar column, but using the historical standard deviation (which comes from a much larger sample) instead of the overall standard deviation estimated from the data collected in your experiment:</p>
<p style="margin-left:40px"><img alt="Percent Process" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/46889f0e-f0a5-4b4a-8a19-2d2b8dce6087/Image/a7c6669b4b10272aac25b420e67d561c/pctprocess_output.GIF" style="height:130px; width:462px" /></p>
<p>My second recommendation is to include confidence intervals in your output. This can be done in the <em>Conf Int </em>subdialog:</p>
<p style="margin-left:40px"><img alt="Conf Int sibdialog" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/46889f0e-f0a5-4b4a-8a19-2d2b8dce6087/Image/f3433eb79f2aacded976c9a2d7733e00/conf_int_dialog.gif" style="height:191px; width:381px" /></p>
<p>Including confidence intervals in your output doesn't inherently improve the wide variation of estimates the standard gage study provides, but it does force you to recognize just how much uncertainty there is in your estimate. For example, consider this output from the gageaiag.mtw sample dataset in Minitab with confidence intervals turned on:</p>
<p style="margin-left:40px"><img alt="Output with CIs" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/46889f0e-f0a5-4b4a-8a19-2d2b8dce6087/Image/68bc06ad673742b3f76922b1c31d813a/output_with_cis.GIF" style="height:162px; width:520px" /></p>
<p>For some processes you might accept this gage based on the %Contribution being less than 9%. But for most processes you really need to trust your data, and the 95% CI of (2.14, 66.18) is a red flag that you really shouldn't be very confident that you have an acceptable measurement system.</p>
<p>So the next time you run a Gage R&R Study, put some thought into how many parts you use and how confident you are in your results!</p>
<p><a href="http://blog.minitab.com/blog/fun-with-statistics/gauging-gage-part-2-are-3-operators-or-2-replicates-enough">see Part II of this series</a><br />
<a href="http://blog.minitab.com/blog/fun-with-statistics/gauging-gage-part-3-how-to-sample-parts">see Part III of this series</a></p>
AutomotiveData AnalysisGovernmentHealthcareLean Six SigmaManufacturingMedical DevicesMiningQuality ImprovementServicesSix SigmaStatisticsStatsWed, 29 Mar 2017 15:31:00 +0000http://blog.minitab.com/blog/fun-with-statistics/gauging-gage-part-1-is-10-parts-enoughJoel SmithWhat to Do When Your Data's a Mess, part 3
http://blog.minitab.com/blog/understanding-statistics/what-to-do-when-your-datas-a-mess2c-part-3
<p>Everyone who analyzes data regularly has the experience of getting a worksheet that just isn't ready to use. Previously I wrote about tools you can use to <a href="http://blog.minitab.com/blog/understanding-statistics/what-to-do-when-your-data-is-a-mess-part-1">clean up and eliminate clutter in your data</a> and <a href="http://blog.minitab.com/blog/understanding-statistics/what-to-do-when-your-datas-a-mess2c-part-2">reorganize your data</a>. </p>
<p><span style="line-height: 1.6;">In this post, I'm going to highlight tools that help you get the most out of messy data by altering its characteristics.</span></p>
Know Your Options
<p>Many problems with data don't become obvious until you begin to analyze it. A shortcut or abbreviation that seemed to make sense while the data was being collected, for instance, might turn out to be a time-waster in the end. What if abbreviated values in the data set only make sense to the person who collected it? Or a column of numeric data accidentally gets coded as text? You can solve those problems quickly with <a href="http://www.minitab.com/products/minitab">statistical software</a> packages.</p>
Change the Type of Data You Have
<p>Here's an instance where a data entry error resulted in a column of numbers being incorrectly classified as text data. This will severely limit the types of analysis that can be performed using the data.</p>
<p><img alt="misclassified data" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/c45b427d3e5e2b5eac4a505ed5c3b24f/misclassified_data.png" style="width: 200px; height: 156px;" /></p>
<p>To fix this, select <strong>Data > Change Data Type</strong> and use the dialog box to choose the column you want to change.</p>
<p><img alt="change data type menu" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/46ece127300500409098383a2e476a9b/text_to_numeric_data.png" style="width: 376px; height: 175px;" /></p>
<p>One click later, and the errant text data has been converted to the desired numeric format:</p>
<p><img alt="numeric data" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f1b9df0211f9085e577a41b0e3661b45/numeric_data.png" style="width: 200px; height: 156px;" /></p>
Make Data More Meaningful by Coding It
<p>When this company collected data on the performance of its different functions across all its locations, it used numbers to represent both locations and units. </p>
<p><img alt="uncoded data" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/d22a57fe9e9e398bd948e86c0adafe34/uncoded_data.png" style="width: 135px; height: 158px;" /></p>
<p>That may have been a convenient way to record the data, but unless you've memorized what each set of numbers stands for, interpreting the results of your analysis will be a confusing chore. You can make the results easy to understand and communicating by coding the data. </p>
<p>In this case, we select <strong>Data > Code > Numeric to Text...</strong></p>
<p><img alt="code data menu" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/c75e46cc190497fd41b0e6736518c0fe/code_data_menu.png" style="width: 384px; height: 255px;" /></p>
<p>And we complete the dialog box as follows, telling the software to replace the numbers with more meaningful information, like the town each facility is located in. </p>
<p><img alt="Code data dialog box" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/cd75c14324187806b8f3a74a3b8996b4/code_data_dialog.png" style="width: 400px; height: 345px;" /></p>
<p>Now you have data columns that can be understood by anyone. When you create graphs and figures, they will be clearly labeled. </p>
<p><img alt="Coded data" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/7ff81bdb08170d6d8a4e8547623cf557/coded_data.png" style="width: 161px; height: 200px;" /></p>
Got the Time?
<p>Dates and times can be very important in looking at performance data and other indicators that might have a cyclical or time-sensitive effect. But the way the date is recorded in your data sheet might not be exactly what you need. </p>
<p>For example, if you wanted to see if the day of the week had an influence on the activities in certain divisions of your company, a list of dates in the MM/DD/YYYY format won't be very helpful. </p>
<p><img alt="date column" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f5b0dd178afbc0352f8dc2d9378e887b/date_column.png" style="width: 240px; height: 223px;" /></p>
<p>You can use <strong>Data > Date/Time > Extract to Text... </strong>to identify the day of the week for each date.</p>
<p><img alt="extract-date-to-text" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/7e6f7e8a87ee8291b9c6d51507092c19/extract_date_to_text.png" style="width: 351px; height: 132px;" /></p>
<p>Now you have a column that lists the day of the week, and you can easily use it in your analysis. </p>
<p><img alt="day column" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/dede93c9621917a0cfb54beef121d4e2/day_column.png" style="width: 249px; height: 205px;" /></p>
Manipulating for Meaning
<p>These tools are commonly seen as a way to correct data-entry errors, but as we've seen, you can use them to make your data sets more meaningful and easier to work with.</p>
<p>There are many other tools available in Minitab's Data menu, including an array of options for arranging, combining, dividing, fine-tuning, rounding, and otherwise massaging your data to make it easier to use. Next time you've got a column of data that isn't quite what you need, try using the Data menu to get it into shape.</p>
<p> </p>
<p> </p>
Data AnalysisStatisticsStatsTue, 28 Mar 2017 12:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/what-to-do-when-your-datas-a-mess2c-part-3Eston Martz