Minitab | MinitabBlog posts and articles about using Minitab software in quality improvement projects, research, and more.
http://blog.minitab.com/blog/minitab/rss
Sun, 26 Feb 2017 21:22:40 +0000FeedCreator 1.7.3Gage Linearity and Bias: Wake Up and Smell Your Measuring System
http://blog.minitab.com/blog/statistics-and-quality-data-analysis/gage-linearity-and-bias%3A-wake-up-and-smell-your-measuring-system
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/b074146c1d3c9fde7367970e1220eb76/extra_large_coffee_cup.jpg" style="float: right; width: 250px; height: 251px; border-width: 1px; border-style: solid; margin: 10px 15px;" />Right now I’m enjoying my daily dose of morning joe. As the steam rises off the cup, the dark rich liquid triggers a powerful enzyme cascade that jump-starts my brain and central nervous system, delivering potent glints of perspicacity into the dark crevices of my still-dormant consciousness.</p>
<p>Feels good, yeah! But is it good for me? Let’s see what the studies say…</p>
<ul>
<li>Drinking more than 4 cups of coffee per day is associated with a <a href="http://www.ncbi.nlm.nih.gov/pubmed/22591295" target="_blank">higher risk of death from all causes</a></li>
<li>Drinking coffee is <em>inversely</em> associated with the mortality risk, with those drinking 4 cups a day having the <a href="http://www.ncbi.nlm.nih.gov/pubmed/25156996" target="_blank">lowest risk of death from all causes</a></li>
<li>Drinking 2 to 4 cups of coffee a day is associated with a <a href="http://www.ncbi.nlm.nih.gov/pubmed/22422331" target="_blank">higher risk of cardiovascular disease</a></li>
<li>Drinking 3.5 cups of coffee per day is associated with a <a href="http://www.ncbi.nlm.nih.gov/pubmed/24201300" target="_blank">lower risk of cardiovascular disease</a></li>
</ul>
<p>Hmm. These are just a few results from copious studies on coffee consumption. But already I'm having a hard time processing the information.</p>
<p>Maybe another cup of coffee would help. Er...uh...maybe not.</p>
The pivotal question you should ask before you perform any analysis
<p>There are a host of possible explanations that might help explain these seemingly contradictory study results.</p>
<p>Perhaps the studies utilized different study designs, different statistical methodologies, different survey techniques, different confounding variables, different clinical endpoints, or different populations. Perhaps the physiological effects of coffee are modulated by the dynamic interplay of a complex array of biomechanisms that are differently triggered in each individual based on their unique, dynamic phenotype-genotype profiles.</p>
<p>Or perhaps...just perhaps...there's something even more fundamental at play. The proverbial elephant in the room of any statistical analysis. The essential, pivotal question upon which all your results rest...</p>
<p><em>"What am I measuring? And how well am I actually measuring what I think I'm measuring?"</em></p>
Measurement system analysis helps ensure that your study isn't doomed from the start.
<p>A <span><a href="http://blog.minitab.com/blog/understanding-statistics/explaining-quality-statistics-so-my-boss-will-understand-measurement-systems-analysis-msa">measurement systems analysis (MSA)</a></span> evaluates the consistency and accuracy of a measuring system. MSA helps you determine whether you can trust your data <em>before</em> you use a statistical analysis to identify trends and patterns, test hypotheses, or make other general inferences.</p>
<p>MSA is frequently used for quality control in the manufacturing industry. In that context, the measuring system typically includes the data collection procedures, the tools and equipment used to measure (the "gage"), and the operators who measure.</p>
<p>Coffee consumption studies don't employ a conventional measuring system. Often, they rely on self-reported data from people who answer questionnaires about their life-style habits, such as "How many cups of coffee do you drink in a typical day?" So the measuring "system," loosely speaking, is every respondent who estimates the number of cups they drink. Despite this, could MSA uncover potential issues with measurements collected from such a survey? </p>
<p><strong>Caveat:</strong> What follows is an exploratory exercise performed with small set of nonrandom data for illustrative purposes only. To see standard MSA scenarios and examples, including sample data sets, go to the Minitab's <a href="http://support.minitab.com/en-us/datasets" target="_blank">online dataset library</a> and select the category <em>Measurement systems analysis</em>.</p>
<img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/a0bfc9c6ba82871132fcf40fa785416e/cups.jpg" style="width: 200px; height: 149px; float: right; border-width: 1px; border-style: solid; margin: 10px 15px;" />Gage Linearity and Bias: "Houston, we have a problem..."
<p>For this experiment (I can't call it a study), I collected different coffee cups in the cupboard of our department lunchroom (see image at right). Then I poured different amounts of liquid into each cup and and asked people to tell me how full the cup was. The actual amount of liquid was 0.50 cup, 0.75 cup, or 1 cup, as measured using a standard measuring cup.</p>
<p>To evaluate the estimated "measurements" in relation to the actual reference values, I performed a gage linearity and bias study (<strong>Stat > Quality Tools > Gage Study > Gage Linearity and Bias Study</strong>). The results are shown below.</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/206aa287a18fca42fa11e31ccce49879/gage_linearity_and_bias_for_coffee_cups.jpg" style="width: 960px; height: 720px;" /></p>
<p><strong>Note:</strong> A gage linearity and bias study evaluates whether a measurement system has bias when compared to a known standard. It also assesses linearity—the difference in average bias through the expected operating range of the measuring device. For this example, I didn't enter an estimate of process variation, so the results don't include linearity estimates.</p>
<p>The Y axis shows the amount of bias, which is the difference between the observed measurement using the gage and the reference or master value. For this study, bias is the difference between the reported volume measured using different coffee cups minus the actual measured volume using a standard cup. If the measurements perfectly match the reference values, the data points on the graph should fall along the line bias = 0, with a slope of 0.</p>
<p>That's obviously not the case here. The estimated measurements for all three reference values show considerable negative bias. That is, when using the coffee cups in our department lunchroom as "gages", every person's estimated measurement was much smaller than the actual amount of liquid. Not a surprise, because the coffee cups are larger than a standard cup. (There are coffee cups that hold about one standard cup, by the way, such as <a href="https://images.search.yahoo.com/images/view;_ylt=AwrB8pnRUeJUSDAAEtKJzbkF;_ylu=X3oDMTIyMWVjZG80BHNlYwNzcgRzbGsDaW1nBG9pZAM2YWI0Y2YxYTczMzhjM2Y3MTdmZWFiN2RlNjA0MjFiOQRncG9zAzUEaXQDYmluZw--?.origin=&back=https%3A%2F%2Fimages.search.yahoo.com%2Fyhs%2Fsearch%3F_adv_prop%3Dimage%26va%3Dteema%2Bcoffee%2Bcup%2B0%252C22%2Bl%26fr%3Dyhs-mozilla-001%26hsimp%3Dyhs-001%26hspart%3Dmozilla%26tab%3Dorganic%26ri%3D5&w=450&h=450&imgurl=media-cache-ec0.pinimg.com%2F736x%2F3d%2Fed%2F13%2F3ded134470900ba66c746161838bcbc0.jpg&rurl=http%3A%2F%2Fpinterest.com%2Fpin%2F228346643577633964%2F&size=+9.9KB&name=%3Cb%3ETeema%3C%2Fb%3E+%3Cb%3ECoffee%3C%2Fb%3E+%3Cb%3ECup%3C%2Fb%3E+-+Kaj+Franck+-+Iittala+-+RoyalDesign.com&p=teema+coffee+cup+0%2C22+l&oid=6ab4cf1a7338c3f717feab7de60421b9&fr2=&fr=yhs-mozilla-001&tt=%3Cb%3ETeema%3C%2Fb%3E+%3Cb%3ECoffee%3C%2Fb%3E+%3Cb%3ECup%3C%2Fb%3E+-+Kaj+Franck+-+Iittala+-+RoyalDesign.com&b=0&ni=336&no=5&ts=&tab=organic&sigr=11cjp5aa4&sigb=14o2u26h3&sigi=12djkvq95&sigt=12elhiqco&sign=12elhiqco&.crumb=NFCYiF44SGZ&fr=yhs-mozilla-001&hsimp=yhs-001&hspart=mozilla" target="_blank">the cup that I use every morning</a>. But most Americans don't drink from coffee cups this small. It was designed back in the '50s, when most things—houses, grocery carts, cheeseburgers—were made in more modest proportions).</p>
<p>The Gage Bias table shows that the average bias increases as the amount of liquid increases. And even though this was a small sample, the bias was statistically significant (P < 0.000). Importantly, notice that the bias wasn't consistent at each reference value—there is a considerable range of bias among the estimates at each reference value.</p>
<p>Despite its obvious limitations, this informal, exploratory analysis provides some grounds for speculation.</p>
<p>What does "one cup of coffee" actually mean in studies that use self-reported data? What about categories such as 1-2 cups, or 2-4 cups? If it's not clear what x cups of coffee actually refers to, what do we make of risk estimates that are specifically associated with x number of cups of coffee? Or meta-analyses that combine self-reported coffee consumption data from different countries (equating one Japanese "cup of coffee", say, with one Australian "cup of coffee"?)</p>
<p>Of course, perfect data sets don't exist. And it's possible that some studies may manage to identify valid overall trends and correlations associated with increasing/decreasing coffee consumption.</p>
<p>Still, let's just say that a self-reported "cup of coffee" might best be served not with cream and sugar, but with a large grain of salt.</p>
So before you start brewing your data...
<p>And before you rush off to calculate p-values...it's worth taking the extra time and effort to make sure that you're actually measuring what you <em>think</em> you're measuring.</p>
Fun StatisticsStatisticsStatistics HelpStatsFri, 24 Feb 2017 13:14:00 +0000http://blog.minitab.com/blog/statistics-and-quality-data-analysis/gage-linearity-and-bias%3A-wake-up-and-smell-your-measuring-systemPatrick RunkelThree Common P-Value Mistakes You'll Never Have to Make
http://blog.minitab.com/blog/understanding-statistics/three-common-p-value-mistakes-youll-never-have-to-make
<p>Statistics can be challenging, especially if you're not analyzing data and interpreting the results every day. <a href="http://www.minitab.com/products/minitab/" title="statistical software for analyzing quality data">Statistical software</a> makes things easier by handling the arduous mathematical work involved in statistics. But ultimately, we're responsible for correctly interpreting and communicating what the results of our analyses show.</p>
<p>The p-value is probably the most frequently cited statistic. We use p-values to interpret the results of regression analysis, hypothesis tests, and many other methods. Every introductory statistics student and every Lean Six Sigma Green Belt learns about p-values. </p>
<p>Yet this common statistic is misinterpreted so often that at least one scientific journal has abandoned its use.</p>
What Does a P-value Tell You?
<p>Typically, a P value is defined as "the probability of observing an effect at least as extreme as the one in your sample data—<em>if the <span><a href="http://blog.minitab.com/blog/understanding-statistics/why-shrewd-experts-fail-to-reject-the-null-every-time">null hypothesis</a></span> is true</em>." Thus, the only question a p-value can answer is this one:</p>
<p><em>How likely is it that I would get the data I have, assuming the null hypothesis is true?</em></p>
<p>If your p-value is less than your selected <span><a href="http://blog.minitab.com/blog/adventures-in-statistics-2/understanding-hypothesis-tests%3A-significance-levels-alpha-and-p-values-in-statistics">alpha level</a></span> (typically 0.05), you <em>reject the null hypothesis</em> in favor of the alternative hypothesis. If the p-value is above your alpha value, you <em>fail to reject</em> the null hypothesis. It's important to note that the null hypothesis is never accepted; we can only <em>reject </em>or <em>fail to reject</em> it. </p>
The P-Value in a 2-Sample t-Test
<p>Consider a typical hypothesis test—say, a 2-sample t-test of the mean weight of boxes of cereal filled at different facilities. We collect and weigh 50 boxes from each facility to confirm that the mean weight for each line's boxes is the listed package weight of 14 oz. </p>
<p>Our null hypothesis is that the two means are equal. Our alternative hypothesis is that they are <em>not </em>equal. </p>
<p>To run this test in Minitab, we enter our data in a worksheet and select <strong>Stat > Basic Statistics > 2-Sample T-test</strong>. If you'd like to follow along, you can download the <a href="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/2edc594cf40ec4931e5cd0021df6703e/cereal_weight.mtw">data</a> and, if you don't already have it, get the <a href="http://www.minitab.com/products/minitab/free-trial/">30-day trial of Minitab</a>. In the t-test dialog box, select<em> Both samples are in one column</em> from the drop-down menu, and choose "Weight" for Samples, and "Facility" for Sample IDs.</p>
<p style="margin-left: 40px;"><img alt="t test for the mean" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/1a090752bef395f3b227511c6e57946d/dialog.png" style="width: 424px; height: 296px;" /></p>
<p>Minitab gives us the following output, and I've highlighted the p-value for the hypothesis test:</p>
<p style="margin-left: 40px;"><img alt="t-test output" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/3b27f14d1859460a1875c81384c52ccb/t_test_output.png" style="width: 544px; height: 222px;" /></p>
<p>So we have a p-value of 0.029, which is less than our selected alpha value of 0.05. Therefore, we reject the null hypothesis that the means of Line A and Line B are equal. Note also that while the evidence indicates the means are different, that difference is estimated at 0.338 oz—a pretty small amount of cereal. </p>
<p>So far, so good. But this is the point at which trouble often starts.</p>
Three Frequent Misstatements about P-Values
<p>The p-value of 0.029 means we reject the null hypothesis that the means are equal. But that doesn't mean any of the following statements are accurate:</p>
<ol>
<li><strong>"There is 2.9% probability the means are the same, and 97.1% probability they are different." </strong><br />
We don't know that at all. The p-value only says that <strong><em>if </em></strong>the null hypothesis is true, the sample data collected would exhibit a difference this large or larger only 2.9% of the time. Remember that the p-value doesn't tell you anything <em>directly </em>about what you've seen. Instead, it tells you the <em>odds </em>of seeing it. </li>
<br />
<li><strong>"The p-value is low, which indicates there's an important difference in the means." </strong><br />
Based on the 0.029 p-value shown above, we can conclude that a statistically significant difference between the means exists. But the estimated size of that difference is less than a half-ounce, and won't matter to customers. A p-value may indicate a difference exists, but it tells you nothing about its practical impact.</li>
<br />
<li><strong>"The low p-value shows the alternative hypothesis is true."</strong><br />
A low p-value provides statistical evidence to reject the null hypothesis—but that doesn't prove the truth of the alternative hypothesis. If your alpha level is 0.05, there's a 5% chance you will incorrectly reject the null hypothesis. Or to put it another way, if a jury fails to convict a defendant, it doesn't prove the defendant is <em>innocent</em>: it only means the prosecution failed to prove the defendant's guilt beyond a reasonable doubt. </li>
</ol>
<p>These misinterpretations happen frequently enough to be a concern, but that doesn't mean that we shouldn't use p-values to help interpret data. The p-value remains a very useful tool, as long as we're interpreting and communicating its significance accurately.</p>
P-Value Results in Plain Language
<p>It's one thing to keep all of this straight if you're doing data analysis and statistics all the time. It's another thing if you're only analyze data occasionally, and need to do many other things in between—like most of us. "Use it or lose it" is certainly true about statistical knowledge, which could well be another factor that contributes to misinterpreted p-values. </p>
<p>If you're leery of that happening to you, a good way to avoid that possibility is to use the Assistant in Minitab to perform your analyses. If you haven't used it yet, the Assistant menu guides you through your analysis from start to finish. The dialog boxes and output are all in plain language, so it's easy to figure out what you need to do and what the results mean, even if it's been a while since your last analysis. (But even expert statisticians tell us they like using the Assistant because the output is so clear and easy to understand, regardless of an audience's statistical background.) </p>
<p>So let's redo the analysis above using the Assistant, to see what that output looks like and how it can help you avoid misinterpreting your results—or having them be misunderstood by others!</p>
<p>Start by selecting <strong>Assistant > Hypothesis Test...</strong> from the Minitab menu. Note that a window pops up to explain exactly what a hypothesis test does. </p>
<p style="margin-left: 40px;"><img alt="assistant hypothesis test" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f26601f26db3576a7cf2b5bc3178f9ca/assistant_hypothesis_test.png" style="width: 420px; height: 252px;" /></p>
<p>The Assistant asks what we're trying to do, and gives us three options to choose from.</p>
<p style="margin-left: 40px;"><img alt="hypothesis test chooser" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/fba2ee28b10063e1c5f0f00eb77db1b2/assistant_hypothesis_test_chooser.png" style="width: 600px; height: 472px;" /></p>
<p>We know we want to compare a sample from Line A with a sample from Line B, but what if we can't remember which of the 5 available tests is the appropriate one in this situation? We can get guidance by clicking "Help Me Choose."</p>
<p style="margin-left: 40px;"><img alt="help me choose the right hypothesis test" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/51bb23fbb44603efff50fe4fa1d9dbd1/assistant_hypothesis_test_decision_tree.png" style="width: 700px; height: 551px;" /></p>
<p>The choices on the diagram direct us to the appropriate test. In this case, we choose continuous data instead of attribute (and even if we'd forgotten the difference, clicking on the diamond would explain it). We're comparing two means instead of two standard deviations, and we're measuring two different sets of items since our boxes came from different production lines. </p>
<p>Now we know what test to use, but suppose you want to make sure you don't miss anything that's important about the test, like requirements that must be met? Click the "more..." link and you'll get those details. </p>
<p style="margin-left: 40px;"><img alt="more info about the 2-Sampe t-Test" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/1b4f09a2438b0aaef14e8da6564524cf/assistant_hypothesis_test_more_info.png" style="width: 700px; height: 526px;" /></p>
<p>Now we can proceed to the Assistant's dialog box. Again, statistical jargon is minimized and everything is put in straightforward language. We just need to answer a few questions, as shown. Note that the Assistant even lets us tell it how big a difference needs to be for us to consider it practically important. In this case, we'll enter 2 ounces.</p>
<p style="margin-left: 40px;"><img alt="Assistant 2-sample t-Test dialog" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/994d9172bf788282258f765d4d08aefa/assistant_hypothesis_test_dialog.png" style="width: 641px; height: 495px;" /></p>
<p>When we press OK, the Assistant performs the t-test and delivers three reports. The first of these is a summary report, which includes summary statistics, confidence intervals, histograms of both samples, and more. And interpreting the results couldn't be more straightforward than what we see in the top left quadrant of the diagram. In response to the question, "Do the means differ?" we can see that p-value of 0.029 marked on the bar, very far toward the "Yes" end of the scale. </p>
<p style="margin-left: 40px;"><img alt="2-Sample t-Test summary report" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/8927b8bc833551678715f68149dd18ad/assistant_hypothesis_test_summary.png" style="width: 700px; height: 526px;" /></p>
<p>Next is the Diagnostic Report, which provides additional information about the test. </p>
<p style="margin-left: 40px;"><img alt="2-Sample t-Test diagnostic report" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/6467a0be0ba60329f2be282e14b9be33/assistant_hypothesis_test_diagnostic.png" style="width: 700px; height: 526px;" /></p>
<p>In addition to letting us check for outliers, the diagnostic report shows us the size of the observed difference, as well as the chances that our test could detect a practically significant difference of 2 oz. </p>
<p>The final piece of output the Assistant provides is the report card, which flags any problems or concerns about the test that we would need to be aware of. In this case, all of the boxes are green and checked (instead of red and x'ed). </p>
<p style="margin-left: 40px;"><img alt="2-Sample t-Test report card" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/0e4cd0dce832a8251701f8175de9a037/assistant_hypothesis_test_report_card.png" style="width: 700px; height: 526px;" /></p>
<p>When you're not doing statistics all the time, the Assistant makes it a breeze to find the right analysis for your situation and to make sure you interpret your results the right way. Using it is a great way to make sure you're not attaching too much, or too little, importance on the results of your analyses.</p>
<p> </p>
Hypothesis TestingStatisticsStatistics HelpStatsWed, 22 Feb 2017 14:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/three-common-p-value-mistakes-youll-never-have-to-makeEston MartzUsing Designed Experiments (DOE) to Minimize Moisture Loss
http://blog.minitab.com/blog/marilyn-wheatleys-blog/using-designed-experiments-doe-to-minimize-moisture-loss
<p><img alt="cake!" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/3a2e0cd50ad9f291b12ae2c44ad20740/image001.png" style="margin: 10px 15px; float: right; width: 259px; height: 221px; border-width: 1px; border-style: solid;" /></p>
<p>As a person who loves baking (and eating) cakes, I find it bothersome to go through all the effort of baking a cake when the end result is too dry for my taste. For that reason, I decided to use a designed experiment in Minitab to help me reduce the moisture loss in baked chocolate cakes, and find the optimal settings of my input factors to produce a moist baked chocolate cake. I’ll share the details of the design and the results in this post.</p>
Choosing Input Factors for the Designed Experiment
<p>Because I like to use premixed chocolate cake mixes, I decided to use two of my favorite cake mix brands for the experiment. For the purpose of this post, I’ll call the brands A and B. Thinking about what could impact the loss of moisture, it is likely that the baking time and the oven temperature will affect the results. Therefore, the factors or inputs that I decided to use for the experiment are:</p>
<ol>
<li>Cake mix brand: A or B (categorical data)</li>
<li>Oven temperature: 350 or 380 degrees Fahrenheit (continuous data)</li>
<li>Baking time: 38 or 46 minutes (continuous data)</li>
</ol>
Measuring the Response
<p>Next, I needed a way to measure the moisture loss. For this experiment, I used an electronic food scale to weigh each cake (in the same baking pan) before and after baking, and then used those weights in conjunction with the formula below to calculate the percent of moisture lost for each cake:</p>
<p style="margin-left: 40px;">% Moisture Loss = <u>100 x initial weight – final weight</u><br />
initial weight</p>
Designing the Experiment
<p>For this experiment, I decided to construct a 23 full factorial design with <span><a href="http://blog.minitab.com/blog/statistics-and-quality-improvement/gummi-bear-doe-what-do-the-center-points-show">center points</a></span> to detect any possible curvature in the response surface. Since the cake mix brand is categorical and therefore has no center point between brand A and brand B, the number of center points will be doubled for that factor. Because of this, I’d have to bake 10 cakes which, even for me, is too many in a single day. Therefore, I decided to run the experiment over two days. Because differences between the days on which the data was collected could potentially introduce additional variation, I decided to add a block to the design to account for any potential variation due to the day. </p>
<p>To create my design in Minitab, I use <strong>Stat</strong> > <strong>DOE</strong> > <strong>Factorial </strong>> <strong>Create Factorial Design</strong>:</p>
<p style="margin-left: 40px;"><img alt="select create factorial design" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/0d1ac44b06547b9f76c2e99c595ae00f/image002.png" style="width: 626px; height: 380px;" /></p>
<p>Minitab 17 makes it easy to enter the details of the design. First, I selected 3 as the number of factors:</p>
<p style="margin-left: 40px;"><img alt="select three factors" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/1777d43538f54cb016bf57b0344a77d1/image5.png" style="width: 437px; height: 357px;" /></p>
<p>Next, I clicked on the <strong>Designs</strong> button above. In the Designs window, I can tell Minitab what type of design I’d like to use with my 3 factors:</p>
<p style="margin-left: 40px;"><img alt="select type of design" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/1805bac7a3e385c09d2826f1d09b8018/image009.png" style="width: 411px; height: 343px;" /></p>
<p>In the window above, I’ve selected a full 23 design, and also added 2 blocks (to account for variation between days), and 1 center point per block. After making the selections and clicking <strong>OK</strong> in the above window, I clicked on the <strong>Factors</strong> button in the main window to enter the details about each of my factors:</p>
<p style="margin-left: 40px;"><img alt="factors" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/b7aa416129f0f8aba1554d5cb6b8370f/image012.jpg" style="width: 448px; height: 309px;" /></p>
<p>Because center points are doubled for categorical factors, and because this design has two blocks, the final design will have a total of 4 center points. After clicking <strong>OK</strong> in the window above, I ended up with the design shown below with 12 runs:</p>
<p style="margin-left: 40px;"><img alt="design data" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/72a0fb4471ee598062521446134d3ea9/image013.png" style="width: 624px; height: 385px;" /></p>
Performing the Experiment and Analyzing the Data
<p>After spending an entire weekend baking cakes and calculating the moisture loss for each one, I entered the data into Minitab for the analysis. I also brought in a lot of cake to share with my colleagues at Minitab!</p>
<p style="margin-left: 40px;"><img alt="data" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/62abd4151d7a51f0b75a78a0fc539686/image015.jpg" style="width: 115px; height: 397px;" /></p>
<p>With the moisture loss for each of my 12 cakes recorded in column C8 in the experiment worksheet, I’m ready to analyze the results. </p>
<p>In Minitab, I used <strong>Stat</strong> > <strong>DOE</strong> > <strong>Factorial </strong>> <strong>Analyze Factorial Design... </strong>and then entered the Moisture Loss column in the <strong>Responses</strong> field:</p>
<p style="margin-left: 40px;"><img alt="Analyze factorial DOE" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/b98a37790ba3d7fba3ca589323d05546/image016.png" style="width: 588px; height: 395px;" /></p>
<p>In the window above, I also clicked on <strong>Terms </strong>to make sure I’m only including the main effects and two-way interactions. After clicking <strong>OK</strong> in each window, Minitab produced a Pareto chart of the standardized effects that I could use to reduce my model:</p>
<p style="margin-left: 40px;"><img alt="pareto of standardized effects" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/1effe05221099d792d412b813f7a8fc7/image018.jpg" style="width: 624px; height: 408px;" /></p>
<p>I can see from the above graph that the main effects (A, B and C) all significantly impact the moisture of the cake, since the bars that represent those terms on the graph extend beyond the red vertical reference line. All of the two-way interactions (AB, AC and BC) are not significant.</p>
<p>I can also see the same information in the ANOVA table in Minitab’s session window:</p>
<p style="margin-left: 40px;"><img alt="ANOVA results" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/6ea3e4f4ef27464ff0345d2cb066f862/image020.jpg" style="width: 624px; height: 264px;" /></p>
<p>In the above ANOVA table, we can see that the cake mix brand, oven temp, and baking time are all significant since their p-values are lower than my alpha of 0.05. </p>
<p>We can also see that all of the 2-way interactions have p-values higher than 0.05, so I’ll conclude that those interactions are not significant and should be removed from the model.</p>
<p>Interestingly, the p-value for the <a href="http://blog.minitab.com/blog/statistics-and-quality-improvement/gummi-bear-doe-what-do-the-blocks-mean">blocks</a> <strong><em>is</em></strong> significant (with a p-value of 0.01). This indicates that there was indeed a difference between the two days in which the data was collected which impacted the results. I'm glad I accounted for that additional variation by including a block in my design!</p>
Analyzing the Reduced Model
<p>To analyze my reduced model, I can go back to <strong>Stat</strong> > <strong>DOE</strong> > <strong>Factorial </strong>> <strong>Analyze Factorial Design</strong>. This time when I click the <strong>Terms</strong> button I’ll keep only the main effects, and remove the two-way interactions. Minitab displays the following ANOVA table for the reduced model:</p>
<p style="margin-left: 40px;"><img alt="ANOVA for reduced model" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/534952331cb248e0bcfe5c0a4b809fe8/image021.png" style="width: 604px; height: 334px;" /></p>
<p>The table shows that all the terms I’ve included (mix brand, oven temp, and baking time) are significant since all the p-values for these terms are lower than 0.05. We can also see that the <a href="http://blog.minitab.com/blog/michelle-paret/doe-center-points-what-they-are-why-theyre-useful">test for curvature based on the center points</a> is not significant (p-value = 0.587), so we can conclude that the relationship between the three factors and moisture loss is linear.</p>
<p>The <a href="http://blog.minitab.com/blog/adventures-in-statistics-2/regression-analysis-how-do-i-interpret-r-squared-and-assess-the-goodness-of-fit">r-squared</a>, r-squared adjusted, and r-squared predicted are all quite high, so this model seems to be a very good fit to the data.</p>
Checking the Residuals
<p>Now I can take a look at the <a href="http://support.minitab.com/en-us/minitab/17/topic-library/modeling-statistics/regression-and-correlation/residuals-and-residual-plots/residual-plots-in-minitab/">residual plots to make sure all the model assumptions</a> for my model have been met:</p>
<p style="margin-left: 40px;"><img alt="residual plots" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f2bc2c1186031a146fe223d2de5bec60/image023.png" style="border-width: 0px; border-style: solid; width: 576px; height: 384px;" /></p>
<p>The residuals in the graph above appear to be normally distributed. The residuals versus fits graph appears to show the points are randomly scattered above and below 0 (which indicates constant variance), and the residuals versus order graph doesn’t suggest any patterns that could be due to the order in which the data was collected. </p>
<p>Now that I'm confident the assumptions for the model have been met, I’ll use this model to determine the optimal settings of my factors so that going forward <strong><em>all</em></strong> the cakes I make will be moist and fabulous! </p>
Optimizing the Response
<p>I can use Minitab’s <strong>Response Optimizer</strong> and my model to tell me exactly what combination of cake mix brand, oven temperature, and baking time I’ll want to use to get the moistest cake. I select <strong>Stat</strong> > <strong>DOE</strong> > <strong>Factorial</strong> > <strong>Response Optimizer</strong>:</p>
<p style="margin-left: 40px;"><img alt="response optimizer dialog" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/2ab9d90741212f95eee4bed85829adab/image024.png" style="border-width: 1px; border-style: solid; width: 581px; height: 485px;" /> </p>
<p>In the above window, I can tell Minitab what my goal is. In this case, I want to know what input settings to use so that the moisture loss will be <em>minimized</em>. Therefore, I choose <strong>Minimize</strong> above and then click <strong>OK</strong>:</p>
<p style="margin-left: 40px;"><img alt="response optimizer" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/c342bff7acfff4e9217b113ce4459acd/image026.png" style="border-width: 0px; border-style: solid; width: 576px; height: 384px;" /></p>
<p>In the above graph, the optimal settings for my factors are marked in red near the top. Using the model that I’ve fit to my data, Minitab is telling me that I can use <strong>Brand B</strong> with an oven temperature of <strong>350</strong> and a baking time of <strong>38</strong> minutes to minimize the moisture loss. Using those values for the inputs, I can expect the moisture loss will be approximately <strong>3.3034</strong>, which is quite low compared to the moisture loss for the cakes collected as part of the experiment.</p>
<p>Success! Now I can use these optimal settings, and I’ll never waste my time baking a dry cake again.</p>
<p>If you’ve enjoyed this post about DOE, you may also like to read some of our <a href="http://blog.minitab.com/?blog_id=41ce27e8-906e-4260-8bd9-178236c098c4&search_terms=DOE&button-submit.x=0&button-submit.y=0&button-submit=Submit">other DOE blog posts</a>.</p>
Design of ExperimentsFun StatisticsStatisticsMon, 20 Feb 2017 13:00:00 +0000http://blog.minitab.com/blog/marilyn-wheatleys-blog/using-designed-experiments-doe-to-minimize-moisture-lossMarilyn WheatleyChi-Square Analysis: Powerful, Versatile, Statistically Objective
http://blog.minitab.com/blog/michelle-paret/chi-square-analysis-powerful-versatile-statistically-objective
<p style="line-height: 20.7999992370605px;">To make objective decisions about the processes that are critical to your organization, you often need to examine categorical data. You may know how to use a t-test or ANOVA when you’re comparing measurement data (like weight, length, <span style="line-height: 1.6;">revenue, </span><span style="line-height: 1.6;">and so on), but do you know how to compare attribute or counts data? It easy to do with <a href="http://www.minitab.com/products/minitab">statistical software</a> like Minitab. </span></p>
<p style="line-height: 20.7999992370605px;"><img alt="failures per production line" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/19b2bd8557279d21284a23e2174fef88/chisquare_onevariable_revision.jpg" style="line-height: 20.8px; width: 400px; height: 267px; float: right; margin: 10px 15px;" /></p>
<p style="line-height: 20.7999992370605px;">One person may look at this bar chart and decide that the production lines performed similarly<span style="line-height: 1.6;">. But another person may focus on the small difference between the bars and decide that one of the lines has outperformed the others. Without an appropriate statistical analysis, how can you know which person is right?</span></p>
<p style="line-height: 20.7999992370605px;">When time, money, and quality depend on your answers, you can’t rely on subjective visual assessments alone. To answer questions like these with statistical objectivity, you can use a Chi-Square analysis.</p>
Which Analysis Is Right for Me?
<p style="line-height: 20.7999992370605px;">Minitab offers three Chi-Square tests. The appropriate analysis depends on the number of variables that you want to examine. And for all three options, the data can be formatted either as raw data or summarized counts.</p>
<strong>Chi-Square Goodness-of-Fit Test – 1 Variable</strong>
<p style="line-height: 20.7999992370605px;">Use Minitab’s <strong>Stat > Tables > Chi-Square Goodness-of-Fit Test (One Variable)</strong> when you have just one variable.</p>
<p style="line-height: 20.7999992370605px;">The Chi-Square Goodness-of-Fit Test can test if the proportions for all groups are equal. It can also be used to test if the proportions for groups are equal to specific values. For example:</p>
<ul style="line-height: 20.7999992370605px;">
<li>A bottle cap manufacturer operates three production lines and records the number of defective caps for each line. The manufacturer uses the <strong>Chi-Square Goodness-of-Fit Test</strong> to determine if the proportion of defectives is equal across all three lines.</li>
<li>A bottle cap manufacturer operates three production lines and records the number of defective caps and the total number produced for each line. One line runs at high speed and produces twice as many caps as the other two lines that run at a slower speed. The manufacturer uses the <strong>Chi-Square Goodness-of-Fit Test</strong> to determine if the number of defective units for each line is proportional to the volume of caps it produces.</li>
</ul>
<strong>Chi-Square Test for Association – 2 Variables</strong>
<p style="line-height: 20.7999992370605px;">Use Minitab’s <strong>Stat > Tables > Chi-Square Test for Association</strong> when you have two variables.</p>
<p style="line-height: 20.7999992370605px;">The Chi-Square Test for Association can tell you if there’s an association between two variables. In another words, it can test if two variables are independent or not. For example:</p>
<ul style="line-height: 20.7999992370605px;">
<li>A paint manufacturer operates two production lines across three shifts and records the number of defective units per line per shift. The manufacturer uses the <strong>Chi-Square Goodness-of-Fit Test</strong> to determine if the percent defective is similar across all shifts and production lines. Or, are certain lines during certain shifts more prone to issues?<br />
<br />
<img alt="Defectives per line per shift" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/8f78b557ef93b1390b79b866787d5503/chisquare_twovariables_revision.jpg" style="width: 600px; height: 400px;" /><br />
<br />
</li>
<li>A call center randomly samples 100 incoming calls each day of the week for each of its three locations, for a total of 1500 calls. They then record the number of abandoned calls per location per day. The call center uses a Chi-Square Test to determine if there are is any association between location and day of the week with respect to missed calls.</li>
</ul>
<p style="margin-left: 40px;"><img alt="call center data" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/e60774e6ddac893694e7b8a1a39a47b4/callcenterdata.jpg" style="width: 265px; height: 133px;" /><br />
</p>
<strong>Cross Tabulation and Chi-Square – 2 or more variables</strong>
<p style="line-height: 20.7999992370605px;">Use Minitab’s <strong>Stat > Tables > Cross Tabulation and Chi-Square </strong>when you have two or more variables.</p>
<p style="line-height: 20.7999992370605px;">If you simply want to test for associations between two variables, you can use either <strong>Cross Tabulation and Chi-Square</strong> or <strong>Chi-Square Test for Association</strong>. However, <span><a href="http://blog.minitab.com/blog/understanding-statistics/using-cross-tabulation-and-chi-square-the-survey-says">Cross Tabulation and Chi-Square</a></span> also lets you control for the effect of additional variables. Here’s an example:</p>
<ul style="line-height: 20.7999992370605px;">
<li>A tire manufacturer records the number of failed tires for four different tire sizes across two production lines and three shifts. The plant uses a Cross Tabulation and Chi-Square analysis to look for failure dependencies between the tire sizes and production lines, while controlling for any shift effect. Perhaps a particular production line for a certain tire size is more prone to failures, but only during the first shift.</li>
</ul>
<p style="line-height: 20.7999992370605px;">This analysis also offers advanced options. For example, if your categories are ordinal (good, better, best or small, medium, large) you can include a special test for concordance.</p>
Conducting a Chi-Square Analysis in Minitab
<p style="line-height: 20.7999992370605px;">Each of these analyses is easy to run in Minitab. For more examples that include step-by-step instructions, just navigate to the Chi-Square menu of your choice and then click Help > example.</p>
<p style="line-height: 20.7999992370605px;">It can be tempting to make subjective assessments about a given set of data, their makeup, and possible interdependencies, but why risk an error in judgment when you can be sure with a Chi-Square test?</p>
<p style="line-height: 20.7999992370605px;">Whether you’re interested in one variable, two variables, or more, a Chi-Square analysis can help you make a clear, statistically sound assessment.</p>
Data AnalysisHypothesis TestingLean Six SigmaQuality ImprovementSix SigmaStatisticsStatistics HelpFri, 17 Feb 2017 13:16:00 +0000http://blog.minitab.com/blog/michelle-paret/chi-square-analysis-powerful-versatile-statistically-objectiveMichelle Paret10 Tips to Increase your Minitab Efficiency
http://blog.minitab.com/blog/statistics-in-the-field/10-tips-to-increase-your-minitab-efficiency
<p>by Rehman Khan, guest blogger</p>
<p>There are many articles giving <span><a href="http://blog.minitab.com/blog/quality-data-analysis-and-statistics/tips-and-tricks-from-minitabs-technical-support-team">Minitab tips</a></span> already, so to be different I have done mine in the style of my <a href="http://www.rmksixsigma.com/" target="_blank">books</a>, which use example-based learning. All ten tips are shown using a single example.</p>
<p>If you don’t already know these 10 tips you will get much more benefit if you work along with the example. You don’t need to download any files to work along—although, if you don’t have Minitab already, you may want to <a href="http://www.minitab.com/products/minitab/free-trial/">download the free 30-day trial</a>. </p>
<p>First I will list my 10 tips, and then as we go through the examples I will highlight where they are going to be used. The 10 tips are</p>
<ol>
<li>Using the Auto-Fill function.</li>
<li>Making patterned text data.</li>
<li>Using Set Base when generating random data.</li>
<li>Using the Edit Last Dialog Box function.</li>
<li>Clearing a menu.</li>
<li>Setting the order of a categorical axis on a graph.</li>
<li>Updating a graph.</li>
<li>Making a Similar Graph, which is especially useful when formatting has been changed.</li>
<li>Using the Layout Tool.</li>
<li>Using Conditional Formatting.</li>
</ol>
<p>We are going to generate 3 columns of data which will have 30 rows in each column. Column 1 is called Shift, it relates to a production process where there is a Morning, Afternoon and Night shift. Columns C2 & C3 are two yield values, which are recorded for each shift. Remember this blog is about learning 10 useful Minitab tips rather than learning to examine the data for this process.</p>
<p>First, start a new Minitab project and type in the column headings shown. Then type ‘Morning’, ‘Afternoon’ and then ‘Night’ in successive cells after shift.</p>
<p style="margin-left: 40px;"><img alt="3 columns of data" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/eb57f3f9d011e2514e9ad2d12ca68c7a/image002.jpg" style="width: 222px; height: 109px;" /></p>
Tip 1: Using the Autofill function
<p>We could copy-and-paste the first three cells 9 times to make our 30 rows of data. But instead, we can highlight the three cells and then grab the Fill Handle and drag that down to Auto Fill our text. Try that now but don’t go too far down.</p>
<p style="margin-left: 40px;"><img alt="autofill" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/84adbb3368b869da756f396ccf18a770/image005.png" style="width: 236px; height: 273px;" /></p>
Tip 2: Making Patterned Text data
<p>Another way of getting Minitab to do all the laborious typing is to use the Make Patterned Data command. Select <strong>Calc > Make Patterned Data> Text Values...</strong> Complete the menu as shown and click OK.</p>
<p style="margin-left: 40px;"><img alt="patterned data dialog" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/51ffc4940432d77e92d6c7a37524a4b4/image008.jpg" style="width: 395px; height: 308px; border-width: 1px; border-style: solid;" /> </p>
<p>Now we will randomly generate two columns of yield data to simulate the performance of the shifts. However, random data is not as random as you might think.</p>
Tip 3: Using Set Base when generating random data.
<p style="margin-left: 40px;"><img alt="set base" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/86ccd130d45e4a8c2a9c5c6215294d89/image010.jpg" style="width: 329px; height: 123px; border-width: 1px; border-style: solid;" /><img alt="rows of data to generate" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/7fc1c5ad417283ac28fa6dc025087854/image012.jpg" style="width: 265px; height: 214px; border-width: 1px; border-style: solid;" /></p>
<p>The Set Base command fixes the starting point of Minitab’s Random number generator. So even though we are generating random data on different machines at different times, using the same starting point ensures that Minitab will give you the same random data that it gives me. Select <strong>Calc > Set Base...</strong> Then enter 3 as the Base for the random number generator and click OK.</p>
<p>We are going to use a Uniform distribution for Yield1. Go to <strong>Calc > Random Data > Uniform…</strong> then complete the menu as shown and then click OK. Because we used the Set Base command, all of our randomly generated data will be the same!</p>
Tip 4: Using the Edit Last Dialog Box function
<p style="margin-left: 40px;"><img alt="Edit last dialog - CTRL+E" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/776fd8dd28913115ca01294e9e620b17/image014.jpg" style="width: 288px; height: 85px; border-width: 1px; border-style: solid;" /></p>
<p>This is <em>the</em> must-know tip for Minitab. To quickly navigate to the last dialog box you had open, press the control key and then press ‘e’, written as ‘ctrl+e’. Alternatively, press the edit last dialog icon in the tool bar as shown. This should have re-opened the dialog box we used to generate 30 rows of random data from the uniform distribution.</p>
Tip 5: Clearing a dialog
<p>To completely clear a dialog box, just press F3. This is very useful, since it will clear sub-dialogs as well. Press F3 to clear the menu now. Complete the menu as you did to create the Yield1 data, but this time store the data in column Yield2. Your data should look like the screenshot shown.</p>
<p style="margin-left: 40px;"><img alt="randomly generated data" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/67fb7572306561f937a3fe5b76e9d21a/image016.jpg" style="width: 227px; height: 166px; border-width: 1px; border-style: solid;" /></p>
<p style="margin-left: 40px;"><img alt="bar graph" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/4cc8706f928cbbecbcfba346bd10245b/image018.jpg" style="width: 229px; height: 208px; border-width: 1px; border-style: solid;" /></p>
<p>For the next part of the demonstration we need to produce a graph. Go to <strong>Graph > Bar Chart...</strong></p>
<p>From the Bar Represents drop-down menu, select ‘A Function of a Variable.’ Ensure that One Y Simple is selected and then click OK. Complete the dialog box as the screenshot shows. Then click OK to produce the chart.</p>
Tip 6: Setting the order of a categorical axis on a graph
<p>Notice that the chart’s X-axis labels are in alphabetical order. This is the default for Minitab but we sometimes need to change the order to be user-friendly.</p>
<p style="margin-left: 40px;"><img alt="12 Tip6 Value order" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/42d0a3452bd8bcba6a75cd3bad6d982f/image020.jpg" style="width: 520px; height: 337px; border-width: 1px; border-style: solid;" /> </p>
<p>To change the order of the x-axis variables,go to the worksheet and </p>
<p>place the active cursor anywhere in column C1. Right-click and then select <strong>Column Properties > Value Order…</strong> There are a number of options available, but we will set the radio button for Value Order to ‘Order of occurrence in the worksheet’ as shown.</p>
<p style="margin-left: 40px;"><img alt="value order for C1" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/4b534539b007d62b3e7a2dae5433b342/image022.jpg" style="width: 206px; height: 111px; border-width: 1px; border-style: solid;" /></p>
Tip 7: Updating a graph.
<p>We can now either recreate our graph—or we can update it. Look in the top-left corner of the graph. The yellow warning triangle and circular blue arrows mean that the graph is out of sync with the data in the worksheet. To update the graph, right-click on it and then select Update Graph Now. You also have the option of keeping the graph automatically updated. Note: some graphs cannot be updated, they must be recreated when data are changed.</p>
<p style="margin-left: 40px;"><img alt="update graph" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/724ee8836519dd2a8979232342d8b192/image024.png" style="width: 431px; height: 430px;" /></p>
<p style="margin-left: 40px;"><img alt="formatted graph" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/0be7e8b9d1bf6b6d5072a53a4ca29465/image026.jpg" style="width: 520px; height: 334px; border-width: 1px; border-style: solid;" /></p>
Tip 8: Making a Similar Graph, which is especially useful when formatting has been changed.
<p>The next graph shown is the same as the one you should have open, but I have changed the format to meet a fictitious company standard. If I need to make similar graphs and not repeat the formatting adjustments every time, there is a shortcut for doing this. First, ensure the graph is selected by left clicking on it. Then go to <strong>Editor > Make Similar Graph…</strong> </p>
<p>The dialog allows basic changes to the graph. Change ‘Yield1’ to ‘Yield2’ in the new variable column.</p>
<p style="margin-left: 40px;"><img alt="17 Tip8 Make Similar Graph" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/1a069ad84a55f8d008bf75ddf341656c/image028.jpg" style="width: 304px; height: 52px;" /></p>
<p>Then Click OK to produce the similar graph for column Yield2.</p>
<p style="margin-left: 40px;"><img alt="Similar Graph" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/497ecabc0b3dcbc3f4fd204bd092fe2d/image030.jpg" style="width: 520px; height: 336px; border-width: 1px; border-style: solid;" /></p>
Tip 9: Using the Layout Tool
<p>We already have two graphs. To demonstrate our next tip, let’s also make two boxplots for Yield1 and Yield2, respectively. If we want to display these four graphs on the same plot we can use the Layout Tool to make a multi-graph plot using existing graphs. Select any graph and then go to <strong>Editor > Layout Tool…</strong></p>
<p>On the top-left of the layout tool we can change how many plots are shown. We will use a 2x2 layout but Minitab can go up to 9x9.</p>
<p style="margin-left: 40px;"><img alt="arrange graphs dialog" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a743eac456b9e31e3caaf3a2d3bfc2ae/image032.png" style="width: 600px; height: 380px; border-width: 1px; border-style: solid;" /></p>
<p>The Layout Tool is an easy to use, but is best learned through a bit of experimentation. Try arranging and ordering the graphs in different ways. When you are done click on the Finish button.</p>
<p style="margin-left: 40px;"><img alt="ordered graphs" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/d2971cf4174bca312f09ea98f1dd4fc3/image034.png" style="width: 600px; height: 391px; border-width: 1px; border-style: solid;" /></p>
<p>Note that you can change the formatting of the new plot can by just as you can adjust other graphs.</p>
Tip 10: Using Conditional Formatting
<p>For the final tip, I am going to give you a brief introduction to the conditional formatting tools added in Minitab 17.</p>
<p>If I wanted to quickly identify my best performers—which we’ll define as those with a Yield1 greater than 95—in the Project Window, I can use Conditional Formatting. Go to <strong>Data > Conditional Formatting > Highlight Cell > Greater Than...</strong></p>
<p style="margin-left: 40px;"><img alt="21 Tip10 Conditional Formatting" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/51effc0aa93bd84b54985bce74295490/image036.jpg" style="width: 297px; height: 89px; border-width: 1px; border-style: solid;" /></p>
<p>Complete the dialog as shown, then click OK. You’ll see the results in the data sheet. Conditional Formatting is very useful when sanitizing large data files.</p>
<p style="margin-left: 40px;"><img alt="conditional formatting" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/9c3b9c92990fb56c4b8536cee91157e3/image038.png" style="width: 231px; height: 246px; border-width: 1px; border-style: solid;" /></p>
<p>I hope you have enjoyed learning about my 10 favorite Minitab tricks, and that you find them helpful the next time you’re analyzing your own data!</p>
<p><strong>About the Guest Blogger…</strong></p>
<p><em>Rehman Khan is the author of <a href="https://www.amazon.com/Six-Sigma-Statistics-using-Minitab17/dp/1539155056/ref=asap_bc?ie=UTF8">Six Sigma Statistics using Minitab 17</a> and also <a href="https://www.amazon.com/gp/product/1118307577/ref=s9_psimh_gw_p14_d0_i2?pf_rd_m=ATVPDKIKX0DER&pf_rd_s=center-2&pf_rd_r=02HDWYRTS5GZ1V8MKAXR&pf_rd_t=101&pf_rd_p=1389517282&pf_rd_i=507846">Problem Solving and Data Analysis using Minitab</a>. Recently he has started his own Youtube channel called <a href="https://www.youtube.com/channel/UCfKKqW1ENFSSrTTGQRK9pQw">RMK Six Sigma</a>. Rehman is a SigmaPro Master Black Belt and Charted Chemical Engineer. He works for FMC Chemicals Ltd in the UK as a Manufacturing Excellence Engineer.</em></p>
<p> </p>
Data AnalysisStatisticsWed, 15 Feb 2017 13:00:00 +0000http://blog.minitab.com/blog/statistics-in-the-field/10-tips-to-increase-your-minitab-efficiencyGuest Blogger3 Things a Histogram Can Tell You
http://blog.minitab.com/blog/michelle-paret/3-things-a-histogram-can-tell-you
<p>Histograms are one of the <a href="http://blog.minitab.com/blog/real-world-quality-improvement/seven-basic-quality-tools-to-keep-in-your-back-pocket">most common graphs</a> used to display numeric data. Anyone who takes a statistics course is likely to learn about the histogram, and for good reason: histograms are easy to understand and can instantly tell you a lot about your data.</p>
<p>Here are three of the most important things you can learn by looking at a histogram. </p>
Shape—Mirror, Mirror, On the Wall…
<p>If the left side of a histogram resembles a mirror image of the right side, then the data are said to be symmetric. In this case, the mean (or average) is a good approximation for the center of the data. And we can therefore safely utilize <a href="http://www.minitab.com/products/minitab/">statistical tools</a> that use the mean to analyze our data, such as t-tests.</p>
<p>If the data are <em>not</em> symmetric, then the data are either left-skewed or right-skewed. If the data are skewed, then the <a href="http://blog.minitab.com/blog/michelle-paret/using-the-mean-its-not-always-a-slam-dunk">mean may not provide a good estimate</a> for the center of the data and represent where most of the data fall. In this case, you should consider using the median to evaluate the center of the data, rather than the mean.</p>
<p style="margin-left:.5in;"><strong><em>Did you know...</em></strong></p>
<p style="margin-left:.5in;"><em>If the data are left-skewed, then the mean is typically LESS THAN the median. </em></p>
<p style="margin-left:.5in;"><em>If the data are right-skewed, then the mean is typically GREATER THAN the median.</em></p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/7a38ca8d58c107c2d0317664c7086835/skewedhistograms.jpg" style="width: 549px; height: 202px; margin-left: 50px; margin-right: 50px;" /></p>
Span—A Little or a Lot?
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/49e90b007df84912ee1731a947d9ab54/histogrambin.jpg" style="line-height: 20.7999992370605px; float: right; width: 400px; height: 267px; margin: 10px 15px;" /><span style="line-height: 1.6;">Suppose you have a data set that contains the salaries of people who work at your organization. It would be interesting to know where the minimum and maximum values fall, and where you are relative to those values. Because histograms use bins to display data—where a bin represents a given range of values—you can’t see exactly what the specific values are for the minimum and maximum, like you can on an </span><a href="http://blog.minitab.com/blog/real-world-quality-improvement/three-ways-individual-value-plots-can-help-you-analyze-data" style="line-height: 1.6;">individual value plot</a><span style="line-height: 1.6;">. However, you can still observe an approximation for the range and see how spread out the data are. And you can answer questions such as "Is there a little bit of variability in my organization's salaries, or a lot?"</span></p>
<div>
Outliers (and the ozone layer)
<p>Outliers can be described as extremely low or high values that do not fall near any other data points. Sometimes outliers represent unusual cases. Other times they represent data entry errors, or perhaps data that does not belong with the other data of interest. Whatever the case may be, outliers can easily be identified using a histogram and should be <a href="http://blog.minitab.com/blog/statistics-and-quality-improvement/how-to-use-brushing-to-investigate-outliers-on-a-graph">investigated</a> as they can shed interesting information about your data. </p>
<img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/50fc0c8ff8f8cdc3848d529db961134a/histogramoutlier.jpg" style="line-height: 20.7999992370605px; width: 400px; height: 267px; margin-left: 50px; margin-right: 50px;" />
<p><span style="line-height: 1.6;">Rewind to the mid-1980s when scientists reported depleting ozone levels above Antarctica. The Goddard Space Center had studied atmospheric ozone levels, but surprisingly didn’t discover the issue. Why? The analysis they used automatically eliminated any Dobson readings below 180 units because ozone levels that low were thought to be impossible.</span></p>
<div> </div>
</div>
LearningStatisticsStatsMon, 13 Feb 2017 13:15:00 +0000http://blog.minitab.com/blog/michelle-paret/3-things-a-histogram-can-tell-youMichelle ParetA Field Guide to Statistical Distributions
http://blog.minitab.com/blog/statistics-in-the-field/a-field-guide-to-statistical-distributions
<p><em><span style="line-height: 1.6;">by Matthew Barsalou, guest blogger. </span></em></p>
<p>The old saying “if it walks like a duck, quacks like a duck and looks like a duck, then it must be a duck” may be appropriate in bird watching; however, the same idea can’t be applied when observing a statistical distribution. The dedicated ornithologist is often armed with binoculars and a field guide to the local birds and this should be sufficient. A statologist (I just made the word up, feel free to use it) on the other hand, is ill-equipped for the visual identification of his or her targets.</p>
Normal, Student's t, Chi-Square, and F Distributions
<p>Notice the upper two distributions in figure 1. The <span><a href="http://blog.minitab.com/blog/fun-with-statistics/normal-the-kevin-bacon-of-distributions">normal distribution</a></span> and student’s t distribution may appear similar. However, the standard normal distribution is calculated using n and <a href="http://blog.minitab.com/blog/michelle-paret/guinness-t-tests-and-proving-a-pint-really-does-taste-better-in-ireland">student’s t distribution</a> is calculated using n-1. This may appear to be a minor difference, but when n is small, student’s t distribution displays much more peakedness. Student’s t distribution approaches the normal distribution as the sample size increases, but it never truly matches the shape of the normal distribution.</p>
<p>Observe the Chi-square and F distribution in the lower half of figure 1. The shapes of the distributions can vary and even the most astute observer will not be able to differentiate between them by eye. Many distributions can be sneaky like that. It is a part of their nature that we must accept as we can’t change it.</p>
<p align="center"><img alt="Distribution Field Guide Figure 1" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/b5c12365f066b6ca3d255bcd458314e1/distribution_field_guide_1.gif" style="width: 605px; height: 352px;" /><em><span style="line-height: 1.6;">Figure 1</span></em></p>
Binomial, Hypergeometric, Poisson, and Laplace Distributions
<p>Notice the distributions illustrated in figure 2. A bird watcher may suddenly encounter four birds sitting in a tree; a quick check of a reference book may help to determine that they are all of a different species. The same can’t always be said for statistical distributions. <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-and-using-discrete-distributions">Observe the binomial distribution, hypergeometric distribution and Poisson distribution</a>. We can’t even be sure the three are not the same distribution. If they are together with a Laplace distribution, an observer may conclude “one of these does not appear to be the same as the others.” But they <em>are </em>all different, which our eyes alone may fail to tell us.</p>
<p align="center"><img alt="Distribution Field Guide Figure 2" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/b9011bf86767f49c3e7ec47c76d20631/distribution_field_guide_2.gif" style="width: 605px; height: 352px;" /><em><span style="line-height: 1.6;">Figure 2</span></em></p>
Weibull, Cauchy, Loglogistic, and Logistic Distributions
<p>Suppose we observe the four distributions in figure 3.What are they? Could you tell if they were not labeled? We must identify them correctly before we can do anything with them. One is a Weibull distribution, but all four could conceivably be various Weibull distributions. The shape of the Weibull distribution varies based upon the shape parameter (κ) and scale parameter (λ).The Weibull distribution is a useful, but potentially devious distribution that can be much like the double-barred finch, which may be mistaken for an owl upon first glance.</p>
<p align="center"><img alt="Distribution Field Guide Figure 3" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/2b606d88ff9ae159f94dcac04748c3e2/distribution_field_guide_3.gif" style="width: 605px; height: 351px;" /><em><span style="line-height: 1.6;">Figure 3</span></em></p>
<p>Attempting to visually identify a statistical distribution can be very risky. Many distributions such as the Chi-Square and F distribution change shape drastically based on the number of degrees of freedom. Figure 4 shows various shapes for the Chi-Square, F distribution and the Weibull distribution. Figure 4 also compares a standard normal distribution with a standard deviation of one to a t distribution with 27 degrees of freedom; notices how the shapes overlap to the point where it is no longer possible to tell the two distributions apart.</p>
<p>Although there is no definitive Field Guide to Statistical Distributions to guide us, there are formulas available to correctly identify statistical distributions. We can also use <a href="http://www.minitab.com/products/minitab">Minitab Statistical Software</a> to identify our distribution.</p>
<p align="center"><img alt="Distribution Field Guide Figure 4" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/aa4be49733e980c8c7e26395c5e8262a/distribution_field_guide_4.gif" style="width: 605px; height: 351px;" /><em style="line-height: 1.6;">Figure 4</em></p>
<p>Go to <strong>Stat > Quality Tools > Individual Distribution Identification...</strong> and enter the column containing the data and the subgroup size. The results can be observed in either the session window (figure 5) or the graphical outputs shown in figures 6 through 9.</p>
<p>In this case, we can conclude we are observing a 3-parameter Weibull distribution based on the p value of 0.364.</p>
<p align="center"><img alt="Distribution Field Guide Figure 5" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/29448180c3ff01cae81cfaf250a60115/distribution_field_guide_5.gif" style="width: 547px; height: 739px;" /></p>
<p align="center"><em>Figure 5</em></p>
<p> </p>
<p style="text-align: center;"><img alt="Distribution Field Guide Figure 6" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/781c7a83b14261ae062c63a07479b10d/distribution_field_guide_6.png" style="width: 576px; height: 384px;" /><em style="line-height: 1.6;">Figure 6</em></p>
<p style="text-align: center;"><img alt="Distribution Field Guide Figure 7" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/fcf5a7b56b859e6861ae8d96e8273fe1/distribution_field_guide_7.png" style="width: 576px; height: 384px;" /><em><span style="line-height: 1.6;">Figure 7</span></em></p>
<p style="text-align: center;"><em><img alt="Distribution Field Guide Figure 8" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a13530fb7ec7ee8e3fe90143772eefbc/distribution_field_guide_8.png" style="width: 576px; height: 384px;" /><span style="line-height: 1.6;">Figure 8</span></em></p>
<p style="text-align: center;"><em><img alt="Distribution Field Guide Figure " src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/6f28cb199afaee379ccc2244a955557f/distribution_field_guide_9.png" style="width: 576px; height: 384px;" /><span style="line-height: 1.6;">Figure 9</span></em></p>
<p> </p>
<p> </p>
<div> </div>
<div>
<p style="line-height: 20.7999992370605px;"><strong>About the Guest Blogger</strong></p>
<p style="line-height: 20.7999992370605px;"><em><a href="https://www.linkedin.com/pub/matthew-barsalou/5b/539/198" target="_blank">Matthew Barsalou</a> is a statistical problem resolution Master Black Belt at <a href="http://www.3k-warner.de/" target="_blank">BorgWarner</a> Turbo Systems Engineering GmbH. He is a Smarter Solutions certified Lean Six Sigma Master Black Belt, ASQ-certified Six Sigma Black Belt, quality engineer, and quality technician, and a TÜV-certified quality manager, quality management representative, and auditor. He has a bachelor of science in industrial sciences, a master of liberal studies with emphasis in international business, and has a master of science in business administration and engineering from the Wilhelm Büchner Hochschule in Darmstadt, Germany. He is author of the books <a href="http://www.amazon.com/Root-Cause-Analysis-Step---Step/dp/148225879X/ref=sr_1_1?ie=UTF8&qid=1416937278&sr=8-1&keywords=Root+Cause+Analysis%3A+A+Step-By-Step+Guide+to+Using+the+Right+Tool+at+the+Right+Time" target="_blank">Root Cause Analysis: A Step-By-Step Guide to Using the Right Tool at the Right Time</a>, <a href="http://asq.org/quality-press/display-item/index.html?item=H1472" target="_blank">Statistics for Six Sigma Black Belts</a> and <a href="http://asq.org/quality-press/display-item/index.html?item=H1473&xvl=76115763" target="_blank">The ASQ Pocket Guide to Statistics for Six Sigma Black Belts</a>.</em></p>
</div>
<p> </p>
Fun StatisticsStatisticsStatistics HelpStatsFri, 10 Feb 2017 13:00:00 +0000http://blog.minitab.com/blog/statistics-in-the-field/a-field-guide-to-statistical-distributionsGuest BloggerHow to Compute Probabilities
http://blog.minitab.com/blog/understanding-statistics/how-to-compute-probabilities
<p>Have you ever wanted to know the odds of something happening, or not happening? </p>
<p>It's the kind of question that students are frequently asked to calculate by hand in introductory statistics classes, and going through that exercise is a good way to become familiar with the mathematical formulas the underlie probability (and hence, all of statistics). </p>
<p>But let's be honest: when class is over, most people don't take the time to calculate those probabilities—at least, not by hand. Some people even resort to "just making it up." Needless to say, we at Minitab are firmly opposed to just making it up.</p>
<p>The good news is that determining the real odds of something happening doesn't have to be hard work! If you don't want to calculate the probabilities by hand, just let a statistical <a href="http://www.minitab.com/products/minitab">software package such as Minitab</a> do it for you. </p>
Computing Binomial Probabilities
<p>Let's look at how to compute binomial probabilities. The process we'll go through is similar for any of the 24 distributions Minitab includes.</p>
<p>We use the binomial distribution to characterize a process with two outcomes—for example, if a part passes or fails inspection, if a candidate wins or loses an election, or if a coin lands on heads or tails. This distribution is used frequently in quality control, opinion surveys, medical research, and insurance.<img alt="coin flip" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/953bf4a98a5808dad03fda290f11a88e/coin_flip.png" style="width: 169px; height: 134px; margin: 10px 15px; float: right;" /></p>
<p>Suppose I want to know the probability of getting a certain number of heads in 10 tosses of a fair coin. I need to calculate the odds for a binomial distribution with 10 trials (n=10) and probability of success p=0.5.</p>
<p>To compute the probability of exactly 8 successes, select <strong>Calc > Probability Distributions > Binomial...</strong></p>
<p style="margin-left: 40px;"><img alt="binomial distribution" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/041fa22a2672686aac53cecb668a7ab5/binomial_distribution.png" style="width: 600px; height: 432px;" /></p>
<p>Choose “probability” in the dialog, then enter the number of trials (10) and the probability of success (0.5) for “event probability." If we wanted to calculate the odds for more than one number of events, we could enter them in a worksheet column. But since for now we just want the probability of getting exactly 8 heads in 10 tosses, choose the "Input Constant" option, enter 8, and press OK. </p>
<p style="margin-left: 40px;"><img alt="binomial probability" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/95a4f98f7f1e162646e60ab79ae77866/binomial_dialog.png" style="width: 409px; height: 348px;" /></p>
<p>The following output appears in the session window. It tells us that if we toss a fair coin with an 50% probability of landing on heads, the odds of getting exactly 8 heads out of 10 tosses are just 4%.</p>
<p style="margin-left: 40px;"><img alt="binomial probability out" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/624c6455b0d90790823f988f4b809c51/probability_density_output.png" style="width: 264px; height: 98px;" /></p>
<p>What if we wanted to know the <a href="http://blog.minitab.com/blog/understanding-statistics/difference-between-probability-and-cumulative-probability">cumulative probability</a> of getting 8 heads in 10 tosses? Cumulative probability is the odds of one, two, or more events taking place. The word to remember is "or," because that's what cumulative probability tells you. What are the chances that when you toss this coin 10 times, you'll get 8 <em>or fewer</em> heads? That's cumulative probability.</p>
<p>To compute cumulative probabilities, select “cumulative probability” in the binomial distribution dialog.</p>
<p style="margin-left: 40px;"><img alt="binomial cumulative probability dialog" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/c3fde22b12f74361e8a4026ea4cc98d5/cumulative_probability_dialog.png" style="width: 409px; height: 348px;" /></p>
<p>The probability of 8 or fewer successes, is P(X ≤ 8) = 0.989258, or 98%:</p>
<p style="margin-left: 40px;"><img alt="binomial cumulative probability output" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/baee599e36c8531d505ba428c3d5b5dd/cumulative_probability_output.png" style="width: 279px; height: 106px;" /></p>
Creating a Table of Probabilities
<p>We can also use Minitab to calculate a full table of probabilities. In the worksheet, enter all of the values of the number of successes in a column. For example, for a series of 10 tosses, you would enter 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. Next we'll select <strong>Calc > Probability Distributions > Binomial...</strong> again, but this time choose “Input column” and select C1 instead of using the "Input constant." Specify a different column for storage and press OK.</p>
<p style="margin-left: 40px;"><img alt="binomial distribution probability table dialog" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/b63126f12e719d128095d62dcc193734/binomial_table.png" style="width: 409px; height: 348px;" /></p>
<p>The probabilities appear in column C2:</p>
<p style="margin-left: 40px;"><img alt="binomial distribution probability table output" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/aa62ae48b404b598168ede075d781527/cumulative_binomial_output.png" style="width: 171px; height: 318px;" /></p>
Visualizing the Probabilities
<p>Suppose you want to see the distribution of these probabilities in a graph? Select <strong>Graph > Bar Charts...</strong>, then use the dialog box choose View Single. </p>
<p style="margin-left: 40px;"><img alt="bar chart selection dialog box" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/3a6f4126665880d11ac6d03cd68f10b7/probability_plot_dialog.png" style="width: 369px; height: 207px;" /></p>
<p>Just complete the dialog as shown:</p>
<p style="margin-left: 40px;"><img alt="bar chart creation dialog" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/1925cd666a8b8dd4532b0c33f5b0f6bb/probability_plot_dialog2.png" style="width: 425px; height: 344px;" /></p>
<p>When you press OK, Minitab produces this bar chart: </p>
<p style="margin-left: 40px;"><img alt="bar chart of binomial probabilities" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/16fd41f81936085700b4868ea6b7306c/probability_distribution_plot.png" style="width: 578px; height: 386px;" /></p>
<p>If you need to know the precise value for a given number of events, just hover over that column and Minitab displays the details:</p>
<p style="margin-left: 40px;"><img alt="edit graph dialog" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/0b8d8116d1be4ecf5cc413347908848e/probability_distribution_plot_detail.png" style="width: 243px; height: 216px; border-width: 1px; border-style: solid;" /></p>
<p>As you can see, using Minitab to check and graph the probabilities of different events is not difficult. I hope knowing this increases the odds that the next time you wonder about the likelihood of an event, you'll be able to find it quickly and accurately!</p>
StatisticsStatistics HelpStatsWed, 08 Feb 2017 13:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/how-to-compute-probabilitiesEston MartzHow Taguchi Designs Differ from Factorial Designs
http://blog.minitab.com/blog/applying-statistics-in-quality-projects/how-taguchi-designs-differ-from-factorial-designs
<p>Genichi Taguchi is famous for his pioneering methods of robust quality engineering. One of the major contributions that he made to quality improvement methods is Taguchi designs.<img alt="Genichi Taguchi" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/4aca280ed32b81d1371695bf81ad6afb/taguchi.png" style="width: 190px; height: 191px; margin: 10px 15px; float: right;" /></p>
<p>Designed experiments were first used by agronomists during the last century. This method seemed highly theoretical at first, and was initially restricted to agronomy. Taguchi made the designed experiment approach more accessible to practitioners in the manufacturing industry.</p>
<p>Thanks partly to him, <a href="http://blog.minitab.com/blog/real-world-quality-improvement/leveraging-designed-experiments-doe-for-success">Design of Experiments (DOE)</a> has become quite popular in many companies, and these methods are widely taught in universities and engineering school. In this blog post, I would like to describe differences between Taguchi DOEs and standard Factorial DOEs.</p>
Taguchi Designs
<p>Both Taguchi designs and Factorial designs are are available in the DOE menu in <a href="http://www.minitab.com/products/minitab/">Minitab </a>Statistical Software. To select a design go to <strong>Stat > DOE</strong>.</p>
<p>Many Taguchi designs are based on Factorial designs (2-level designs and Plackett & Burman designs, as well as factorial designs with more than 2 levels). Taguchi’s L8 design, for example, is actually a standard 23 (8-run) factorial design.</p>
<p>Taguchi's designs are usually highly fractionated, which makes them very attractive to practitioners. Doing a half-fraction, quarter-fraction or eighth-fraction of a full factorial design greatly reduces costs and time needed for a designed experiment.</p>
<p>The drawback of a fractionated design is that some interactions may be confounded with other effects. It is important to consider carefully the role of potential confounders and aliases. Failure to take account of such confounded effects can result in erroneous conclusions and misunderstandings.</p>
<p>When using a Taguchi design, one needs to guess which interactions are most likely to be significant—even before any experiment is performed. Taguchi created several linear graphs to help practitioners select the interactions they want to study, based on their prior process knowledge.</p>
Example from a two-level, eight-factor L16 Taguchi design:
<p><img src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/File/4896fb7233e717576af07f896547fd67/4896fb7233e717576af07f896547fd67.png" /></p>
<p>Linear graphs are not displayed in Minitab, but factor allocation and interaction selection are based on Taguchi linear graphs. Suppose that factor A is allocated to column 1 of the orthogonal array, factor B to column 2, C to column 4, D to column 8, E to column 7, F to column 11, G to column 13, and H to column 14 (as described in the Minitab dialog box above and this matches with the corresponding Taguchi linear graph below). With this design, one may select the AB, AC, AD, AE, AF, AG, AH interactions. It is not possible to analyze the remaining interactions, since they are confounded with the selected interactions.</p>
<p><img height="292" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/File/1e6de17908f3f1decab6853498d05e59/1e6de17908f3f1decab6853498d05e59.png" width="725" /></p>
<p>Taguchi suggested several other linear graphs for an L16 design (a 16-run factorial design):</p>
<p><img height="234" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/File/8910a7ccfba6b1d52e0b93a7c32fae2a/8910a7ccfba6b1d52e0b93a7c32fae2a.png" width="755" /></p>
Standard Fractional Factorial Designs
<p>In a standard factorial (non-Taguchi) design, identifying the interactions most likely to be significant is based on alias / confounding "chains." The same alias chains apply to Taguchi designs, but are not displayed. Practitioners may not necessarily be aware that some interaction effects are confounded. However, when you use the factorial design functionality in Minitab the alias chains are clearly displayed showing the confounding pattern:</p>
<p style="margin-left: 40px;"><strong>AB + CG + DH + EF </strong></p>
<p style="margin-left: 40px;"><strong>AC + BG + DF + EH </strong></p>
<p style="margin-left: 40px;"><strong>AD + BH + CF + EG </strong></p>
<p style="margin-left: 40px;"><strong>AE + BF + CH + DG </strong></p>
<p style="margin-left: 40px;"><strong>AF + BE + CD + GH </strong></p>
<p style="margin-left: 40px;"><strong>AG + BC + DE + FH </strong></p>
<p style="margin-left: 40px;"><strong>AH + BD + CE + FG </strong></p>
<p>Confounding patterns are a lot more complex for 3-level and 4-level designs.</p>
<p>In the factorial design menu, the diagram below displays the designs that are available and their resolution (level of confounding). In Minitab, you can quickly access the table of factorial designs shown below by selecting <strong>Stat > DOE > Factorial > Create Factorial Design...</strong> and clicking "Display Available Designs."</p>
<p style="margin-left: 40px;"><img alt="table of available designs" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/5d894e571b90ba171444e0228edf1137/table_of_available_designs.png" style="width: 466px; height: 366px;" /></p>
<p>Red (Resolution III) designs should be avoided (because main effects are confounded with two-factor interactions). In experiments that use Yellow (Resolution IV) designs, two-factor interactions are confounded with other two-factor interactions. These popular designs provide a good compromise between the amount of information obtained and costs (number of experimental runs). The higher resolution designs (in green) offer high quality—with limited or no confounding—at higher costs.</p>
The Pareto and “Heredity” Principles
<p>In a Resolution IV (yellow region) design, main effects are not confounded with two-factor interactions. Often, after an experiment has been performed, the experimenter discovers that only a few of the many effects investigated turn out to be important (the “<a href="http://blog.minitab.com/blog/michelle-paret/fast-food-and-identifying-the-vital-few">Pareto rule</a>”).</p>
<p>When two interactions are confounded with one another, the interaction that is the most likely to be significant is the one containing factors whose main effects are themselves significant (based on the so called “heredity” or “hierarchy” principle). These principles are extremely useful to identify the interactions most likely to be important. We can expect only a few effects to be statistically significant, and we can focus on the interactions containing factors whose main effects are themselves significant.</p>
Two Approaches to Selecting Which Interactions Are Important
<p>Taguchi designs are based on prior selection of the most likely interactions, whereas in standard fractional factorial designs, the interactions are selected later on, after the initial results from the designed experiments have been analyzed. The way in which interactions are selected clearly differs between the two approaches.</p>
<p>According to Taguchi, optimizing a process is not sufficient: making processes and products more robust to quality issues and environmental noises is crucial. In this strategy, designed experiments clearly play a central role.</p>
Design of ExperimentsQuality ImprovementMon, 06 Feb 2017 13:00:00 +0000http://blog.minitab.com/blog/applying-statistics-in-quality-projects/how-taguchi-designs-differ-from-factorial-designsBruno ScibiliaStatistical Tools for Process Validation, Stage 2: Process Qualification
http://blog.minitab.com/blog/michelle-paret/statistical-tools-for-process-validation-stage-2-process-qualification
<p>In its industry guidance to companies that manufacture drugs and biological products for people and animals, the Food and Drug Administration (FDA) recommends three stages for process validation.<img alt="Process Validation Stages" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/26c294a2e9b5b993bfd0f571be11113d/processvalidationstages.jpg" style="width: 220px; height: 235px; margin: 10px 15px; float: right;" /> While my last post covered <a href="http://blog.minitab.com/blog/michelle-paret/statistical-tools-for-process-validation,-stage-1:-process-design">statistical tools for the Process Design stage</a>, here we will focus on the statistical techniques typically utilized for the second stage, Process Qualification.</p>
Stage 2: Process Qualification
<p>During this stage, the process design is evaluated to determine if it is capable of reproducible commercial manufacture. Successful completion of Stage 2 is necessary before commercial distribution.</p>
<span style="color:#008080;"><strong>Example: Evaluate Acceptance Criteria with Capability Analysis</strong></span>
<p>Suppose the active ingredient amount in a tranquilizer needs to be between 360 and 370 mg/mL and you need to assess the quality level, where a minimum Cpk of 1.33 is defined as the acceptance criteria. To assess process performance and determine if measurements are within specification, you can use capability analysis, available in <a href="http://www.minitab.com/products/minitab/">Minitab Statistical Software</a>.</p>
<p>Five samples are randomly selected from 50 batches and the amount of active ingredient is measured. The data is then analyzed relative to the 360 mg/mL minimum and 370 mg/mL maximum.</p>
<p style="margin-left: 40px;"><img alt="Process Capability" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/c48fee09e2caab2f6499c5e1ee74a867/processcapability.jpg" style="width: 400px; height: 300px; margin-left: 15px; margin-right: 15px;" /></p>
<p>The capability analysis reveals a Cpk of 0.53, which fails to meet the acceptance criteria of 1.33. The active ingredient amounts for this tranquilizer are not acceptable. So how can we improve it? The <a href="http://blog.minitab.com/blog/michelle-paret/how-to-improve-cpk">Cp value</a> of 1.41 and the graph both reveal that, although the variability is acceptable with respect to the width of the specification limits, the process average needs to be shifted to a higher mg/mL in order to achieve an acceptable Cpk.</p>
<span style="color:#008080;"><strong>Example: Conduct Variation Analysis across Batches</strong></span>
<p>Suppose we want to assess content uniformity, a critical quality characteristic, across 3 batches at 10 locations. To visualize the intra-batch (within-batch) variation and the inter-batch (between-batch) variation, we can create boxplots for each batch.</p>
<p>A boxplot can help us visually assess both the intra- and inter-batch variation, and identify any outliers. This specific graph shows a homogeneous dispersion of measurements both within each batch and between batches. And there are no <a href="http://blog.minitab.com/blog/michelle-paret/how-to-identify-outliers-and-get-rid-of-them">outliers</a>, which Minitab would flag with an asterisk (*). </p>
<p><img alt="Boxplot" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/f7f242d7f0f0ea0b793c91c4cf710ca8/boxplots.jpg" style="width: 400px; height: 267px; margin-left: 15px; margin-right: 15px;" /></p>
<p>Although boxplots are useful tools to conduct a visual assessment, we can also statistically assess if there is a significant difference in the between batch variation using an equal variances test. The test reveals a p-value greater than an alpha-level of 0.05 (or whatever alpha-level you prefer), which supports the conclusion that there is consistency between batches.</p>
<span style="color:#008080;"><strong>Example: Various Applications for Tolerance Intervals</strong></span>
<p>Another useful tool for Process Qualification is the tolerance interval. This tool has multiple applications. For example, tolerance intervals can be used to compare your process to specifications, profile the outcome of a process, or establish acceptance criteria.</p>
<p>For a given product characteristic, a tolerance interval provides a range of values that likely covers a specified proportion of the population (for example, 95%) for a specified confidence level (like 99%).</p>
<p>For example, suppose we want to know how the active ingredient values in the manufacturing process compare to our specification limits. Based on a dose-response study, the limits are 360 to 370 mg/mL.</p>
<p><img alt="Tolerance Interval" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/9a8a69e60f00528975c7b40d52fb8206/toleranceinterval.jpg" style="width: 400px; height: 267px; margin-left: 15px; margin-right: 15px;" /></p>
<p>For this particular data set, Minitab reveals that we can be 99% confident that 95% of the units will be between 362.272 and 367.468 mg/mL. The process bounds therefore indicate that we can meet the requirements of 360 to 370, and we can conclude with high confidence that the process variation is less than the allowable variation, defined by the specification limits.</p>
<p>Or perhaps we need to assess content uniformity using 99% confidence and 99% coverage. We sample 30 tablets and calculate a tolerance interval, revealing that we can be 99% certain that 99% of the tablets will have a content uniformity within some range, calculated using Minitab.</p>
<p>And that’s how you can use various statistical tools to support Process Qualification. In the final post in this series, we’ll explore the Continued Process Verification stage!</p>
Capability AnalysisData AnalysisQuality ImprovementStatisticsStatistics HelpStatsFri, 03 Feb 2017 13:00:00 +0000http://blog.minitab.com/blog/michelle-paret/statistical-tools-for-process-validation-stage-2-process-qualificationMichelle ParetFive Ways to Make Your Control Charts More Effective
http://blog.minitab.com/blog/applying-statistics-in-quality-projects/five-ways-to-make-your-control-charts-more-effective
<p>Have you ever wished your control charts were better? More effective and user-friendly? Easier to understand and act on? In this post, I'll share some simple ways to make SPC monitoring more effective in Minitab.</p>
Common Problems with SPC Control Charts
<p><img alt="manufacturing line SPC" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/afd4836f46aac6afa5d8f6c780319ecf/assembly_line_clean_room.jpg" style="border-width: 1px; border-style: solid; margin: 10px 15px; width: 200px; height: 200px; float: right;" />I worked for several years in a large manufacturing plant in which control charts played a very important role. Virtually thousands of SPC (Statistical Process Control) charts were used to monitor processes, contamination in clean rooms, monitor product thicknesses and shapes as well as critical equipment process parameters. Process engineers regularly checked the control charts of the processes they were responsible for. Operators were expected to stop using equipment as soon as an out of control alert appeared and report this incident back to their team leader.</p>
<p>But some of the problems we faced had little to do with statistics. For example, comments entered by the operators were often not explicit at all. Control chart limits were not updated regularly and were sometimes not appropriate due to process changes in time. Also, there was confusion about the difference between control limits and specification limits, so even when drifts from the target were clearly identifiable, some process engineers were reluctant to take action as long as their data remained within specifications.</p>
<p>Other problems could be solved with a better knowledge of statistics. For example, some processes were cyclical in nature, and therefore the way subgroups were defined was critical. Also, since the production was based on small batches of similar parts, the within-batch variability was often much smaller than the between-batch variability (simply because the parts within a batch had been processed in very similar conditions). This lead to inappropriate control limits when standard X-bar control charts were used.</p>
<p><img alt="Red chart" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/d3d7f5be5a9dc341bedc846891ae8923/red_i_chart_of_data.jpg" style="width: 395px; height: 283px;" /></p>
Five Ways to Make SPC Monitoring Control Charts More Effective
<p>Let's look at some simple ways to make SPC monitoring more effective in Minitab. In addition to creating standard control charts, you can use Minitab to:</p>
<ol>
<li>Import data quickly to identify drifts as soon as possible.</li>
<li>Create Pareto charts to prevent special causes from reoccurring.</li>
<li>Account for atypical periods to avoid inflating your control limits.</li>
<li>Visually manage SPC alerts to quickly identify the out-of-control points.</li>
<li>Choose the right type of charts for your process.</li>
</ol>
1. Identify drifts as soon as possible.
<p>To ensure that your control charts are up to date in Minitab, you can right click on them and choose “Automatically update Graphs.” However, Minitab is not always available on the shop floor, so the input data often must be saved in an Excel file or in a database.</p>
<p>Suppose that the measurement system generates an XML, Excel or text file, and that this data needs to be reconfigured and manipulated in order to be processed in an SPC chart in Minitab. You can automate these using a Minitab macro.</p>
<p>This macro might automatically retrieve data from an XML or a Text file or from a database (using Minitab's ODBC “Open Data Base Connectivity” functionality) into a Minitab worksheet, or transpose rows into columns, stack columns, or merge several files into one etc. This macro would enable you to obtain a continuously updated Minitab worksheet -- and consequently a continuously updated control chart.</p>
<p>You could easily launch the macro just by clicking on a customized icon or menu in Minitab (see the graph below) in order to update the resulting control chart.</p>
<p><img alt="SPC Tool Bar" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/24ab8cb1bf84f3866eee6ae062532984/spc_tool_bar.JPG" style="width: 608px; height: 127px;" /></p>
<p>Alternatively, if the macro is named Startup.mac, it will launch whenever you launch Minitab. If you're using Minitab to enable process operators or engineers to monitor control charts, you could also customize Minitab's toolbars and icons in order to show only the relevant toolbars and icons and focus on SPC.</p>
<p>The product support section of our website has <a href="http://support.minitab.com/minitab/17/topic-library/minitab-environment/interface/customize-the-minitab-interface/customize-menus-toolbars-and-shortcut-keys/">information on adding a button to a menu or toolbar that will update data from a file or a database</a>.</p>
2. Create Pareto charts to prevent special causes from reoccurring.
<p>Statistical Process Control may be used to identify the true root causes (the so-called special causes) of quality problems from the surrounding process noise (the so-called common causes). The root causes of quality issues need to be truly understood in order to prevent reoccurrence.</p>
<p>A Pareto chart of the causes for out-of-control points might be very useful to identify which special causes occur most frequently.</p>
<p>Comments can be entered in a column of the Minitab worksheet for each out-of-control point. These comments should be standardized for each type of problem. A list of keywords displayed in the Minitab worksheet would help operators enter meaningful keywords, instead of comments that differ each time. Then a Pareto chart could be used to identify the 20% causes that generate 80% of your problems, based on the (standardized) comments entered in the worksheet.</p>
<p><img alt="Pareto" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/884d2caa91a99f824b3d5072d63662f9/pareto_comments.jpg" style="width: 455px; height: 293px;" /></p>
<p>Comments can even be displayed in the SPC chart by using the annotation toolbar. Click on the T (text) icon of the Graph Annotation toolbar.</p>
3. Account for atypical periods to avoid inflating your control limits.
<p>Atypical periods (due to measurement issues, outliers, or a quality crisis) may artificially inflate your control chart limits. In Minitab, control limits may be calculated according to a reference period (one with standard, stable /predictable behavior), or the atypical period may be omitted so that control limits are not affected.</p>
<p>In Minitab, go to Options in the control chart dialogue box, look for the Estimate Tab and select the subgroups to be omitted (untypical behavior, outliers), or use only some specified sub-groups to set reference periods. Although the atypical period will still get displayed on the control chart, it won't affect the way your control limits are estimated.</p>
<p><img alt="Untypical" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/f89ba400982c0be3f8d88f3696306e67/untypical_period.jpg" style="width: 433px; height: 253px;" /></p>
<p>If a reference period has been selected, you will probably need to update it after a certain period of time to ensure that this selection is still relevant.</p>
4) Visually manage SPC alerts to quickly identify out-of-control points.
<p>If the number of control charts you deal with is very large, and you need to quickly identify processes that are drifting away from the target, your could display all control charts in a Tile format (go to<strong> Window > Tile</strong>). When the latest data (i.e., the last row of the worksheet) generates an out-of-control warning, you can have the control chart become completely red, as shown in the picture below:</p>
<p><img alt="Red chart" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/fca03b6571dcc13d05d665ccf4463d17/control_charts_w640.jpeg" style="width: 640px; height: 365px;" /></p>
<p>You can do this by going to <strong>Tools > Options.</strong> Select “Control Charts and Quality Tools” on the list, then choose <strong>Other. </strong>Under the words “When last row of data causes a new test failure for any point,” check the box that says "Change color of chart." Note that the color will change according to the last row (latest <em>single value</em>) not according to the latest subgroup, so this option is more effective when collecting individual values.</p>
5. Choose the right type of charts for your process.
<p>When it comes to control charts, one size does not fit all. That's why you'll see a wide array of options when you select <strong>Stat > Control Charts</strong>. Be sure that you're matching the control chart you're using to the type of data and information you want to monitor. For example, if your subgroups are based on batches of products, I-MR-R/S (within/between) charts are probably best suited to monitor your process.</p>
<p>If you're not sure <a href="http://blog.minitab.com/blog/understanding-statistics/what-control-chart-should-i-use">which control chart to use</a>, you can get details about each type from the Help menu in Minitab, or try using the Assistant menu to direct you to the best test for your situation.</p>
<p> </p>
<p> </p>
<p> </p>
Control ChartsLean Six SigmaQuality ImprovementSix SigmaStatsWed, 01 Feb 2017 13:00:00 +0000http://blog.minitab.com/blog/applying-statistics-in-quality-projects/five-ways-to-make-your-control-charts-more-effectiveBruno ScibiliaHow to Use Data to Understand and Resolve Differences in Opinion, Part 3
http://blog.minitab.com/blog/understanding-statistics/how-to-use-data-to-understand-and-resolve-differences-in-opinion-part-3
<p>In the first part of this series, we saw how <a href="http://blog.minitab.com/blog/understanding-statistics/how-to-use-data-to-understand-and-resolve-differences-in-opinion-part-1">conflicting opinions about a subjective factor</a> can create business problems. In part 2, we used Minitab's Assistant feature to <a href="http://Previously, I discussed how business problems arise when people have conflicting opinions about a subjective factor, such as whether something is the right color, or whether a job applicant is qualified for a position. The key to resolving such honest disagreements and handling future decisions more consistently is a statistical tool called attribute agreement analysis. In this post, we'll cover how to set up and conduct an attribute agreement analysis.">set up an attribute agreement analysis study</a> that will provide a better understanding of where and when such disagreements occur. </p>
<p>We asked four loan application reviewers to reject or approve 30 selected applications, two times apiece. Now that we've collected that data, we can analyze it. If you'd like to follow along, you can download the data set <a href="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/6999a9517a572ff2fc3df681c36b3e44/loan_application_attribute_agreement_analysis.mtw">here</a>.</p>
<p>As is so often the case, you don't need statistical software to do this analysis—but with 240 data points to contend with, a computer and software such as <a href="http://www.minitab.com/products/minitab">Minitab</a> will make it much easier. </p>
Entering the Attribute Agreement Analysis Study Data
<p>Last time, we showed that the only data we need to record is whether each appraiser approved or rejected the sample application in each case. Using the data collection forms and the worksheet generated by Minitab, it's very easy to fill in the Results column of the worksheet. </p>
<p style="margin-left: 40px;"><img alt="attribute agreement analysis worksheet data entry" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/41387399781177418d2cc236755a4f41/attribute_agreement_worksheet_data_entry.png" style="width: 448px; height: 324px;" /></p>
Analyzing the Attribute Agreement Analysis Data
<p>The next step is to use statistics to better understand how well the reviewers agree with each others' assessments, and how consistently they judge the same application when they evaluate it again. Choose <strong>Assistant > Measurement Systems Analysis (MSA)...</strong> and press the <em>Attribute Agreement Analysis</em> button to bring up the appropriate dialog box: </p>
<p style="margin-left: 40px;"><img alt="attribute agreement analysis assistant selection" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/36dfe0d806a026f66083efd4e4e8e3be/assistant_msa_dialog.png" style="width: 500px; height: 393px;" /></p>
<p>The resulting dialog couldn't be easier to fill out. Assuming you used the Assistant to create your worksheet, just select the columns that correspond to each item in the dialog box, as shown: </p>
<p style="margin-left: 40px;"><img alt="attribute agreement analysis dialog box" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a3b70ac28bd2783a6414e6f3ed6583ae/attribute_agreement_analysis_dialog.png" style="width: 500px; height: 285px;" /></p>
<p>If you set up your worksheet manually, or renamed the columns, just choose the appropriate column for each item. Select the value for good or acceptable items—"Accept," in this case—then press OK to analyze the data. </p>
Interpreting the Results of the Attribute Agreement Analysis
<p>Minitab's Assistant generates four reports as part of its attribute agreement analysis. The first is a summary report, shown below: </p>
<p style="margin-left: 40px;"><img alt="attribute agreement analysis summary report" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/b7d0dab58d734d058f6a34830d81f0af/attribute_agreement_analysis_summary_report.png" style="width: 600px; height: 471px;" /></p>
<p>The green bar at top left of the report indicates that overall, the error rate of the application reviewers is 15.8%. That's not as bad as it could be, but it certainly indicates that there's room for improvement! The report also shows that 13% of the time, the reviewers rejected applications that should be accepted, and they accepted applications that should be rejected 18% of the time. In addition, the reviewers rated the same item two different ways almost 22% of the time.</p>
<p>The bar graph in the lower left indicates that Javier and Julia have the lowest accuracy percentages among the reviewers at 71.7% and 78.3%, respectively. Jim has the highest accuracy, with 96%, followed by Jill at 90%.</p>
<p>The second report from the Assistant, shown below, provides a graphic summary of the accuracy rates for the analysis.</p>
<p style="margin-left: 40px;"><img alt="attribute agreement analysis accuracy report" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/e0797dc24089c1624fdc9cd4f881b391/attribute_agreement_analysis_accuracy_report.png" style="width: 600px; height: 471px;" /></p>
<p>This report illustrates the 95% confidence intervals for each reviewer in the top left, and further breaks them down by standard (accept or reject) in the graphs on the right side of the report. Intervals that don't overlap are likely to be different. We can see that overall, Javier and Jim have different overall accuracy percentages. In addition, Javier and Jim have different accuracy percentages when it comes to assessing those applications that should be rejected. However, most of the other confidence intervals overlap, suggesting that the reviewers share similar abilities. Javier clearly has the most room for improvement, but none of the reviewers are performing terribly when compared to the others. </p>
<p>The Assistant's third report shows the most frequently misclassified items, and individual reviewers' misclassification rates:</p>
<p style="margin-left: 40px;"><img alt="attribute agreement analysis misclassification report" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/42c982c39fb1eb3835fe1382d4ccc1f0/attribute_agreement_analysis_misclassification_report.png" style="width: 600px; height: 471px;" /></p>
<p>This report shows that App 9 gave the reviewers the most difficulty, as it was misclassified almost 80% of the time. (A check of the application revealed that this was indeed a borderline application, so the fact that it proved challenging is not surprising.) Among the reject applications that were mistakenly accepted, App 5 was misclassified about half of the time. </p>
<p>The individual appraiser misclassification graphs show that Javier and Julia both misclassified acceptable applications as rejects about 20% of the time, but Javier accepted "reject" applications nearly 40% of the time, compared to roughly 20% for Julia. However, Julia rated items both ways nearly 40% of the time, compared to 30% for Javier. </p>
<p>The last item produced as part of the Assistant's analysis is the report card:</p>
<p style="margin-left: 40px;"><img alt="attribute agreement analysis report card" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/01e31b93979be4847836cfa28fc176bc/attribute_agreement_analysis_report_card.png" style="width: 600px; height: 471px;" /></p>
<p>This report card provides general information about the analysis, including how accuracy percentages are calculated. It also can alert you to potential problems with your analysis (for instance, if there were an imbalance in the amount of acceptable to rejectable items being evaluated); in this case, there are no alerts we need to be concerned about. </p>
Moving Forward from the Attribute Agreement Analysis
<p>The results of this attribute agreement analysis give the bank a clear indication of how the reviewers can improve their overall accuracy. Based on the results, the loan department provided additional training for Javier and Julia (who also were the least experienced reviewers on the team), and also conducted a general review session for all of the reviewers to refresh their understanding about which factors on an application were most important. </p>
<p>However, training may not always solve problems with inconsistent assessments. In many cases, the criteria on which decisions should be based are either unclear or nonexistent. "Use your common sense" is not a defined guideline! In this case, the loan officers decided to create very specific checklists that the reviewers could refer to when they encountered borderline cases. </p>
<p>After the additional training sessions were complete and the new tools were implemented, the bank conducted a second attribute agreement analysis, which verified improvements in the reviewers' accuracy. </p>
<p>If your organization is challenged by honest disagreements over "judgment calls," an attribute agreement analysis may be just the tool you need to get everyone back on the same page. </p>
Data AnalysisLean Six SigmaQuality ImprovementSix SigmaStatisticsMon, 30 Jan 2017 13:04:00 +0000http://blog.minitab.com/blog/understanding-statistics/how-to-use-data-to-understand-and-resolve-differences-in-opinion-part-3Eston MartzSo Why Is It Called "Regression," Anyway?
http://blog.minitab.com/blog/statistics-and-quality-data-analysis/so-why-is-it-called-regression-anyway
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/9c1470fb31867e73c62035d681904ea9/regression_image_real_2.jpg" style="width: 275px; height: 190px; float: right; margin: 10px 15px;" />Did you ever wonder why statistical analyses and concepts often have such weird, cryptic names?</p>
<p>One conspiracy theory points to the workings of a secret committee called the ICSSNN. The International Committee for Sadistic Statistical Nomenclature and Numerophobia was formed solely to befuddle and subjugate the masses. Its mission: To select the most awkward, obscure, and confusing name possible for each statistical concept.</p>
<p>A whistle-blower recently released the following transcript of a secretly recorded ICSSNN meeting:</p>
<p>"This statistical analysis seems pretty straightforward…"</p>
<p>“What does it do?”</p>
<p>“It describes the relationship between one or more 'input' variables and an 'output' variable. It gives you an equation to predict values for the 'output' variable, by plugging in values for the input variables."</p>
<p>“Oh dear. That sounds disturbingly transparent.”</p>
<p>“Yes. We need to fix that—call it something grey and nebulous. What do you think of 'regression'?”</p>
<p>“What’s 'regressive' about it? </p>
<p>“Nothing at all. That’s the point!”</p>
<p>“<em>Re</em>-<em>gres</em>-<em>sion</em>. It does sound intimidating. I’d be afraid to try that alone.”</p>
<p>“Are you sure it’s completely unrelated to anything? Sounds a lot like 'digression.' Maybe it’s what happens when you add up umpteen sums of squares…you forget what you were talking about.”</p>
<p>“Maybe it makes you regress and relive your traumatic memories of high school math…until you revert to a fetal position?”</p>
<p>“No, no. It’s not connected with anything concrete at all.”</p>
<p>“Then it’s perfect!”</p>
<div>
<p> “I don’t know...it only has 3 syllables. I’d feel better if it were at least 7 syllables and hyphenated.”<br />
<br />
“I agree. Phonetically, it’s too easy…people are even likely to pronounce it correctly. Could we add an uvular fricative, or an interdental retroflex followed by a sustained turbulent trill?”</p>
The Real Story: How Regression Got Its Name
<p>Conspiracy theories aside, the term “regression” in statistics was probably not a result of the workings of the ICSSNN. Instead, the term is usually attributed to Sir Francis Galton.</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/ce9c6ec5cf3a4d4be86e4b14efceac6e/galton.jpg" style="float: right; width: 150px; height: 204px; margin: 10px 15px;" />Galton was a 19th century English Victorian who wore many hats: explorer, inventor, meteorologist, anthropologist, and—most important for the field of statistics—an inveterate measurement nut. You might call him a statistician’s statistician. Galton just couldn’t stop measuring anything and everything around him.</p>
<p>During a meeting of the Royal Geographical Society, Galton devised a way to roughly quantify boredom: he counted the number of fidgets of the audience in relation to the number of breaths he took (he didn’t want to attract attention using a timepiece). Galton then converted the results on a time scale to obtain a mean rate of 1 fidget per minute per person. Decreases or increases in the rate could then be used to gauge audience interest levels. (That mean fidget rate was calculated in 1885. I’d guess the mean fidget rate is astronomically higher today—especially if glancing at an electronic device counts as a fidget.)</p>
<p>Galton also noted the importance of considering sampling bias in his fidget experiment:</p>
<p><em>“These observations should be confined to persons of middle age. Children are rarely still, while elderly philosophers will sometimes remain rigid for minutes.”</em></p>
<p>But I regress…</p>
<p>Galton was also keenly interested in heredity. In one experiment, he collected data on the heights of 205 sets of parents with adult children. To make male and female heights directly comparable, he rescaled the female heights, multiplying them by a factor 1.08. Then he calculated the average of the two parents' heights (which he called the “mid-parent height”) and divided them into groups based on the range of their heights. The results are shown below, replicated on a Minitab graph.</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/ffe53a1d16d9af0934c678bcd5cd558a/galton_graph_1_w1024.jpeg" style="width: 600px; height: 469px;" /></p>
<p>For each group of parents, Galton then measured the heights of their adult children and plotted their median heights on the same graph.</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/00ee5df21329bceb64b34d127f71f789/galton_graph_2_w1024.jpeg" style="width: 600px; height: 506px;" /></p>
<p>Galton fit a line to each set of heights, and added a reference line to show the average adult height (68.25 inches).</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/8af2e39c47cf6d337cf33922a623635a/galton_graph_3_w1024.jpeg" style="width: 600px; height: 495px;" /></p>
<p>Like most statisticians, Galton was all about deviance. So he represented his results in terms of deviance from the average adult height.</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/329474034fe47e0e67cf40d4fadc0bef/galton_graph_4_w1024.jpeg" style="width: 600px; height: 472px;" /></p>
<p>Based on these results, Galton concluded that as heights of the parents deviated from the average height (that is as they became taller or shorter than the average adult), their children tended to be less extreme in height. That is, the heights of the children <em>regressed</em> to the average height of an adult.</p>
<p>He calculated the rate of regression as 2/3 of the deviance value. So if the average height of the two parents was, say, 3 inches taller than the average adult height, their children would tend to be (on average) approximately 2/3*3 = 2 inches taller than the average adult height.</p>
<p>Galton published his results in a paper called “<em>Regression towards Mediocrity in Hereditary Stature.</em>”</p>
<p>So here’s the irony: The term regression, as Galton used it, didn't refer to the statistical procedure he used to determine the fit lines for the plotted data points. In fact, Galton didn’t even use the least-squares method that we now most commonly associate with the term “regression.” (The least-squares method had already been developed some 80 years previously by Gauss and Legendre, but wasn’t called “regression” yet.) In his study, Galton just "eyeballed" the data values to draw the fit line.</p>
<p>For Galton, “regression” referred only to the tendency of extreme data values to "revert" to the overall mean value. In a biological sense, this meant a tendency for offspring to revert to average size ("mediocrity") as their parentage became more extreme in size. In a statistical sense, it meant that, with repeated sampling, a variable that is measured to have an extreme value the first time tends to be closer to the mean when you measure it a second time. </p>
<p>Later, as he and other statisticians built on the methodology to quantify correlation relationships and to fit lines to data values, the term “regression” become associated with the statistical analysis that we now call regression. But it was just by chance that Galton's original results using a fit line happened to show a <em>regression</em> of heights. If his study had showed increasing deviance of childrens' heights from the average compared to their parents, perhaps we'd be calling it "progression" instead.</p>
<p>So, you see, there’s nothing particularly “regressive” about a regression analysis.</p>
<p>And <em>that </em>makes the ICSSNN <em>very</em> happy. <img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/fba2883ce387826f291937af897d1024/winker_3.jpg" style="width: 20px; height: 20px;" /></p>
Don't Regress....<em>Progress</em>
<p>Never let<span><a href="http://blog.minitab.com/blog/understanding-statistics/10-statistical-terms-designed-to-confuse-non-statisticians"> intimidating terminology</a></span> deter you from using a statistical analysis. The sign on the door is often much scarier than what's behind it. Regression is an intuitive, practical statistical tool with broad and powerful applications.</p>
<p>If you’ve never performed a regression analysis before, a good place to start is the Minitab Assistant. See Jim Frost’s post on <a href="http://blog.minitab.com/blog/adventures-in-statistics/multiple-regression-analysis-and-response-optimization-examples-using-the-assistant-in-minitab-17" target="_blank">using the Assistant to perform a multiple regression analysis.</a> Jim has also compiled a <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-tutorial-and-examples" target="_blank">helpful compendium of blog posts on regression</a>.</p>
<p>And don’t forget Minitab Help. In Minitab, choose <strong>Help > Help</strong>. Then click <strong>Tutorials > Regression</strong>, or <strong>Stat Menu > Regression</strong>.</p>
Sources
<p>Bulmer, M. <em>Francis Galton: Pioneer or Heredity and Biometry.</em> Johns Hopkins University Press, 2003.</p>
<p>Davis, L. J. <em>Obsession: A History.</em> University of Chicago Press, 2008.</p>
<p>Galton, F. “Regression towards Mediocrity in Hereditary Stature.” <a href="http://galton.org/essays/1880-1889/galton-1886-jaigi-regression-stature.pdf">http://galton.org/essays/1880-1889/galton-1886-jaigi-regression-stature.pdf</a></p>
<p>Gillham, N. W. <em>A Life of Sir Francis Galton. </em>Oxford University Press, 2001.</p>
<p>Gould, S. J. <em>The Mismeasure of Man.</em> W. W. Norton, 1996.</p>
</div>
Fun StatisticsLearningRegression AnalysisStatisticsFri, 27 Jan 2017 13:00:00 +0000http://blog.minitab.com/blog/statistics-and-quality-data-analysis/so-why-is-it-called-regression-anywayPatrick RunkelThe Empirical CDF, Part 1: What's a CDF?
http://blog.minitab.com/blog/data-analysis-and-quality-improvement-and-stuff/the-empirical-cdf-part-1-whats-a-cumulative-distribution-function
<p>T'was the season for toys recently, and Christmas day found me playing around with a classic, the Etch-a-Sketch. As I noodled with the knobs, I had a sudden flash of recognition: my drawing reminded me of the Empirical CDF Plot in <a href="http://www.minitab.com/en-us/products/minitab/" target="_blank">Minitab Statistical Software</a>. Did you just ask, "What's a CDF plot? And what's so empirical about it?" Both very good questions. Let's start with the first, and we'll save that second question for a future post.<img alt="etch-a-sketch" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/76570a14fdb86c06064023036dd40547/etchasketch.jpg" style="width: 300px; height: 248px; margin: 10px 15px; float: right;" /></p>
<p>The acronym CDF stands for Cumulative Distribution Function. If, like me, you're a big fan of failures, then you might be familiar with the cumulative failure plot that you can create with some Reliability/Survival tools in Minitab. (For an entertaining and offbeat example, check out this excellent post, <a href="http://blog.minitab.com/blog/fun-with-statistics/what-i-learned-from-treating-childbirth-as-a-failure" target="_blank">What I Learned from Treating Childbirth as a Failure</a>.) The cumulative failure plot is a CDF.</p>
<p>Even if you're not a fan of failure plots and CDFs, you're likely very familiar with the CDF's famous cousin, the PDF or Probability Density Function. The classic "bell curve" is no more (and no less) than a PDF of a normal distribution.</p>
<p>For example, here's a histogram with a fitted normal PDF for <a href="http://support.minitab.com/en-us/datasets/capability-data-sets/connector-pin-lengths/" target="_blank">PinLength.MTW</a>, from Minitab's online Data Set Library.</p>
<p style="margin-left: 40px;"><img alt="zzz" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/6052805143cdc9ba385a756d763cb14b/image1.png" style="width: 370px; height: 250px;" /></p>
<p>To create this plot, do the following:</p>
<ol>
<li>Download the data file, <a href="http://support.minitab.com/en-us/datasets/capability-data-sets/connector-pin-lengths/" target="_blank">PinLength.MTW</a>, and open it in Minitab.</li>
<li>Choose <strong>Graph > Histogram > With Fit</strong>, and click <strong>OK</strong>.</li>
<li>In <strong>Graph variables</strong>, enter <em>Length</em>.</li>
<li>Click the <strong>Scale</strong> button.</li>
<li>On the <strong>Y-Scale Type</strong> tab, choose <strong>Percent</strong>.</li>
<li>Click <strong>OK </strong>in each dialog box.</li>
</ol>
<p>The data are from a sample of 100 connector pins. The histogram and fitted line show that the lengths of the pins (shown on the x-axis) roughly follow a normal distribution with a mean of 19.26 and a standard deviation of 2.154. You can get the specifics for each bin of the histogram by hovering over the corresponding bar.</p>
<p style="margin-left: 40px;"><img alt="zzz" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/ec002dc01014710c2c4ea5c2e2461bc9/image2.png" style="width: 300px; height: 198px;" /></p>
<p>The height of each bar represents the percentage of observations in the sample that fall within the specified lengths. For example, the fifth bar is the tallest. Hovering over the fifth bar reveals that 18% of the bins have lengths that fall between 18.5 mm to 19.5 mm. Remember that for a moment.</p>
<p>Now let's try something a little different.</p>
<ol>
<li>Double-click the y-axis.</li>
<li>On the <strong>Type</strong> tab, select <strong>Accumulate values across bins</strong>.</li>
<li>Click <strong>OK</strong>.</li>
</ol>
<p style="margin-left: 40px;"><img alt="zzz" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/47f6642ec0e5f99e1b3bb35fef0d21e3/image3.png" style="width: 304px; height: 265px;" /> </p>
<p style="margin-left: 40px;"><img alt="zzz" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/d6e1aa7795624bf8ced35da04130209b/image4.png" style="width: 370px; height: 250px;" /></p>
<p>It looks very different, but it's the exact same data. The difference is that the bar heights now represent <em>cumulative</em> percentages. In other words, each bar represents the percentage of pins with the specified lengths <em>or smaller</em>.</p>
<p style="margin-left: 40px;"><img alt="zzz" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/48616bd270aabb9c1619790f488551b3/image5.png" style="width: 308px; height: 201px;" /><img alt="zzz" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/6a4318376314ccea6634e122567da52a/image6.png" style="width: 289px; height: 201px;" /></p>
<p>For example, the height of the fifth bar indicates that 55% of the pin lengths are less than 19.5 mm. The height of the fourth bar indicates that 37% of pin lengths are 18.5 or less. The difference in height between the 2 bars is 18, which tells us that 18% of the pins have lengths between 18.5 and 19.5. Which, if you remember, we already knew from our first graph. So the cumulative bars look different, but it's just another way of conveying the same information.</p>
<p>You may have also noticed that the fitted line no longer looks like a bell curve. That's because when we changed to a cumulative y-axis, Minitab changed the fitted line from a PDF to... you guessed it, a cumulative distribution function (CDF). Like the cumulative bars, the cumulative distribution function represents the cumulative percentage of observations that have values less than or equal to X. Basically, the CDF of a distribution gives us the <a href="http://blog.minitab.com/blog/understanding-statistics/difference-between-probability-and-cumulative-probability">cumulative probabilities</a> from the PDF of the same distribution.</p>
<p>I'll show you what I mean. Choose <strong>Graph > Probability Distribution Plot > View Probability</strong>, and click <strong>OK</strong>. Then enter the parameters and x-value as shown here, and click OK.</p>
<p style="margin-left: 40px;"><img alt="zzz" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/d46f65baffc43b5c5b528ba140e9574f/image7.png" style="width: 387px; height: 216px;" /><img alt="zzz" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/4cec943a971c38b1a599f3774cc98b54/image8.png" style="width: 371px; height: 290px;" /></p>
<p style="margin-left: 40px;"><img alt="zzz" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/aa8497118e28cebc4b505365d33f95be/image9.png" style="width: 370px; height: 250px;" /></p>
<p>The "Left Tail" probabilities are cumulative probabilities. The plot tells us that the probability of obtaining a random value that is less than or equal to 16 is about 0.065. That's another way of saying that 6.5% of the values in this hypothetical population are less than or equal to 16.</p>
<p>Now we can create a CDF using the same parameters:</p>
<ol>
<li>Choose <strong>Graph > Empirical CDF > Single</strong> and click <strong>OK</strong>.</li>
<li>In <strong>Graph variables</strong>, enter <em>Length</em>.</li>
<li>Click the <strong>Distribution </strong>button.</li>
<li>On the <strong>Data Display</strong> tab, select <strong>Distribution fit only</strong>.</li>
<li>Click <strong>OK</strong>, then click the <strong>Scale </strong>button.</li>
<li>On the <strong>Percentile Lines tab</strong>, under <strong>Show percentile lines at data values</strong>, enter <em>16</em>.</li>
</ol>
<p style="margin-left: 40px;"><img alt="zzz" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/a37d56fe41d8623fe3d975309a54b569/image10.png" style="width: 370px; height: 250px;" /></p>
<p>The CDF tells us that 6.5% of the values in this distribution are less than or equal to 16, as did the PDF.</p>
<p>Let's try another. Double-click the shaded area on the PDF and change x to 19.26, which is the mean of the distribution.</p>
<p style="margin-left: 40px;"><img alt="zzz" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/a7b3d59b5a621d47622f424f0dd364e7/image11.png" style="width: 370px; height: 250px;" /></p>
<p>Naturally, because we're dealing with a perfect theoretical normal distribution here, half of the values in the hypothetical population are less than or equal to the mean. You can also visualize this on the CDF by adding another percentile line. Click the CDF and choose <strong>Editor > Add > Percentile Lines</strong>. Then enter 19.26 under <strong>Show percentile lines at data values</strong>.</p>
<p style="margin-left: 40px;"><img alt="zzz" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/8a1631f3919dd72f300dc5c8658fd789/image12.png" style="width: 370px; height: 250px;" /></p>
<p>There's a little bit of rounding error, but the CDF tells us the same thing that we learned from the PDF, namely that 50% of the values in the distribution are less than or equal to the mean.</p>
<p>Finally, let's input a probability and determine the associated x-value. Double-click the shaded area on the PDF, but this time enter a probability of 0.95 as shown:</p>
<p style="margin-left: 40px;"><img alt="zzz" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/5c659d341a23d4847124633dc66551e0/image13.png" style="width: 249px; height: 321px;" /></p>
<p style="margin-left: 40px;"><img alt="zzz" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/77567bc0699b301691909ac1ba826854/image14.png" style="width: 370px; height: 250px;" /></p>
<p>The PDF shows that the x-value that is associated with a cumulative probability of 0.5 is 22.80. Now right-click the CDF and choose <strong>Add > Percentile Lines</strong>. This time, under <strong>Show percentile lines at Y values</strong>, enter <em>95</em> for 95%. </p>
<p style="margin-left: 40px;"><img alt="zzz" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/2778478ca329cd736a2fb49792a40278/image16.png" style="width: 370px; height: 250px;" /></p>
<p>Once again, other than a little rounding error, the CDF tells us the same thing as the PDF.</p>
<p>For most people (maybe everyone?), the PDF is an easier way to visualize the shape of a distribution. But the nice thing about the CDF is that there's no need to look up probabilities for each x-value individually: all of the x-values in the distribution and the associated cumulative probabilities are right there on the curve.</p>
Data AnalysisFun StatisticsLearningStatisticsStatistics HelpStatsWed, 25 Jan 2017 13:05:00 +0000http://blog.minitab.com/blog/data-analysis-and-quality-improvement-and-stuff/the-empirical-cdf-part-1-whats-a-cumulative-distribution-functionGreg FoxHow to Use Data to Understand and Resolve Differences in Opinion, Part 2
http://blog.minitab.com/blog/understanding-statistics/how-to-use-data-to-understand-and-resolve-differences-in-opinion-part-2
<p>Previously, I discussed how business <a href="http://blog.minitab.com/blog/understanding-statistics/how-to-use-data-to-understand-and-resolve-differences-in-opinion-part-1">problems arise when people have conflicting opinions about a subjective factor</a>, such as whether something is the right color or not, or whether a job applicant is qualified for a position. The key to resolving such honest disagreements and handling future decisions more consistently is a statistical tool called attribute agreement analysis. In this post, we'll cover how to set up and conduct an attribute agreement analysis. </p>
Does This Applicant Qualify, or Not?
<p>A busy loan office for a major financial institution processed many applications each day. A team of four reviewers inspected each application and categorized it as Approved, in which case it went on to a loan officer for further handling, or Rejected, in which case the applicant received a polite note declining to fulfill the request. <img alt="filling out an application" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/64918ddbce0abb888c85031374765491/filling_out_paper.png" style="width: 300px; height: 223px; margin: 10px 15px; float: right; border-width: 1px; border-style: solid;" /></p>
<p>The loan officers began noticing inconsistency in approved applications, so the bank decided to conduct an attribute agreement analysis on the application reviewers.</p>
<p>Two outcomes were possible: </p>
<p style="margin-left: 40px;"><strong>1. The reviewers make the right choice most of the time.</strong> If this is the case, loan officers can be confident that the reviewers do a good job, rejecting risky applicants and approving applicants with potential to be good borrowers. </p>
<p style="margin-left: 40px;"><strong>2. The reviewers too often choose incorrectly.</strong> In this case, the loan officers might not be focusing their time on the best applications, and some people who may be qualified may be rejected incorrectly. </p>
<p>One particularly useful thing about an attribute agreement analysis: even if reviewers make the wrong choice too often, the results will indicate where the reviewers make mistakes. The bank can then use that information to help improve the reviewers' performance. </p>
The Basic Structure of an Attribute Agreement Analysis
<p>A typical attribute agreement analysis asks individual appraisers to evaluate multiple samples, which have been selected to reflect the range of variation they are likely to observe. The appraisers review each sample item several times each, so the analysis reveals how not only how well individual appraisers agree with each other, but also howl consistently each appraiser evaluates the same item. </p>
<p>For this study, the loan officers selected 30 applications, half of which the officers agreed should receive approval and half which should be rejected. These included both obvious and borderline applications. </p>
<p>Next, each of the four reviewers was asked to approve or reject the 30 applications two times. These evaluation sessions took place one week apart, to make it less likely they would remember how they'd classified them the first time. The applications were randomly ordered each time.</p>
<p>The reviewers did not know how the applications had been rated by the loan officers. In addition, they were asked not to talk about the applications until after the analysis was complete, to avoid biasing one another. </p>
Using Software to Set Up the Attribute Agreement Analysis
<p>You don't <em>need </em>to use software to perform an Attribute Agreement Analysis, but a program like <a href="http://www.minitab.com/products/minitab">Minitab</a> does make it easier both to plan the study and gather the data, as well as to analyze the data after you have it. There are two ways to set up your study in Minitab. </p>
<p>The first way is to go to <strong>Stat > Quality Tools > Create Attribute Agreement Analysis Worksheet...</strong> as shown here: </p>
<p style="margin-left: 40px;"><img alt="create attribute agreement analysis worksheet" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/3925c8ebdb8bb03a73638f78dfbc0c3d/attribute_agreement_stat_menu.png" style="width: 510px; height: 495px;" /></p>
<p>This option calls up an easy-to-follow dialog box that will set up your study, randomize the order of reviewer evaluations, and permit you to print out data collection forms for each evaluation session. </p>
<p>But it's even easier to use Minitab's Assistant. In the menu, select <strong>Assistant > Measurement Systems Analysis...</strong>, then click the <em>Attribute Agreement Worksheet</em> button:</p>
<p style="margin-left: 40px;"><img alt="Assistant MSA Dialog" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/78f07ea29a7339b689361b91f602de45/assistant_msa_dialog1.png" style="width: 500px; height: 393px;" /></p>
<p>That brings up the following dialog box, which walks you through setting up your worksheet and printing out data collection forms, if desired. For this analysis, the Assistant dialog box is filled out as shown here: </p>
<p style="margin-left: 40px;"><img alt="Create Attribute Agreement Analysis Worksheet" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/bae016bb52b30ab527fc82f471ae056f/attribute_agreement_setup_dialog.png" style="width: 500px; height: 492px;" /></p>
<p>After you press OK, Minitab creates a worksheet for you and gives you the option to print out data collection forms for each reviewer and each trial. As you can see in the "Test Items" column below, Minitab randomizes the order of the observed items in each trial automatically, and the worksheet is arranged so you need only enter the reviewers' judgments in the the "Results" column. </p>
<p style="margin-left: 40px;"><img alt="attribute agreement analysis worksheet" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/697bab496ec1eebf6dfb10ba4a27b15f/attribute_worksheet.png" style="width: 451px; height: 475px;" /></p>
<p>In my next post, we'll analyze the data collected in this attribute agreement analysis. </p>
Data AnalysisLean Six SigmaQuality ImprovementSix SigmaStatisticsMon, 23 Jan 2017 13:03:00 +0000http://blog.minitab.com/blog/understanding-statistics/how-to-use-data-to-understand-and-resolve-differences-in-opinion-part-2Eston Martz