Data Analysis Software | MinitabBlog posts and articles with tips for using statistical software to analyze data for quality improvement.
http://blog.minitab.com/blog/data-analysis-software/rss
Tue, 26 Sep 2017 09:06:59 +0000FeedCreator 1.7.3The Easiest Way to Do Multiple Regression Analysis
http://blog.minitab.com/blog/understanding-statistics/the-easiest-way-to-do-multiple-regression-analysis
<p>Maybe you're just getting started with analyzing data. Maybe you're reasonably knowledgeable about statistics, but it's been a long time since you did a particular analysis and you feel a little bit rusty. In either case, the <a href="http://www.minitab.com/en-us/products/minitab/assistant/" target="_blank">Assistant menu</a> in Minitab Statistical Software gives you an interactive guide from start to finish. It will help you choose the right tool quickly, analyze your data properly, and even interpret the results appropriately. </p>
<p>One type of analysis many practitioners struggle with is multiple regression analysis, particularly an analysis that aims to optimize a response by finding the best levels for different variables. In this post, we'll use the Assistant to complete a multiple regression analysis and optimize the response.</p>
Identifying the Right Type of Regression
<p>In our example, we'll use a <a href="https://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/File/73fbd67dcfeb8300ce5855cceea6339d/heatflux2.MTW">data set</a> based on some solar energy research. Scientists found the position of focal points could be used to predict total heat flux. The goal of our analysis will be to use the Assistant to find the ideal position for these focal points. </p>
<p>When you select <strong>Assistant > Regression </strong>in Minitab, the software presents you with an interactive decision tree. If you need more explanation about a decision point, just click on the diamonds to see detailed information and examples.</p>
<p><img alt="Minitab's Assistant menu interactive decision tree" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/7d815f53a9ce234f845857081eb4737a/asstmenuregressionguide_w640.gif" style="width: 640px; height: 482px;" /></p>
<p>This data set has three X variables, or predictors, and we're looking to fit a model and optimize the response. For this goal, the tree leads to the Optimize Response button located at the bottom right. Clicking that button brings up a simple dialog box to complete.</p>
<p>HeatFlux is the response variable. The X variables are the focal points located in each direction, East, West, North, and South. Based on previous knowledge, we know we should use 234 as the target heat flux value of 234, but we could also ask the Assistant to maximize or minimize the response. Because we checked the box labeled "Fit 2-way interactions and quadratic terms," the Assistant also will check for curvature and interactions.</p>
<p><img alt="Minitab's Assistant menu dialog box" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/ad64420b4b38e63f03ec8962a8b4bfdb/asst_dialog.gif" style="width: 539px; height: 436px;" /></p>
<p>When we press "OK," the Assistant quickly generates a regression model for the X variables using <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-smackdown-stepwise-versus-best-subsets" target="_blank">stepwise regression</a>. It presents the results in a series of reports written in plain, easy-to-follow language. </p>
Summary Report
<p><img alt="Multiple regression summary report for Minitab's Assistant" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f28d0c08d0b880903cfbf8ecc0daedfb/multiple_regression_for_heatflux___summary_report_w640.png" style="width: 640px; height: 480px;" /></p>
<p>This Summary Report delivers the "big picture" about the analysis and its results. With a p-value less than 0.001, this report shows that the regression model is statistically significant, with an <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-how-do-i-interpret-r-squared-and-assess-the-goodness-of-fit">R-squared value</a> of 96.15%! The comments window shows which X variables the model includes: East, South, and North, as well as interaction terms. To <a href="http://blog.minitab.com/blog/adventures-in-statistics/curve-fitting-with-linear-and-nonlinear-regression">model curvature</a>, the model also includes several polynomial terms.</p>
Effects Report
<p><img alt="Effects report for Minitab's Assistant menu" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/d5a072b05e09ca463a7de37f0d828b98/multiple_regression_for_heatflux___effects_report_w640.gif" style="width: 640px; height: 480px;" /></p>
<p>The Effects Report shows all of the interaction and main effects included in the model. The presence of curved lines indicates the Assistant used a polynomial term to fit a curve.</p>
<p>In this report, the East*South interaction is significant. This means the effect of one variable on heat flux varies based on the other variable. If South has a low setting (31.84), heat flux is reduced by increasing East. But if South is set high (40.55), the heat flux increases as East gets higher.</p>
Diagnostic Report
<p><img alt="Multiple regression diagnostic report for Minitab's Assistant" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/a433b435aa412515d81a8d01fb6b09ff/report3_diagnostic_w640.gif" style="width: 640px; height: 479px;" /></p>
<p>The Diagnostic Report shows you the plot of <a href="http://blog.minitab.com/blog/adventures-in-statistics/why-you-need-to-check-your-residual-plots-for-regression-analysis" target="_blank">residuals</a> versus fitted values, and indicates any unusual points that ought to be investigated. This report has flagged two points, but these are not necessarily problematic, since based on the criteria for large residuals we'd expect roughly 5% of the observations to be flagged. The report also identifies two points that had unusual X values; clicking the points reveals which worksheet row they are in.</p>
Model Building Report
<p><img alt="Multiple regression model building report for Minitab's Assistant" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/d98aa49eb97ef8cfffc920bae32fbdd9/report4_modelbuilding_w640.gif" style="width: 640px; height: 479px;" /></p>
<p>The Model Building Report details how the Assistant arrived at the final regression model. It also contains the regression equation, identifies the variables that contribute the most information, and indicates whether the X variables are correlated. In this model, North contributes the most information. Even though East is not significant, since it is part of a higher-order term the Assistant includes it.</p>
<p>This is a good opportunity to point out how The Assistant helps ensure that an analysis is done in the best way. For example, the Assistant uses standardized X variables to create the regression model. That's because <a href="http://blog.minitab.com/blog/adventures-in-statistics/what-are-the-effects-of-multicollinearity-and-when-can-i-ignore-them" target="_blank">standardizing the X variables removes most of the correlation</a> between linear and higher-order terms, which reduces the chance of adding these terms to your model if they aren't needed. However, the Assistant still displays the final model in natural (unstandardized) units.</p>
Prediction and Optimization Report
<p><img alt="Multiple regression prediction and optmization report for Minitab's Assistant" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/beebc6c3460b0ea96ee4c7f93bc0891d/report5_prediction_w640.gif" style="width: 640px; height: 479px;" /></p>
<p>The Assistant's Prediction and Optimization Report provides solutions for obtaining the targeted heat flux value of 234. The optimal settings for the focal points have been identified as East 37.82, South 31.84, and North 16.01. The model predicts that these settings will deliver a heat flux of 234, with a prediction interval of 216 to 252. But the Assistant provides alternate solutions you may want to consider, particularly in cases where specialized subject area expertise might be critical.</p>
Report Card
<p><img alt="Multiple regression report card for Minitab's Assistant" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/3bc83fb6b4c620c40ecc1a3cbc97dbb8/multiple_regression_for_heatflux___report_card_w640.png" style="width: 640px; height: 480px;" /></p>
<p>Finally, the Report Card prevents you from missing potential problems that could make your results unreliable. In this case, the report suggests collecting a larger sample and investigating the unusual residuals. It also shows that normality is not an issue for these data. Finally, it provides a helpful reminder to validate the model's optimal values by doing confirmation runs.</p>
<p>The Assistant's methods are based on established statistical practice, guidelines in the literature, and simulations performed by Minitab's statisticians. You can read the technical white paper for <a href="http://support.minitab.com/en-us/minitab/17/Assistant_Multiple_Regression.pdf" target="_blank">Multiple Regression in the Assistant</a> if you would like all the details.</p>
<p> </p>
Data AnalysisRegression AnalysisStatistics HelpTue, 29 Aug 2017 13:57:00 +0000http://blog.minitab.com/blog/understanding-statistics/the-easiest-way-to-do-multiple-regression-analysisEston MartzControls Charts Are Good for So Much More than SPC!
http://blog.minitab.com/blog/understanding-statistics/controls-charts-are-good-for-so-much-more-than-spc
<p>Control charts take data about your process and plot it so you can distinguish between common-cause and special-cause variation. Knowing the difference is important because it permits you to address potential problems without over-controlling your process. </p>
<p>Control charts are fantastic for assessing the stability of a process. Is the process mean unstable, too low, or too high? Is observed variability a natural part of the process, or could it be caused by specific sources? By answering these questions, control charts let you dedicate your actions to where you can make the most impact.</p>
<p>Assessing whether your process is stable is valuable in itself, but it is also a necessary first step in <a href="http://blog.minitab.com/blog/understanding-statistics/i-think-i-can-i-know-i-can-a-high-level-overview-of-process-capability-analysis" target="_blank">capability analysis</a>. Your process has to be stable before you can measure its capability. You can predict the performance of a stable process and therefore improve its capability. If your process is unstable, by definition it is unpredictable.</p>
<p>Control charts are commonly applied to business processes, but they have great benefits beyond Six Sigma and statistical process control (SPC). In fact, control charts can reveal information that would otherwise be very difficult to uncover.</p>
Other Processes That Need to Be In Control
<p>Let's consider processes beyond those we encounter in business. Instability and excessive variation can cause problems in many other kinds of processes. </p>
<ul>
<li>A test process that causes subjects to experience an impact of 6 times their body weight.</li>
<li>A teacher's process to help students learn. the material as measured by test scores.</li>
<li><a href="http://blog.minitab.com/blog/real-world-quality-improvement/control-charts-keep-blood-sugar-in-check" target="_blank">A diabetic's process for maintaining blood sugar levels</a>.</li>
</ul>
<p>The first example stems from a colleague's <a href="http://blog.minitab.com/blog/adventures-in-statistics/quality-improvement-controlling-variability-more-difficult-than-the-mean" target="_blank">research.</a> The researchers had middle-school students jump 30 times from 24-inch steps every other school day to see if it increased their bone density. Treatment was defined as the subjects experiencing an impact of 6 body weights, but the research team didn't quite hit the mark.</p>
<p>My colleague conducted a pilot study and graphed the results in an Xbar-S chart.</p>
<p><img alt="Xbar-S chart of ground reaction forces for pilot study" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/e721bd172aa55d5ec9976e81990f1293/xbars_grf_w1024.jpeg" style="width: 576px; height: 384px;" /></p>
<p>The fact that the S chart (on the bottom) is in control means each subject has a consistent landing style with impacts of a consistent magnitude—the variability is in control.</p>
<p>But the Xbar chart (at the top) is clearly out of control, indicating that even though the overall mean (6.141) exceeds the target, individual subjects have very different means. Some are consistently hard landers while others are consistently soft landers. The control chart suggests that the variability is not natural process variation (common cause) but rather due to differences among the participants (special cause variation).</p>
<p>The researchers addressed this by training the subjects how to land. They also had a nurse observe all future jumping sessions. These actions reduced the variability to the point that impacts were consistently greater than 6 body weights.</p>
Control Charts as a Prerequisite for Statistical Hypothesis Tests
<p>Control charts can verify that a process is stable, as required for capability analysis. But control charts can be used similarly to test assumptions for <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-why-we-need-to-use-hypothesis-tests-in-statistics" target="_blank">hypothesis tests</a>.</p>
<p>Specifically, the measurements used in a hypothesis test are assumed to be stable, though this assumption is often overlooked. This assumption parallels the requirement for stability in capability analysis: if your measurements are not stable, inferences based on those measurements will not be reliable.</p>
<p>Let’s assume that we’re comparing test scores between group A and group B. We’ll use this <a href="//cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/File/6053477fc294de59d5b3837389daab3a/groupcomparison.MTW">data set</a> to perform a 2-sample t-test as shown below.</p>
<p style="margin-left: 40px;"><img alt="two sample t-test results" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/95d19923cf0680676324db57e3df0ef7/two_sample_t_test_output.png" style="width: 355px; height: 555px;" /></p>
<p>The results indicate that group A has a higher mean and that the difference is statistically significant. We’re not assuming equal variances, so it's not a problem that Group B has a slightly higher standard deviation. We also have enough observations per group that normality is not a concern. Concluding that group A has a higher mean than group B seems safe. </p>
<p>But wait a minute...let's look at each group in an I-MR chart. </p>
<p><img alt="I-MR chart for group A" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/cef240bbb760bb6760ddcbc33e446be9/imr_a.png" style="width: 576px; height: 384px;" /></p>
<p><img alt="I-MR chart of group B" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/e4bd53da7831826959be94540b7ab0a2/imr_b.png" style="width: 576px; height: 384px;" /></p>
<p>Group A's chart shows stable scores. But group B's chart indicates that the scores are unstable, with multiple out-of-control points and a clear negative trend. Even though these data satisfy the other assumptions, we can make a valid comparison between stable and an unstable groups! </p>
<p>This is not the only type of problem you can detect with control charts. They also can test for a variety of patterns in your data, and for out-of-control variability.</p>
Different Types of Control Charts
<p>An I-MR chart can assess process stability when your data don’t have subgroups. The XBar-S chart, the first one in this post, assesses process stability when your data does have<em> </em>subgroups.</p>
<p>Other control charts are ideal for other types of data. For example, the U Chart and Laney U’ Chart use the Poisson distribution. The P Chart and Laney P’ Chart use the binomial distribution. </p>
<p>In <a href="http://www.minitab.com/en-us/products/minitab/" target="_blank" title="Minitab 16 Statistical Software">Minitab Statistical Software</a>, you can get step-by-step guidance in control chart selection by going to <strong>Assistant > Control Charts</strong>. The Assistant will help you with everything from determining your data type, to ensuring it meets assumptions, to interpreting your results.</p>
Control ChartsThu, 24 Aug 2017 13:59:00 +0000http://blog.minitab.com/blog/understanding-statistics/controls-charts-are-good-for-so-much-more-than-spcEston MartzWhat's the Difference between Confidence, Prediction, and Tolerance Intervals?
http://blog.minitab.com/blog/understanding-statistics/whats-the-difference-between-confidence-prediction-and-tolerance-intervals
<p>In statistics, as in life, absolute certainty is rare. That's why statisticians often can't provide a result that is as specific as we might like; instead, they provide the results of an analysis as a range, within which the data suggest the true answer lies.</p>
<p>Most of us are familiar with "confidence intervals," but that's just of several different kinds of intervals we can use to characterize the results of an analysis. Sometimes, confidence intervals are not the best option. Let's look at the characteristics of some different types of intervals, and consider when and where they should be used. Specifically, we'll look at confidence intervals, prediction intervals, and tolerance intervals. </p>
An Overview of Confidence Intervals
<p><img alt="Illustration of confidence level for confidence intervals" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/a9bd1376510c8289a0daf15f5bcd376f/ci.gif" style="float: right; width: 327px; height: 224px;" />A confidence interval refers to a range of values that is likely to contain the value of an unknown population parameter, such as the mean, based on data sampled from that population.</p>
<p>Collected randomly, two samples from a given population are unlikely to have identical confidence intervals. But if the population is sampled again and again, a certain percentage of those confidence intervals will contain the unknown population parameter. The percentage of these confidence intervals that contain this parameter is the confidence level of the interval.</p>
<p>Confidence intervals are most frequently used to express the population mean or standard deviation, but they also can be calculated for proportions, regression coefficients, occurrence rates (Poisson), and for the differences between populations in hypothesis tests.</p>
<p>If we measured the life of a random sample of light bulbs and Minitab calculates 1230 - 1265 hours as the 95% confidence interval, that means we can be 95% confident the mean for the population of bulbs falls between 1230 and 1265 hours.</p>
<p>In relation to the parameter of interest, confidence intervals only assess sampling error—the inherent error in estimating a population characteristic from a sample. Larger sample sizes will decrease the sampling error, and result in smaller (narrower) confidence intervals. If you could sample the entire population, the confidence interval would have a width of 0: there would be no sampling error, since you have obtained the actual parameter for the entire population! </p>
<p>In addition, confidence intervals only provide information about the mean, standard deviation, or whatever your parameter of interest happens to be. It tells you nothing about how the individual values are distributed.</p>
<p>What does that mean in practical terms? It means that the confidence interval has some serious limitations. In this example, we can be 95% confident that the mean of the light bulbs will fall between 1230 and 1265 hours. But that 95% confidence interval does not indicate that 95% of the bulbs will fall in that range. To draw a conclusion like that requires a different type of interval...</p>
An Overview of Prediction Intervals
<p>A prediction interval is a confidence interval for <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-predict-with-minitab-using-bmi-to-predict-the-body-fat-percentage-part-1" target="_blank">predictions</a> derived from <a href="http://blog.minitab.com/blog/adventures-in-statistics/linear-or-nonlinear-regression-that-is-the-question" target="_blank">linear and nonlinear regression models</a>. There are two types of prediction intervals.</p>
Confidence interval of the prediction
<p>Given specified settings of the predictors in a model, the confidence interval of the prediction is a range likely to contain the mean response. Like regular confidence intervals, the confidence interval of the prediction represents a range for the mean, not the distribution of individual data points.</p>
<p>With respect to the light bulbs, we could test how different manufacturing techniques (Slow or Quick) and filaments (A or B) affect bulb life. After fitting a model, we can use <a href="http://www.minitab.com/products/minitab">statistical software</a> to forecast the life of bulbs made using filament A under the Quick method.</p>
<p>If the confidence interval of the prediction is 1400–1450 hours, we can be 95% confident that the <em>mean </em>life for bulbs made under those conditions falls within that range. However, this interval doesn't tell us anything about how the lives of <em>individual </em>bulbs are distributed. </p>
Prediction interval
<p>A prediction interval is a range that is likely to contain the response value of an individual new observation under specified settings of your predictors.</p>
<p>If Minitab calculates a prediction interval of 1350–1500 hours for a bulb produced under the conditions described above, we can be 95% confident that the lifetime of a new bulb produced with those settings will fall within that range.</p>
<p>You'll note the prediction interval is wider than the confidence interval of the prediction. This will always be true, because additional uncertainty is involved when we want to predict a single response rather than a mean response.</p>
An Overview of Tolerance Intervals
<p>A tolerance interval is a range likely to contain a defined proportion of a population. To calculate tolerance intervals, you must stipulate the proportion of the population and the desired confidence level—the probability that the named proportion is actually included in the interval. This is easier to understand when you look at an example.</p>
Tolerance interval example
<p>To assess how long their bulbs last, the light bulb company samples 100 bulbs randomly and records how long they last in <a href="//cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/File/c4ab0558e6b5c4e7f6b759528067d9d0/lightbulb.MTW">this worksheet</a>.</p>
<p>To use this data to calculate tolerance intervals, go to <strong>Stat > Quality Tools > Tolerance Intervals </strong>in Minitab. (If you don't already have it, download the <a href="http://www.minitab.com/products/minitab/free-trial/">free 30-day trial of Minitab</a> and follow along!) Under <strong>Data</strong>, choose <em>Samples in columns</em>. In the text box, enter <em>Hours</em>. Then click <strong>OK</strong>. </p>
<p style="margin-left: 40px;"><img alt="Example of a tolerance interval" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/dd1bef8ea49f03e5362ef705a0a43107/ti.gif" style="width: 576px; height: 384px;" /></p>
<p>The normality test indicates that these data follow the normal distribution, so we can use the Normal interval (1060 1435). The bulb company can be 95% confident that at least 95% of all bulbs will last between 1060 to 1435 hours. </p>
How tolerance intervals compare to confidence intervals
<p>As we mentioned earlier, the width of a confidence interval depends entirely on sampling error. The closer the sample comes to including the entire population, the smaller the width of the confidence interval, until it approaches zero.</p>
<p>But a tolerance interval's width is based not only on sampling error, but also variance in the population. As the sample size approaches the entire population, the sampling error diminishes and the estimated percentiles approach the true population percentiles.</p>
<p>Minitab calculates the data values that correspond to the estimated 2.5th and 97.5th percentiles (97.5 - 2.5 = 95) to determine the interval in which 95% of the population falls. You can get more details about percentiles and population proportions <a href="http://blog.minitab.com/blog/adventures-in-statistics/the-graphical-benefits-of-identifying-the-distribution-of-your-data" target="_blank">here</a> for more information about percentiles and population proportions.</p>
<p>Of course, because we are using a sample, the percentile estimates will have error. Since we can't say that a tolerance interval truly contains the specified proportion with 100% confidence, tolerance intervals have a confidence level, too.</p>
How tolerance intervals are used
<p>Tolerance intervals are very useful when you want to predict a range of likely outcomes based on sampled data.</p>
<p>In quality improvement, practitioners generally require that a process output (such as the life of a light bulb) falls within spec limits. By comparing client requirements to tolerance limits that cover a specified proportion of the population, tolerance intervals can detect excessive variation. A tolerance interval wider than the client's requirements may indicate that product variation is too high.</p>
<p><a href="http://it.minitab.com/en-us/products/minitab/free-trial.aspx">Minitab statistical software</a> makes obtaining these intervals easy, regardless of which one you need to use for your data.</p>
StatisticsStatistics HelpTue, 22 Aug 2017 13:58:00 +0000http://blog.minitab.com/blog/understanding-statistics/whats-the-difference-between-confidence-prediction-and-tolerance-intervalsEston MartzFlight of the Chickens: A Statistical Bedtime Story, Part 1
http://blog.minitab.com/blog/statistics-in-the-field/flight-of-the-chickens-a-statistical-bedtime-story-part-1
<p><em>by Matthew Barsalou, guest blogger</em></p>
<p>Once upon a time, in the Kingdom of Wetzlar, there was a farm with over a thousand chickens, two pigs, and a cow. The chickens were well treated, but a few rabble-rousers among them got the rest of the chickens worked up. These trouble-making chickens <em>looked </em>almost like the other chickens, but in fact they were <em>evil </em>chickens. </p>
<img alt="chickens" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/99db35580e3f59363035593e45311e89/image001.jpg" style="width: 331px; height: 248px;" />
<p style="font-size: 9px; text-align: center;"><em>By HerbertT - Eigenproduktion, CC BY-SA 3.0, <a href="https://commons.wikimedia.org/w/index.php?curid=962579">https://commons.wikimedia.org/w/index.php?curid=962579</a></em></p>
<p>Hidden among the good chickens and the evil chickens was Sid. Sid was not like other chickens. He was a secret spy for The Swan of the Lahn, who ruled Wetzlar and was concerned about the infiltration of evil chickens. Sid was also a duck. That's right, a duck disguised as a chicken. Sid knew who the evil chickens were, and sent regular reports on their activities back to Wetzler.</p>
<img alt="duck" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/29a7e9ed327615ff06708ca7d629e12b/image004.jpg" style="width: 273px; height: 182px;" />
<p style="font-size: 9px;">Mallard drake by <a href="https://commons.wikimedia.org/wiki/File:Mallard_drake_.02.jpg">Bert de tilly</a></p>
<p>One stormy and dark night, an evil chicken snuck out with an enormous basket of beautiful hand-painted eggs to throw at the two pigs and the cow. Sid snuck out into the pouring rain and took a sample of 18 of the eggs. The intrepid duck spy was familiar with a previous study of 157 eggs, which showed that the mean of those eggs was <a href="http://archive.org/stream/standarddeviatio195atwo/standarddeviatio195atwo_djvu.txt" target="_blank">57.079 grams</a> with a standard deviation of 2.30 grams. Sid was determined to find out if the mean of his current samples had a statistically significant difference from the mean of the previous study.</p>
<img alt="https://upload.wikimedia.org/wikipedia/commons/thumb/1/10/Easter_eggs_-_straw_decoration.jpg/1024px-Easter_eggs_-_straw_decoration.jpg" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ea1fe788c97d1f12d8f000fc568f0ad9/image005.jpg" style="text-align: -webkit-center; border-width: 0px; border-style: solid; width: 306px; height: 229px;" />
<p style="font-size: 9px;">By Jan Kameníček - Own work, Public Domain, <a href="https://commons.wikimedia.org/w/index.php?curid=732984" target="_blank">https://commons.wikimedia.org/w/index.php?curid=732984</a></p>
<p>If you'd like to recreate Sid's analysis, download his <a href="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a84e66ffcbe9f854a779cdacfc914915/flightofthechickens.mtw">data set</a> and, if you need it, the <a href="http://www.minitab.com/products/minitab/free-trial">free trial of Minitab</a> 18 Statistical Software. We will need to use summarized data since we only have actual values for the sample from the study and not the full data set. Go to <strong>Stat > Basic Statistics > Display Descriptive Statistics...</strong> and select the column containing the data as the Variable. Click on Graphs and select Individual value plot to view a graph of the data.</p>
<p style="margin-left: 40px;"><img alt="descriptive statistics dialog" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f67868a217eeb27025975888d27ff118/descro_tove_doa_pg.png" style="width: 527px; height: 354px;" /></p>
<p>Click OK twice and Minitab will create an individual value plot of the data and the mean and standard deviation will appear in the session window with the rest of the descriptive statistics.</p>
<p align="center"><img alt="individual value plot of eggs" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/72d17166ab9bf6fb9bd400f1fa864d02/image008.png" style="border-width: 0px; border-style: solid; width: 576px; height: 384px;" /></p>
<p> </p>
<p align="center"><img alt="Descriptive Statistics" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/b9ef39d053dfd80c026f6fdfafcb3193/descriptive_statistics_eggs.png" style="border-width: 0px; border-style: solid; width: 646px; height: 151px;" /></p>
<p> </p>
<p>We can see that the sample mean is 57.315 and the standard deviation is 2.439 so now we can perform a 2 sample t-test to compare the means by going to <strong>Stat > Basic Statistics > 2-Sample t... </strong>and selecting Summarized data in the drop down menu. Enter the sample size of 18, sample mean of 57.315 and standard deviation of 2.439 under Sample 1 and enter the sample size of 157, mean of 57.079, and the population standard deviation of 2.30 under Sample 2.</p>
<p style="margin-left: 40px;"><img alt="two-sample t test dialog" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/23cbaedfab65fc90c48259b8dbbad0e0/2_sample_t_dialog.png" style="width: 424px; height: 296px;" /></p>
<p>Then click OK.</p>
<p style="margin-left: 40px;"><img alt="t-test output" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/dfaecdaefcbd971827c75e8ad15ebd7f/t_test_output_2.png" style="border-width: 0px; border-style: solid; width: 327px; height: 555px;" /></p>
<p>The p -value is greater than 0.05 so we can conclude there is no statistically significant difference between the means of the eggs the evil chickens planned to throw and the eggs in the previous study.</p>
<p>Unfortunately, Sid made a critical mistake. The first step in an analysis is to ask the right question. Sid's statistics were correct, but he asked the wrong question: “Is the mean of the second sample different from the mean of the first sample with an alpha of 0.05?” </p>
<p>What he <em>should </em>have asked was, “What will happen when the pigs and the cow get hit by eggs?” The weight of the eggs was irrelevant; what mattered was the consequences of the pigs and cow being pummeled with eggs.</p>
<p>If Sid had prepared a report for The Swan of the Lahn that only said the eggs collected by the evil chickens weighed the same as eggs in the earlier study, the Swan would conclude that the process had not changed. But had the right question been answered, the correct conclusion would have been, “Trouble may be brewing.”</p>
<p><img alt="Swan" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/3c61f8f1ca1091cd188b9716190ed656/image013.jpg" style="text-align: -webkit-center; border-width: 0px; border-style: solid; width: 399px; height: 301px;" /></p>
<p><span style="text-align: -webkit-center;">By Dick Daniels (http://carolinabirds.org/) - Own work, CC BY-SA 3.0, </span><a href="https://commons.wikimedia.org/w/index.php?curid=11053305" style="text-align: -webkit-center;">https://commons.wikimedia.org/w/index.php?curid=11053305</a></p>
<p>Trouble did indeed result when the evil chickens put their egg-throwing plan into action. As darkness fell, first the cow and then the pigs were bombarded by egg after messy egg.</p>
<p>The cow simply ate the eggs. But the pigs, holding <em>all </em>the chickens to be responsible, were outraged. They rampaged and terrorized the poor chickens all that night. By midnight, the muddy fields were full of pig prints and feathers were ruffled in the chicken coop. </p>
<p>One of the evil chickens seized on the traumatized crowd's passions, and demanded of the others, “How can we live like this?" The evil chickens soon convinced the others that they would all be happier if they moved to the high-walled village of Wetzlar beside the Lahn River. The chickens began to march into the stormy night.</p>
<p><em><a href="http://blog.minitab.com/blog/statistics-in-the-field/flight-of-the-chickens-a-statistical-bedtime-story-part-2">Continued in Part 2</a></em></p>
<p> </p>
<p><strong>About the Guest Blogger</strong></p>
<div>
<p><em><a href="https://www.linkedin.com/pub/matthew-barsalou/5b/539/198" target="_blank">Matthew Barsalou</a> is a statistical problem resolution Master Black Belt at <a href="http://www.3k-warner.de/" target="_blank">BorgWarner</a> Turbo Systems Engineering GmbH. He is a Smarter Solutions certified Lean Six Sigma Master Black Belt, ASQ-certified Six Sigma Black Belt, quality engineer, and quality technician, and a TÜV-certified quality manager, quality management representative, and auditor. He has a bachelor of science in industrial sciences, a master of liberal studies with emphasis in international business, and has a master of science in business administration and engineering from the Wilhelm Büchner Hochschule in Darmstadt, Germany. He is author of the books </em><a href="http://www.amazon.com/Root-Cause-Analysis-Step---Step/dp/148225879X/ref=sr_1_1?ie=UTF8&qid=1416937278&sr=8-1&keywords=Root+Cause+Analysis%3A+A+Step-By-Step+Guide+to+Using+the+Right+Tool+at+the+Right+Time" target="_blank">Root Cause Analysis: A Step-By-Step Guide to Using the Right Tool at the Right Time</a><em>, </em><a href="http://asq.org/quality-press/display-item/index.html?item=H1472" target="_blank">Statistics for Six Sigma Black Belts</a><em> and </em><a href="http://asq.org/quality-press/display-item/index.html?item=H1473&xvl=76115763" target="_blank">The ASQ Pocket Guide to Statistics for Six Sigma Black Belts</a><em>.</em></p>
</div>
<div style="clear:both;"> </div>
Fun StatisticsStatisticsStatistics HelpTue, 15 Aug 2017 13:59:00 +0000http://blog.minitab.com/blog/statistics-in-the-field/flight-of-the-chickens-a-statistical-bedtime-story-part-1Guest Blogger5 More Critical Six Sigma Tools: A Quick Guide
http://blog.minitab.com/blog/understanding-statistics/5-more-critical-six-sigma-tools-a-quick-guide
<p>The Six Sigma quality improvement methodology has lasted for decades because it gets results. Companies in every country around the world, and in every industry, have used this logical, step-by-step method to improve the quality of their processes, products, and services. And they've saved billions of dollars along the way.</p>
<p>However, Six Sigma involves a good deal of statistics and data analysis, which makes many people uneasy. Individuals who are new to quality improvement often feel intimidated by the statistical aspects.</p>
<p>Don't be intimidated. Data analysis may be a critical component of improving quality, but the good news is that most of the analyses we use in Six Sigma aren't hard to understand, even if statistics isn't something you're comfortable with.</p>
<p>Just getting familiar with the tools used in Six Sigma is a good way to get started on your quality journey. In my last post, I offered a rundown of 5 tools that crop up in most Six Sigma projects. In this post, I'll review 5 more common statistical tools, and explain what they do and why they’re important in Six Sigma.</p>
1. t-Tests
<p><img alt="Pareto Chart" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/9836f7ec0e12d309f6a3472557a5f424/5_more_six_sigma_tools_t_tests.jpg" style="width: 600px; height: 395px;" /></p>
<p>We use t-tests to compare the average of a sample to a target value, or to the average of another sample. For example, a company that sells beverages in 16-oz. containers can use a 1-sample t-test to determine if the production line’s average fill is on or off target. If you buy flavored syrup from two suppliers and want to determine if there’s a difference in the average volume of their respective shipments, you can use a 2-sample t-test to compare the two suppliers. </p>
2. ANOVA
<p><img alt="ANOVA" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/56cc203b4012c25d4fa4e28fc96787f3/5_more_six_sigma_tools_anova.jpg" style="width: 600px; height: 395px;" /></p>
<p>Where t-tests compare a mean to a target, or two means to each other, ANOVA—which is short for Analysis of Variance—lets you compare more than two means. For example, ANOVA can show you if average production volumes across 3 shifts are equal. You can also use ANOVA to analyze means for more than 1 variable. For example, you can simultaneously compare the means for 3 shifts and the means for 2 manufacturing locations. </p>
3. Regression
<p><img alt="Regression" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/54e06038732315d016e9703a866d74f0/5_more_six_sigma_tools_regression.jpg" style="width: 600px; height: 395px;" /></p>
<p>Regression helps you determine whether there's a relationship between an output and one or more input factors. For instance, you can use regression to examine if there is a relationship between a company’s marketing expenditures and its sales revenue. When a relationship between the variables exists, you can use the regression equation to describe that relationship and predict future output values for given input values.</p>
4. DOE (Design of Experiments)
<p><img alt="DOE" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/558c592fd82aafe591c2d087d49bfa4c/5_more_six_sigma_tools_doe.jpg" style="width: 600px; height: 395px;" /><br />
Regression and ANOVA are most often used for data that’s already been collected. In contrast, Design of Experiments (DOE) gives you an efficient strategy for collecting your data. It permits you to change or adjust multiple factors simultaneously to identify if relationships exist between inputs and outputs. Once you collect the data and identify the important inputs, you can then use DOE to determine the optimal settings for each factor. </p>
5. Control Charts
<p><img alt="Control Charts" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/e7cd9d3ebc70528d9c617d8b3980be8f/5_more_six_sigma_tools_control_charts.jpg" style="width: 600px; height: 395px;" /></p>
<p>Every process has some natural, inherent variation, but a stable (and therefore predictable) process is a hallmark of quality products and services. It's important to know when a process goes beyond the normal, natural variation, because it can indicate a problem that needs to be resolved. A control chart distinguishes “special-cause” variation from acceptable, natural variation. These charts graph data over time and flag out-of-control data points, so you can detect unusual variability and take action when necessary. Control charts also help you ensure that you sustain process improvements into the future. </p>
<p><strong>Conclusion</strong></p>
<p>Any organization can benefit from Six Sigma projects, and those benefits <span style="background-color: rgb(246, 213, 217);">are based on </span>data analysis. However, many Six Sigma projects are completed by practitioners who are highly skilled, but not expert statisticians. But a basic understanding of common Six Sigma statistics, combined with easy-to-use statistical software, will let you handle these statistical tasks and analyze your data with confidence. </p>
Lean Six SigmaSix SigmaThu, 10 Aug 2017 13:58:00 +0000http://blog.minitab.com/blog/understanding-statistics/5-more-critical-six-sigma-tools-a-quick-guideEston Martz5 Critical Six Sigma Tools: A Quick Guide
http://blog.minitab.com/blog/understanding-statistics/5-critical-six-sigma-tools-a-quick-guide
<p>Six Sigma is a quality improvement method that businesses have used for decades—because it gets results. A Six Sigma project follows a clearly defined series of steps, and companies in every industry in every country around the world have used this method to resolve problems. Along the way, they've saved billions of dollars.</p>
<p>But Six Sigma relies heavily on statistics and data analysis, and many people new to quality improvement feel intimidated by the statistical aspects.</p>
<p>You needn't be intimidated. While it's true that data analysis is critical in improving quality, the majority of analyses in Six Sigma are not hard to understand, even if you’re not very knowledgeable about statistics.</p>
<p>Familiarizing yourself with these tools is a great place to start. This post briefly explains 5 statistical tools used in Six Sigma, what they do, and why they’re important.</p>
1. Pareto Chart
<p><img alt="Pareto Chart" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/014b0ef2847e14b49bd9d18adeb9b309/5_six_sigma_tools_pareto.jpg" style="width: 600px; height: 395px;" /></p>
<p>The Pareto Chart stems from an idea called the Pareto Principle, which asserts that about 80% of outcomes result from 20% of the causes. It's easy to think of examples even in our personal lives. For instance, you may wear 20% of your clothes 80% of the time, or listen to 20% of the music in your library 80% of the time.</p>
<p>The Pareto chart helps you visualize how this principle applies to data you've collected. It is a specialized type of bar chart designed to distinguish the “critical few” causes from the “trivial many” enabling you to focus on the most important issues. For example, if you collect data about defect types each time one occurs, a Pareto chart reveals which types are most frequent, so you can focus energy on solving the most pressing problems. </p>
2. Histogram
<p><img alt="Histogram" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/2bb10c9c739b156c8753d81b2a63cc16/5_six_sigma_tools_histogram.jpg" style="width: 600px; height: 395px;" /></p>
<p>A histogram is a graphical snapshot of numeric, continuous data. Histo­grams enable you to quickly identify the center and spread of your data. It shows you where most of the data fall, as well as the minimum and maximum values. A histogram also reveals if your data are bell-shaped or not, and can help you find unusual data points and outliers that may need further investigation. </p>
3. Gage R&R
<p><img alt="gage R&R" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/53a58036bcacc1abbbe345a171bd3cc8/5_six_sigma_tools_gage.jpg" style="width: 600px; height: 444px;" /></p>
<p>Accurate measurements are critical. Would you want to weigh yourself with a scale you know is unre­liable? Would you keep using a thermometer that never shows the right temperature? If you can't measure a process accurately, you can't improve it, which is where <span><a href="http://blog.minitab.com/blog/meredith-griffith/fundamentals-of-gage-rr">Gage R&R</a></span> comes in. This tool helps you determine if your continuous numeric measurements—such as weight, diameter, and pressure—are both repeatable and reproducible, both when the same person repeatedly measures the same part, and when different operators measure the same part.</p>
4. Attribute Agreement Analysis
<p><img alt="Attribute" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a248d1a75f744990aea0ce8414219166/5_six_sigma_tools_attribute.jpg" style="width: 600px; height: 395px;" /><br />
Another tool for making sure you can trust your data is attribute agreement analysis. Where Gage R&R assesses the reliability and reproducibility of numeric measurements, attribute agree­ment analysis assess categorical assessments, such as Pass or Fail. This tool shows whether people rating these categories agree with a known standard, with other appraisers, and with themselves. </p>
5. Process Capability
<p><img alt="Capability" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/155f21ef5bb7ddf08d0af4cce5425340/5_six_sigma_tools_capability.jpg" style="width: 600px; height: 444px;" /></p>
<p>Nearly every process has an acceptable lower and/or upper bound. For example, a supplier's parts can’t be too large or too small, wait times can’t extend beyond an acceptable threshold, fill weights need to exceed a specified minimum. Capability analysis shows you how well your process meets specifications and provides insight into how you can improve a poor process. Frequently cited capability metrics include Cpk, Ppk, defects per million opportunities (DPMO), and Sigma level. </p>
Conclusion
<p>Six Sigma can bring significant benefits to any business, but reaping those benefits requires the collection and analysis of data so you can understand opportunities for improvement and make significant and sustainable changes.</p>
<p>The success of Six Sigma projects often depends on practitioners who are highly skilled experts in many fields, but not statistics. But with a basic understanding of the most commonly used Six Sigma statistics and easy-to-use statistical software, you can handle the statistical tasks associated with improving quality, and analyze your data with confidence. </p>
<p> </p>
<p> </p>
Lean Six SigmaSix SigmaTue, 08 Aug 2017 13:58:00 +0000http://blog.minitab.com/blog/understanding-statistics/5-critical-six-sigma-tools-a-quick-guideEston MartzPoisson Data: Examining the Number Deaths in an Episode of Game of Thrones
http://blog.minitab.com/blog/the-statistics-game/poisson-data-examining-the-number-deaths-in-an-episode-of-game-of-thrones
<p><img alt="Game of Thrones" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/d11b4341996f340e24132eb12253d8e5/game_of_thrones.jpg" style="float: right; width: 250px; height: 141px; margin: 10px 15px; border-width: 1px; border-style: solid;" />There may not be a situation more perilous than being a character on <a href="http://www.hbo.com/game-of-thrones" target="_blank"><em>Game of Thrones</em></a>. Warden of the North, Hand of the King, and apparent protagonist of the entire series? Off with your head before the end of the first season! Last male heir of a royal bloodline? Here, have a pot of molten gold poured on your head! Invited to a wedding? Well, you probably know what happens at weddings in the show. </p>
<p>So what do all these gruesome deaths have to do with statistics? They are data that come from a <a href="http://blog.minitab.com/blog/fun-with-statistics/poisson-processes-and-probability-of-poop">Poisson distribution</a>.</p>
<p>Data from a Poisson distribution describe the number of times an event occurs in a finite observation space. For example, a Poisson distribution can describe the number of defects in the mechanical system of an airplane, the number of calls to a call center, or in our case it can describe the number of deaths in an episode of Game of Thrones.</p>
Goodness-of-Fit Test for Poisson
<p>If you're not certain whether your data follow a Poisson distribution, you can use <a href="http://www.minitab.com/en-us/products/minitab/" target="_blank">Minitab Statistical Software</a> to perform a goodness-of-fit test. If you don't already use Minitab and you'd like to follow along with this analysis, download the <a href="http://www.minitab.com/products/minitab/free-trial/">free 30-day trial</a>.</p>
<p>I collected the <a href="http://genius.com/Game-of-thrones-list-of-game-of-thrones-deaths-annotated" target="_blank">number of deaths for each episode</a> of Game of Thrones (as of this writing, 57 episodes have aired), and put them in a Minitab worksheet. Then I went to <strong>Stat > Basic Statistics > Goodness-of-Fit Test for Poisson </strong>to determine whether the data follow a Poisson distribution. You can get the data I used <a href="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f73acb13fa520a25583149f8b780a31c/game_of_thrones_deaths.mtw">here</a>. </p>
<p style="margin-left: 40px;"><img alt="Goodness-of-Fit Test for Poisson Distribution " src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/0c9dcb9ecb6eb644109d86e3501143b3/gof_test_poisson.jpg" style="width: 492px; height: 417px;" /></p>
<p>Before we interpret the p-value, we see that we have a problem. Three of the categories have an expected value less than 5. If the expected value for any category is less than 5, the results of the test may not be valid. To fix our problem, we can combine categories to achieve the minimum expected count. In fact, we see that Minitab actually already started doing this by combining all episodes with 7 or more deaths.</p>
<p>So we'll just continue by making the highest category 6 or more deaths, and the lowest category 1 or 0 deaths. To do this, I created a new column with the categories 1, 2, 3, 4, 5 and 6. Then I made a frequency column that contained the number of occurrences for each category. For example, the "1" category is a combination of episodes with 0 deaths and 1 death, so there were 15 occurrences. Then I ran the analysis again with the new categories.</p>
<p style="margin-left: 40px;"><img alt="Goodness-of-Fit Test for Poisson Distribution " src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/93551e38ce5c4cc5321c249fee184e24/gof_test_poisson_2.jpg" style="width: 420px; height: 323px;" /></p>
<p>Now that all of our categories have expected counts greater than 5, we can examine the p-value. If the p-value is less than the significance level (usually 0.05 works well), you can conclude that the data do not follow a Poisson distribution. But in this case the p-value is 0.228, which is greater than 0.05. Therefore, we cannot conclude that the data do not follow the Poisson distribution, and can continue with analyses that assume the data follow a Poisson distribution. </p>
Confidence Interval for 1-Sample Poisson Rate
<p>When you have data that come from a Poisson distribution, you can use <strong>Stat > Basic Statistics > 1-Sample Poisson Rate</strong> to get a rate of occurrence and calculate a range of values that is likely to include the population rate of occurrence. We'll perform the analysis on our data.</p>
<p style="margin-left: 40px;"><img alt="1-Sample Poisson Rate" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/259b9b0cb11fed7e5b7467703f7037ad/1_poisson_rate.jpg" style="width: 489px; height: 133px;" /></p>
<p>The rate of occurrence tells us that on average there are about 3.2 deaths per episode on <em>Game of Thrones</em>. If our 57 episodes were a sample from a much larger population of <em>Game of Thrones</em> episodes, the confidence interval would tell us that we can be 95% confident that the population rate of deaths per episode is between 2.8 and 3.7.</p>
<p>The length of observation lets you specify a value to represent the rate of occurrence in a more useful form. For example, suppose instead of deaths per episode, you want to determine the number of deaths per season. There are 10 episodes per season. So because an individual episode represents 1/10 of a season, 0.1 is the value we will use for the length of observation. </p>
<p style="margin-left: 40px;"><img alt="1-Sample Poisson Rate" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/b6fa9d2e740aacc86d4223ea75487d95/1_poisson_rate_season.jpg" style="width: 495px; height: 106px;" /></p>
<p>With a different length of observation, we see that there are about 32 deaths per season with a confidence interval ranging from 28 to 37.</p>
Poisson Regression
<p>The last thing we'll do with our Poisson data is perform a regression analysis. In Minitab, go to <strong>Stat > Regression > Poisson Regression > Fit Poisson Model</strong> to perform a Poisson regression analysis. We'll look at whether we can use the episode number (1 through 10) to predict how many deaths there will be in that episode.</p>
<p style="margin-left: 40px;"><img alt="Poisson Regression" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/0540d6716d13c4de50421155038b2c03/poisson_regression.jpg" style="width: 402px; height: 238px;" /></p>
<p>The first thing we'll look at is the p-value for the predictor (episode). The p-value is 0.042, which is less than 0.05, so we can conclude that there is a statistically significant association between the episode number and the number of deaths. However, the Deviance R-Squared value is only 18.14%, which means that the episode number explains only 18.14% of the variation in the number of deaths per episode. So while an association exists, it's not very strong. Even so, we can use the coefficients to determine how the episode number affects the number of deaths. </p>
<p style="margin-left: 40px;"><img alt="Poisson Regression" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/adb7514fd7892c3b8591895321c96918/poisson_regression_2.jpg" style="width: 241px; height: 227px;" /></p>
<p>The episode number was entered as a categorical variable, so the coefficients show how each episode number affects the number of deaths relative to episode number 1. A positive coefficient indicates that episode number is likely to have more deaths than episode 1. A negative coefficient indicates that episode number is likely to have fewer deaths than episode 1.</p>
<p>We see that the start of each season usually starts slow, as 7 of the 9 episode numbers have positive coefficients. Episodes 8, 9, and 10 have the highest coefficients, meaning relative to the first episode of the season they have the greatest number of deaths. So even though our model won't be great at predicting the exact number of deaths for each episode, it's clear that the show ends each season with a bang.</p>
<p>So, if you're a <em>Game of Thrones</em> viewer you should brace yourself, because death is coming. Or, as they would say in Essos:</p>
<p><em>Valar morghulis.</em></p>
Data AnalysisFun StatisticsStatisticsStatistics in the NewsTue, 18 Jul 2017 12:03:00 +0000http://blog.minitab.com/blog/the-statistics-game/poisson-data-examining-the-number-deaths-in-an-episode-of-game-of-thronesKevin RudyCp and Cpk: Two Process Perspectives, One Process Reality
http://blog.minitab.com/blog/statistics-and-quality-data-analysis/cp-and-cpk-two-process-perspectives-one-process-reality
<p>It’s usually not a good idea to rely solely on a single statistic to draw conclusions about your process. Do that, and you could fall into the clutches of the “duck-rabbit” illusion shown here:</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/4ee77630a518a133bd8146e5e96b3e28/cpk_cp_cropped.jpg" style="line-height: 18.9090900421143px; margin: 10px 15px; width: 353px; height: 183px;" /></p>
<p>If you fix your eyes solely on the duck, you’ll miss the rabbit—and vice-versa.</p>
<p><span style="line-height: 18.9090900421143px;">If you're using <a href="http://www.minitab.com/products/minitab">Minitab Statistical Software</a> for capability analysis, t</span>he capability indices Cp and Cpk are good examples of this. If you focus on only one measure, and ignore the other, you might miss seeing something critical about the performance of your process. </p>
Cp: A Tale of Two Tails
<p>Cp is a ratio of the specification spread to the process spread. The process spread is often defined as the 6-sigma spread of the process (that is, 6 times the within-subgroup standard deviation). Higher Cp values indicate a more capable process.</p>
<p>When the specification spread is considerably greater than the process spread, Cp is high.</p>
<p style="margin-left: 40px;"><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/73367467a6c16919e9bc030f1a63c913/cp_high.jpg" style="width: 328px; height: 213px;" /></p>
<p>When the specification spread is less than the process spread, Cp is low.</p>
<p style="margin-left: 40px;"><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/2818c4502396e8b64627502918ddd08f/cp_low.jpg" style="width: 319px; height: 217px;" /></p>
<p>By using the 6-sigma process spread, Cp incorporates information about both tails of the process data. But there’s something Cp doesn’t do—it doesn’t tell you anything about the location of the process data.</p>
<p>For example, the following two processes have the about same Cp value (≈ 3):</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/228e5e4d4b24aa4eb760c3db8f36a5a0/cp_same.jpg" style="width: 582px; height: 210px;" /></p>
<p>Obviously, Process B has a serious issue with its location in relation to the spec limits that Cp just can't "see."</p>
Cpk: Location, Location, Location!
<p>Like Cp, Cpk is also a ratio of the specification spread to the process spread. But unlike Cp, Cpk compares the distance from the process mean to the closest specification limit, to about half the spread of the process (often, the 3-sigma spread).</p>
<p>When the distance from the mean to the nearest specification limit is considerably greater than the one-sided process spread, Cpk is high.</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/34c73f8708aaf7af89d37ac2e38ee8cf/cpk_high.jpg" style="width: 318px; height: 216px;" /></p>
<p>When the distance from the mean to the nearest specification limit is less than the one-sided process spread, Cpk is low.</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/4b0494897f6986f25d7f6c638eb01191/cpk_low.jpg" style="width: 326px; height: 214px;" /></p>
<p>Notice how the location of the process <em>does</em> affect the Cpk value—by virtue of its being calculated using the process mean.</p>
<p>Yet there's something important that Cpk doesn't do. Because it's a "worst-case" estimate that uses only the nearest specification limit, Cpk can't "see" how the process is performing on the other side.</p>
<p>For example, the following two processes have the about same Cpk value (≈ 0.9):</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/7ef5c1fb343200b07f0914b4db5d813e/cpk_same.jpg" style="width: 554px; height: 208px;" /><br />
Notice that Process X has nonconforming parts in relation to both spec limits, while Process Y has nonconforming parts in relation to only the upper spec limit (USL). But Cpk can't "see"any difference between these two processes.</p>
<p>To get the two-sided picture of each process, in relation to both spec limits, you can look at Cp, which would be higher for Process Y than for Process X.</p>
Summing Up: Look for Ducks, Rabbits, and Other Critters as Well
<p>Avoid getting too fixated on any single statistic. If you have both a lower and upper specification limit for your process, Cp and Cpk each might “know” something about your process that the other one doesn’t. That “something” could be critical to fully understand how your process is performing.</p>
<p>To see a concrete example of how Cp and Cpk work together, using real data from the National Renewable Energy Laboratory, see <a href="http://blog.minitab.com/blog/statistics-and-quality-improvement/process-capability-statistics-cp-and-cpk-working-together" target="_blank">this post by Cody Steele</a>.</p>
<p>By the way, the potential "blind spot" for Cp and Cpk also applies to Pp and Ppk. The only difference is that the process spread for those indices is calculated using the overall standard deviation, instead of the within-subgroup standard deviation. For more on that distinction, see <a href="http://blog.minitab.com/blog/michelle-paret/process-capability-statistics-cpk-vs-ppk" target="_blank">this post by Michelle Paret</a>.</p>
<p>And if you’re interested other optical and statistical illusions, check out <a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/optical-illusions-zen-koans-and-simpsons-paradox" target="_blank">this post on Simpson's paradox</a>.</p>
Capability AnalysisQuality ImprovementStatisticsThu, 29 Jun 2017 17:03:00 +0000http://blog.minitab.com/blog/statistics-and-quality-data-analysis/cp-and-cpk-two-process-perspectives-one-process-realityPatrick RunkelGleaning Insights from Election Data with Basic Statistical Tools
http://blog.minitab.com/blog/statistics-and-more/gleaning-insights-from-election-data-with-basic-statistical-tools
<p>One of the biggest pieces of international news last year was the so-called "Brexit" referendum, in which a majority of voters in the United Kingdom cast their ballots to leave the European Union (EU).</p>
<p><img alt="Polling station in the United Kingdom" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ccd0e2bc94b8749ffd9e4ecf49ccd179/polling_station.jpg" style="width: 300px; height: 244px; margin: 10px 15px; float: right;" />That outcome shocked the world. Follow-up media coverage has asserted that the younger generation prefers to remain in the EU since that means more opportunities on the continent. The older generation, on the other hand, prefers to leave the EU.</p>
<p>As a statistician, I wanted to look at the data to see what I could find out about the Brexit vote, and recently the BBC <a href="http://www.bbc.co.uk/news/uk-politics-38762034">published an article</a> that included some detailed data.</p>
<p>In this post, I'll use Minitab Statistical Software to explore the data from the BBC site along with the <a href="https://www.electoralcommission.org.uk/find-information-by-subject/elections-and-referendums/past-elections-and-referendums/eu-referendum">data from the Electoral Commission website</a>. I hope this exploration will give you some ideas about how you might use publicly available data to get insights about your customers or other aspects of your business.</p>
<p>The electoral commission data contains the voting details of all 382 regions in the United Kingdom. It includes information on voter turnout, the percent who voted to leave the EU, and the percent who voted to remain. (If you'd like to follow along, open the <a href="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/d851330cd1b38a9afba9cf524c3353e7/brexitdata1.mtw">BrexitData1</a> and <a href="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/eb505bc52b3837fa15673329836d76d3/brexitdata2.mtw">BrexitData2</a> worksheets in Minitab 18. If you don't already have Minitab, you can <a href="http://www.minitab.com/products/minitab/free-trial/">download the 30-day trial</a>.)</p>
<p>I began by creating scatterplots (in Minitab, go to <strong>Graph > Scatterplot...</strong>) of the percentage of voter turnout against the percentage of the population that voted to leave for each region, as shown below.</p>
<p style="margin-left: 40px;"><img alt="Scatterplot of Brexit Voter Data1" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/54eb0e04d859e1401f55def43f23bb24/brexit_scatterplot_1.png" style="width: 577px; height: 385px;" /></p>
<p style="margin-left: 40px;"><img alt="Scatterplot of Brexit Voter Data, #2" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/01aa20381fe1d7e4cb3098af1eba36f4/brexit_scatterplot_2.png" style="width: 577px; height: 385px;" /></p>
<p>According to commentators, areas with high voter turnout had a tendency to vote to leave, as the elderly were more likely to turn up to vote. There is also a perceptible difference between the plots for the different areas.</p>
<p>To make this easier to analyze, I created an indicator variable called “decided to leave” in my Minitab worksheet. This variable takes the value of 1 if the area voted to leave the EU, and takes the value 0 otherwise. Tallying the number of areas in each region that voted to leave or remain (<strong>Stat > Tables > Tally Individual Variables...</strong>) yields the following:</p>
<p style="margin-left: 40px;"><img alt="Tabulated Brexit Statistics: Region, Decided to Leave" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/e2223885372be6070d6a403cba2a1604/tabulated_statistics_region_decided_to_leave.jpg" style="width: 503px; height: 426px;" /></p>
<p>There are indeed regional differences. For example, London and Scotland voted strongly to remain while North East and North West voted strongly to leave. So, do we see greater voter turnout in the regions that voted to leave? Looking at the average turnout in each region (using <strong>Stat > Display Descriptive Statistics...</strong>), we have the following:</p>
<p style="margin-left: 40px;"><img alt="Brexit Data - Descriptive Statistics" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/de2ce324872b704f7bd93a2ff2954d8a/descriptive_statistics_percent_turnout.jpg" style="width: 380px; height: 346px;" /></p>
<p>Surprisingly, the average turnout of regions that voted strongly to leave is not very different from the turnout of regions that voted strongly to remain. For example, the average turnout of 69.817% in London compared to 70.739% in North West.</p>
<p>The data set analyzed in the BBC article contains localised voting data supplied to the BBC by councils which counted the EU referendum. This data is more detailed than the regional data from the Electoral Commission, and it includes a detailed breakdown of how the people in individual electoral wards voted.</p>
<p>The BBC asked all the counting areas for these figures. Three councils did not reply. The remaining missing data could be due to any of the following reasons:</p>
<ul>
<li>The council refused to give the information to the BBC.</li>
<li>No geographical information was available because all ballot boxes were mixed before counting.</li>
<li>The council conducted a number of mini-counts that combined ballot boxes in a way that does not correspond to individual wards.</li>
</ul>
<p>For those wards that have voting data, I also gathered the following information from the last census for each area.</p>
<ul>
<li>Percent of population in an area with level 4 qualification or higher. This includes individuals with a higher certificate/diploma, foundation degree, undergraduate degree, or master’s degree up to a doctorate. I will call this variable “degree” to represent individuals holding degrees or equivalent qualification.</li>
<li>Percentage of young people (age 18-29) in an area.</li>
<li>Percentage of middle-aged (age 30-59) in an area.</li>
<li>Percentage of elderly (age 65 or above) in an area.</li>
</ul>
<p>There is some difference in how some wards are defined between this data set and the data from the last census, perhaps due to changes in ward boundaries. Thus, for some wards, it was not possible to match the corresponding percentages of different age groups and degree holders. Therefore, some areas had to be omitted from my analysis, leaving me with data from a total of 1,069 wards.</p>
<p>With the exception of Scotland, Northern Ireland, and Wales, I have data from wards in all regions of the UK. The number of measurements from each region appears below.</p>
<p style="margin-left: 40px;"><img alt="Brexit Data, Descriptive Statistics N" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/6c992d074b0c8616f39ebced04f84a8d/descriptive_statistics_n_brexit_data.png" style="width: 418px; height: 312px;" /></p>
<p>As with the Electoral Commission data, let’s begin by looking at some graphs. Below is a scatterplot of the percentage voting to leave against the percent of the population with a degree in an area.</p>
<p style="margin-left: 40px;"><img alt="Scatterplot of Brexit Data: Leave % vs. Degree" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/3f343dda6be65f6b25e657f39129b540/brexit_scatterplot_3.png" style="width: 577px; height: 385px;" /></p>
<p>As you can see, the higher the percentage of people in an area who had a degree, the lower the percentage of the population that voted to leave. However, there are exceptions. For example, for Osterley and Spring Grove in Hounslow, the percentage that voted to leave is 63.41%, with a higher percentage of degree holders at 37.5566%. However, the area has a small proportion of young adults, at 19.3538%.</p>
<p>Let's look at the voting behaviour for different age groups. I created scatterplots of the percentage that voted to leave against different age groups.</p>
<p>The next plot shows percentage that voted to leave against the percentage of young people (age 18-29) in an area:</p>
<p style="margin-left: 40px;"><img alt="Scatterplot of Brexit Data: Leave% vs Young" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/83203f8cfa58ea8a30e6e3a0b39d5d23/brexit_scatterplot_4.png" style="width: 577px; height: 385px;" /></p>
<p>Areas with a higher percentage of young people appear to have a smaller percentage of people who voted to leave.</p>
<p>The following plot shows the percentage of the population that voted to leave against the percentage of elderly residents:</p>
<p style="margin-left: 40px;"><img alt="Scatterplot of Brexit Data: Leave% vs. Elderly" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/708bbc59b66d9eb6028472cb3e7feb1f/brexit_scatterplot_5.png" style="width: 577px; height: 385px;" /></p>
<p>This plot shows the opposite situation shown in the previous one: areas with a higher proportion of elderly residents voted more strongly to leave.</p>
<p>These scatterplots support what’s being said in pieces such as the article on the BBC's website. However, in statistics, we like to verify that the relationship is significant. Let’s look at the correlation coefficients (<strong>Stat > Basic Statistics > Correlation...</strong>).</p>
<p style="margin-left: 40px;"><img alt="Brexit Data: Correlation - Leave%, Degree, Young, Elderly" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a3eaedc8c0404005e8caec42699d8ad1/correlation_coefficients.jpg" style="width: 340px; height: 416px;" /></p>
<p>The correlation output in Minitab includes a <a href="http://blog.minitab.com/blog/adventures-in-statistics-2/how-to-correctly-interpret-p-values">p-value</a>. If the p-value is less than the chosen significance level, it tells you the correlation coefficient is significantly different from 0—in other words, a correlation exists. Since we selected an alpha value (or significance level) of 0.05, we can say that all the coefficients calculated above are significant and that there are correlations between these factors.</p>
<p>Thus, the proportion of degree holders in an area has a strong negative impact on voting to leave. On the other hand, the proportion of elderly residents in an area has a strong positive impact on voting to leave.</p>
<p>Going a step further, I fit a regression model (<strong>Stat > Regression > Regression > Fit Regression Model...</strong>) that links the percent voting to leave with the proportion of degree holders and different age groups.</p>
<p style="margin-left: 40px;"><img alt="Brexit Data Regression: Leave% vs Degree, Young, Middle-age, Elderly" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/d9f5483603ddb35ffdea02a3c0f856e6/brexit_regression_output.jpg" style="width: 695px; height: 655px;" /></p>
<p>While there is no need to use the equation to make a prediction, we can still get some interesting information from the results.</p>
<p>The different age groups and proportion of degree holders all have an impact on the percentage voting to leave. The coefficient for the “degree” term is negative, and this implies for each unit increase in the percent of degree holders, the percentage voting to leave drops by 1.4095. On the other hand, for a unit increase in the percentage of elderly, the percentage voting to leave increases by 1.2732. In addition, there is a significant interaction between the percentage of degree holders and young people: Every unit increase in this interaction term only increases the percent voting to leave by 0.00641.</p>
<p>The results I obtained when I analyzed the data with Minitab support the commonly held view that younger voters preferred to remain in the EU, while older voters preferred to leave. The analysis also underscores the complicated politics surrounding Brexit, a reality that became apparent in the recent general election. One thing seems certain now that Brexit talks are imminent: balancing the needs and desires of the people from different age groups and backgrounds will be a tremendous task.</p>
Data AnalysisGovernmentHypothesis TestingRegression AnalysisStatisticsStatistics in the NewsMon, 26 Jun 2017 12:44:51 +0000http://blog.minitab.com/blog/statistics-and-more/gleaning-insights-from-election-data-with-basic-statistical-toolsEugenie ChungNeed to Validate Minitab per FDA Guidelines? Get Minitab's Validation Kit
http://blog.minitab.com/blog/understanding-statistics/need-to-validate-minitab-per-fda-guidelines-get-minitabs-validation-kit
<p>Last week I was fielding questions on social media about Minitab 18, the latest version of our statistical software. Almost as soon as the new release was announced, we received a question that comes up often from people in pharmaceutical and medical device companies:</p>
<p><img alt="pills" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a6d342c977e9c879a9dafcb9a1f739b8/pills_thumbg.png" style="width: 200px; height: 200px; border-width: 1px; border-style: solid; margin: 10px 15px; float: right;" />"Is Minitab 18 FDA-validated?"</p>
How Software Gets Validated
<p>That's a great question. To satisfy U.S. Food and Drug Administration (FDA) regulatory requirements, many firms—including those in the pharmaceutical and medical device industries—must validate their data analysis software. That can be a big hassle, so to make this process easier, Minitab offers a <a href="https://www.minitab.com/support/software-validation/">Validation Kit</a>.</p>
<p>We conduct extremely rigorous and extensive internal testing of Minitab Statistical Software to assure the numerical accuracy and reliability of all statistical output. Details on our software testing procedures can be found in the validation kit. The kit also includes an automated macro script to generate various statistical and graphical analyses on your machine. You can then compare your results to the provided output file that we have validated internally to ensure that the results on your machine match the validated results.</p>
Intended Use
<p>FDA regulations state that the <em>purchaser</em> must validate software used in production or as part of a quality system for the “intended use” of the software. FDA’s Code of Federal Regulations Title 21 Part 820.70(i) lays it out:</p>
<p style="margin-left:40px;"><em>“When computers or automated data processing systems are used as part of production or the quality system, the manufacturer shall validate computer software for its intended use according to an established protocol.”</em></p>
<p>FDA provides additional guidance for medical device makers in Section 6.3 of “Validation of Automated Process Equipment and Quality System Software” in the Principles of Software Validation; Final Guidance for Industry and FDA Staff, January 11, 2002.</p>
<p style="margin-left:40px;"><em>“The device manufacturer is responsible for ensuring that the product development methodologies used by the off-the-shelf (OTS) software developer are appropriate and sufficient for the device manufacturer's intended use of that OTS software. For OTS software and equipment, the device manufacturer may or may not have access to the vendor's software validation documentation. If the vendor can provide information about their system requirements, software requirements, validation process, and the results of their validation, the medical device manufacturer can use that information as a beginning point for their required validation documentation.”</em></p>
<p>Validation for intended use consists of mapping the software requirements to test cases, where each requirement is traced to a test case. Test cases can contain:</p>
<ul>
<li>A test case description. For example, <em>Validate capability analysis for Non-Normal Data.</em></li>
<li>Steps for execution. For example, go to <strong>Stat > Quality Tools > Capability Analysis > Nonnormal</strong> and enter the column to be evaluated and select the appropriate distribution.</li>
<li>Test results (with screen shots).</li>
<li>Test pass/fail determination.</li>
<li>Tester signature and date.</li>
</ul>
An Example
<p>There is good reason for the “intended use” guidance when it comes to validation. Here is an example:</p>
<p>Company XYZ is using Minitab to estimate the probability of a defective part in a manufacturing process. If the size of Part X exceeds 10, the product is considered defective. They use Minitab to perform a capability analysis by selecting <strong>Stat > Quality Tools > Capability Analysis > Normal</strong>.</p>
<p>In the following graph, the Ppk (1.32) and PPM (37 defects per million) are satisfactory.</p>
<p><img alt="Not Validated for Non-Normal Capability Analysis" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/3e39dbd6b080eb43dfa86975c27d72d4/process_capability_report_for_factor_x.png" style="border: 0px; vertical-align: middle; width: 600px; height: 444px;" /></p>
<p>However, these good numbers would mislead the manufacturer into believing this is a good process. Minitab's calculations are correct, but this data is non-normal, so normal capability analysis was the wrong procedure to use.</p>
<p>Fortunately, Minitab also offers non-normal capability analysis. As shown in the next graph, if we choose <strong>Stat > Quality Tools > Capability Analysis > Nonnormal</strong> and select an appropriate distribution (in this case, <span><a href="http://blog.minitab.com/blog/understanding-statistics/weibull-wobble-process-capability-analysis-with-nonnormal-data">Weibull</a></span>), we find that the Ppk (1.0) and PPM (1343 defects per million) are actually not acceptable:</p>
<p><img alt="Validated for Non Normal Capability Analysis" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/da2ef17e694260fe68ea862f61369888/nonnormal_process_capability_for_factor_x.png" style="border: 0px; vertical-align: middle; width: 600px; height: 444px;" /></p>
<p>Thoroughly identifying, documenting, and validating all intended uses of the software helps protect both businesses that make FDA-regulated products and the people who ultimately use them.</p>
Software Validation Resources from Minitab
<p>To download Minitab's software validation kit, visit <a href="http://www.minitab.com/support/software-validation/">http://www.minitab.com/support/software-validation/</a></p>
<p>In addition to details regarding our testing procedures and a macro script for comparing your results to our validated results, the kit also includes software lifecycle information.</p>
<p>Additional information about validating Minitab relative to the FDA guideline CFR Title 21 Part 11 is available at this link:</p>
<p><a href="http://it.minitab.com/support/answers/answer.aspx?id=2588">http://it.minitab.com/support/answers/answer.aspx?id=2588</a></p>
<p>If you have any questions about our software validation process, please <a href="http://www.minitab.com/contact-us">contact us</a>.</p>
ManufacturingMedical DevicesProject ToolsQuality ImprovementFri, 16 Jun 2017 12:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/need-to-validate-minitab-per-fda-guidelines-get-minitabs-validation-kitEston MartzA Swiss Army Knife for Analyzing Data
http://blog.minitab.com/blog/understanding-statistics/a-swiss-army-knife-for-analyzing-data
<p>Easy access to the right tools makes any task easier. That simple idea has made the Swiss Army knife essential for adventurers: just one item in your pocket gives you instant access to dozens of tools when you need them. </p>
<p><img alt="swiss army knife" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/c696dbfa168de2c0272bd81d6bec5a9a/swiss_army_knife.png" style="width: 250px; height: 161px; margin: 10px 15px; float: right;" />If your current adventures include analyzing data, the multifaceted Editor menu in <a href="http://www.minitab.com/products/minitab">Minitab Statistical Software</a> is just as essential.</p>
Minitab’s Dynamic Editor Menu
<p>Whether you’re organizing a data set, sifting through Session window output, or perfecting a graph, the Editor menu adapts so that you never have to search for the perfect tool.</p>
<p>The Editor menu only contains tools that apply to the task you're engaged in. When you’re working with a data set, the menu contains only items for use in the worksheet. When a graph is active, the menu contains only graph-related tools. You get the idea.</p>
Graphing
<p>When a graph window is active, the Editor menu contains over a dozen graph tools. Here are a few of them.</p>
<p style="margin-left: 40px;"><img alt="editor menu for graphs" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/37928ac19a835ac5e18654c6b4df1fc7/editor_menu_graphs.png" style="line-height: 20.8px; width: 383px; height: 409px;" /></p>
<p><strong>ADD</strong></p>
<p>Use <strong>Editor > Add</strong> to add reference lines, labels, subtitles, and much more to your graphs. The contents of the Add submenu will change depending on the type of graph you're editing.</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/676698418ef6540f9e8ba4ef7a01a189/editor_graph_add_ref_lines.jpg" style="width: 522px; height: 387px;" /></p>
<p><strong>MAKE SIMILAR GRAPH</strong></p>
<p>The editing features in Minitab graphs make it easy to create a graph that looks just right. But it may not be easy to reproduce that look a few hours (or a few months) later.</p>
<p>With most graphs, you can use <strong>Editor > Make Similar Graph</strong> to produce another graph with the same edits, but with new variables.</p>
<p><img alt="make similar graph dialog" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/3563cb93afb5f1c8f89da8007088be6e/editor_make_similar_graph.jpg" style="width: 688px; height: 386px;" /></p>
<p> </p>
Entering data and organizing your worksheet
<p>When a worksheet is active, the Editor menu contains tools to manipulate both the layout and contents of your worksheet. You can add column descriptions; insert cells, columns or rows; and much more, including the items below.</p>
<p><strong>VALUE ORDER</strong></p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/8bb6a840126fa8b13598366b10de5880/editor_value_order.jpg" style="width: 477px; height: 409px;" /></p>
<p>By default, Minitab displays text data alphabetically in output. But sometimes a different order is more appropriate (for example, “Before” then “After”, instead of alphabetical order). Use <strong>Editor > Column > Value Order</strong> to ensure that your graphs and other output appear the way that you intend.</p>
<p><strong>ASSIGN FORMULA TO COLUMN</strong></p>
<p><img alt="editor menu assign formula" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/3a2ba0b681ebb327efc35ffb0b4d2d50/editor_menu_assign_formula.png" style="width: 512px; height: 345px;" /></p>
<p>You can assign a formula to a worksheet column that updates when you add or change data.</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/afa18b7fc813a88a4839a1ce02539d20/editor_formula_dialog.jpg" style="width: 500px; height: 376px;" /></p>
Session window
<p>As the repository for output, the Session window is already an important component of any Minitab project, but the Editor menu makes it even more powerful. </p>
<p><strong>SHOW COMMAND LINE</strong></p>
<p>For example, most users rely on menus to run analyses, but you can extend the functionality of Minitab and save time on routine tasks with Minitab macros. If you select the "Show Command Line" option, you'll see the command language generated with each analysis, which opens the door to macro writing.</p>
<p><img alt="editor-menu-show-command-line" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/e0ca0518f8b0f5a558226c3075533393/editor_menu_command_line.png" style="width: 466px; height: 258px;" /></p>
<p>In previous versions of Minitab, the Command Line appeared in the Session window. In Minitab 18, the Command Line appears in an another pane, which keeps the Session window output clean and displays all of the commands together. The new Command Line pane is highlighted in the screen shot below:</p>
<p><img alt="graph with command pane" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/655ce14d79b4e7fdc8ebd983d2640c04/graph_with_command_pane.png" style="width: 800px; height: 327px; border-width: 1px; border-style: solid;" /></p>
<p> </p>
<p><strong>NEXT COMMAND / PREVIOUS COMMAND / EXPAND ALL / COLLAPSE ALL</strong></p>
<p>After you run several analyses, you may have a great deal of output in your Session window. This group of items makes it easy to find the results that you want, regardless of project size.</p>
<p>Next Command and Previous Command will take you back or forward one step from the currently selected location in your output.</p>
<p style="margin-left: 40px;"><img alt="editor menu - next command, expand or collapse all" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/20f8b1f65052f9051f1f29aeee7bac9d/editor_menu_next_collapse.png" style="width: 237px; height: 255px;" /></p>
<p>Expand All and Collapse All capitalize on a new feature in Minitab 18's redesigned Session window. Now you can select individual components of your output and choose whether to display all of the output (Expanded), or only the output title (Collapsed). Here's an example of an expanded output item:</p>
<p><img alt="expanded session window item" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/706ed8d7c96c02e2d3ed59b058af1589/expanded_session_item.png" style="width: 803px; height: 440px;" />And here's how the same output item appears when collapsed:</p>
<p><img alt="collapsed session item" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/3f8d321c228c7805bbd7f7062c533bd6/collapsed_session_item.png" style="width: 823px; height: 114px; border-width: 1px; border-style: solid;" /></p>
<p>When you have a lot of output items in the session window, the "Collapse All" function can make it extremely fast to scroll through them and find exactly the piece of your analysis you need at any given moment. </p>
Graph brushing
<p>Graph exploration sometimes calls for <span><a href="http://blog.minitab.com/blog/statistics-and-quality-improvement/how-to-use-brushing-to-investigate-outliers-on-a-graph">graph brushing, which is a powerful way to learn more about the points on a graph that interest you</a></span>. Here are two of the specialized tools in the Editor menu when you are in “brushing mode”.</p>
<p><strong>SET ID VARIABLES</strong></p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/95a90bfe1151233233e195760b9468ae/editor_set_id_variables.jpg" style="width: 579px; height: 387px;" /></p>
<p>It’s easy to spot an outlier on a graph, but do you know why it’s an outlier? Setting ID variables allows you to see all of the information that your dataset contains for an individual observation, so that you can uncover the factors that are associated with its abnormality.</p>
<p><strong>CREATE INDICATOR VARIABLE</strong></p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/600632b4bae6d6d64edb934279dfee70/editor_create_indicator.jpg" style="width: 439px; height: 266px;" /></p>
<p>As you brush points on a graph, an indicator variable “tags” the observations in the worksheet. This enables you to identify these points of interest when you return to the worksheet.</p>
Putting the Dynamic Menu Editor to Use
<p>Working on a Minitab project can feel like many jobs rolled into one—data wrestler, graph creator, statistical output producer. Each task has its own challenges, but in every case you can reach for the Editor menu to locate the right tools.</p>
<p> </p>
Data AnalysisStatisticsStatistics HelpStatsWed, 14 Jun 2017 12:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/a-swiss-army-knife-for-analyzing-dataEston MartzCompanion by Minitab: Deep Dive into the Desktop App (Part 2)
http://blog.minitab.com/blog/quality-business/companion-by-minitab-deep-dive-into-the-desktop-app
<p>Companion by Minitab® is our software for executing and reporting on quality improvement projects. It has two components, a <em>desktop </em><em>app</em> and a <em>web app</em>. As practitioners use the Companion desktop app to do project work, their project information automatically rolls up to Companion’s web app dashboard, where stakeholders can see graphical summaries and reports. Since the dashboard updates automatically, teams are freed to complete critical tasks instead of creating reports or entering data in a separate system.</p>
<p>Previously, I offered an overview of the <a href="http://blog.minitab.com/blog/quality-business/companion-by-minitab%3A-desktop-app-and-web-app-terminology-part-1">whole Companion platform</a>. In this blog, I will explore the desktop app in depth, and in a future blog, I will <a href="http://blog.minitab.com/blog/quality-business/companion-by-minitab-deep-dive-into-the-web-app">explore the web app</a>.</p>
<p align="center"><img alt="Companion Big Picture" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/a20d6dc65a9cad271c0f9992f33de2f6/1_companion_big_picture.png" style="width: 490px; height: 237px; margin: 10px 15px;" /></p>
The Companion Desktop Application
<p>Companion's desktop application provides tools and forms that are used by the project owners and practitioners to execute projects efficiently and consistently. Using consistent methodologies, forms, and metrics allows teams working on projects to devote more of their time to critical, value-added project tasks. </p>
<p>The desktop app delivers a comprehensive set of integrated project tools, in an easy-to-use interface.</p>
<ul>
<li>The Project Manager is a window that provides access to high-level project data. It also includes the Roadmap™, which shows the phases and specific tools used to organize and complete projects.</li>
<li>The workspace is where team members work with individual tools. The workspace always displays the currently active tool.</li>
</ul>
<p align="center"><img alt="Desktop UI" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/6af4cffcfb3269018f9dd478cbcbb56d/3_desktop_interface.png" style="width: 1070px; height: 517px; margin: 10px 15px;" /></p>
The Project Manager
<p>The Project Manager offers instant access to project data and tools. The Management Section includes the following components:</p>
<p><img alt="Management Forms" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/74682fabded29ea48612de5ca80fe997/2_dmaic_project_manager.png" style="margin-left: 12px; margin-right: 12px; float: left; width: 350px; height: 275px; border-width: 1px; border-style: solid;" /></p>
<p><strong>Project Today:</strong><br />
Provides a snapshot of overall project status, health, and phases.</p>
<p><strong>Project Charter: </strong><br />
Defines the project and its benefits, and is updated as the project progresses.</p>
<p><strong>Financial Data: </strong><br />
Records the project’s financial impact in terms of annualized or monthly hard and soft savings.</p>
<p><strong>Team Members and Roles: </strong><br />
Compiles contact and role information for each member of the project team. Easily imports contacts from Microsoft Outlook and from your Companion subscription user list.</p>
<p><strong>Tasks: </strong><br />
Outlines the actions required to complete the project. Enables team leaders to identify and assign responsibilities, set priorities, and establish due dates.</p>
Roadmap™
<p><img alt="Roadmaps" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/709af88b9ab2de89c58b4e96b32a6175/4_roadmap.png" style="margin: 10px 15px; float: right; width: 400px; height: 300px; border-width: 1px; border-style: solid;" />Companion’s Roadmap™ feature gives teams a clear path to execute and document each phase of their projects. The Companion desktop app includes predefined Roadmap™ templates based on common continuous improvement methodologies, including DMAIC, Kaizen, QFD, CDOV, PDCA, and Just Do It. </p>
<p>The Roadmaps contains phases, and the phases contain the tools appropriate to each phase. However, because every project is different, users can easily add or remove tools as needed. Built-in guidance for each tool further helps practitioners complete their tasks in a timely manner. </p>
<p>Since many organizations use their own methods, metrics, and KPIs, we’ve made it simple to create or customize a Roadmap™ for your organization’s unique approach to improvement. </p>
Powerful Project Tools, All in One Place
<p>Companion’s desktop app includes a full set of easy-to-use tools, such as:</p>
<p><img alt="Insert Tool" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/54ea6c012c353cf6040292302fd00b44/5_insert_tool.png" style="margin: 10px 15px; float: right; width: 425px; height: 322px; border-width: 1px; border-style: solid;" /></p>
<p>• Value stream map</p>
<p>• FMEA</p>
<p>• Process map</p>
<p>• Brainstorming</p>
<p>• Monte Carlo simulation</p>
<p>• And many more</p>
<p>As teams add specific tools to their project file, they appear within the selected phases of a Roadmap™. You can even customize or build tools from scratch (Blank Form) for processes or methods unique to your organization.</p>
Data sharing in forms and tools
<p>The tools within the Companion desktop app are smart and integrated. Information you add in one tool can be used in other tools, so you only need to type it once—no more redundant entry of the same information into multiple documents and applications!</p>
<p>For example, as you complete a C&E Matrix, you can import the variables you previously added to a process map. And as you rate the importance of the inputs relative to the outputs in the matrix, Companion calculates the results to build a Pareto chart on the fly. You can easily create forms that include your own custom charts and calculations, too.</p>
<p align="center"><img alt="CE Matrix" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/8efc9d94b3b4b7717fadd740537e84be/6_ce_matrix.png" style="width: 530px; height: 627px; border-width: 1px; border-style: solid; margin: 10px 15px;" /></p>
Monte Carlo Simulation Tool
<p><a href="http://www.minitab.com/en-us/products/companion/" target="_blank">Companion by Minitab®</a> contains a very powerful Monte Carlo simulation tool. With its easy to use interface and guided workflow, this tool helps engineers and process improvement practitioners quickly simulate product results and provides step-by-step guidance for optimization to determine best settings for process inputs that result in acceptable outputs. </p>
<p>The results are easy to understand and next steps are identified. The tool includes Parameter Optimization to find the optimal settings for your input parameters to improve results and reduce defects. It also includes Sensitivity Analysis to quickly identify and quantify the factors driving variation. By using these to pinpoint exactly where to reduce variation, you can quickly get your process where it needs to be.</p>
<p style="text-align: center;"><img alt="Monte Carlo Simulation" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/a6f5e65fda86bb02826c5fbb92e39a80/7_mc_simulation.png" style="width: 626px; height: 360px; border-width: 1px; border-style: solid; margin: 10px 15px;" /></p>
<p><a href="http://www.minitab.com/en-us/products/companion/" target="_blank">Companion by Minitab's</a> desktop application is an excellent tool that can propel your projects to success. It gives you the tools for executing projects all in one place, Roadmaps to guide your teams through the appropriate problem-solving process, interconnected forms to eliminate redundant data entry—and because it automatically updates the Companion dashboard, it even makes project reporting completely effortless. Literally.</p>
<p>I believe Companion is the best tool on the market for efficient project execution and summarizing the project work. Why wouldn’t you want to give your people the best tools to make difficult problem solving and reporting easier?</p>
<p>My next post will provide <a href="http://blog.minitab.com/blog/quality-business/companion-by-minitab-deep-dive-into-the-web-app">a deep dive into the Companion's web app</a>. You can also visit our site for more information about <a href="http://www.minitab.com/en-us/products/companion/" target="_blank">Companion by Minitab®</a> or to download your 30-day free trial for your entire team.</p>
<p> </p>
Lean Six SigmaProject ToolsQuality ImprovementSix SigmaFri, 09 Jun 2017 15:16:00 +0000http://blog.minitab.com/blog/quality-business/companion-by-minitab-deep-dive-into-the-desktop-appBonnie K. StoneSee the New Features and Enhancements in Minitab 18 Statistical Software
http://blog.minitab.com/blog/understanding-statistics/see-the-new-features-and-enhancements-in-minitab-18-statistical-software
<p>It's a very exciting time at Minitab's offices around the world because we've just announced the availability of Minitab® 18 Statistical Software.</p>
<p><a href="http://www.minitab.com/products/minitab/whats-new/"><img alt="What's new in Minitab 18?" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/64e5153baa676ef27cd0672296ff2a2d/minitab18_whatsnew_blue_twitter.png" style="width: 400px; height: 200px; margin: 10px 15px; float: right;" /></a>Data is everywhere today, but to use it to make sound, strategic business decisions, you need to have tools that turn that data into knowledge and insights. We've designed Minitab 18 to do exactly that. </p>
<p>We've incorporated a lot of <a href="http://www.minitab.com/products/minitab/whats-new/">new features</a>, made some great enhancements and put a lot of energy into developing a tool that will make getting insight from your data faster and easier than ever before, and we're excited to get feedback from you about the new release. </p>
<p>The advanced capabilities we've added to Minitab 18 include tools for measurement systems analysis, statistical modeling, and Design of Experiments (DOE). With Minitab 18, it’s much easier to test how a large number of factors influence process output, and to get more accurate results from models with both fixed and random factors.</p>
<p>We'll delve into more detail about these features in the coming weeks, but today I wanted to give you a quick overview of some of the most exciting additions and improvements. You can also check out one of our <a href="http://www.minitab.com/en-us/products/minitab/webinars/">upcoming webinars</a> to see the new features demonstrated. Then I hope you'll check them out for yourself—you can <a href="http://www.minitab.com/products/minitab/free-trial/">get Minitab 18 free for 30 days</a>.</p>
Updated Session Window
<img alt="updated session window in Minitab 18" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/9d32d30bb81f568e265afc3549b9569e/whats_new_1_1_.png" style="width: 300px; height: 208px; margin: 10px 15px; float: right;" />
<p>The first thing longtime Minitab users are likely to notice when they launch Minitab 18 is the enhancements we've made to the Session window, which contains the output of all your analyses. </p>
<div>The Session window looks better, and also now includes the ability to:</div>
<ul>
<li>Specify the number of significant digits (decimal places) in your output</li>
<li>Go directly to graphs by clicking links in the output</li>
<li>Expand and collapse analyses for easier navigation</li>
<li>Zoom in and out </li>
</ul>
<img alt="sort worksheets in Minitab 18's project manager" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/b5ba230cd2917d387da0b0eb9a59986e/whats_new_2_1_.png" style="width: 200px; height: 139px; margin: 10px 15px; float: right;" />
Sort Worksheets in the Project Manager
<p>We've also added the option to sort the worksheets in your project by title or in chronological order, so you can manage and work with your data in the Project Manager more easily.</p>
Definitive Screening Designs
<p>Many businesses need to determine which inputs make the biggest impact on the output of a process. When you have a lot of inputs, as most processes do, this can be a huge challenge. Standard experimental methods can be costly and time-consuming, and may not be able to distinguish main effects from the two-way interactions that occur between inputs.</p>
<p>That challenge is answered in Minitab 18 with Definitive Screening Designs, a type of designed experiment that minimizes the number of experimental runs required, but still lets you identify important inputs without confounding main effects and two-way interactions.</p>
Restricted Maximum Likelihood (REML) Estimation
<p>Another feature we've added to Minitab 18 is restricted maximum likelihood (REML) estimation. This is an advanced statistical method that improves inferences and predictions while minimizing bias for mixed models, which include both fixed and random factors.</p>
New Distributions for Tolerance Intervals
<p>With Minitab 18 we've made it easy to calculate statistical tolerance intervals for nonnormal data with distributions including the Weibull, lognormal, exponential, and more.</p>
Effects Plots for Designed Experiments (DOE)
<p>In another enhancement to our Design of Experiments (DOE) functionality, we've added effects plots for general factorial and response surface designs, so you can visually identify significant X’s.</p>
Historical Standard Deviation in Gage R&R
<p>If you're doing the measurement system analysis method known as Gage R&R, Minitab 18 enables you to enter a user-specified process (historical) standard deviation in relevant calculations.</p>
Response Optimizer for GLM
<p>When you use the <a href="http://blog.minitab.com/blog/statistics-support/wave-a-magic-wand-over-your-doe-analyses">response optimizer</a> for the general linear model (GLM), you can include both your factors and covariates to find optimal process settings.</p>
Output in Table Format to Word and Excel
<p>The Session window output can be imported into Word and Excel in table format, which lets you easily customize the appearance of your results.</p>
Command Line Pane
<p>Many people use Minitab's command line to expand the software's functionality. With Minitab 18, we've made it easy to keep commands separate from the Session output with a docked command line pane. </p>
Updated Version of Quality Trainer
<p>Finally, it's worth mentioning that the release of Minitab 18 is complemented by a new version of <a href="http://www.minitab.com/products/quality-trainer/">Quality Trainer by Minitab®</a>, our e-learning course. It teaches you how to solve real-world quality improvement challenges with statistics and Minitab, and lets you refresh that knowledge anytime. If you haven't tried it yet, you can check out a sample chapter now. </p>
<p>We hope you'll try the latest Minitab release! And when you do, please be sure to let us know what you think: we love to get your feedback and input about what we've done right, and what we can make better! Send your comments to feedback@minitab.com. </p>
Data AnalysisInsightsLean Six SigmaSix SigmaStatisticsStatistics HelpStatsWed, 07 Jun 2017 17:09:00 +0000http://blog.minitab.com/blog/understanding-statistics/see-the-new-features-and-enhancements-in-minitab-18-statistical-softwareEston MartzDoing Gage R&R at the Microscopic Level
http://blog.minitab.com/blog/statistics-in-the-field/doing-gage-randr-at-the-microscopic-level
<p><em>by Dan Wolfe, guest blogger</em></p>
<p>How would you measure a hole that was allowed to vary one tenth the size of a human hair? What if the warmth from holding the part in your hand could take the measurement from good to bad? These are the types of problems that must be dealt with when measuring at the micron level.</p>
<img alt="a 10-micron fiber" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/bcf2f768b2e355d2d8e3113a404ceb36/320px_cfaser_haarrp_1_.jpg" style="width: 320px; height: 182px; margin: 10px 15px; float: right;" />
<div>
<p>As a Six Sigma professional, that was the challenge I was given when Tenneco entered into high-precision manufacturing. In Six Sigma projects “gage studies” and “Measurement System Analysis (MSA)” are used to make sure measurements are reliable and repeatable. It’s tough to imagine doing that type of analysis without <a href="http://www.minitab.com/products/minitab">statistical software</a> like Minitab.</p>
<div>Tenneco, the company I work for, creates and supplies clean air and ride performance products and systems for cars and commercial vehicles. Tenneco has revenues of $7.4 billion annually, and we expect to grow as stricter vehicle emission regulations take effect in most markets worldwide over the next five years.</div>
<p>We have an active and established Six Sigma community as part of the “Tenneco Global Process Excellence” program, and Minitab is an integral part of training and project work at Tenneco.</p>
Verifying Measurement Systems
<p>Verifying the measurement systems we use in precision manufacturing and assembly is just one instance of how we use Minitab to make data-driven decisions and drive continuous improvement.</p>
<p>Even the smallest of features need to meet specifications. Tolerance ranges on the order of 10 to 20 microns require special processes not only for manufacturing, but also measurement. You can imagine how quickly the level of complexity grows when you consider the fact that we work with multiple suppliers from multiple countries for multiple components.</p>
<p>To gain agreement between suppliers and Tenneco plants on the measurement value of a part, we developed a process to work through the verification of high precision, high accuracy measurement systems such as CMM and vision.</p>
<p>The following <a href="http://blog.minitab.com/blog/understanding-statistics/sipoc-alypse-now">SIPOC (Supplier, Input, Process, Output, Customer)</a> process map shows the basic flow of the gage correlation process for new technology.</p>
<p><a href="//cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/File/4bc224d356f7f41309744ee0da2b7988/sipoc_large.jpg"><img alt="sipoc " src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/6e031650e769d3c9c6b233a23bbf6199/sipoc_sm.jpg" style="width: 438px; height: 377px;" /></a></p>
What If a Gage Study Fails?
<p>If any of the gage studies fail to be approved, we launch a problem-solving process. For example, in many cases, the Type 1 results do not agree at the two locations. But given these very small tolerance ranges, seemingly small differences can have significant practical impact on the measurement value. One difference was resolved when the ambient temperature in a CMM lab was found to be out of the expected range. Another occurred when the lens types of two vision systems were not the same.</p>
<p>Below is an example of a series of Type 1 gage studies performed to diagnose a repeatability issue on a vision system. It shows the effect of part replacement (taking the part out of the measurement device, then setting it up again) before each measurement and the bias created by handling the part.</p>
<p>For this study, we took the results of 25 measurements made when simply letting the part sit in the machine and compared them with 25 measurements made when taking the part out and setting it up again between each of 25 measurements. The analysis shows picking the part up, handling it and resetting it in the machine changes the measurement value. This was found to be <a href="http://blog.minitab.com/blog/the-stats-cat/sample-size-statistical-power-and-the-revenge-of-the-zombie-salmon-the-stats-cat">statistically significant, but not <em>practically </em>significant</a>. Knowing the results of this study helps our process and design engineers understand how to interpret the values given to them by the measurement labs, and give some perspective on the considerations of the part and measurement processes.</p>
<p>The two graphs below show Type 1 studies done with versus without replacement of the part. There is a bias between the two studies. A test for equal variance shows a difference in variance between the two methods.</p>
<p><img alt="Type 1 Gage Study with Replacement" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/de9796cecba9ea3102fc659f6f4bcfe4/type1gagestudy_withreplacement.jpg" style="width: 572px; height: 384px;" /></p>
<p><img alt="Type 1 Gage Study without Replacement" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/cf8cd8158947fc22bd88087123a1b87f/type1gagestudy_withoutreplacement.jpg" style="width: 568px; height: 383px;" /></p>
<p>As the scatterplot below illustrates, the study done WITH REPLACEMENT has higher standard deviation. It is statistically significant, but still practically acceptable.</p>
<p><img alt="With Replacement vs. Without Replacement" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/380d8e62f6fbc15aac9f8299bed42924/scatterplot.jpg" style="width: 570px; height: 372px;" /></p>
<p>Minitab’s gage study features are a critical part of the gage correlation process we have developed. Minitab has been integrated into Tenneco’s Six Sigma program since it began in 2000.</p>
<p>The powerful analysis and convenient graphing tools are being used daily by our Six Sigma resources for these types of gage studies, problem-solving efforts, quality projects, and many other uses at Tenneco.</p>
<p> </p>
<p><strong>About the Guest Blogger</strong>:</p>
<p>Dan Wolfe is a Certified Lean Six Sigma Master Belt at Tenneco. He has led projects in Engineering, Supply Chain, Manufacturing and Business Processes. In 2006 he was awarded the Tenneco CEO award for Six Sigma. As a Master Black Belt he has led training waves, projects and the development of business process design tools since 2007. Dan holds a BSME from The Ohio State University and an MSME from Oakland University and a degree from the Chrysler Institute of Engineering for Automotive Engineering.</p>
<p> </p>
<p><em style="border: 0px; margin: 0px; padding: 0px;"><strong style="border: 0px; margin: 0px; padding: 0px;">Would you like to publish a guest post on the Minitab Blog? Contact <a href="mailto:publicrelations@minitab.com?subject=I%20Would%20Like%20to%20Be%20a%20Guest%20Blogger" style="border-width: 0px 0px 0.1em; border-bottom-style: dotted; border-bottom-color: rgb(0, 47, 97); margin: 0px; padding: 0px; color: rgb(0, 47, 97); text-decoration: none;">publicrelations@minitab.com</a>. </strong></em></p>
</div>
AutomotiveManufacturingQuality ImprovementSix SigmaTue, 06 Jun 2017 12:00:00 +0000http://blog.minitab.com/blog/statistics-in-the-field/doing-gage-randr-at-the-microscopic-levelGuest BloggerReducing the Phone Bill with Statistical Analysis
http://blog.minitab.com/blog/understanding-statistics/reducing-the-phone-bill-with-statistical-analysis
<p>One of the most memorable presentations at the inaugural Minitab Insights conference reminded me that data analysis and quality improvement methods aren't only useful in our work and businesses: they can make our home life better, too. </p>
<p><img alt="you won't believe how cheap my phone bill is now! " src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/8bd65d370741a7670c0c17d88ea157d2/phone_kid.jpg" style="width: 291px; height: 225px; float: right; margin: 10px 15px;" />The presenter, a continuous improvement training program manager at an aviation company in the midwestern United States, told attendees how he used Minitab Statistical Software, and some simple quality improvement tools, to reduce his phone bill.</p>
<p>He took the audience back to 2003, when his family first obtained their cell phones. For a few months, everything was fine. Then the April bill arrived, and it was more than they expected. The family had used too many minutes. </p>
<p>The same thing happened again in May. In June, the family went over the number of minutes allocated in their phone plan again, for the third month in row. Something had to change!</p>
Defining the Problem
<p>His wife summed up the problem this way: "There is a problem with our cell phone plan, because the current minutes are not enough for the family members over the past three months." </p>
<p>He wasn't sure that "too few minutes" was the real problem. But instead of arguing, he applied his quality improvement training to find common ground. He and wife agreed that the previous three months' bills were too much, and they were able to agree that the family went over the plan minutes—for an unknown reason. Based on their areas of agreement, they revised the initial problem statement: </p>
<p style="margin-left: 40px;"><em>There is a problem with our cell phone usage, and this is known because the minutes are over the plan for the past 3 months, leading to a strain on the family budget.</em></p>
<p>They further agreed that before taking further action—like switching to a costlier plan with more minutes—they needed to identify the root cause of the overage. </p>
Using Data to Find the Root Cause(s)
<img alt="pie chart of phone usage" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a60a867c650a9998e5abcad46b6f68c0/phone_usage_1.png" style="width: 200px; height: 250px; margin: 10px 15px; float: right;" />
<div>
<p>At this point, he downloaded the family's phone logs from their cell phone provider and began using <a href="http://www.minitab.com/products/minitab/">Minitab Statistical Software</a> to analyze the data. First, he used a simple pie chart to look at who was using the most minutes. Since he also had a work-provided cell phone, it wasn't surprising to see that his wife used 4 minutes for each minute of the family plan he used. </p>
<p>Since his wife used 75% of the family's minutes, he looked more closely for patterns and insights in her call data. He created time series plots of her daily and individual call minutes, and created I-MR and Xbar-S charts to assess the stability of her calling process over time. </p>
<p style="margin-left: 40px;"><img alt="I-MR chart of daily phone minutes" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/1c1bba795b4182fbabcde66f1e0623bf/phone_usage_2_i_mr.png" style="width: 500px; height: 333px; border-width: 1px; border-style: solid;" /></p>
<p style="margin-left: 40px;"><img alt="Xbar-S Chart of Daily Minutes Per Week" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/c525f6527170e396bfc1f513794cd2b7/phone_usage_3_xbar_s.png" style="width: 500px; height: 334px; border-width: 1px; border-style: solid;" /></p>
<p>He also subgrouped calls by day of the week and displayed them in a boxplot. </p>
<p style="margin-left: 40px;"><img alt="Boxplot of daily minutes used" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/714c6ead49902f9f657ae36b4a33b90b/phone_usage_4_boxplot.png" style="width: 500px; height: 332px; border-width: 1px; border-style: solid;" /></p>
<p>These analyses revealed that daily minute usage did contain some "<a href="http://blog.minitab.com/blog/understanding-statistics/control-charts-show-you-variation-that-matters">special cause variation</a>," shown in the I-MR chart. They also showed that, compared to other days of the week, Thursdays had a greater average daily minutes and variance. </p>
<p>Creating a Pareto chart of his wife's phone calls provided further insight. </p>
<p style="margin-left: 40px;"><img alt="Pareto chart of number called" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/cdf58a1aa4f7d633243d64c989a1306b/phone_usage_5_pareto.png" style="width: 500px; height: 334px; border-width: 1px; border-style: solid;" /></p>
<p>The Minitab analysis helped them see where and when most of their minutes were going. But as experienced professionals know, sometimes the numbers alone don't tell the entire story. So the family discussed the results to put those numbers in context and to see where some improvements might be possible.</p>
<p>The most commonly called number belonged to his wife's best friend, who used a different cell phone provider than the family did. This explained the Thursday calls, because every weekend his wife and her friend took turns shopping garage sales on opposite sides of town to get clothes for their children. They did their coordination on Thursday evenings.</p>
<p>Calls to her girlfriend could have been free if they just used the same provider, but the presenter's family didn't want to change, and it wasn't fair to expect the other family to change. But while a few calls to her girlfriend may have been costing a few dollars, the family was saving many more dollars on clothes for the kids. </p>
<p>Given the complete context, this was a situation where the calls were paying for themselves, so the family moved on to the next most frequently called number: the presenter's mother's land line.</p>
<p>His wife spoke very frequently with his mother to arrange childcare and other matters. His mother had a cell phone from the same provider, so calls to the cell phone should be free. Why, then, was his wife calling the land line? "Because," his wife informed him, "your mother never answers her cell phone." </p>
Addressing the Root Cause
<p>The next morning, the presenter visited his mother and eventually he steered the conversation to her cell phone. "I just love using the cell phone on weekends," his mother told him. "I use it to call my old friends during breakfast, and since it's the weekend the minutes are free!" </p>
<p>When he asked how she liked using the cell phone during the week, his mother's face darkened. "I hate using the cell phone during the week," she declared. "The phone rings all the time, but when I answer there's never anyone on the line!" </p>
<p>This seemed strange. To get some more insight, her son worked with her to create a spaghetti diagram that showed her typical movements during the weekday when her cell phone rang. That diagram, shown below, revealed two important things.</p>
<p style="margin-left: 40px;"><img alt="spaghetti diagram" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/b3955e2dfdccdfce57bc529e7ba2f1dd/phone_usage_6_spaghetti_diagram.jpg" style="width: 500px; height: 386px;" /></p>
<p>First, it showed that his mother loved watching television during the day. But second, and more important when it came to using the cell phone, his mother needed to get up from her chair, walk into the dining room, and retrieve her cell phone—which she always kept on the dining room table—in order to answer it. </p>
<p>Her cell phone automatically sent callers to voice mail after three rings. But it took his mother longer than three rings to get from her chair to the phone. What's more, since she never learned to use the voice mail ("Son, there is no answering machine connected to this phone!"), his mother almost exclusively used the cell phone to make outgoing calls. </p>
<p>Now that the real root cause underlying this major drain on the family's cell phone minutes was known, a potential solution could be devised and tested. In this case, rather than force his mother to start using voicemail, he came up with an elegant and simple alternative: </p>
<p style="margin-left: 40px;"><strong>Job Instructions for Mom:</strong></p>
<p style="margin-left: 40px;">When receiving call on weekday:</p>
<ul>
<li style="margin-left: 40px;">Go to cell phone.</li>
<li style="margin-left: 40px;">Pick up phone.</li>
<li style="margin-left: 40px;">Press green button twice.</li>
<li style="margin-left: 40px;">Wait for person who called to answer phone.</li>
</ul>
<p>After a few test calls to make sure his mother was comfortable with the new protocol, they tested the new system for a month. </p>
The Results
<p>To recap, solving this problem required four steps. First, the presenter and his wife needed to clearly define the problem. Second, they used statistical software to get insight into the problem from the available data. From there, a spaghetti chart and a set of simple job instructions provided a very viable solution to test. And the outcome? </p>
<p style="margin-left: 40px;"><img alt="Bar Chart of Phone Bills" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/262a6433521414d59edc6b86fd38efcd/phone_usage_7_bar_chart.png" style="width: 500px; height: 333px; border-width: 1px; border-style: solid;" /></p>
<p>As the bar graph shows, July's minutes were well within their plan's allotment. In that month's Pareto chart, what had been the second-largest bar dropped to near zero. His mother enjoyed her cell phone much more, and his wife was able to arrange child care with just one call. </p>
<p>And to this day, when the presenter wants to talk to his mother, he: </p>
<p style="margin-left: 40px;">1. Calls her cell phone.<br />
2. Lets it ring 3 times.<br />
3. Hangs up.<br />
4. Waits for her return call.</p>
<p>Happily, this solution turned out to be very sustainable, as the monthly minutes remained within the family's allowance and budget for quite some time...and then his daughter got a cell phone, and texting issues began.</p>
<p>Where could you apply data analysis to get more insight into the challenges you face? </p>
</div>
InsightsLean Six SigmaQuality ImprovementSix SigmaWed, 10 May 2017 13:04:00 +0000http://blog.minitab.com/blog/understanding-statistics/reducing-the-phone-bill-with-statistical-analysisEston Martz