Minitab | MinitabBlog posts and articles about using Minitab software in quality improvement projects, research, and more.
http://blog.minitab.com/blog/minitab/rss
Mon, 23 Oct 2017 16:48:12 +0000FeedCreator 1.7.3Fighting Wildfires with Statistical Analysis
http://blog.minitab.com/blog/statistics-in-the-field/fighting-wildfires-with-statistical-analysis
<p><img alt="Smoke Jumper" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/c4bd8e92976a81935bb457008727857d/smokejumper_a.jpg" style="width: 194px; height: 288px; float: right;" />Wildfires in California have killed at least 40 people and burned more than 217,000 acres in the past few weeks. Nearly 8,000 firefighters are trying to contain the blazes with the aid of more than 800 firetrucks, 70 helicopters and 30 planes.</p>
<p>In remote areas difficult to access by firetruck, smokejumpers may be needed to parachute in to fight the fires. But danger looms before a smokejumper even confronts a fire.</p>
<p>In statistics, we ask: “<a href="http://blog.minitab.com/blog/understanding-statistics/the-single-most-important-question-in-every-statistical-analysis" rel="noreferrer" target="_blank">Can you trust your data?</a>”</p>
<p>For a smokejumper, the critical initial question is: “Can you trust your parachute?”</p>
Smokejumping + Statistics = Technical Fire Management
<p>When they’re not battling wildfires, many smokejumpers pursue advanced studies in fields like fire management, ecology, forestry, and engineering in the off-season.</p>
<p>At Washington Institute, smokejumpers and other students in the Technical Fire Management (TFM) program apply quantitative methods — often using Minitab Statistical Software — to evaluate alternative solutions to fire management problems.</p>
<p>“The students in this program are mid-career wildland firefighters who want to become fire managers, i.e., transition from a technical career path to a professional path in the federal government,” said Dr. Robert Loveless, a statistics instructor for TFM.</p>
<p>As part of the program, the students must complete a project in wildland fire management. One primary analysis tool for these projects is statistics.</p>
<p>“Many students have no, or a limited, background in any college-level coursework,” Loveless noted. "So, teaching stats can be a real challenge."</p>
<p>Minitab often helps students overcome that challenge.</p>
<p>“Most students find using Minitab to be easy and intuitive,” Dr. Loveless said. That helps them focus on their research objectives without getting lost in tedious calculations or a complex software interface.</p>
<p>For his TFM project, Steve Stroud used Minitab to evaluate the relationship between the age, the number of jumps used, and the permeability of a smokejumper’s parachute.</p>
<p>The permeability of a parachute is a key measure of its performance. Repeated use and handling cause the nylon fabric to degrade, increasing its permeability. If permeability becomes too high, the chute opens more slowly, the speed of descent increases, and the chute becomes less responsive to steering maneuvers.</p>
<p>Not things you want to happen when you’re skydiving over the hot zone of raging wildfire.</p>
99% Confidence Intervals for Parachute Permeability Found Using Minitab
<p>Stroud sampled 70 smokejumper parachutes and recorded their age, number of jumps, and the permeability of cells within each parachute.</p>
<p>Permeability is measured as the airflow through the fabric in cubic feet of air per one square foot per minute (CFM). For a new parachute, industry standards dictate that the CFM should be less than 3.0 CFM. The chute can be safely used until its average permeability exceeds 12.0 CFM, at which time it’s considered unsafe and should be removed from service.</p>
<p>Using the descriptive statistics command in Minitab, the study determined:</p>
<ul>
<li>Smokejumpers could be 99% confident that the mean permeability in unused parachutes (0-10 years old, with no jumps) was between 1.99 and 2.31 CFM, well within industry standards.</li>
<li>Only one unused parachute, an outlier, had a cell with a CFM greater than 3.0 (3.11). Although never used in jumps, this parachute was 10 years old and had been packed and repacked repeatedly.</li>
<li>For used parachutes (0-10 years old, with between 1-140 jumps), smokejumpers could be 99% confident that the mean permeability of the parachutes was between 4.23 and 4.61 CFM. The maximum value in the sample, 9.88, was also well below the upper limit of 12.0 CFM.</li>
</ul>
Regression Analysis to Estimate Parachute Service Life
<p>The service life for the smokejumper parachutes was 10 years at that time. However, this duration was based on a purchase schedule used by the U.S. military for a different type of parachute with different usage. Smokejumpers use a special rectangular parachute made of pressurized fabric airfoil.</p>
<p><img alt="Drogue" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/fa184dc8d3f486c1c142375af10317a5/drogue.jpg" style="width: 640px; height: 357px;" /></p>
<p>Stroud wanted to determine a working service life appropriate for the expected use and wear of smokejumper chutes. Using Minitab’s regression analysis, he developed a model to predict permeability of smokejumper parachutes based on number of jumps and age (in years). (A logarithmic transformation was used to stabilize the unconstant variance shown by the Minitab residual plots.)</p>
<p>-------------------------------------------------------------------------</p>
Regression Analysis: logPerm versus logJumps, logAge
<p>The regression equation is logPerm = 0.388 + 0.198 logJumps + 0.170 logAge</p>
Predictor
Coef
SE Coef
T
P
Constant
0.38794
0.02859
13.57
0.000
logJumps
0.197808
0.007920
24.97
0.000
logAge
0.17021
0.01704
9.99
0.000
<p>S = 0.196473 R-Sq = 76.6% R-Sq(adj) = 76.4%</p>
Analysis of Variance
Source
DF
SS
MS
F
P
Regression
2
56.201
28.100
727.96
0.000
Residual Error
446
17.216
0.039
Total
448
73.417
<p>-----------------------------------------------------------------------</p>
<p>Both predictors, the number of jumps (log) and age of the parachute (log), were statistically significant (P = 0.000). The coefficient for LogJumps (0.19708) was greater than logAge (0.17021), indicating that number of jumps is a stronger predictor of the permeability of a parachute than its age. The R-Sq value indicates the model explains approximately 75% of the variation in parachute permeability.</p>
<p>Using the fitted model, the permeability of the chutes can be predicted for a given number of jumps and age. Based on 99% prediction intervals for new observations, the study concluded that the service life of chutes could be extended safely to 20 years and/or 300 jumps before the permeability of any single parachute cell reached an upper prediction limit of 12 CFM.</p>
<p>By adopting this extended service life, Stroud estimated they could save over $700,000 in budget costs over a period of 20 years, while still ensuring the safety of the chutes.</p>
<p>Stroud’s TFM student research project, completed in 2010, provided the impetus for further investigation and potential policy change in two federal agencies.</p>
<p><i><b>Photo credits:</b> Smokejumper photos courtesy of <a href="http://www.spotfireimages.com/" rel="noreferrer" target="_blank">Mike McMillan</a></i></p>
<p><i>Want to experience what it feels like to smokejump over a wildfire while remaining safely at your seat? Check out the incredible smokejumping video clips at <a href="http://www.spotfireimages.com/" rel="noreferrer" target="_blank">http://www.spotfireimages.com/</a></i></p>
Data AnalysisRegression AnalysisStatisticsThu, 19 Oct 2017 11:08:00 +0000http://blog.minitab.com/blog/statistics-in-the-field/fighting-wildfires-with-statistical-analysisGuest Blogger3 Ways to Gain Buy-In for Continuous Improvement
http://blog.minitab.com/blog/statistics-in-the-field/3-ways-to-gain-buy-in-for-continuous-improvement
<div style="clear:both;">
<p paraeid="{3130d88d-6fa0-42eb-8d14-3264349ed8ed}{175}" paraid="1158853640"><span xml:lang="EN-US">Research out of the Juran Institute, which specializes in training, certification, and consulting on quality management globally, reveals that only </span><a href="http://info.juran.com/hubfs/documents/9002%20The%20No.%201%20Reason%20Why%20Performance%20Improvement%20Programs%20Fail.pdf" rel="noreferrer" target="_blank"><span xml:lang="EN-US">30 percent of improvement initiatives succeed</span></a><span xml:lang="EN-US">. </span> </p>
</div>
<div style="clear:both;">
<p paraeid="{3130d88d-6fa0-42eb-8d14-3264349ed8ed}{194}" paraid="1131428731"><span xml:lang="EN-US">And why do these initiatives fail so frequently? This research concludes that </span><a href="http://info.juran.com/hubfs/documents/9002%20The%20No.%201%20Reason%20Why%20Performance%20Improvement%20Programs%20Fail.pdf" rel="noreferrer" target="_blank"><span xml:lang="EN-US">a lack of management support is the No. 1 reason quality improvement initiatives fail</span></a><span xml:lang="EN-US">. But this is certainly not a problem isolated to just continuous improvement, as other types of strategic initiatives across the organization face similar challenges. Surveys of C-level executives by the Economist Intelligence Unit concur—sharing that lack of </span><span xml:lang="EN-US">leadership</span> <span xml:lang="EN-US">buy-in and support can stop the success of many strategic initiatives.</span> </p>
</div>
<div style="clear:both;">
<strong><span xml:lang="EN-US">Why Else Do Quality Initiatives Fail? </span></strong>
</div>
<div style="clear:both;">
<p paraeid="{3130d88d-6fa0-42eb-8d14-3264349ed8ed}{206}" paraid="1516713042"><span xml:lang="EN-US">Evidence shows that company leaders just don't have good access to the kind of information they need about their quality improvement initiatives. </span> </p>
</div>
<div style="clear:both;">
<p paraeid="{3130d88d-6fa0-42eb-8d14-3264349ed8ed}{214}" paraid="638300113"><span xml:lang="EN-US">Even for organizations that are working hard to assess the impact of quality, communicating impacts effectively to C-level executives is a huge challenge. The 2013 </span><span xml:lang="EN-US">ASQ <em>Global State of Quality</em></span><span xml:lang="EN-US"> report revealed that the higher people rise in an organization's leadership, the less often they receive reports about quality metrics. Only 2% of senior executives get daily quality reports, compared to 33% of front-line staff members. </span> </p>
</div>
<div style="clear:both;">
<p paraeid="{3130d88d-6fa0-42eb-8d14-3264349ed8ed}{239}" paraid="1761127251"><span xml:lang="EN-US">So why do so many leaders get so few reports about their quality programs? Scattered, and inaccessible project data makes it difficult to piece together the full picture of quality </span><span xml:lang="EN-US">initiatives and their impact </span><span xml:lang="EN-US">in a company. Because an array of applications </span><span xml:lang="EN-US">are</span><span xml:lang="EN-US"> often used to create charts, process maps, </span><a href="http://blog.minitab.com/blog/understanding-statistics/four-more-tips-for-making-the-most-of-value-stream-maps" rel="noreferrer" target="_blank"><span xml:lang="EN-US">value stream maps</span></a><span xml:lang="EN-US">, and other documents, it can be very time consuming to keep track of multiple versions of a document </span><span xml:lang="EN-US">and</span> <span xml:lang="EN-US">keep the official project records current</span><span xml:lang="EN-US"> and accessible to all key stakeholders</span><span xml:lang="EN-US">.</span> </p>
</div>
<div style="clear:both;">
<p paraeid="{3130d88d-6fa0-42eb-8d14-3264349ed8ed}{245}" paraid="1358980612"><span xml:lang="EN-US">On top of the difficulty of piecing together data from multiple applications, inconsistent metrics across projects can make it impossible to evaluate results in an equivalent manner. And even when organizations try quality tracking methods, such as homegrown project databases or even full-featured PPM systems, these systems become a burden to maintain or end up not effectively supporting the needs of continuous quality improvement methods like Lean and Six Sigma.</span> </p>
</div>
<div style="clear:both;">
<strong><span xml:lang="EN-US">Overcoming Limited Visibility</span> </strong>
</div>
<div style="clear:both;">
<p paraeid="{b9bb73d8-2e46-4c12-8e24-5e2b2920658d}{24}" paraid="1545941106"><span xml:lang="EN-US">Are there ways to overcome the limited visibility stakeholders have into their company's quality initiatives? </span><span xml:lang="EN-US">For</span><span xml:lang="EN-US"> successful strategic initiatives, it has been identified that planning and good communication are drivers for success. These drivers also positively impact successful continuous improvement projects.</span></p>
</div>
<div>
<p paraeid="{b9bb73d8-2e46-4c12-8e24-5e2b2920658d}{78}" paraid="2124158710" style="clear: both; margin-left: 40px;"><span xml:lang="EN-US"><strong>1. Ensure efficiency</strong>. </span><a href="http://www.minitab.com/products/companion/" rel="noreferrer" target="_blank"><span xml:lang="EN-US">Utilize a complete platform for managin</span><span xml:lang="EN-US">g</span><span xml:lang="EN-US"> your continuous improvement program to reduce inefficiencies</span></a><span xml:lang="EN-US">. Using one platform to track </span><span xml:lang="EN-US">milestones, KPIs, and documents address</span><span xml:lang="EN-US">es</span><span xml:lang="EN-US"> redundancies </span><span xml:lang="EN-US">of gathering key metrics </span><span xml:lang="EN-US">from various sources </span><span xml:lang="EN-US">needed to report on projects</span><span xml:lang="EN-US">,</span> <span xml:lang="EN-US">saving </span><span xml:lang="EN-US">teams </span><span xml:lang="EN-US">hours of valuable time</span><span xml:lang="EN-US">. Looking past the current project at hand, one platfor</span><span xml:lang="EN-US">m can also make it easy to quickly replicate processes such as roadmaps and templates that</span><span xml:lang="EN-US"> were useful </span><span xml:lang="EN-US">in previous</span> <span xml:lang="EN-US">quality initiatives</span><span xml:lang="EN-US">.</span> </p>
<p paraeid="{b9bb73d8-2e46-4c12-8e24-5e2b2920658d}{99}" paraid="258607942" style="clear: both; margin-left: 40px;"><span xml:lang="EN-US"><strong>2. Aim for consistency. </strong>Centralize your storage by making all relevant documents accessible to all team members and stakeholders. </span><span xml:lang="EN-US">As teams grow and projects become more complex, the benefit of having all </span><span xml:lang="EN-US">team</span> <span xml:lang="EN-US">members align</span><span xml:lang="EN-US">ed</span><span xml:lang="EN-US"> can prevent confusion and reduce the number of back and forth emails that tend</span><span xml:lang="EN-US"> to happen. </span> </p>
<p paraeid="{b9bb73d8-2e46-4c12-8e24-5e2b2920658d}{130}" paraid="1896462313" style="clear: both; margin-left: 40px;"><span xml:lang="EN-US"><strong>3. Real-time visibility for all.</strong> </span><span xml:lang="EN-US">Visibility </span><span xml:lang="EN-US">in</span><span xml:lang="EN-US">to </span><span xml:lang="EN-US">the progress of your quality </span><span xml:lang="EN-US">project </span><span xml:lang="EN-US">facilitates the day</span><span xml:lang="EN-US">-</span><span xml:lang="EN-US">to</span><span xml:lang="EN-US">-</span><span xml:lang="EN-US">day management </span><span xml:lang="EN-US">of </span><span xml:lang="EN-US">tracking results and address</span><span xml:lang="EN-US">ing</span><span xml:lang="EN-US"> any challenges.</span> <span xml:lang="EN-US">Utilize dashboards to provide a quick "snapshot" of your project's progress. Cloud-based capabilit</span><span xml:lang="EN-US">ies</span><span xml:lang="EN-US"> takes your dashboard to the next level—instantly communicating real-time results. </span> </p>
<div style="clear:both;">
<strong><span xml:lang="EN-US">Drive for Excellence</span> </strong>
</div>
<div style="clear:both;">
<p paraeid="{b9bb73d8-2e46-4c12-8e24-5e2b2920658d}{162}" paraid="1739063668"><span xml:lang="EN-US">For quality professionals and leaders, the challenge is to make sure that reporting on results becomes a critical step in each project and that all projects are using consistent metrics that are easily accessible. Teams that can do this will find </span><span xml:lang="EN-US">reporting on their</span> <span xml:lang="EN-US">results </span><span xml:lang="EN-US">a manageable task</span><span xml:lang="EN-US">—</span><span xml:lang="EN-US">facilitating</span> <span xml:lang="EN-US">the needed </span><span xml:lang="EN-US">visibility to all key stakeholders </span><span xml:lang="EN-US">that's </span><span xml:lang="EN-US">necessary for</span><span xml:lang="EN-US"> leadership buy-in. </span> </p>
</div>
</div>
<div style="clear:both;">
<p paraeid="{b9bb73d8-2e46-4c12-8e24-5e2b2920658d}{208}" paraid="140150838"><span xml:lang="EN-US">The efficiency, consistency</span><span xml:lang="EN-US">,</span><span xml:lang="EN-US"> and visibility needed to </span><span xml:lang="EN-US">successfully </span><span xml:lang="EN-US">manage quality projects is a common challenge we hear from our </span><span xml:lang="EN-US">customers.</span> <span xml:lang="EN-US">In fact, </span><span xml:lang="EN-US">our</span> <a href="http://www.minitab.com/products/companion/" rel="noreferrer" target="_blank"><span xml:lang="EN-US">Companion by Minitab</span></a><span xml:lang="EN-US"> software </span><span xml:lang="EN-US">was created to </span><span xml:lang="EN-US">tackle</span><span xml:lang="EN-US"> th</span><span xml:lang="EN-US">ese</span><span xml:lang="EN-US"> issue</span><span xml:lang="EN-US">s</span><span xml:lang="EN-US"> head-on</span><span xml:lang="EN-US"> based on</span><span xml:lang="EN-US"> our</span> <span xml:lang="EN-US">customer </span><span xml:lang="EN-US">feedback</span><span xml:lang="EN-US">. For executives, managers, and stakeholders, Companion delivers unprecedented and unparalleled insight into the progress, performance, and bottom-line impact of the organization’s entire quality initiative, or any individual piece of it. </span><span xml:lang="EN-US"><a href="http://www.minitab.com/products/companion/" rel="noreferrer" target="_blank">Learn more about Companion by Minitab, and try it free for 30 days</a>.</span> </p>
</div>
<p></p>
Tue, 19 Sep 2017 01:39:36 +0000http://blog.minitab.com/blog/statistics-in-the-field/3-ways-to-gain-buy-in-for-continuous-improvementGuest BloggerHow to Avoid Overfitting Your Regression Model
http://blog.minitab.com/blog/understanding-statistics/how-to-avoid-overfitting-your-regression-model
<p>Overfitting a model is a real problem you need to beware of when performing regression analysis. An overfit model result in misleading <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-interpret-regression-analysis-results-p-values-and-coefficients" target="_blank">regression coefficients, p-values</a>, and <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-how-do-i-interpret-r-squared-and-assess-the-goodness-of-fit" target="_blank">R-squared</a> statistics. Nobody wants that, so let's examine what overfit models are, and how to avoid falling into the overfitting trap.</p>
<p>Put simply, an overfit model is too complex for the data you're analyzing. Rather than reflecting the entire population, an overfit regression model is perfectly suited to the noise, anomalies, and random features of the specific sample you've collected. When that happens, the overfit model is unlikely to fit another random sample drawn from the same population, which would have its own quirks.</p>
<p>A good model should fit not just the sample you have, but any new samples you collect from the same population. </p>
<p>For an example of the dangers of overfitting regression models, take a look at this fitted line plot:</p>
<img alt="Example of an overfit regression model" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/a284ba0ea6c3bf8f6dcec4e7a9d5f1f2/overfitlineplotnoequ.gif" style="width: 576px; height: 384px;" />
<div>
<p>Even though this model looks like it explains a lot of variation in the response, it's too complicated for this sample data. In the population, there is no true relationship between the predictor and this response, as is explained in detail <a href="http://blog.minitab.com/blog/adventures-in-statistics/multiple-regession-analysis-use-adjusted-r-squared-and-predicted-r-squared-to-include-the-correct-number-of-variables" target="_blank">here.</a></p>
Basics of Inferential Statistics
<p>For more insight into the problems with overfitting, let's review a basic concept of inferential statistics, in which we try to draw conclusions about a population from a random sample. The sample data is used to provide unbiased estimates of population parameters and relationships, and also in testing hypotheses about the population.</p>
<p>In inferential statistics, the size of your sample affects the amount of information you can glean about the population. If you want to learn more, you need larger sample sizes. Trying to wrest too much information from a small sample isn't going to work very well.</p>
<p>For example, with a sample size of 20, you could probably get a good estimate of a single population mean. But estimating two population means with a total sample size of 20 is a riskier proposition. If you want to estimate three or more population means with that same sample, any conclusions you draw are going to be pretty sketchy. </p>
<p>In other words, trying to learn too much from a sample leads to results that aren't as reliable as we'd like. In this example, as the observations per parameter decreases from 20 to 10 to 6.7 and beyond, the parameter estimates will become more unreliable. A new sample would likely yield different parameter estimates.</p>
How Sample Size Relates to an Overfit Model
<p>Similarly, overfitting a regression model results from trying to estimate too many parameters from too small a sample. In regression, a single sample is used to estimate the coefficients for <em>all</em> of the terms in the model. That includes every predictor, interaction, and polynomial term. As a result, the number of terms your can safely accommodate depends on the size of your sample. </p>
<p>Larger samples permit more complex models, so if the question or process you're investigating is very complicated, you'll need a sample size large enough to support that complexity. With an inadequate sample size, your model won't be trustworthy.</p>
<p>So your sample needs enough observations for each term. In multiple linear regression, 10-15 observations per term is a good rule of thumb. A model with two predictors and an interaction, therefore, would require 30 to 45 observations—perhaps more if you have high multicollinearity or a small effect size. </p>
Avoiding Overfit Models
<p>You can detect overfit through cross-validation—determining how well your model fits new observations. Partitioning your data is one way to assess how the model fits observations that weren't used to estimate the model.</p>
<p>For linear models, <a href="http://www.minitab.com/en-us/products/minitab/" target="_blank">Minitab</a> calculates <a href="http://blog.minitab.com/blog/adventures-in-statistics/multiple-regession-analysis-use-adjusted-r-squared-and-predicted-r-squared-to-include-the-correct-number-of-variables" target="_blank">predicted R-squared</a>, a cross-validation method that doesn't require a separate sample. To calculate predicted R-squared, Minitab systematically removes each observation from the data set, estimates the regression equation, and determines how well the model predicts the removed observation.</p>
<p>A model that performs poorly at predicting the removed observations probably conforms to the specific data points in the sample, and can't be generalized to the full population. </p>
<p>The best solution to an overfitting problem is avoidance. Identify the important variables and think about the model that you are likely to specify, then plan ahead to collect a sample large enough handle all predictors, interactions, and polynomial terms your response variable might require. </p>
<p>Jim Frost discusses offers some good advice about selecting a model in <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-choose-the-best-regression-model">How to Choose the Best Regression Model</a>. Also, check out his post about how <a href="http://blog.minitab.com/blog/adventures-in-statistics/beware-of-phantom-degrees-of-freedom-that-haunt-your-regression-models" target="_blank">too many phantom degrees of freedom</a> can lead to overfitting, too. </p>
</div>
Regression AnalysisStatisticsThu, 31 Aug 2017 13:57:00 +0000http://blog.minitab.com/blog/understanding-statistics/how-to-avoid-overfitting-your-regression-modelEston MartzThe Easiest Way to Do Multiple Regression Analysis
http://blog.minitab.com/blog/understanding-statistics/the-easiest-way-to-do-multiple-regression-analysis
<p>Maybe you're just getting started with analyzing data. Maybe you're reasonably knowledgeable about statistics, but it's been a long time since you did a particular analysis and you feel a little bit rusty. In either case, the <a href="http://www.minitab.com/en-us/products/minitab/assistant/" target="_blank">Assistant menu</a> in Minitab Statistical Software gives you an interactive guide from start to finish. It will help you choose the right tool quickly, analyze your data properly, and even interpret the results appropriately. </p>
<p>One type of analysis many practitioners struggle with is multiple regression analysis, particularly an analysis that aims to optimize a response by finding the best levels for different variables. In this post, we'll use the Assistant to complete a multiple regression analysis and optimize the response.</p>
Identifying the Right Type of Regression
<p>In our example, we'll use a <a href="https://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/File/73fbd67dcfeb8300ce5855cceea6339d/heatflux2.MTW">data set</a> based on some solar energy research. Scientists found the position of focal points could be used to predict total heat flux. The goal of our analysis will be to use the Assistant to find the ideal position for these focal points. </p>
<p>When you select <strong>Assistant > Regression </strong>in Minitab, the software presents you with an interactive decision tree. If you need more explanation about a decision point, just click on the diamonds to see detailed information and examples.</p>
<p><img alt="Minitab's Assistant menu interactive decision tree" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/7d815f53a9ce234f845857081eb4737a/asstmenuregressionguide_w640.gif" style="width: 640px; height: 482px;" /></p>
<p>This data set has three X variables, or predictors, and we're looking to fit a model and optimize the response. For this goal, the tree leads to the Optimize Response button located at the bottom right. Clicking that button brings up a simple dialog box to complete.</p>
<p>HeatFlux is the response variable. The X variables are the focal points located in each direction, East, West, North, and South. Based on previous knowledge, we know we should use 234 as the target heat flux value of 234, but we could also ask the Assistant to maximize or minimize the response. Because we checked the box labeled "Fit 2-way interactions and quadratic terms," the Assistant also will check for curvature and interactions.</p>
<p><img alt="Minitab's Assistant menu dialog box" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/ad64420b4b38e63f03ec8962a8b4bfdb/asst_dialog.gif" style="width: 539px; height: 436px;" /></p>
<p>When we press "OK," the Assistant quickly generates a regression model for the X variables using <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-smackdown-stepwise-versus-best-subsets" target="_blank">stepwise regression</a>. It presents the results in a series of reports written in plain, easy-to-follow language. </p>
Summary Report
<p><img alt="Multiple regression summary report for Minitab's Assistant" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f28d0c08d0b880903cfbf8ecc0daedfb/multiple_regression_for_heatflux___summary_report_w640.png" style="width: 640px; height: 480px;" /></p>
<p>This Summary Report delivers the "big picture" about the analysis and its results. With a p-value less than 0.001, this report shows that the regression model is statistically significant, with an <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-how-do-i-interpret-r-squared-and-assess-the-goodness-of-fit">R-squared value</a> of 96.15%! The comments window shows which X variables the model includes: East, South, and North, as well as interaction terms. To <a href="http://blog.minitab.com/blog/adventures-in-statistics/curve-fitting-with-linear-and-nonlinear-regression">model curvature</a>, the model also includes several polynomial terms.</p>
Effects Report
<p><img alt="Effects report for Minitab's Assistant menu" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/d5a072b05e09ca463a7de37f0d828b98/multiple_regression_for_heatflux___effects_report_w640.gif" style="width: 640px; height: 480px;" /></p>
<p>The Effects Report shows all of the interaction and main effects included in the model. The presence of curved lines indicates the Assistant used a polynomial term to fit a curve.</p>
<p>In this report, the East*South interaction is significant. This means the effect of one variable on heat flux varies based on the other variable. If South has a low setting (31.84), heat flux is reduced by increasing East. But if South is set high (40.55), the heat flux increases as East gets higher.</p>
Diagnostic Report
<p><img alt="Multiple regression diagnostic report for Minitab's Assistant" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/a433b435aa412515d81a8d01fb6b09ff/report3_diagnostic_w640.gif" style="width: 640px; height: 479px;" /></p>
<p>The Diagnostic Report shows you the plot of <a href="http://blog.minitab.com/blog/adventures-in-statistics/why-you-need-to-check-your-residual-plots-for-regression-analysis" target="_blank">residuals</a> versus fitted values, and indicates any unusual points that ought to be investigated. This report has flagged two points, but these are not necessarily problematic, since based on the criteria for large residuals we'd expect roughly 5% of the observations to be flagged. The report also identifies two points that had unusual X values; clicking the points reveals which worksheet row they are in.</p>
Model Building Report
<p><img alt="Multiple regression model building report for Minitab's Assistant" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/d98aa49eb97ef8cfffc920bae32fbdd9/report4_modelbuilding_w640.gif" style="width: 640px; height: 479px;" /></p>
<p>The Model Building Report details how the Assistant arrived at the final regression model. It also contains the regression equation, identifies the variables that contribute the most information, and indicates whether the X variables are correlated. In this model, North contributes the most information. Even though East is not significant, since it is part of a higher-order term the Assistant includes it.</p>
<p>This is a good opportunity to point out how The Assistant helps ensure that an analysis is done in the best way. For example, the Assistant uses standardized X variables to create the regression model. That's because <a href="http://blog.minitab.com/blog/adventures-in-statistics/what-are-the-effects-of-multicollinearity-and-when-can-i-ignore-them" target="_blank">standardizing the X variables removes most of the correlation</a> between linear and higher-order terms, which reduces the chance of adding these terms to your model if they aren't needed. However, the Assistant still displays the final model in natural (unstandardized) units.</p>
Prediction and Optimization Report
<p><img alt="Multiple regression prediction and optmization report for Minitab's Assistant" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/beebc6c3460b0ea96ee4c7f93bc0891d/report5_prediction_w640.gif" style="width: 640px; height: 479px;" /></p>
<p>The Assistant's Prediction and Optimization Report provides solutions for obtaining the targeted heat flux value of 234. The optimal settings for the focal points have been identified as East 37.82, South 31.84, and North 16.01. The model predicts that these settings will deliver a heat flux of 234, with a prediction interval of 216 to 252. But the Assistant provides alternate solutions you may want to consider, particularly in cases where specialized subject area expertise might be critical.</p>
Report Card
<p><img alt="Multiple regression report card for Minitab's Assistant" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/3bc83fb6b4c620c40ecc1a3cbc97dbb8/multiple_regression_for_heatflux___report_card_w640.png" style="width: 640px; height: 480px;" /></p>
<p>Finally, the Report Card prevents you from missing potential problems that could make your results unreliable. In this case, the report suggests collecting a larger sample and investigating the unusual residuals. It also shows that normality is not an issue for these data. Finally, it provides a helpful reminder to validate the model's optimal values by doing confirmation runs.</p>
<p>The Assistant's methods are based on established statistical practice, guidelines in the literature, and simulations performed by Minitab's statisticians. You can read the technical white paper for <a href="http://support.minitab.com/en-us/minitab/17/Assistant_Multiple_Regression.pdf" target="_blank">Multiple Regression in the Assistant</a> if you would like all the details.</p>
<p> </p>
Data AnalysisRegression AnalysisStatistics HelpTue, 29 Aug 2017 13:57:00 +0000http://blog.minitab.com/blog/understanding-statistics/the-easiest-way-to-do-multiple-regression-analysisEston MartzControls Charts Are Good for So Much More than SPC!
http://blog.minitab.com/blog/understanding-statistics/controls-charts-are-good-for-so-much-more-than-spc
<p>Control charts take data about your process and plot it so you can distinguish between common-cause and special-cause variation. Knowing the difference is important because it permits you to address potential problems without over-controlling your process. </p>
<p>Control charts are fantastic for assessing the stability of a process. Is the process mean unstable, too low, or too high? Is observed variability a natural part of the process, or could it be caused by specific sources? By answering these questions, control charts let you dedicate your actions to where you can make the most impact.</p>
<p>Assessing whether your process is stable is valuable in itself, but it is also a necessary first step in <a href="http://blog.minitab.com/blog/understanding-statistics/i-think-i-can-i-know-i-can-a-high-level-overview-of-process-capability-analysis" target="_blank">capability analysis</a>. Your process has to be stable before you can measure its capability. You can predict the performance of a stable process and therefore improve its capability. If your process is unstable, by definition it is unpredictable.</p>
<p>Control charts are commonly applied to business processes, but they have great benefits beyond Six Sigma and statistical process control (SPC). In fact, control charts can reveal information that would otherwise be very difficult to uncover.</p>
Other Processes That Need to Be In Control
<p>Let's consider processes beyond those we encounter in business. Instability and excessive variation can cause problems in many other kinds of processes. </p>
<ul>
<li>A test process that causes subjects to experience an impact of 6 times their body weight.</li>
<li>A teacher's process to help students learn. the material as measured by test scores.</li>
<li><a href="http://blog.minitab.com/blog/real-world-quality-improvement/control-charts-keep-blood-sugar-in-check" target="_blank">A diabetic's process for maintaining blood sugar levels</a>.</li>
</ul>
<p>The first example stems from a colleague's <a href="http://blog.minitab.com/blog/adventures-in-statistics/quality-improvement-controlling-variability-more-difficult-than-the-mean" target="_blank">research.</a> The researchers had middle-school students jump 30 times from 24-inch steps every other school day to see if it increased their bone density. Treatment was defined as the subjects experiencing an impact of 6 body weights, but the research team didn't quite hit the mark.</p>
<p>My colleague conducted a pilot study and graphed the results in an Xbar-S chart.</p>
<p><img alt="Xbar-S chart of ground reaction forces for pilot study" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/e721bd172aa55d5ec9976e81990f1293/xbars_grf_w1024.jpeg" style="width: 576px; height: 384px;" /></p>
<p>The fact that the S chart (on the bottom) is in control means each subject has a consistent landing style with impacts of a consistent magnitude—the variability is in control.</p>
<p>But the Xbar chart (at the top) is clearly out of control, indicating that even though the overall mean (6.141) exceeds the target, individual subjects have very different means. Some are consistently hard landers while others are consistently soft landers. The control chart suggests that the variability is not natural process variation (common cause) but rather due to differences among the participants (special cause variation).</p>
<p>The researchers addressed this by training the subjects how to land. They also had a nurse observe all future jumping sessions. These actions reduced the variability to the point that impacts were consistently greater than 6 body weights.</p>
Control Charts as a Prerequisite for Statistical Hypothesis Tests
<p>Control charts can verify that a process is stable, as required for capability analysis. But control charts can be used similarly to test assumptions for <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-why-we-need-to-use-hypothesis-tests-in-statistics" target="_blank">hypothesis tests</a>.</p>
<p>Specifically, the measurements used in a hypothesis test are assumed to be stable, though this assumption is often overlooked. This assumption parallels the requirement for stability in capability analysis: if your measurements are not stable, inferences based on those measurements will not be reliable.</p>
<p>Let’s assume that we’re comparing test scores between group A and group B. We’ll use this <a href="//cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/File/6053477fc294de59d5b3837389daab3a/groupcomparison.MTW">data set</a> to perform a 2-sample t-test as shown below.</p>
<p style="margin-left: 40px;"><img alt="two sample t-test results" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/95d19923cf0680676324db57e3df0ef7/two_sample_t_test_output.png" style="width: 355px; height: 555px;" /></p>
<p>The results indicate that group A has a higher mean and that the difference is statistically significant. We’re not assuming equal variances, so it's not a problem that Group B has a slightly higher standard deviation. We also have enough observations per group that normality is not a concern. Concluding that group A has a higher mean than group B seems safe. </p>
<p>But wait a minute...let's look at each group in an I-MR chart. </p>
<p><img alt="I-MR chart for group A" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/cef240bbb760bb6760ddcbc33e446be9/imr_a.png" style="width: 576px; height: 384px;" /></p>
<p><img alt="I-MR chart of group B" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/e4bd53da7831826959be94540b7ab0a2/imr_b.png" style="width: 576px; height: 384px;" /></p>
<p>Group A's chart shows stable scores. But group B's chart indicates that the scores are unstable, with multiple out-of-control points and a clear negative trend. Even though these data satisfy the other assumptions, we can make a valid comparison between stable and an unstable groups! </p>
<p>This is not the only type of problem you can detect with control charts. They also can test for a variety of patterns in your data, and for out-of-control variability.</p>
Different Types of Control Charts
<p>An I-MR chart can assess process stability when your data don’t have subgroups. The XBar-S chart, the first one in this post, assesses process stability when your data does have<em> </em>subgroups.</p>
<p>Other control charts are ideal for other types of data. For example, the U Chart and Laney U’ Chart use the Poisson distribution. The P Chart and Laney P’ Chart use the binomial distribution. </p>
<p>In <a href="http://www.minitab.com/en-us/products/minitab/" target="_blank" title="Minitab 16 Statistical Software">Minitab Statistical Software</a>, you can get step-by-step guidance in control chart selection by going to <strong>Assistant > Control Charts</strong>. The Assistant will help you with everything from determining your data type, to ensuring it meets assumptions, to interpreting your results.</p>
Control ChartsThu, 24 Aug 2017 13:59:00 +0000http://blog.minitab.com/blog/understanding-statistics/controls-charts-are-good-for-so-much-more-than-spcEston MartzWhat's the Difference between Confidence, Prediction, and Tolerance Intervals?
http://blog.minitab.com/blog/understanding-statistics/whats-the-difference-between-confidence-prediction-and-tolerance-intervals
<p>In statistics, as in life, absolute certainty is rare. That's why statisticians often can't provide a result that is as specific as we might like; instead, they provide the results of an analysis as a range, within which the data suggest the true answer lies.</p>
<p>Most of us are familiar with "confidence intervals," but that's just of several different kinds of intervals we can use to characterize the results of an analysis. Sometimes, confidence intervals are not the best option. Let's look at the characteristics of some different types of intervals, and consider when and where they should be used. Specifically, we'll look at confidence intervals, prediction intervals, and tolerance intervals. </p>
An Overview of Confidence Intervals
<p><img alt="Illustration of confidence level for confidence intervals" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/a9bd1376510c8289a0daf15f5bcd376f/ci.gif" style="float: right; width: 327px; height: 224px;" />A confidence interval refers to a range of values that is likely to contain the value of an unknown population parameter, such as the mean, based on data sampled from that population.</p>
<p>Collected randomly, two samples from a given population are unlikely to have identical confidence intervals. But if the population is sampled again and again, a certain percentage of those confidence intervals will contain the unknown population parameter. The percentage of these confidence intervals that contain this parameter is the confidence level of the interval.</p>
<p>Confidence intervals are most frequently used to express the population mean or standard deviation, but they also can be calculated for proportions, regression coefficients, occurrence rates (Poisson), and for the differences between populations in hypothesis tests.</p>
<p>If we measured the life of a random sample of light bulbs and Minitab calculates 1230 - 1265 hours as the 95% confidence interval, that means we can be 95% confident the mean for the population of bulbs falls between 1230 and 1265 hours.</p>
<p>In relation to the parameter of interest, confidence intervals only assess sampling error—the inherent error in estimating a population characteristic from a sample. Larger sample sizes will decrease the sampling error, and result in smaller (narrower) confidence intervals. If you could sample the entire population, the confidence interval would have a width of 0: there would be no sampling error, since you have obtained the actual parameter for the entire population! </p>
<p>In addition, confidence intervals only provide information about the mean, standard deviation, or whatever your parameter of interest happens to be. It tells you nothing about how the individual values are distributed.</p>
<p>What does that mean in practical terms? It means that the confidence interval has some serious limitations. In this example, we can be 95% confident that the mean of the light bulbs will fall between 1230 and 1265 hours. But that 95% confidence interval does not indicate that 95% of the bulbs will fall in that range. To draw a conclusion like that requires a different type of interval...</p>
An Overview of Prediction Intervals
<p>A prediction interval is a confidence interval for <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-predict-with-minitab-using-bmi-to-predict-the-body-fat-percentage-part-1" target="_blank">predictions</a> derived from <a href="http://blog.minitab.com/blog/adventures-in-statistics/linear-or-nonlinear-regression-that-is-the-question" target="_blank">linear and nonlinear regression models</a>. There are two types of prediction intervals.</p>
Confidence interval of the prediction
<p>Given specified settings of the predictors in a model, the confidence interval of the prediction is a range likely to contain the mean response. Like regular confidence intervals, the confidence interval of the prediction represents a range for the mean, not the distribution of individual data points.</p>
<p>With respect to the light bulbs, we could test how different manufacturing techniques (Slow or Quick) and filaments (A or B) affect bulb life. After fitting a model, we can use <a href="http://www.minitab.com/products/minitab">statistical software</a> to forecast the life of bulbs made using filament A under the Quick method.</p>
<p>If the confidence interval of the prediction is 1400–1450 hours, we can be 95% confident that the <em>mean </em>life for bulbs made under those conditions falls within that range. However, this interval doesn't tell us anything about how the lives of <em>individual </em>bulbs are distributed. </p>
Prediction interval
<p>A prediction interval is a range that is likely to contain the response value of an individual new observation under specified settings of your predictors.</p>
<p>If Minitab calculates a prediction interval of 1350–1500 hours for a bulb produced under the conditions described above, we can be 95% confident that the lifetime of a new bulb produced with those settings will fall within that range.</p>
<p>You'll note the prediction interval is wider than the confidence interval of the prediction. This will always be true, because additional uncertainty is involved when we want to predict a single response rather than a mean response.</p>
An Overview of Tolerance Intervals
<p>A tolerance interval is a range likely to contain a defined proportion of a population. To calculate tolerance intervals, you must stipulate the proportion of the population and the desired confidence level—the probability that the named proportion is actually included in the interval. This is easier to understand when you look at an example.</p>
Tolerance interval example
<p>To assess how long their bulbs last, the light bulb company samples 100 bulbs randomly and records how long they last in <a href="//cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/File/c4ab0558e6b5c4e7f6b759528067d9d0/lightbulb.MTW">this worksheet</a>.</p>
<p>To use this data to calculate tolerance intervals, go to <strong>Stat > Quality Tools > Tolerance Intervals </strong>in Minitab. (If you don't already have it, download the <a href="http://www.minitab.com/products/minitab/free-trial/">free 30-day trial of Minitab</a> and follow along!) Under <strong>Data</strong>, choose <em>Samples in columns</em>. In the text box, enter <em>Hours</em>. Then click <strong>OK</strong>. </p>
<p style="margin-left: 40px;"><img alt="Example of a tolerance interval" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/dd1bef8ea49f03e5362ef705a0a43107/ti.gif" style="width: 576px; height: 384px;" /></p>
<p>The normality test indicates that these data follow the normal distribution, so we can use the Normal interval (1060 1435). The bulb company can be 95% confident that at least 95% of all bulbs will last between 1060 to 1435 hours. </p>
How tolerance intervals compare to confidence intervals
<p>As we mentioned earlier, the width of a confidence interval depends entirely on sampling error. The closer the sample comes to including the entire population, the smaller the width of the confidence interval, until it approaches zero.</p>
<p>But a tolerance interval's width is based not only on sampling error, but also variance in the population. As the sample size approaches the entire population, the sampling error diminishes and the estimated percentiles approach the true population percentiles.</p>
<p>Minitab calculates the data values that correspond to the estimated 2.5th and 97.5th percentiles (97.5 - 2.5 = 95) to determine the interval in which 95% of the population falls. You can get more details about percentiles and population proportions <a href="http://blog.minitab.com/blog/adventures-in-statistics/the-graphical-benefits-of-identifying-the-distribution-of-your-data" target="_blank">here</a> for more information about percentiles and population proportions.</p>
<p>Of course, because we are using a sample, the percentile estimates will have error. Since we can't say that a tolerance interval truly contains the specified proportion with 100% confidence, tolerance intervals have a confidence level, too.</p>
How tolerance intervals are used
<p>Tolerance intervals are very useful when you want to predict a range of likely outcomes based on sampled data.</p>
<p>In quality improvement, practitioners generally require that a process output (such as the life of a light bulb) falls within spec limits. By comparing client requirements to tolerance limits that cover a specified proportion of the population, tolerance intervals can detect excessive variation. A tolerance interval wider than the client's requirements may indicate that product variation is too high.</p>
<p><a href="http://it.minitab.com/en-us/products/minitab/free-trial.aspx">Minitab statistical software</a> makes obtaining these intervals easy, regardless of which one you need to use for your data.</p>
StatisticsStatistics HelpTue, 22 Aug 2017 13:58:00 +0000http://blog.minitab.com/blog/understanding-statistics/whats-the-difference-between-confidence-prediction-and-tolerance-intervalsEston MartzFlight of the Chickens: A Statistical Bedtime Story, Part 2
http://blog.minitab.com/blog/statistics-in-the-field/flight-of-the-chickens-a-statistical-bedtime-story-part-2
<p><em>by Matthew Barsalou, guest blogger</em></p>
<p>At the end of <a href="http://blog.minitab.com/blog/statistics-in-the-field/flight-of-the-chickens-a-statistical-bedtime-story-part-1">the first part of this story</a>, a group of evil trouble-making chickens had convinced all of their fellow chickens to march on the walled city of Wetzlar, where, said the evil chickens, they all would be much happier than they were on the farm. </p>
The chickens marched through the night and arrived at Wetzlar on the Lahn as the sun came up. “Let us in!” demanded the chickens.
<p><img alt="https://upload.wikimedia.org/wikipedia/commons/a/aa/Wetzlarskyline.jpg" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/3dd478e1d22595f46e61e30633cc1cc8/image015.jpg" style="text-align: -webkit-center; border-width: 0px; border-style: solid; width: 603px; height: 452px;" /></p>
<p align="center" style="font-size:9px">By Krusto - self made by Krusto, CC BY 2.0 de, https://commons.wikimedia.org/w/index.php?curid=1712222</p>
<p>"No," said the Swan of the Lahn, the ruler of Wetzlar.</p>
<p>The chickens spent the day trying to force open the gates of Wetzlar. One chicken snuck off to meet with a goose known for dealing in antiques such as lamps, chairs, and main battle tanks. The chicken returned by early evening driving a slightly used T-55 tank.</p>
<p><img alt="https://upload.wikimedia.org/wikipedia/commons/thumb/6/68/T-54A_Panzermuseum_Thun.jpg/1280px-T-54A_Panzermuseum_Thun.jpg" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/b9f47d36ed923bb9a8d4957b50aaed3d/image017.jpg" style="text-align: -webkit-center; border-width: 0px; border-style: solid; width: 627px; height: 388px;" /></p>
<p style="font-size:9px">By Sandstein - Own work, CC BY 3.0, <a href="https://commons.wikimedia.org/w/index.php?curid=5069466">https://commons.wikimedia.org/w/index.php?curid=5069466</a></p>
<p>Sid, the undercover duck who'd infiltrated the flock to spy on the evil chickens for the Swan of the Lahn, realized he needed to do something, and fast. So he looked up the amount of fuel used for the distance driven for 47 T-55s, then performed a regression analysis to determine how far this one could go if it had full fuel tanks.</p>
<p>To recreate Sid's analysis in Minitab, download his <a href="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a84e66ffcbe9f854a779cdacfc914915/flightofthechickens.mtw">data set</a> (and the trial version of <a href="http://www.minitab.com/products/minitab/free-trial">Minitab 18</a>, if you need it) and go to <strong>Stat > Regression > Fit Regression Model...,</strong> and select Distance as the Response and Fuel as the Continuous predictor. Then click on Graphs and select Four in one.</p>
<p><img alt="regression dialog box" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/7897aea8776afdf207dfd87f6f5d151b/regression_dialog.png" style="width: 600px; height: 396px;" /></p>
<p>Click OK twice, and Minitab produces the following output: </p>
<p style="margin-left: 40px;"><img alt="residual plots" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/27b58becfc3954e9d298592ddbc4b0e5/image020.png" style="border-width: 0px; border-style: solid; width: 576px; height: 384px;" /></p>
<p style="margin-left: 40px;"><img alt="Regression Analysis Distance vs Fuel" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/752f79e57ba64ac2348a4f07e721f5f0/regression_output_18.png" style="width: 472px; height: 691px;" /></p>
<p>We can see that there is a statistically significant relationship between fuel used and distance traveled. The R-squared statistics indicates the amount of fuel used explains 95.28% of the variability in distance traveled. There seems to be something odd with the order of the data, as seen in the Four in One graph. The Session window output shows three unusual values, two of which have large residuals. This is an indication that these data are not perfect for a regression analysis.</p>
<p>However, better data would not matter in this case, since Sid the Duck once again investigated the wrong question. Predicting how far the tank could travel was irrelevant, given it had already arrived at the town. The real question Sid should have asked was, “Can a T-55 round penetrate the gates of Wetzlar?”</p>
<p><img alt="Gates of Wetzlar" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/5c7810ee0247cead4fd90134fc27817c/image023.jpg" style="text-align: -webkit-center; border-width: 0px; border-style: solid; width: 477px; height: 318px;" /></p>
<p style="font-size:9px"><span style="text-align: -webkit-center;">By Peter Haas /, CC BY-SA 3.0, </span><a href="https://commons.wikimedia.org/w/index.php?curid=28496059" style="text-align: -webkit-center;">https://commons.wikimedia.org/w/index.php?curid=28496059</a></p>
<p>The chicken inside the tank huffed and puffed and fired the main gun directly at the gates of Wetzlar, but the round simply bounced off. He fired again and again, but the rounds just bounced off again and again. Eventually, the T-55 broke down—as they are known to do—so the chickens gathered in force and attempted to knock the gates down by running into them.</p>
<p>But a gate that can survive a tank’s main gun round will not budge when rammed by chickens, no matter how determined they are.</p>
<p>The Swan of Wetzlar had had enough by this time, so boiling chicken soup with noodles and vegetables was poured onto the chickens. This was too much for the chickens, so they fled.</p>
<p><img alt="Chicken Noodle Soup.jpg" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/c12f89615912f1c1867393240da3c76a/image025.jpg" style="text-align: -webkit-center; border-width: 0px; border-style: solid; width: 292px; height: 219px;" /></p>
<p style="font-size:9px"><span style="text-align: -webkit-center;">Chicken Noodle Soup by </span><a href="https://commons.wikimedia.org/wiki/File:Chicken_Noodle_Soup.jpg" style="text-align: -webkit-center;">Hoyabird8</a><span style="text-align: -webkit-center;"> at English Wikipedia</span></p>
<p>Unfortunately, the road they had followed had washed out in the heavy rains so the only route home was through the Bird Mountains...the<em> terribly misnamed</em> Bird Mountains, which could more accurately be called the Hungry Foxes Everywhere Mountains. </p>
<p><img alt="Hungry Foxes Everywhere Mountains" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/99127e2feebe508d0b16544ac123f305/image027.jpg" style="text-align: -webkit-center; border-width: 0px; border-style: solid; width: 364px; height: 243px;" /></p>
<p style="font-size:9px"><span style="text-align: -webkit-center;">Von Pulv - Eigenes Werk, CC BY-SA 3.0, </span><a href="https://commons.wikimedia.org/w/index.php?curid=15979924" style="text-align: -webkit-center;">https://commons.wikimedia.org/w/index.php?curid=15979924</a></p>
<p>The chickens fled into the forests of the Bird Mountains. Knowing the dangers of these forests, the evil chickens let the other chickens lead so that they would encounter the foxes first. However, the evil chickens failed to consider that foxes are, as they say, sly as foxes. The foxes of the woefully misnamed Bird Mountains waited till the chickens were well into their territory, and then fell upon those in the rear—the evil chickens.</p>
<p>Evil chickens nonetheless taste like chicken, and the foxes feasted.</p>
<p><img alt="hungry fox" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/e81feb7a838f17127a03bb8c7774b747/image029.jpg" style="text-align: -webkit-center; border-width: 0px; border-style: solid; width: 384px; height: 276px;" /></p>
<p style="font-size:9px"><span style="text-align: -webkit-center;">By Foto: Jonn Leffmann, CC BY 3.0, </span><a href="https://commons.wikimedia.org/w/index.php?curid=21536441" style="text-align: -webkit-center;">https://commons.wikimedia.org/w/index.php?curid=21536441</a></p>
<p>Upon returning with the flock to the farm, Sid suspected the evil chickens had been decimated, so he did a survey. Originally, 647 out of the population of 1,541 chickens were evil, so he randomly sampled 175 chickens and found only 22 of these chickens were evil. Sid wanted to know if the new proportion of evil chickens was less than the older portion, so so he did a one-tailed two proportion test.</p>
<p>To do this in Minitab, go to <strong>Stat > Basic Statistics > 2 Proportions...</strong> and select Summarized data in the drop down menu. Enter 22 for the number of events and 175 for the number of trials under Sample 1 and enter 647 for the number of events and 1,541 for the number of trials under Sample 2. Click on options and select Difference < hypothesized difference...</p>
<p style="margin-left: 40px;"><img alt="2 proportions test dialog" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/5c5778bdb7360c9974fe8d8e0f059a36/2_proportions_dialog.png" style="width: 499px; height: 400px;" /></p>
<p>Then click OK twice. </p>
<p style="margin-left: 40px;"><img alt="2 proportions test output" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/e5cfd2f156bdf33a539975586475c0e0/2_proportions_output.png" style="width: 348px; height: 599px;" /></p>
<p>The resulting p-value is less than 0.05, so Sid can conclude there is a statically significant difference between the samples.</p>
<p>After returning home, the chickens were confused. Just days earlier all had been well, and suddenly they had found themselves in such an adventure. The remaining evil chickens weren't confused, since they had instigated it all. But, they were understandably upset with how things and ended and they began to argue and blame each other for the failure. Both recriminations and feathers flew.</p>
<p>Evil chickens started turning each other into the farmer which resulted in a weight gain for the farmer and an even greater reduction in the proportion of evil chickens in the flock. Sid surreptitiously arranged a few “accidents” to dispatch the remaining evil chickens.</p>
<p>The pigs eventually forgave the innocent chickens for the egg-throwing incident, and the remaining chickens lived happily ever after. The cow spent the rest of her life hoping for another dinner of eggs.</p>
<p>As for Sid, his next assignment was the infiltration of a rabbit den. How a duck disguised himself as a rabbit is a tale for another time.</p>
<p><img alt="Rabbit burrow" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/e218aafacbf66d1da87cf6fb0b2c2ddb/image033.jpg" style="text-align: -webkit-center; border-width: 0px; border-style: solid; width: 464px; height: 348px;" /></p>
<p align="center" style="font-size:9px">By Brammers - Own work, Public Domain, <a href="https://commons.wikimedia.org/w/index.php?curid=7862100">https://commons.wikimedia.org/w/index.php?curid=7862100</a></p>
<p><strong>There is a moral to this story: If you need help with statistics, call a statistician...<em>not </em>a duck.</strong></p>
<p> </p>
<p><strong>About the Guest Blogger</strong></p>
<div>
<p><em><a href="https://www.linkedin.com/pub/matthew-barsalou/5b/539/198" target="_blank">Matthew Barsalou</a> is a statistical problem resolution Master Black Belt at <a href="http://www.3k-warner.de/" target="_blank">BorgWarner</a> Turbo Systems Engineering GmbH. He is a Smarter Solutions certified Lean Six Sigma Master Black Belt, ASQ-certified Six Sigma Black Belt, quality engineer, and quality technician, and a TÜV-certified quality manager, quality management representative, and auditor. He has a bachelor of science in industrial sciences, a master of liberal studies with emphasis in international business, and has a master of science in business administration and engineering from the Wilhelm Büchner Hochschule in Darmstadt, Germany. He is author of the books </em><a href="http://www.amazon.com/Root-Cause-Analysis-Step---Step/dp/148225879X/ref=sr_1_1?ie=UTF8&qid=1416937278&sr=8-1&keywords=Root+Cause+Analysis%3A+A+Step-By-Step+Guide+to+Using+the+Right+Tool+at+the+Right+Time" target="_blank">Root Cause Analysis: A Step-By-Step Guide to Using the Right Tool at the Right Time</a><em>, </em><a href="http://asq.org/quality-press/display-item/index.html?item=H1472" target="_blank">Statistics for Six Sigma Black Belts</a><em> and </em><a href="http://asq.org/quality-press/display-item/index.html?item=H1473&xvl=76115763" target="_blank">The ASQ Pocket Guide to Statistics for Six Sigma Black Belts</a><em>.</em></p>
</div>
<div style="clear:both;"> </div>
Fun StatisticsStatisticsThu, 17 Aug 2017 13:59:00 +0000http://blog.minitab.com/blog/statistics-in-the-field/flight-of-the-chickens-a-statistical-bedtime-story-part-2Guest BloggerFlight of the Chickens: A Statistical Bedtime Story, Part 1
http://blog.minitab.com/blog/statistics-in-the-field/flight-of-the-chickens-a-statistical-bedtime-story-part-1
<p><em>by Matthew Barsalou, guest blogger</em></p>
<p>Once upon a time, in the Kingdom of Wetzlar, there was a farm with over a thousand chickens, two pigs, and a cow. The chickens were well treated, but a few rabble-rousers among them got the rest of the chickens worked up. These trouble-making chickens <em>looked </em>almost like the other chickens, but in fact they were <em>evil </em>chickens. </p>
<img alt="chickens" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/99db35580e3f59363035593e45311e89/image001.jpg" style="width: 331px; height: 248px;" />
<p style="font-size: 9px; text-align: center;"><em>By HerbertT - Eigenproduktion, CC BY-SA 3.0, <a href="https://commons.wikimedia.org/w/index.php?curid=962579">https://commons.wikimedia.org/w/index.php?curid=962579</a></em></p>
<p>Hidden among the good chickens and the evil chickens was Sid. Sid was not like other chickens. He was a secret spy for The Swan of the Lahn, who ruled Wetzlar and was concerned about the infiltration of evil chickens. Sid was also a duck. That's right, a duck disguised as a chicken. Sid knew who the evil chickens were, and sent regular reports on their activities back to Wetzler.</p>
<img alt="duck" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/29a7e9ed327615ff06708ca7d629e12b/image004.jpg" style="width: 273px; height: 182px;" />
<p style="font-size: 9px;">Mallard drake by <a href="https://commons.wikimedia.org/wiki/File:Mallard_drake_.02.jpg">Bert de tilly</a></p>
<p>One stormy and dark night, an evil chicken snuck out with an enormous basket of beautiful hand-painted eggs to throw at the two pigs and the cow. Sid snuck out into the pouring rain and took a sample of 18 of the eggs. The intrepid duck spy was familiar with a previous study of 157 eggs, which showed that the mean of those eggs was <a href="http://archive.org/stream/standarddeviatio195atwo/standarddeviatio195atwo_djvu.txt" target="_blank">57.079 grams</a> with a standard deviation of 2.30 grams. Sid was determined to find out if the mean of his current samples had a statistically significant difference from the mean of the previous study.</p>
<img alt="https://upload.wikimedia.org/wikipedia/commons/thumb/1/10/Easter_eggs_-_straw_decoration.jpg/1024px-Easter_eggs_-_straw_decoration.jpg" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ea1fe788c97d1f12d8f000fc568f0ad9/image005.jpg" style="text-align: -webkit-center; border-width: 0px; border-style: solid; width: 306px; height: 229px;" />
<p style="font-size: 9px;">By Jan Kameníček - Own work, Public Domain, <a href="https://commons.wikimedia.org/w/index.php?curid=732984" target="_blank">https://commons.wikimedia.org/w/index.php?curid=732984</a></p>
<p>If you'd like to recreate Sid's analysis, download his <a href="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a84e66ffcbe9f854a779cdacfc914915/flightofthechickens.mtw">data set</a> and, if you need it, the <a href="http://www.minitab.com/products/minitab/free-trial">free trial of Minitab</a> 18 Statistical Software. We will need to use summarized data since we only have actual values for the sample from the study and not the full data set. Go to <strong>Stat > Basic Statistics > Display Descriptive Statistics...</strong> and select the column containing the data as the Variable. Click on Graphs and select Individual value plot to view a graph of the data.</p>
<p style="margin-left: 40px;"><img alt="descriptive statistics dialog" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f67868a217eeb27025975888d27ff118/descro_tove_doa_pg.png" style="width: 527px; height: 354px;" /></p>
<p>Click OK twice and Minitab will create an individual value plot of the data and the mean and standard deviation will appear in the session window with the rest of the descriptive statistics.</p>
<p align="center"><img alt="individual value plot of eggs" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/72d17166ab9bf6fb9bd400f1fa864d02/image008.png" style="border-width: 0px; border-style: solid; width: 576px; height: 384px;" /></p>
<p> </p>
<p align="center"><img alt="Descriptive Statistics" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/b9ef39d053dfd80c026f6fdfafcb3193/descriptive_statistics_eggs.png" style="border-width: 0px; border-style: solid; width: 646px; height: 151px;" /></p>
<p> </p>
<p>We can see that the sample mean is 57.315 and the standard deviation is 2.439 so now we can perform a 2 sample t-test to compare the means by going to <strong>Stat > Basic Statistics > 2-Sample t... </strong>and selecting Summarized data in the drop down menu. Enter the sample size of 18, sample mean of 57.315 and standard deviation of 2.439 under Sample 1 and enter the sample size of 157, mean of 57.079, and the population standard deviation of 2.30 under Sample 2.</p>
<p style="margin-left: 40px;"><img alt="two-sample t test dialog" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/23cbaedfab65fc90c48259b8dbbad0e0/2_sample_t_dialog.png" style="width: 424px; height: 296px;" /></p>
<p>Then click OK.</p>
<p style="margin-left: 40px;"><img alt="t-test output" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/dfaecdaefcbd971827c75e8ad15ebd7f/t_test_output_2.png" style="border-width: 0px; border-style: solid; width: 327px; height: 555px;" /></p>
<p>The p -value is greater than 0.05 so we can conclude there is no statistically significant difference between the means of the eggs the evil chickens planned to throw and the eggs in the previous study.</p>
<p>Unfortunately, Sid made a critical mistake. The first step in an analysis is to ask the right question. Sid's statistics were correct, but he asked the wrong question: “Is the mean of the second sample different from the mean of the first sample with an alpha of 0.05?” </p>
<p>What he <em>should </em>have asked was, “What will happen when the pigs and the cow get hit by eggs?” The weight of the eggs was irrelevant; what mattered was the consequences of the pigs and cow being pummeled with eggs.</p>
<p>If Sid had prepared a report for The Swan of the Lahn that only said the eggs collected by the evil chickens weighed the same as eggs in the earlier study, the Swan would conclude that the process had not changed. But had the right question been answered, the correct conclusion would have been, “Trouble may be brewing.”</p>
<p><img alt="Swan" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/3c61f8f1ca1091cd188b9716190ed656/image013.jpg" style="text-align: -webkit-center; border-width: 0px; border-style: solid; width: 399px; height: 301px;" /></p>
<p><span style="text-align: -webkit-center;">By Dick Daniels (http://carolinabirds.org/) - Own work, CC BY-SA 3.0, </span><a href="https://commons.wikimedia.org/w/index.php?curid=11053305" style="text-align: -webkit-center;">https://commons.wikimedia.org/w/index.php?curid=11053305</a></p>
<p>Trouble did indeed result when the evil chickens put their egg-throwing plan into action. As darkness fell, first the cow and then the pigs were bombarded by egg after messy egg.</p>
<p>The cow simply ate the eggs. But the pigs, holding <em>all </em>the chickens to be responsible, were outraged. They rampaged and terrorized the poor chickens all that night. By midnight, the muddy fields were full of pig prints and feathers were ruffled in the chicken coop. </p>
<p>One of the evil chickens seized on the traumatized crowd's passions, and demanded of the others, “How can we live like this?" The evil chickens soon convinced the others that they would all be happier if they moved to the high-walled village of Wetzlar beside the Lahn River. The chickens began to march into the stormy night.</p>
<p><em><a href="http://blog.minitab.com/blog/statistics-in-the-field/flight-of-the-chickens-a-statistical-bedtime-story-part-2">Continued in Part 2</a></em></p>
<p> </p>
<p><strong>About the Guest Blogger</strong></p>
<div>
<p><em><a href="https://www.linkedin.com/pub/matthew-barsalou/5b/539/198" target="_blank">Matthew Barsalou</a> is a statistical problem resolution Master Black Belt at <a href="http://www.3k-warner.de/" target="_blank">BorgWarner</a> Turbo Systems Engineering GmbH. He is a Smarter Solutions certified Lean Six Sigma Master Black Belt, ASQ-certified Six Sigma Black Belt, quality engineer, and quality technician, and a TÜV-certified quality manager, quality management representative, and auditor. He has a bachelor of science in industrial sciences, a master of liberal studies with emphasis in international business, and has a master of science in business administration and engineering from the Wilhelm Büchner Hochschule in Darmstadt, Germany. He is author of the books </em><a href="http://www.amazon.com/Root-Cause-Analysis-Step---Step/dp/148225879X/ref=sr_1_1?ie=UTF8&qid=1416937278&sr=8-1&keywords=Root+Cause+Analysis%3A+A+Step-By-Step+Guide+to+Using+the+Right+Tool+at+the+Right+Time" target="_blank">Root Cause Analysis: A Step-By-Step Guide to Using the Right Tool at the Right Time</a><em>, </em><a href="http://asq.org/quality-press/display-item/index.html?item=H1472" target="_blank">Statistics for Six Sigma Black Belts</a><em> and </em><a href="http://asq.org/quality-press/display-item/index.html?item=H1473&xvl=76115763" target="_blank">The ASQ Pocket Guide to Statistics for Six Sigma Black Belts</a><em>.</em></p>
</div>
<div style="clear:both;"> </div>
Fun StatisticsStatisticsStatistics HelpTue, 15 Aug 2017 13:59:00 +0000http://blog.minitab.com/blog/statistics-in-the-field/flight-of-the-chickens-a-statistical-bedtime-story-part-1Guest Blogger5 More Critical Six Sigma Tools: A Quick Guide
http://blog.minitab.com/blog/understanding-statistics/5-more-critical-six-sigma-tools-a-quick-guide
<p>The Six Sigma quality improvement methodology has lasted for decades because it gets results. Companies in every country around the world, and in every industry, have used this logical, step-by-step method to improve the quality of their processes, products, and services. And they've saved billions of dollars along the way.</p>
<p>However, Six Sigma involves a good deal of statistics and data analysis, which makes many people uneasy. Individuals who are new to quality improvement often feel intimidated by the statistical aspects.</p>
<p>Don't be intimidated. Data analysis may be a critical component of improving quality, but the good news is that most of the analyses we use in Six Sigma aren't hard to understand, even if statistics isn't something you're comfortable with.</p>
<p>Just getting familiar with the tools used in Six Sigma is a good way to get started on your quality journey. In my last post, I offered a rundown of 5 tools that crop up in most Six Sigma projects. In this post, I'll review 5 more common statistical tools, and explain what they do and why they’re important in Six Sigma.</p>
1. t-Tests
<p><img alt="Pareto Chart" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/9836f7ec0e12d309f6a3472557a5f424/5_more_six_sigma_tools_t_tests.jpg" style="width: 600px; height: 395px;" /></p>
<p>We use t-tests to compare the average of a sample to a target value, or to the average of another sample. For example, a company that sells beverages in 16-oz. containers can use a 1-sample t-test to determine if the production line’s average fill is on or off target. If you buy flavored syrup from two suppliers and want to determine if there’s a difference in the average volume of their respective shipments, you can use a 2-sample t-test to compare the two suppliers. </p>
2. ANOVA
<p><img alt="ANOVA" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/56cc203b4012c25d4fa4e28fc96787f3/5_more_six_sigma_tools_anova.jpg" style="width: 600px; height: 395px;" /></p>
<p>Where t-tests compare a mean to a target, or two means to each other, ANOVA—which is short for Analysis of Variance—lets you compare more than two means. For example, ANOVA can show you if average production volumes across 3 shifts are equal. You can also use ANOVA to analyze means for more than 1 variable. For example, you can simultaneously compare the means for 3 shifts and the means for 2 manufacturing locations. </p>
3. Regression
<p><img alt="Regression" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/54e06038732315d016e9703a866d74f0/5_more_six_sigma_tools_regression.jpg" style="width: 600px; height: 395px;" /></p>
<p>Regression helps you determine whether there's a relationship between an output and one or more input factors. For instance, you can use regression to examine if there is a relationship between a company’s marketing expenditures and its sales revenue. When a relationship between the variables exists, you can use the regression equation to describe that relationship and predict future output values for given input values.</p>
4. DOE (Design of Experiments)
<p><img alt="DOE" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/558c592fd82aafe591c2d087d49bfa4c/5_more_six_sigma_tools_doe.jpg" style="width: 600px; height: 395px;" /><br />
Regression and ANOVA are most often used for data that’s already been collected. In contrast, Design of Experiments (DOE) gives you an efficient strategy for collecting your data. It permits you to change or adjust multiple factors simultaneously to identify if relationships exist between inputs and outputs. Once you collect the data and identify the important inputs, you can then use DOE to determine the optimal settings for each factor. </p>
5. Control Charts
<p><img alt="Control Charts" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/e7cd9d3ebc70528d9c617d8b3980be8f/5_more_six_sigma_tools_control_charts.jpg" style="width: 600px; height: 395px;" /></p>
<p>Every process has some natural, inherent variation, but a stable (and therefore predictable) process is a hallmark of quality products and services. It's important to know when a process goes beyond the normal, natural variation, because it can indicate a problem that needs to be resolved. A control chart distinguishes “special-cause” variation from acceptable, natural variation. These charts graph data over time and flag out-of-control data points, so you can detect unusual variability and take action when necessary. Control charts also help you ensure that you sustain process improvements into the future. </p>
<p><strong>Conclusion</strong></p>
<p>Any organization can benefit from Six Sigma projects, and those benefits <span style="background-color: rgb(246, 213, 217);">are based on </span>data analysis. However, many Six Sigma projects are completed by practitioners who are highly skilled, but not expert statisticians. But a basic understanding of common Six Sigma statistics, combined with easy-to-use statistical software, will let you handle these statistical tasks and analyze your data with confidence. </p>
Lean Six SigmaSix SigmaThu, 10 Aug 2017 13:58:00 +0000http://blog.minitab.com/blog/understanding-statistics/5-more-critical-six-sigma-tools-a-quick-guideEston Martz5 Critical Six Sigma Tools: A Quick Guide
http://blog.minitab.com/blog/understanding-statistics/5-critical-six-sigma-tools-a-quick-guide
<p>Six Sigma is a quality improvement method that businesses have used for decades—because it gets results. A Six Sigma project follows a clearly defined series of steps, and companies in every industry in every country around the world have used this method to resolve problems. Along the way, they've saved billions of dollars.</p>
<p>But Six Sigma relies heavily on statistics and data analysis, and many people new to quality improvement feel intimidated by the statistical aspects.</p>
<p>You needn't be intimidated. While it's true that data analysis is critical in improving quality, the majority of analyses in Six Sigma are not hard to understand, even if you’re not very knowledgeable about statistics.</p>
<p>Familiarizing yourself with these tools is a great place to start. This post briefly explains 5 statistical tools used in Six Sigma, what they do, and why they’re important.</p>
1. Pareto Chart
<p><img alt="Pareto Chart" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/014b0ef2847e14b49bd9d18adeb9b309/5_six_sigma_tools_pareto.jpg" style="width: 600px; height: 395px;" /></p>
<p>The Pareto Chart stems from an idea called the Pareto Principle, which asserts that about 80% of outcomes result from 20% of the causes. It's easy to think of examples even in our personal lives. For instance, you may wear 20% of your clothes 80% of the time, or listen to 20% of the music in your library 80% of the time.</p>
<p>The Pareto chart helps you visualize how this principle applies to data you've collected. It is a specialized type of bar chart designed to distinguish the “critical few” causes from the “trivial many” enabling you to focus on the most important issues. For example, if you collect data about defect types each time one occurs, a Pareto chart reveals which types are most frequent, so you can focus energy on solving the most pressing problems. </p>
2. Histogram
<p><img alt="Histogram" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/2bb10c9c739b156c8753d81b2a63cc16/5_six_sigma_tools_histogram.jpg" style="width: 600px; height: 395px;" /></p>
<p>A histogram is a graphical snapshot of numeric, continuous data. Histo­grams enable you to quickly identify the center and spread of your data. It shows you where most of the data fall, as well as the minimum and maximum values. A histogram also reveals if your data are bell-shaped or not, and can help you find unusual data points and outliers that may need further investigation. </p>
3. Gage R&R
<p><img alt="gage R&R" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/53a58036bcacc1abbbe345a171bd3cc8/5_six_sigma_tools_gage.jpg" style="width: 600px; height: 444px;" /></p>
<p>Accurate measurements are critical. Would you want to weigh yourself with a scale you know is unre­liable? Would you keep using a thermometer that never shows the right temperature? If you can't measure a process accurately, you can't improve it, which is where <span><a href="http://blog.minitab.com/blog/meredith-griffith/fundamentals-of-gage-rr">Gage R&R</a></span> comes in. This tool helps you determine if your continuous numeric measurements—such as weight, diameter, and pressure—are both repeatable and reproducible, both when the same person repeatedly measures the same part, and when different operators measure the same part.</p>
4. Attribute Agreement Analysis
<p><img alt="Attribute" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a248d1a75f744990aea0ce8414219166/5_six_sigma_tools_attribute.jpg" style="width: 600px; height: 395px;" /><br />
Another tool for making sure you can trust your data is attribute agreement analysis. Where Gage R&R assesses the reliability and reproducibility of numeric measurements, attribute agree­ment analysis assess categorical assessments, such as Pass or Fail. This tool shows whether people rating these categories agree with a known standard, with other appraisers, and with themselves. </p>
5. Process Capability
<p><img alt="Capability" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/155f21ef5bb7ddf08d0af4cce5425340/5_six_sigma_tools_capability.jpg" style="width: 600px; height: 444px;" /></p>
<p>Nearly every process has an acceptable lower and/or upper bound. For example, a supplier's parts can’t be too large or too small, wait times can’t extend beyond an acceptable threshold, fill weights need to exceed a specified minimum. Capability analysis shows you how well your process meets specifications and provides insight into how you can improve a poor process. Frequently cited capability metrics include Cpk, Ppk, defects per million opportunities (DPMO), and Sigma level. </p>
Conclusion
<p>Six Sigma can bring significant benefits to any business, but reaping those benefits requires the collection and analysis of data so you can understand opportunities for improvement and make significant and sustainable changes.</p>
<p>The success of Six Sigma projects often depends on practitioners who are highly skilled experts in many fields, but not statistics. But with a basic understanding of the most commonly used Six Sigma statistics and easy-to-use statistical software, you can handle the statistical tasks associated with improving quality, and analyze your data with confidence. </p>
<p> </p>
<p> </p>
Lean Six SigmaSix SigmaTue, 08 Aug 2017 13:58:00 +0000http://blog.minitab.com/blog/understanding-statistics/5-critical-six-sigma-tools-a-quick-guideEston MartzHow to Estimate the Probability of a No-Show using Binary Logistic Regression
http://blog.minitab.com/blog/using-data-and-statistics/how-to-estimate-the-probability-of-a-no-show-using-binary-logistic-regression
<p>In April 2017, overbooking of flight seats hit the headlines when a United Airlines customer was dragged off a flight. A TED talk by Nina Klietsch gives a good, but simplistic explanation of why overbooking is so attractive to airlines.</p>
<p></p>
<p>Overbooking is not new to the airlines; these strategies were officially sanctioned by The American Civil Aeronautics Board in 1965, and since that time complex statistical models have been researched and developed to set the ticket pricing and overbooking strategies to deliver maximum revenue to the airlines. </p>
<img alt="airline travel" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/dd1b2a0560ece3771b8dce0b31ca7b4f/airline_passenger.png" style="width: 300px; height: 198px; margin: 10px 15px; float: right;" />
<div>
<p>In this blog, I would like to look at one aspect of this: the probability of a no-show. In Klietsch’s talk, she assumed that the probability of a no-show (a customer not turning up for a flight) is identical for all customers. In reality, this is not true—factors such as time of day, price, time since booking, and whether a traveler is alone or in a group will impact the probability of a no show.</p>
<div>By using this information about our customers, we can predict the probability of a no-show using <a href="http://blog.minitab.com/blog/marilyn-wheatleys-blog/coffee-or-tea-analyzing-categorical-data-with-minitab-v2">binary logistic regression</a>. This type of modeling is common to many services and industries. Some of the applications, in addition to predicting no-shows, include:</div>
<ul>
<li>Credit scores: What is the probability of default? </li>
<li>Marketing offers: What are the chances you'll buy a product based on a specific offer?</li>
<li>Quality: What is the probability of a part failing?</li>
<li>Human resources: What is the sickness absence rate likely to be? </li>
</ul>
<p>In all cases, your outcome (the event you are predicting) is discrete and can be split into two separate groups; for example, purchase/no purchase, pass/fail, or show/no show. Using the characteristics of your customers or parts as predictors you can use this modeling technique to predict the outcome.</p>
<p><img alt="cereal purchase worksheet" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/be467ae943684f0a147b4995f4b1d1ea/cereal_purchase_data.png" style="margin: 10px 15px; float: right; border-width: 1px; border-style: solid;" /> Let’s look at an example. I was unable to find any airline data, so I am illustrating this with one of our Minitab sample data sets, <a href="http://support.minitab.com/datasets/regression-data-sets/cereal-purchases/">Cerealpurchase.mtw</a>.</p>
<p>In this example, a food company surveys consumers to measure the effectiveness of their television ad in getting viewers to buy their cereal. The Bought column has the value 1 if the respondent purchased the cereal, and the value 0 if not. In addition to asking if respondents have seen the ad, the survey also gathers data on the household income and the number of children, which the company also believes might influence the purchase of this cereal.</p>
<p>Using <strong>Stat > Regression > Binary Logistic Regression</strong>,<strong> </strong>I entered the details of the response I wanted to predict, <strong>Bought,</strong> and the value in the Response Event which indicated a purchase. I then entered the Continuous predictor, <strong>Income </strong>and the Categorical predictors <strong>Children </strong>and <strong>ViewAd. </strong>My completed dialog box looks like this: </p>
<p style="margin-left: 40px;"><img alt="binary logistic regression dialog" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a32342393b5bc4ae7c684dcced3674fa/binary_dialog.png" style="width: 574px; height: 314px; border-width: 1px; border-style: solid;" /></p>
<p>After pressing OK, Minitab performs the analysis and displays the results in the Session window. From this table at the top of the output I can see that the researchers surveyed a sample of 71 customers, of which 22 purchased the cereal.</p>
<p style="margin-left: 40px;"><img alt="response information" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/5be946269b2221d79fd670b0bd0dc99d/binary_output_1.png" style="width: 312px; height: 165px;" /></p>
<p>With Logistic regression, the output features a Deviance Table instead of an Analysis of Variance Table. The calculations and test statistics used with this type of data are different, but we still use the P-value on the far right to determine which factors have an effect on our response.</p>
<p style="margin-left: 40px;"><img alt="deviance table" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/c474f30c96355fab2a5d7a9228636b3e/deviance_table.png" style="width: 430px; height: 232px;" /></p>
<p>As we would when using other regression methods, we are going to reduce the model by eliminating non-significant terms one at a time. In this case, as highlighted above, Income is not significant. We can simply press Ctrl-E to recall the last dialog box, remove the Income term from the model, and rerun the analysis. Minitab returns the following results: </p>
<p style="margin-left: 40px;"><img alt="deviance table" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/8e02f72a6d7627468824742132cabf8e/deviance_table_2.png" style="width: 441px; height: 224px;" /></p>
<p>After removing Income, we can see that both Children and ViewAd are significant at the 0.05 significance level. This could be good news for the Marketing Department, as it clearly indicates that viewing the ad did influence the decision to buy. However from this table it is not possible to see if this effect is positive or negative.</p>
<p>To understand this, we need to look at another part of the output. In Binary Logistic Regression, we are trying to estimate the probability of an event. To do this we use the Odds Ratio, which compares the odds of two events by dividing the odds of success under condition A by the odds of success under condition B. </p>
<p style="margin-left: 40px;"><img alt="Odds Ratio" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/905f9b30cf4722086fd56cd3a84bf3b8/odds_ratio.png" style="width: 367px; height: 192px;" /></p>
<p>In this example, the Odds Ratio for Children is telling us that respondents who reported they do have children are 5.1628 times more likely to purchase the cereal than those who did not report having children. The good news for the Marketing Department is that customers who viewed the ad were 3.0219 times more likely to purchase the cereal. If the Odds Ratio was less than 1, we would conclude that seeing the advert reduces sales! </p>
<p><img alt="storage dialog" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/4626c90540439bd12cc8d214dd927cb4/binary_logistic_regression_storage.png" style="margin: 10px 15px; float: right;" /> The other way to look at these results is to calculate the probability of purchase and analyse this. </p>
<p>It is easy to calculate the probability of a sale by clicking on the <strong>Storage </strong>button in the <strong>Binary Logistic Regression </strong>dialog box and checking the box labeled <strong>Fits (event probabilities)</strong>. This will store the probability of purchase in the worksheet.</p>
<p style="margin-left: 40px;"><img alt="data with stored fits" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/16a008b51b973ea4c1f39d6960d99c2d/data.png" style="width: 359px; height: 236px;" /></p>
<p>Using the fits data, we can produce a table summarizing the Probability of Purchase for all the combinations of Children and ViewAd, as follows:</p>
<p style="margin-left: 40px;"><img alt="tabulated statistics" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/7bf3901e06faf493312623f31e8deabe/tabulated_statistics.png" style="width: 416px; height: 371px;" /></p>
<p>In the rows we have the Children indicator, and in the columns we have the ViewAd indicator. In each cell the top number is the probability of cereal purchase, and the bottom number is the count of customers observed in each of the groups. </p>
<p>Based on this table, customers with children who have seen the ad have a 51% chance of purchase, whereas customers without children who have not seen the ad have a 6% chance of purchase.</p>
<p>Now let's bring this back to our airline example. Using the information about their customers' demographics and flight preferences, an airline can use binary logistic regression to estimate the probabilities of a “no-show” for a whole plane and then determine by how much they should overbook seats. Of course, no model is perfect, and as we saw with United, getting it wrong can have severe consequences. </p>
<p> </p>
</div>
Regression AnalysisThu, 03 Aug 2017 13:57:00 +0000http://blog.minitab.com/blog/using-data-and-statistics/how-to-estimate-the-probability-of-a-no-show-using-binary-logistic-regressionGillian Groom3 Keys to Getting Reliable Data
http://blog.minitab.com/blog/understanding-statistics/3-keys-to-getting-reliable-data
<p><em>Can you trust your data? </em></p>
<p><img alt="disk" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/6e0d1a81bdbaadb7aef1d687501264d4/disk.png" style="width: 250px; height: 261px; float: right; margin: 10px 15px;" />That's the very first question we need to ask when we perform a statistical analysis. If the data's no good, it doesn't matter what statistical methods we employ, nor how much expertise we have in analyzing data. If we start with bad data, we'll end up with unreliable results. <em>Garbage in, garbage out, </em>as they say.</p>
<p>So, <em>can </em>you trust your data? Are you positive? Because, let's admit it, many of us forget to ask that question altogether, or respond too quickly and confidently.</p>
<p>You can’t just assume we have good data—you need to <em>know </em>you do. That may require a little bit more work up front, but the energy you spend getting good data will pay off in the form of better decisions and bigger improvements.</p>
<p>Here are 3 critical actions you can take to maximize your chance of getting data that will lead to correct conclusions. </p>
1: Plan How, When, and What to Measure—and Who Will Do It
<p>Failing to plan is a great way to get unreliable data. That’s because a solid plan is the key to successful data collection. Asking why you’re gathering data at the very start of a project will help you pinpoint the data you really need. A data collection plan should clarify:</p>
<ul>
<li>What data will be collected</li>
<li>Who will collect it</li>
<li>When it will be collected</li>
<li>Where it will be collected</li>
<li>How it will be collected</li>
</ul>
<p>Answering these questions in advance will put you well on your way to getting meaningful data.</p>
2: Test Your Measurement System
<p>Many quality improvement projects require measurement data for factors like weight, diameter, or length and width. Not verifying the accuracy of your measurements practically guarantees that your data—and thus your results—are not reliable.</p>
<p>A branch of statistics called <span><a href="http://blog.minitab.com/blog/adventures-in-statistics-2/three-measurement-system-analysis-questions-to-ask-before-you-take-a-single-measurement">Measurement System Analysis</a></span> lets you quickly assess and improve your measurement system so you can be sure you’re collecting data that is accurate and precise.</p>
<p>When gathering quantitative data, Gage Repeatability and Reproducibility (R&R) analysis confirms that instruments and operators are measuring parts consistently.</p>
<p>If you’re grading parts or identifying defects, an Attribute Agreement Analysis verifies that different<br />
evaluators are making judgments consistent with each other and with established standards.</p>
<p>If you do not examine your measurement system, you’re much more likely to add variation and<br />
inconsistency to your data that can wind up clouding your analysis.</p>
3: Beware of Confounding or Lurking Variables
<p>As you collect data, be careful to avoid introducing unintended and unaccounted-for variables. These “lurking” variables can make even the most carefully collected data unreliable—and such hidden factors often are insidiously difficult to detect.</p>
<p>A well-known example involves World War II-era bombing runs. Analysis showed that accuracy increased when bombers encountered enemy fighters, confounding all expectations. But a key variable hadn’t been factored in: weather conditions. On cloudy days, accuracy was terrible<br />
because the bombers couldn’t spot landmarks, and the enemy didn’t bother scrambling fighters.</p>
<p>Suppose that data for your company’s key product shows a much larger defect rate for items made by the second shift than items made by the first.</p>
<p style="margin-left: 40px;"><img alt="defects per shift" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/1eceb685455e3bed234c0b635ef60dc9/defects_per_shift.jpg" style="width: 576px; height: 377px;" /></p>
<p>Given only this information, your boss might suggest a training program for the second shift, or perhaps even more drastic action.</p>
<p>But could something else be going on? Your raw materials come from three different suppliers.</p>
<p>What does the defect rate data look like if you include the supplier along with the shift?</p>
<p style="margin-left: 40px;"><img alt="Defects per Shift per Suppleir" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ef2c2d3749d20b117106b4c122388846/defects_per_shift_with_supplier.jpg" style="width: 544px; height: 383px;" /></p>
<p>Now you can see that defect rates for both shifts are higher when using supplier 2’s materials. Not<br />
accounting for this confounding factor almost led to an expensive “solution” that probably would do little to reduce the overall defect rate.</p>
Take the Time to Get Data You Can Trust…
<p>Nobody sets out to waste time or sabotage their efforts by not collecting good data. But it’s all too easy to get problem data even when you’re being careful! When you collect data, be sure to spend<br />
the little bit of time it takes to make sure your data is truly trustworthy. </p>
Data AnalysisQuality ImprovementStatisticsTue, 01 Aug 2017 13:56:00 +0000http://blog.minitab.com/blog/understanding-statistics/3-keys-to-getting-reliable-dataEston MartzSunny Day for A Statistician and A Householder – An Update
http://blog.minitab.com/blog/using-data-and-statistics/sunny-day-for-a-statistician-and-a-householder%E2%80%93an-update
<p>We had solar panels fitted on our property in 2011. Last year, we had a few problems with the equipment. It was shutting down at various times throughout the day, typically when it was very sunny, resulting in no electricity being generated.</p>
<img alt="solar panels" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/0ee09d62f414b4bd79601d23995458bf/solar.jpg" style="width: 400px; height: 267px; margin: 10px 15px; float: right;" />
<div>
<div>
<p>In summer 2016, I completed a statistical analysis in Minitab to confirm my suspicions that my solar panels were not working as well as they did when they were installed. Details of this first analysis can be found <a href="http://blog.minitab.com/blog/using-data-and-statistics/sunny-day-for-a-statistician-vs-dark-day-for-a-householder-with-solar-panels">here.</a></p>
<p>After completing this analysis, I spoke to the company who manufactured our inverter and they identified a problem. On the 15th of July 2016, an engineer set up our inverter (the equipment that converts the solar energy from DC to AC), with the correct settings.</p>
<p>I now have the data for the months Jan–June 2017, which I am going to compare to the first six months for the years 2012–2016 to see if this fix has solved the problem.</p>
<p>I am going to use the same analysis as last year, the one-way analysis of variance via the Assistant: <strong> </strong><strong>Assistant > Hypothesis Tests > One-Way ANOVA</strong>.</p>
<p>The updated descriptive results were as follows:</p>
<p><img alt="solar energy output from Minitab 18" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/0c9657d141afd560725c5ac35f758e4e/solar_1.png" style="width: 798px; height: 224px;" /></p>
<p>Just looking at the summary statistics above, I can clearly see that the average electric units generated per day for the first six months of 2017, at 8.13, is higher than the 5.69 generated per day in 2016. </p>
<p>In my analysis of last year’s data, I found that 2016 was significantly worse than the previous year's. Using the results from this latest one-way ANOVA, and reviewing the Means Comparison Chart, shown below, I am hoping to see that 2017’s performance is as good as the years 2012–2015, when there were no problems with the solar-generating equipment. </p>
<p><img alt="solar energy means comparison chart" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/afebf341c3ca0eede9efd18393eb236a/solar_2.png" style="width: 561px; height: 507px;" /></p>
<p>The great news is that, the engineers fix has worked and the amount of electricity generated in the first 6 six months in 2017 is significantly better than 2016 and not significantly different to that generated in the years 2012-2015. </p>
<p>A sunny day for a householder and a statistician!</p>
<p> </p>
</div>
</div>
ANOVAData AnalysisStatisticsThu, 27 Jul 2017 12:00:00 +0000http://blog.minitab.com/blog/using-data-and-statistics/sunny-day-for-a-statistician-and-a-householder%E2%80%93an-updateGillian GroomHow to Eliminate False Alarms on P and U Control Charts
http://blog.minitab.com/blog/understanding-statistics/how-to-eliminate-false-alarms-on-p-and-u-control-charts
<p>All processes have variation, some of which is inherent in the process, and isn't a reason for concern. But when processes show unusual variation, it may indicate a change or a "special cause" that requires your attention. </p>
<p>Control charts are the primary tool quality practitioners use to detect special cause variation and distinguish it from natural, inherent process variation. These charts graph process data against an upper and a lower control limit. To put it simply, when a data point goes beyond the limits on the control chart, investigation is probably warranted.</p>
Traditional Control Charts and False Alarms
<p>It seems so straightforward. But veteran control chart users can tell you about “false alarms,” instances where data points went outside the control limits, and those limits fell close to the mean—even though the process <em>was </em>in statistical control.</p>
<p>The attribute charts, the <a href="http://blog.minitab.com/blog/understanding-statistics/what-control-chart-should-i-use">traditional P and U control charts</a> we use monitor defectives and defects, are particularly prone to false alarms due to a phenomenon known as overdispersion. That problem had been known for decades, until quality engineer David Laney solved it by devising P' and U' charts. </p>
<p>The P' and U' charts avoid false alarms so only important process deviations are detected. In contrast to the traditional charts, which assume a defective or defect rate remains constant, P' and U' charts assume that no process has a truly constant rate, and accounts for that when calculating control limits.</p>
<p>That's why the P' and U' charts deliver a more reliable indication of whether the process is really in control, or not.</p>
<p>Minitab's control chart capabilities include P' and U' charts, and the software also includes a diagnostic tool that identifies situations where you need to use them. When you choose the right chart, you can be confident any special-cause variation you're observing truly exists.</p>
The Cause of Control Chart False Alarms
<p>When you have too much variation, or “overdispersion,” in your process data, false alarms can result—especially with data collected in large subgroups. The larger the subgroups, the narrower the control limits on a traditional P or U chart. But those artificially tight control limits can make points on a traditional P chart appear out of control, even if they aren't.</p>
<p>However, too little variation, or “underdispersion,” in your process data also can lead to problems. Underdispersion can result in artificially wide control limits on a traditional P chart or U chart. Under that scenario, some points that appear to be in control could well be ones you <em>should </em>be concerned about.</p>
<p>If your data is affected by overdispersion or underdispersion, you need to use a P' or U' chart will to reliably distinguish common-cause from special-cause variation. </p>
Detecting Overdispersion and Underdispersion
<p>If you aren't sure whether or not your process data has over- or underdispersion, the P Chart or U Chart Diagnostic in Minitab can test it and tell you if you need to use a Laney P' or U' chart.</p>
<p>Choose <strong>Stat > Control Charts > Attributes Charts > P Chart Diagnostic</strong> or <strong>Stat > Control Charts > Attributes Charts > U Chart Diagnostic</strong>. </p>
<p><img alt="P Chart Diagnostic" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ab78f289474fe548db3deb7fa8581075/diagnostic_menu.png" style="width:575px;height:323px;" /></p>
<p>The following dialog appears:</p>
<p style="margin-left: 40px;"><img alt="P chart diagnostic" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/464df821310229bb5a439ca94db4f151/p_diagnostic_dialog_en_1_.jpg" style="width:400px;height:265px;" /></p>
<p>Enter the worksheet column that contains the number of defectives under "Variables." If all of your samples were collected using the same subgroup size, enter that number in Subgroup sizes. Alternatively, identify the appropriate column if your subgroup sizes varied.</p>
<p>Let’s run this test on the <a href="https://support.minitab.com/datasets/control-charts-data-sets/defective-records-data/" target="_blank">DefectiveRecords.MTW</a> from Minitab's sample data sets. This data set features very large subgroups, each having an average of about 2,500 observations.</p>
<p>The diagnostic for the P chart gives the following output:</p>
<p><img alt="P Chart Diagnostic" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/e9a9ae99c4c095e3af4ca390a8ebacd2/p_chart_diagnostic_for_defectives.png" style="width:576px;height:384px;" /></p>
<p>Below the plot, check out the ratio of observed to expected variation. If the ratio is greater than the 95% upper limit that appears below it, your data are affected by overdispersion. Underdispersion is a concern if the ratio is less than 60%. Either way, a Laney P' chart will be a more reliable option than the traditional P chart.</p>
Creating a P' Chart
<p>To make a P' chart, go to <strong>Stat > Control Charts > Attributes Charts > Laney P'</strong>. Minitab will generate the following chart for the Defectives.MTW data. </p>
<p><img alt="P' Chart of Defectives" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/67bf34d0435c1b131ddbe7a0ea5b26a6/laney_p____chart_of_defectives.png" style="width:576px;height:384px;" /></p>
<p>This P' chart shows a stable process with no out-of-control points. But create a traditional P chart with this data, and several of the subgroups appear to be out of control, thanks to the artificially narrow limits caused by the overdispersion. </p>
<p><img alt="P Chart of Defectives" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/361a23fa2f3d3553de39d5f211786302/p_chart_of_defectives.png" style="width:576px;height:384px;" /></p>
<p>So why do these data points appear to be out of control on the P but not the P' chart? It’s the way each defines and calculates process variation. The Laney P' chart control limits account for the overdispersion when calculating the variation and eliminate these false alarms.</p>
<p>The Laney P' chart calculations include within-subgroup variation as well as the variation <em>between </em>subgroups to adjust for overdispersion or underdispersion.</p>
<p>If over- or underdispersion is not a problem, the P' chart compares to a traditional P chart. But the P' chart expands the control limits where overdispersion exists, ensuring that only important deviations are identified as out of control. And in the case of underdispersion, the P' chart calculations result in narrower control limits.</p>
<p>To learn more about the statistical foundation underlying the Laney P' and U' charts, read <a href="http://www.minitab.com/en-us/published-articles/On-the-Charts--A-Conversation-with-David-Laney/">On the Charts: A Conversation with David Laney</a>.</p>
Control ChartsTue, 25 Jul 2017 13:58:00 +0000http://blog.minitab.com/blog/understanding-statistics/how-to-eliminate-false-alarms-on-p-and-u-control-chartsEston MartzArea Graphs: An Underutilized Tool
http://blog.minitab.com/blog/starting-out-with-statistical-software/area-graphs-an-underutilized-tool
<p>In my time at Minitab, I’ve gotten a good understanding of what types of graphs users create. Everyone knows about histograms, bar charts, and time series plots. Even relatively less familiar plots like the interval plot and <span><a href="http://blog.minitab.com/blog/understanding-statistics/trouble-starting-an-analysis-graph-your-data-with-an-individual-value-plot">individual value plot</a></span> are still used quite often. However, one of the most underutilized graphs we have available is the area graph. If you’re not familiar with an Area Graph, here’s the example from the Minitab help menu of what it looks like:</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/732ead34-1005-4470-b034-d7f8b87fabcf/Image/67c9fc3399dc4a8a5c72d2ace452db62/areagraph1.png" style="width: 366px; height: 245px;" /></p>
<p>As you can see, an area graph is a great way to be able to view multiple time series trends in one plot, especially if those plots form a part of one whole. There are numerous ways this can be used to visualize things. Anytime you are interested in multiple series that make up a whole, an area graph can do the job. You could use it to show enrollment rates by gender, precipitation rates by county, population totals by city, etc.</p>
<p>I’m going to show you how to go about creating one in <a href="http://www.minitab.com/products/minitab">Minitab</a>. First, we need to put our data in our worksheet. For this graph, we need each of the series, or sections, in a separate column. An additional constraint on this graph is that we need all of the columns to be of equal length, so be sure that’s the case. In our example we will use <a href="//cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/File/8ad6bf30e2b5eed510c2bd1e19f52e1c/areagraphblogdata.mtw">sales data</a> from different regional branches, and show that an area graph can be an improvement over a simple time series plot.</p>
<p>Once it’s in your worksheet, we can go to <strong>Graph > Time Series Plot</strong>, and look at the data in a basic time series plot. As you can see, there are a few challenges with interpreting this plot. </p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/732ead34-1005-4470-b034-d7f8b87fabcf/Image/dc1239167766d00e1eeea7a8e1227414/areagraph2.png" style="width: 577px; height: 385px;" /></p>
<p>First, the plot looks extremely messy. While it gives a good look at the sales from the individual branches, it is very hard to track an individual branch through time. And it’s not much better to look at 4 (or more) separate individual plots, because it then makes it harder to compare. Additionally, when you make separate plots, an important piece of information is lost: total sales. For example, in August, Philadelphia, London, and Seattle had a total sales increase, while New York had its worst month of the year. Was this an overall gain or overall loss? We can’t really tell from individual plots. </p>
<p>Instead, let’s look at an Area Graph. You can find this by going to <strong>Graph > Area Graph</strong>, and entering the series the same way as we did the time series plot. Take a look at our output below:</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/732ead34-1005-4470-b034-d7f8b87fabcf/Image/f5f5d94367127534af22a16b9dc364d1/areagraph3.png" style="width: 577px; height: 385px;" /></p>
<p>For starters, it looks much cleaner. We are able to see clear trends in the overall pattern. We can see that overall sales spiked in August, answering our question from above. We can use this to evaluate trends in multiple series, <em>as well as</em> the contribution of each series to the total quantity. We get all the information about total sales month-to-month, as well as the individual series for each location, in one plot, instead of in the messy, hard-to-read Time Series plot we created first.</p>
<p>Next time you need to evaluate multiple series together, considering taking a look at the Area Graph to get a cleaner picture of your data!</p>
Data AnalysisStatisticsStatsThu, 20 Jul 2017 12:00:00 +0000http://blog.minitab.com/blog/starting-out-with-statistical-software/area-graphs-an-underutilized-toolEric Heckman