Data Analysis Software | MinitabBlog posts and articles with tips for using statistical software to analyze data for quality improvement.
http://blog.minitab.com/blog/data-analysis-software/rss
Tue, 25 Oct 2016 19:09:51 +0000FeedCreator 1.7.3Common Assumptions about Data (Part 1: Random Samples and Statistical Independence)
http://blog.minitab.com/blog/quality-business/common-assumptions-about-data-part-1-random-samples-and-statistical-independence
<p><img alt="horse before the cart road sign" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/cc91865a3d4df6456934528866576a1b/horse_warning_sign.png" style="margin: 10px 15px; float: right; width: 120px; height: 120px;" /></p>
<p>Statistical inference uses data from a sample of individuals to reach conclusions about the whole population. It’s a very <span>powerful tool</span>. But as the saying goes, “With great power comes great responsibility!” When attempting to make inferences from sample data, you must check your assumptions. Violating any of these assumptions can result in false positives or false negatives, thus invalidating your results. In other words, you run the risk that your results are wrong, that your conclusions are wrong, and hence that the solutions you implement won’t solve the problem (unless you’re <em>really</em> lucky!).</p>
<p>You’ve heard the joke about <a href="https://www.goodreads.com/quotes/192478-you-should-never-assume-you-know-what-happens-when-you">what happens when you assume</a>? For this post, let’s instead ask “What happens when you fail to check your assumptions?” After all, we’re human—and humans assume things all the time. Suppose, for example, I want to schedule a phone meeting with you and I’m in the U.S. Eastern time zone. It’s easy for me to assume that everyone is in same time zone, but you’re really in California, or Australia. What would happen if I called a meeting at 2:00 p.m. but didn’t specify the time zone? Unless you checked, you might be early or late to the meeting, or miss it entirely! </p>
<p>The good news is that when it comes to the assumptions in statistical analysis, Minitab has your back. Minitab 17 has even more features to help you verify and validate the needed statistical analysis assumptions before you finalize your conclusion. When you use <a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/angst-over-anova-assumptions-ask-the-assistant">the Assistant in Minitab</a>, the software will identify the appropriate assumptions for your analysis, provide guidance to help you develop robust data collection plans, check the assumptions when you analyze your data, and let you know the results in an easy-to-understand Report Card and Diagnostic Report.</p>
<p>The common data assumptions are: Random Samples, Independence, Normality, Equal Variance, Stability, and that your Measurement System is accurate and precise. In this post, we’ll address Random Samples and Statistical Independence.</p>
What Is the Assumption of Random Samples?
<p>A sample is random when each data point in your population has an equal chance of being included in the sample; therefore <a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/collecting-random-data-isnt-monkey-business">selection of any individual happens by chance, rather than by choice</a>. This reduces the chance that differences in materials or conditions strongly bias results. Random samples are more likely to be representative of the population; therefore you can be more confident with your statistical inferences with a random sample. </p>
<p>There is no test that assures random sampling has occurred. Following good sampling techniques will help to ensure your samples are random. Here are some common approaches to making sure a sample is randomly created:</p>
<ul>
<li>Using a random number table or feature in Minitab (Figure 1).</li>
<li>Systematic selection (every nth unit or at specific times during the day).</li>
<li>Sequential selection (taken in sequence for destructive testing, etc.).</li>
<li>Avoiding the use of judgement or convenience to select samples.</li>
</ul>
<p><img alt="Minitab dialog boxes" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/9f122a1ab01790cdc5f6530ec3a90a14/assumptions_dialog_box.png" style="border-width: 0px; border-style: solid; width: 700px; height: 421px;" /></p>
<p><em>Figure 1. Random Data Generator in Minitab 17</em></p>
<p>Non-random samples introduce bias and can result in incorrect interpretations.</p>
What Is the Assumption of Statistical Independence?
<p>Statistical independence is a critical assumption for many statistical tests, such as the 2-sample t test and ANOVA. Independence means the value of one observation does not influence or affect the value of other observations. Independent data items are not connected with one another in any way (unless you account for it in your model). This includes the observations in both the “between” and “within” groups in your sample. Non-independent observations introduce bias and can make your statistical test give too many false positives. </p>
<p>Following good sampling techniques will help to ensure your samples are independent. Common sources of non-independence include:</p>
<ul>
<li>Observations that are close together in time.</li>
<li>Observations that are close together in space or nested.</li>
<li>Observations that are somehow related.</li>
</ul>
<p>Minitab can test for independence using the Chi-Square Test for Association, which is designed to determine if the distribution of observations for one variable is similar for all categories of the second variable. </p>
The Real Reason You Need to Check the Assumptions
<p>You will be putting a lot of time and effort into collecting and analyzing data. After all the work you put into the analysis, you want to be able to reach correct conclusions. You want to be confident that you can tell whether observed differences between data samples are simply due to chance, or if the populations are indeed different! </p>
<p>It’s easy to put the cart before the horse and just plunge in to the data collection and analysis, but it’s much wiser to take the time to understand which data assumptions apply to the statistical tests you will be using, and plan accordingly.</p>
<p>In my next blog post, I will review the Normality and Equal Variance assumptions. </p>
Data AnalysisStatisticsStatistics HelpStatsMon, 24 Oct 2016 12:00:00 +0000http://blog.minitab.com/blog/quality-business/common-assumptions-about-data-part-1-random-samples-and-statistical-independenceBonnie K. StoneImproving Cash Flow and Cutting Costs at Bank Branch Offices
http://blog.minitab.com/blog/understanding-statistics/improving-cash-flow-and-cutting-costs-at-bank-branch-offices
<p>Every day, thousands of people withdraw extra cash for daily expenses. Each transaction may be small, but the total amount of cash dispersed over hundreds or thousands of daily transactions can be very high. But every bank branch has a fixed cash flow, which must be set without knowing what each customer will need on a given day. This creates a challenge for financial entities. Customers expect their local bank office to have adequate cash on hand, so how can a bank confidently ensure each branch has enough funds to handle transactions without keeping too much in reserve?</p>
<p><img alt="Grupo Mutual" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/b2366c2da44cd861775ebab6c6d07e55/grupo_mutual_logo_200w_1_.png" style="width: 200px; height: 95px; margin: 10px 15px; float: right;" />A quality project team led by Jean Carlos Zamora and Francisco Aguilar tackled that problem at Grupo Mutual, a financial entity in Costa Rica.</p>
<p>When the project began, each of Grupo Mutual's 55 branches kept additional cash in a vault to avoid having insufficient funds. But without a clear understanding of daily needs, some branches often ran out of cash anyway, while others had significant unused reserves.</p>
<p>When a branch ran short, it created high costs for the company and gave customers three undesirable options: receive the funds as an electronic transfer, wait 1–3 days for consignment, or travel to the main branch to withdraw their cash. Having the right amount of cash in each branch vault would reduce costs and maintain customer satisfaction.</p>
<p>Using <a href="http://www.minitab.com/products/minitab">Minitab Statistical Software</a> and Lean Six Sigma methods, the team set out to determine the optimal amount of currency to store at each branch to avoid both a negative cash flow and idle funds. The team followed the five-phase <a href="http://blog.minitab.com/blog/real-world-quality-improvement/dmaic-vs-dmadv-vs-dfss">DMAIC (Define, Measure, Analyze, Improve, and Control)</a> method. In the Define phase, they set the goal: creating an efficient process that transferred cash from idle vaults to branches that needed it most.</p>
<p>In the Measure phase, the team analyzed two years' worth of cash-flow data from the 55 branches. “Managing the databases and analyzing about 2,000 data points from each of the 55 branches was our biggest challenge,” says Jean-Carlos Zamora Mora, project leader and improvement specialist at Grupo Mutual. “Minitab played a very important part in addressing this issue. It reduced the analysis time by helping us identify where to focus our efforts to improve our process.” </p>
<p>The Analyze phase began with an analysis of variance (ANOVA) for to explore how the banks’ cash flow varied per month. They used Minitab to identify which months were different from one another, and grouped similar months together to streamline the analysis. </p>
<p>The team next used control charts to graph the data over time and assess whether or not the process was stable, in preparation for conducting capability analysis. To choose the right control chart and create comprehensive summaries of the results, the team used the Minitab Assistant.</p>
<p style="margin-left: 40px;"><img alt="grupo mutual i-mr chart" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/2d9ac9b2597c592e5be5b779bae85076/grupo_mutual_i_mr_chart_1_.png" style="width: 585px; height: 432px;" /></p>
<p>The team then performed a capability analysis of each group’s current cash flow to determine whether customer transactions matched the services provided, and establish the percentage of cash used at each branch.</p>
<p style="margin-left: 40px;"><img alt="grupo mutual capability analysis" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f0e25ef8282111550e8fe8733eb889de/grupo_mutual_capability_analysis_1_.png" style="width: 586px; height: 439px;" /></p>
<p>The analysis revealed that, in total, the vaults contained more than the necessary funds each branch needed to operate effectively, but excessive circulation of the money caused some to overdraw their vaults while others stored cash that was not utilized. </p>
<p>“We found a positive cash balance at 95% of the branches,” says Zamora Mora. “The analysis showed the cash on hand to meet customer needs exceeded the requirements by over 200%, so we suddenly had lots of money to invest.” </p>
<p>The analysis gave the team the confidence to move forward with the Improve phase: implementing real-time control charts that enabled management to check each branch’s cash balance throughout the day. Managers could now quickly move cash from branches with excess cash to those needing additional funds, and make more strategic cash flow decisions.</p>
<p>The team found that being able to answer objections with data helped secure buy-in from skeptical stakeholders. “Throughout this project, we encountered questions and situations that could have jeopardized our team’s credibility and our likelihood of success,” recalls Zamora Mora. “But the accuracy and reliability of our data analysis with Minitab was overpowering.” </p>
<p>The changes made during the project increased cash usage by 40% and slashed remittance costs by 60%.The new process also cut insurance costs and shrank risks associated with storing and transporting cash. Overall, the project increased revenue by $1.1 million. </p>
<p>To read a more detailed account of this project, <a href="https://www.minitab.com/Case-Studies/Grupo-Mutual/">click here</a>. </p>
Capability AnalysisLean Six SigmaQuality ImprovementFri, 21 Oct 2016 12:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/improving-cash-flow-and-cutting-costs-at-bank-branch-officesEston MartzProblems Using Data Mining to Build Regression Models, Part Two
http://blog.minitab.com/blog/adventures-in-statistics/problems-using-data-mining-to-build-regression-models-part-two
<p>Data mining can be helpful in the exploratory phase of an analysis. If you're in the early stages and you're just figuring out which predictors are potentially correlated with your response variable, data mining can help you identify candidates. However, there are problems associated with using data mining to select variables.</p>
<p>In my <a href="http://blog.minitab.com/blog/adventures-in-statistics/problems-using-data-mining-to-build-regression-models" target="_blank">previous post</a>, we used data mining to settle on the following model and graphed one of the relationships between the response (C1) and a predictor (C7). It all looks great! The only problem is that all of these data are randomly generated! No true relationships are present. </p>
<p style="margin-left: 40px;"><img alt="Regression output for data mining example" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/24e98167e2dfd848b346292af371acf3/regression_swo.png" style="width: 364px; height: 278px;" /></p>
<p style="margin-left: 40px;"><img alt="Scatter plot for data mining example" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/6e4dfb991b33031738756d4b2d1c77e4/scatterplot.png" style="width: 576px; height: 384px;" /></p>
<p>If you didn't already know there was no true relationship between these variables, these results could lead you to a very inaccurate conclusion.</p>
<p>Let's explore how these problems happen, and how to avoid them</p>
Why <em>Do </em>These Problems Occur with Data Mining?
<p>The problem with data mining is that you fit many different models, trying lots of different variables, and you pick your final model based mainly on statistical significance, rather than being guided by theory.</p>
<p>What's wrong with that approach? The problem is that every statistical test you perform has a chance of a false positive. A false positive in this context means that the <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-correctly-interpret-p-values" target="_blank">p-value</a> is statistically significant but there really is no relationship between the variables at the population level. If you set the <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests:-significance-levels-alpha-and-p-values-in-statistics" target="_blank">significance level at 0.05</a>, you can expect that in 5% of the cases where the null hypothesis is true, you'll have a false positive.</p>
<p>Because of this false positive rate, if you analyze many different models with many different variables you will inevitably find false positives. And if you're guided mainly by statistical significance, you'll leave the false positives in your model. If you keep going with this approach, you'll fill your model with these false positives. That’s exactly what happened in our example. We had 100 candidate predictor variables and the stepwise procedure literally dredged through hundreds and hundreds of potential models to arrive at our final model.</p>
<p>As we’ve seen, data mining problems can be hard to detect. The numeric results and graph all look great. However, these results don’t represent true relationships but instead are chance correlations that are bound to occur with enough opportunities.</p>
<p>If I had to name my favorite R-squared, it would be <a href="http://blog.minitab.com/blog/adventures-in-statistics/multiple-regession-analysis-use-adjusted-r-squared-and-predicted-r-squared-to-include-the-correct-number-of-variables" target="_blank">predicted R-squared</a>, without a doubt. However, even predicted R-squared can't detect all problems. Ultimately, even though the predicted R-squared is moderate for our model, the ability of this model to predict accurately for an entirely new data set is practically zero.</p>
Theory, the Alternative to Data Mining
<p>Data mining can have a role in the exploratory stages of an analysis. However, for all variables that you identify through data mining, you should perform a confirmation study using newly collected to data to verify the relationships in the new sample. Failure to do so can be very costly. Just imagine if we had made decisions based on the model above!</p>
<p>An alternative to data mining is to use theory as a guide in terms of both the models you fit and the evaluation of your results. Look at what others have done and incorporate those findings when building your model. Before beginning the regression analysis, develop an idea of what the important variables are, along with their expected relationships, coefficient signs, and effect magnitudes.</p>
<p>Building on the results of others makes it easier both to collect the correct data and to specify the best regression model without the need for data mining. The difference is the process by which you fit and evaluate the models. When you’re guided by theory, you reduce the number of models you fit and you assess properties beyond just statistical significance.</p>
<p>Theoretical considerations should not be discarded based solely on statistical measures.</p>
<ul>
<li>Compare the coefficient signs to theory. If any of the signs contradict theory, investigate and either change your model or explain the inconsistency.</li>
<li>Use <a href="http://www.minitab.com/en-us/products/minitab/" target="_blank">Minitab statistical software</a> to create factorial plots based on your model to see if all the effects match theory.</li>
<li>Compare the <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-how-do-i-interpret-r-squared-and-assess-the-goodness-of-fit" target="_blank">R-squared</a> for your study to those of similar studies. If your R-squared is very different than those in similar studies, it's a sign that your model may have a problem.</li>
</ul>
<p>If you’re interested in learning more about these issues, read my post about <a href="http://blog.minitab.com/blog/adventures-in-statistics/beware-of-phantom-degrees-of-freedom-that-haunt-your-regression-models">how using too many <em>phantom</em> degrees of freedom is related to data mining problems</a>.</p>
<p> </p>
Data AnalysisHypothesis TestingLearningRegression AnalysisStatisticsStatistics HelpWed, 19 Oct 2016 12:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/problems-using-data-mining-to-build-regression-models-part-twoJim FrostMinitab 17 and Minitab Express: A Comparison of Software Features
http://blog.minitab.com/blog/marilyn-wheatleys-blog/minitab-17-and-minitab-express-a-comparison-of-software-features
<p><span style="line-height: 1.6;">Since the release of Minitab Express in 2014, we’ve often received questions in technical support about the differences between Express and Minitab 17. In this post, I’ll attempt to provide a comparison between these two Minitab products.</span></p>
What Is Minitab 17?
<p>Minitab 17 is an all-in-one graphical and statistical analysis package that includes basic analysis tools such as hypothesis testing, regression, and ANOVA. Additionally, Minitab 17 includes more advanced features such as reliability analysis, multivariate tools, design of experiments (DOE), and quality tools such as gage R&R and capability analysis. A full list of features that are included Minitab 17 is available on this <a href="http://www.minitab.com/en-us/products/minitab/features-list/">page</a>. </p>
What Is Minitab Express?
<p>Minitab Express is a more basic all-in-one software package for graphical and statistical analysis, designed for students and professors teaching introductory statistics courses. Minitab Express includes statistical analysis options such as hypothesis testing, regression, and ANOVA, but does not include many of the other advanced features that are available in Minitab 17. A full list of the features that are included in Minitab Express is available <a href="http://www.minitab.com/en-us/products/express/features-list/">here</a>.</p>
Key Differences
<strong><em>Supported Operating Systems</em></strong>
<p>One main difference between the two packages is that Minitab 17 is a Windows-only application (however, Minitab 17 can be installed on Mac OS X using one of the options described <a href="http://support.minitab.com/en-us/installation/frequently-asked-questions/other/minitab-companion-on-mac/">here</a>). System requirements for Minitab 17 are available <a href="http://www.minitab.com/en-us/products/minitab/system-requirements/">here</a>. </p>
<p>Minitab Express is available for both Window and Mac OS X. The system requirements for Minitab Express are available <a href="http://www.minitab.com/en-us/products/express/system-requirements/">here</a>.</p>
<strong><em>The Interface</em></strong>
<p>While the menu options for both versions of the software are located at the top and the worksheet/data window are below, there are several differences in the interface. The first screen shot below is for Minitab 17, while the next two screen shots are for Minitab Express:</p>
<p style="margin-left: 40px;"><br />
<strong>Minitab 17:</strong><img alt="Minitab 17 Interface" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f054ba83a85abb6245445502feb2ce86/minitab17interface.png" style="width: 800px; height: 481px;" /></p>
<p style="margin-left: 40px;"><strong>Minitab Express for Windows:</strong><img alt="Express for Windows" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/280aa535dde18d42aaf42eb517fbb9fe/expressforwindowsinterface.png" style="width: 800px; height: 571px;" /></p>
<strong>Minitab Express for OS X</strong><img alt="Minitab Express for OS X Interface" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/177920cdc081cd8d77458ccf3318d192/expressforosxinterface.png" style="width: 800px; height: 529px;" />
<em><strong>Comparison of Commonly Used Features</strong></em>
<p>In addition to cosmetic differences in appearance, the table below compares the features that are available in both versions:</p>
<p align="center"><strong>Feature</strong></p>
<p align="center"><strong>Minitab 17 </strong></p>
<p align="center"><strong>(Windows)</strong></p>
<p align="center"><strong>Minitab Express </strong></p>
<p align="center"><strong>(Windows & Mac OS X)</strong></p>
<p style="text-align: center;">Assistant menu</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p style="text-align: center;"> </p>
<p style="text-align: center;">Graphs</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p style="text-align: center;">Probability distributions</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p style="text-align: center;">Summary statistics</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p style="text-align: center;">Hypothesis tests</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p style="text-align: center;">One-Way ANOVA</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p style="text-align: center;">Two-Way ANOVA</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p style="text-align: center;">ANOVA with > 2 factors</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p style="text-align: center;"> </p>
<p style="text-align: center;">Linear regression</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p style="text-align: center;">Logistic regression</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p style="text-align: center;">Nonlinear regression</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p style="text-align: center;"> </p>
<p style="text-align: center;">Design of experiments</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p style="text-align: center;"> </p>
<p style="text-align: center;">Control charts</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p style="text-align: center;">Gage R&R</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p style="text-align: center;"> </p>
<p style="text-align: center;">Capability analysis</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p style="text-align: center;"> </p>
<p style="text-align: center;">Reliability</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p style="text-align: center;"> </p>
<p style="text-align: center;">Multivariate</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p style="text-align: center;"> </p>
<p style="text-align: center;">Time series</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p style="text-align: center;">Nonparametric tests</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p style="text-align: center;">Equivalence tests</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p style="text-align: center;"> </p>
<p style="text-align: center;">Power and sample size</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p align="center"> </p>
<p>Although many of the same features are available in both packages, Minitab 17 has many graph editing options that are not available in Minitab Express. For many of the tests that are available in both packages, Minitab 17 allows more control over the results and has more options that Minitab Express. You can see a more detailed comparison <a href="http://www.minitab.com/academic/comparison/">here</a>. </p>
<p>I hope this post is useful in evaluating the two versions of Minitab. For any questions about either software package, we are more than happy to help here in <a href="http://www.minitab.com/en-us/support/">technical support</a>.</p>
StatisticsStatsMon, 17 Oct 2016 12:00:00 +0000http://blog.minitab.com/blog/marilyn-wheatleys-blog/minitab-17-and-minitab-express-a-comparison-of-software-featuresMarilyn WheatleyWhy You Should Celebrate Healthcare Quality Week
http://blog.minitab.com/blog/real-world-quality-improvement/ways-to-celebrate-healthcare-quality-week
<p>October 16–22 is National Healthcare Quality Week, started by the National Association for Healthcare Quality to increase awareness of healthcare quality programs and to highlight the work of healthcare quality professionals and their influence on improved patient care outcomes.</p>
<img alt="healthcare quality week logo" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/71359d037d4643f7c534b6b2e17a074e/hqw.jpg" style="width: 250px; height: 70px; float: right; margin: 10px 15px;" />
<p>This event deserves your attention because the quality of healthcare affects every one of us, and so does the cost of that care. Whether it's as a patient, a quality practitioner, or a health care provider, we all have a stake in learning what people are doing to improve the quality of care and, at the same time, working to make it more efficient and ultimately affordable.</p>
<div>In honor of the celebration, I wanted to point you to a few resources we have on hand that not only acknowledge the great work of healthcare quality professionals around the world, but also show off the tactics and tools they use to keep patients safe and care affordable.</div>
<p>Kudos to not only those in the field of healthcare quality, but all who work in healthcare to improve the experience of patients everywhere (thank you!).</p>
Q&As with Healthcare Quality Professionals
<p><a href="https://www.minitab.com/News/Getting-Better-All-The-Time/" target="_blank">Getting Better All the Time</a></p>
<p>As the corporate director of process excellence at Citrus Valley Health Partners, Denise Ronquillo plays a key role in improving quality and ensuring that patients receive excellent and safe care. Over the past two years, she and her colleagues have achieved substantial successes while overcoming resistance and skepticism, and are beginning to see a new culture of quality emerge in their organization.</p>
<p><a href="https://www.minitab.com/News/This-Isn-t-a-Game-We-re-Playing/" target="_blank">This Isn’t a Game We’re Playing</a></p>
<p>Quality improvement is something healthcare providers <em>have</em> to do, says Dr. Sandy Fogel, surgical quality officer at Carilion Clinic.</p>
<p><a href="https://www.minitab.com/en-us/News/Healthcare-Quality--Making-a-Difference-with-Data--A-Conversation-with-Dr--William-H--Woodall/" target="_blank">Healthcare Quality: Making a Difference with Data</a></p>
<p>How can statistics and data analysis help improve outcomes in healthcare? William H. Woodall, professor of statistics at Virginia Tech, has been focused on that question for over ten years.</p>
Blog Posts about Quality Improvement in Health Care
<p><a href="http://blog.minitab.com/blog/understanding-statistics/a-six-sigma-healthcare-project-part-1-examining-factors-with-a-pareto-chart" target="_blank">A Six Sigma Healthcare Project</a></p>
<p>Follow along with a series of blog posts on the application of binary logistic regression in a healthcare Six Sigma project, which had a goal of attracting and retaining more patients in a hospital's cardiac rehabilitation program.</p>
<p><a href="http://blog.minitab.com/blog/michelle-paret/monitoring-rare-events-with-g-charts" target="_blank">Monitoring Rare Events with G and T Charts</a></p>
<p>These charts make it easy to assess the stability of processes that involve rare events and have low defect rates.</p>
<p><a href="http://blog.minitab.com/blog/meredith-griffith/exploring-healthcare-data-part-1" target="_blank">Exploring Healthcare Data</a></p>
<p>Learn several tips for exploring and visualizing your healthcare data in a way that will prepare you for a formal analysis.</p>
Case Studies about Health Care Quality Improvement Projects
<p><a href="https://www.minitab.com/en-us/Case-Studies/Cathay-General-Hospital/" target="_blank">Cathay General Hospital</a></p>
<p>During an assessment of its angioplasty process for patients suffering from heart attacks, Cathay General Hospital in Taipei, Taiwan used Minitab to analyze data to help them introduce new treatment options that led to a decrease in the patients’ hospital stay and an increased savings in medical resources.</p>
<p><a href="https://www.minitab.com/Case-Studies/Riverview-Hospital-Association/" target="_blank">Riverview Hospital Association</a></p>
<p>The Riverview Hospital Association Lean Six Sigma team performed data analysis to identify patient groups who were scoring lower on patient satisfaction survey questions. This allowed the team to target process improvement efforts to specific patient populations.</p>
<p><a href="https://www.minitab.com/en-us/Case-Studies/Franciscan-Hospital-for-Children/" target="_blank">Franciscan Children’s Hospital</a></p>
<p>With the help of Lean Six Sigma and Minitab software, Franciscan Hospital for Children was able to analyze information about its processes and make data-driven decisions that increased dental operating room efficiency and enabled doctors to see more kids.</p>
<p><em>For more on how data analysis and Minitab can be used in healthcare, visit <a href="http://www.minitab.com/healthcare" target="_blank">www.minitab.com/healthcare</a>. </em></p>
Health Care Quality ImprovementFri, 14 Oct 2016 12:00:00 +0000http://blog.minitab.com/blog/real-world-quality-improvement/ways-to-celebrate-healthcare-quality-weekCarly BarryDo You Know the Truth about Gage Repeatability and Reproducibility?
http://blog.minitab.com/blog/michelle-paret/do-you-know-the-truth-about-gage-repeatability-and-reproducibility
<p>The ultimate goal of most quality improvement projects is clear: reducing the number of defects, improving a response, or making a change that benefits your customers.</p>
<p>We often want to jump right in and start gathering and analyzing data so we can solve the problems. Checking your measurement systems first, with methods like attribute agreement analysis or Gage R&R, may seem like a needless waste of time. </p>
<p>But the truth is that a Gage R&R Study is a critical step in <em>any </em>statistical analysis involving continuous data. That's because it allows you to determine if your measurement system for that data is adequate or not. If your measurement system isn’t capable of producing reliable measurements, then any analysis you conduct with those measurements is likely meaningless.</p>
<p>So let’s get to the “R&R” part of <span><a href="http://blog.minitab.com/blog/meredith-griffith/fundamentals-of-gage-rr">Gage R&R</a></span>—Repeatability and Reproducibility.</p>
<p>Suppose we’re measuring pencils with a ruler (which is an excellent hands-on activity you can use to teach Gage R&R). We want to determine if our measurement system can adequately measure the length of these pencils. To conduct a Gage R&R Study, we randomly select 10 pencils and 3 people—Abe, Brenda, and Charlie. Each person measures each pencil 2 times, using the same ruler. This gives us a total of 10 x 3 x 2 = 60 measurements.</p>
<p><img alt="parts and operators" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/c7685645ea8140d6ba67b1496ba57624/parts_and_ops.png" style="width: 548px; height: 293px;" /></p>
Repeatability
<p>Repeatability represents the variation observed when the same operator measures the same part multiple times with the same device. In other words, when Abe repeatedly measures the same pencil with the same ruler, will his measurements be consistent? If he measures 16.8 cm the first time, is he going to measure 16.8 cm the next time he measures that same pencil?</p>
Reproducibility
<p>Reproducibility represents the variation observed when DIFFERENT operators measure the same part multiple times with the same device. In other words, if Abe measures a pencil at 16.8 cm in length, will Brenda also measure 16.8 cm for that same pencil? And what about Charlie?</p>
<p><strong>Helpful Hint: </strong>To remember the difference between repeatability and reproducibility, note that reproducibility includes an ‘o’ – think ‘<strong>o</strong>’ for the variability across “<strong>o</strong>perators.”</p>
Answering Important Questions
<p>Gage R&R can help you answer questions such as:</p>
<ul>
<li>Is my measurement system capable of discriminating between parts?</li>
<li>Is the variability in my measurement system small compared with the manufacturing process variability?</li>
<li>How much variability is my measurement system is caused by differences between operators?</li>
</ul>
<p>And if your measurement system isn't great, you can also use Gage R&R to determine where the weaknesses are. For example, perhaps a study reveals that while repeatability is good, the reproducibility is poor. You can use Gage R&R to dig deeper and figure out why different operators reported different readings.</p>
<p>To easily setup your Gage R&R data collection plan and analyze the corresponding data to assess your measurement system, check out <a href="http://www.minitab.com/products/minitab">Minitab Statistical Software</a> and its <strong>Stat > Quality Tools > Gage Study</strong> and <strong>Assistant > Measurement Systems Analysis</strong> features.</p>
Data AnalysisFun StatisticsLean Six SigmaQuality ImprovementSix SigmaStatisticsStatistics HelpStatsFri, 07 Oct 2016 12:00:00 +0000http://blog.minitab.com/blog/michelle-paret/do-you-know-the-truth-about-gage-repeatability-and-reproducibilityMichelle Paret5 More Powerful Insights from Noted Quality Leaders
http://blog.minitab.com/blog/understanding-statistics/5-more-powerful-insights-from-noted-quality-leaders
<p>We hosted our first-ever Minitab Insights conference in September, and if you were among the attendees, you already know the caliber of the speakers and the value of the information they shared. Experts from a wide range of industries offered a lot of great lessons about how they use data analysis to improve business practices and solve a variety of problems.<img alt="tips from Minitab Insights 2016" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/394dfef193debd958deb2011edaaac16/insights_takeaways1.gif" style="width: 354px; height: 250px; margin: 10px 15px; float: right;" /></p>
<p>I blogged earlier about <a href="http://blog.minitab.com/blog/understanding-statistics/5-powerful-insights-from-noted-quality-leaders">five key takeaways</a> gleaned from the sessions at the Minitab Insights 2016 conference. But that was just the tip of the iceberg, and participants learned many more helpful things are well worth sharing. So here are five <em>more </em>helpful, challenging, and thought-provoking ideas and suggestions that we heard during the event.</p>
Improve Your Skills while Improving Yourself!
<p>Everyone has personal goals they'd like to achieve, such as getting fit, changing a habit, or writing a book. Rod Toro, deployment leader at <a href="http://www.minitab.com/en-us/Case-Studies/Edward-Jones/?cta=6675">Edward Jones</a>, explained how challenging himself and his team to apply Lean and Six Sigma tools to their personal goals has helped them better understand the underlying principles of quality improvement, personalized learning and gain deeper insights, and expanded their ability to apply quality methods in a variety of circumstances and situations. </p>
We Can't Claim the Null Hypothesis Is True.
<p>Minitab technical training specialist Scott Kowalski reminded us that when we test a hypothesis with statistics, "<span><a href="http://blog.minitab.com/blog/understanding-statistics/things-statisticians-say-failure-to-reject-the-null-hypothesis">failing to reject the null</a></span>" does not prove that the null hypothesis <em>is </em>true. It only means we don't have enough evidence to reject it. We need to keep this in mind when we interpret our results, and to be careful how we explain our findings to others. We also need to be sure our hypotheses are clearly stated, and that we've selected the appropriate test for our task!</p>
Outliers Won't Just Be Ignored, So You'd Better Investigate Them.
<p>We've all seen them in our data: those <a href="http://blog.minitab.com/blog/michelle-paret/how-to-identify-outliers-and-get-rid-of-them">troublesome observations</a> that just don't want to belong, lurking off in the margins, maybe with one or two other loners. It can be tempting to ignore or just delete those observations, but Larry Bartkus, senior distinguished engineer at Edwards Lifesciences, provided vivid illustrations of the drastic impact outliers can have on the results of an analysis. He also reminded us of the value in slowing down our assumptions, looking at the data in several ways, and trying to understand <em>why </em>our data is the way it is. </p>
Attribute Agreement Analysis Is Just One Option.
<p>When we need to assess how well an attribute measurement system performs, attribute agreement analysis is the go-to method—but Thomas Rust, reliability engineer at Autoliv, demonstrated that many more options are available. In encouraging quality practitioners to "break the attribute paradigm," Rust detailed four innovative ways to assess an attribute measurement system: measure an underlying variable; attribute measurement of a variable product; variable measurement of an attribute product; and attribute measurement of an attribute product.</p>
Minitab Users Do Great Things.
<p>More than anything else, what we took away from Minitab Insights 2016 was an even greater appreciation for the people who are using our software in innovative ways—to increase the quality of the products we use every day, to raise the level of service we receive from businesses and organizations, to increase the efficiency and safety of our healthcare providers, and so much more.</p>
<p>Watch for more stories and ideas from the the Minitab Insights conference in future issues of Minitab News, and on the Minitab Blog.</p>
Data AnalysisInsightsLean Six SigmaProject ToolsQuality ImprovementSix SigmaStatisticsWed, 05 Oct 2016 12:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/5-more-powerful-insights-from-noted-quality-leadersEston MartzWhy Shrewd Experts "Fail to Reject the Null" Every Time
http://blog.minitab.com/blog/understanding-statistics/why-shrewd-experts-fail-to-reject-the-null-every-time
<p><img alt="nulls angels: the toughest statisticians around!" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/d2c0571a-acbd-48c7-84f4-222276c293fe/Image/509959f8406d59b3bb31f686aeb3b6b0/nulls_angels.jpg" style="margin: 10px 15px; float: right; width: 175px; height: 198px;" />I watched an old <a href="https://en.wikipedia.org/wiki/The_Wild_Angels" target="_blank">motorcycle flick from the 1960s</a> the other night, and I was struck by the bikers' slang. They had a language all their own. Just like statisticians, whose manner of speaking often confounds those who aren't hep to the lingo of data analysis.</p>
<p>It got me thinking...what if there were an all-statistician biker gang? Call them the Nulls Angels. Imagine them in their colors, tearing across the countryside, analyzing data and asking the people they encounter on the road about whether they "fail to reject the null hypothesis."</p>
<p>If you point out how strange that phrase sounds, the Nulls Angels will <em>know</em> you're not cool...and not very aware of statistics.</p>
<p>Speaking purely as an editor, I acknowledge that "failing to reject the null hypothesis" <em>is</em> cringe-worthy. "Failing to reject" seems like an overly complicated equivalent to <em>accept</em>. At minimum, it's clunky phrasing.</p>
<p>But it turns out those rough-and-ready statisticians in the Nulls Angels have good reason to talk like that. From a <em>statistical</em> perspective, it's undeniably accurate—and replacing "failure to reject" with "accept" would just be wrong.</p>
What <em>Is </em>the Null Hypothesis, Anyway?
<p>Hypothesis tests include one- and two-sample t-tests, tests for association, tests for normality, and many more. (All of these tests are available under the <strong>Stat</strong><span> menu in Minitab <a href="http://www.minitab.com">statistical software</a>. Or, if you want a little more <a href="http://www.minitab.com/en-us/products/minitab/assistant">statistical guidance</a>, the Assistant can lead you through common hypothesis tests step-by-step.)</span></p>
<p>A hypothesis test examines two propositions: the null hypothesis (or H0 for short), and the alternative (H1). The <em>alternative </em>hypothesis is what we hope to support. We presume that the null hypothesis is true, unless the data provide sufficient evidence that it is not.</p>
<p>You've heard the phrase "Innocent until proven guilty." That means innocence is assumed until guilt is proven. In statistics, the null hypothesis is taken for granted until the alternative is proven true.</p>
So Why Do We "Fail to Reject" the Null Hypothesis?
<p>That brings up the issue of "proof."</p>
<p>The degree of statistical evidence we need in order to “prove” the alternative hypothesis is the <a href="http://blog.minitab.com/blog/michelle-paret/alphas-p-values-confidence-intervals-oh-my">confidence level</a>. The confidence level is 1 minus our risk of committing a Type I error, which occurs when you incorrectly reject the null hypothesis when it's true. Statisticians call this risk alpha, and also refer to it as the significance level. The typical alpha of 0.05 corresponds to a 95% confidence level: we're accepting a 5% chance of rejecting the null even if it is true. (In life-or-death matters, we might <a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/alpha-male-vs-alpha-female">lower the risk of a Type I error to 1% or less</a>.)</p>
<p>Regardless of the alpha level we choose, any hypothesis test has only two possible outcomes:</p>
<ol>
<li><strong>Reject the null hypothesis</strong> and conclude that the alternative hypothesis is true at the 95% confidence level (or whatever level you've selected).<br />
</li>
<li><strong>Fail to reject the null hypothesis</strong> and conclude that <em>not</em> enough evidence is available to suggest the null is false at the 95% confidence level.</li>
</ol>
<p>We often use a <a href="http://blog.minitab.com/blog/understanding-statistics/three-things-the-p-value-cant-tell-you-about-your-hypothesis-test">p-value</a> to decide if the data support the null hypothesis or not. If the test's p-value is less than our selected alpha level, we reject the null. Or, as statisticians say "When the p-value's low, the null must go."</p>
<p>This still doesn't explain <em>why</em> a statistician won't "accept the null hypothesis." Here's the bottom line: failing to reject the null hypothesis does not mean the null hypothesis <em>is</em> true. That's because a hypothesis test does not determine <em>which</em> hypothesis is true, or even which is most likely: it <em>only</em> assesses whether evidence exists to reject the null hypothesis.</p>
<img alt=""My hypothesis is Null until proven Alternative, sir!" " src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/d2c0571a-acbd-48c7-84f4-222276c293fe/Image/a07b85370986a3dd126ac4d021775d13/trial.jpg" style="border-width: 1px; border-style: solid; margin: 10px 15px; float: right; width: 300px; height: 200px;" />"Null Until Proven Alternative"
<p>Hark back to "innocent until proven guilty." As the data analyst, you are the judge. The hypothesis test is the trial, and the null hypothesis is the defendant. The alternative hypothesis is the prosecution, which needs to make its case <em>beyond a reasonable doubt</em> (say, with 95% certainty).</p>
<p>If the trial evidence does not show the defendant is guilty, neither has it proved that the defendant <em>is</em> innocent. However, based on the available evidence, you can't reject that <em>possibility</em>. So how would you announce your verdict?</p>
<p>"Not guilty."</p>
<p>That phrase is perfect: "Not guilty"doesn't say the defendant <em>is</em> innocent, because that has not been proven. It just says the prosecution couldn't convince the judge to abandon the assumption of innocence.</p>
<p>So "failure to reject the null" is the statistical equivalent of "not guilty." In a trial, the burden of proof falls to the prosecution. When analyzing data, the entire burden of proof falls to your sample data. "Not guilty" does not mean "innocent," and "failing to reject" the null hypothesis is quite distinct from "accepting" it. </p>
<p>So if a group of marauding statisticians in their Nulls Angels leathers ever asks, keep yourself in their good graces, and show that know "failing to reject the null" is not "accepting the null."</p>
Fun StatisticsHypothesis TestingStatisticsStatistics HelpMon, 03 Oct 2016 12:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/why-shrewd-experts-fail-to-reject-the-null-every-timeEston MartzHow to Save a Failing Regression with PLS
http://blog.minitab.com/blog/statistics-and-quality-improvement/fix-problems-in-regression-analysis-with-partial-least-squares
<p>Face it, you love regression analysis as much as I do. Regression is one of the most satisfying analyses in <a href="http://www.minitab.com/en-US/products/minitab/free-trial/">Minitab</a>: get some predictors that should have a relationship to a response, go through a model selection process, interpret fit statistics like adjusted R2 and predicted R2, and make predictions. Yes, regression really is quite wonderful.</p>
<p>Except when it’s not. Dark, seedy corners of the data world exist, lying in wait to make regression confusing or impossible. Good old ordinary least squares regression, to be specific.</p>
<p>For instance, sometimes you have a lot of <em>detail</em> in your data, but not a lot of data. Want to see what I mean?</p>
<ol>
<li>In Minitab, choose <strong>Help > Sample Data...</strong></li>
<li>Open Soybean.mtw.</li>
</ol>
<p><img alt="Soybeans" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/e9bae86907cd8194ecf16b7622cf98bb/edamame_by_zesmerelda_in_chicago.jpg" style="float: right; width: 200px; height: 133px; border-width: 1px; border-style: solid; margin: 10px 15px;" />The data has 88 variables about soybeans, the results of near-infrared (NIR) spectroscopy at different wavelengths. But the data contains only 60 measurements, and the data are arranged to save 6 measurements for validation runs.</p>
A Limit on Coefficients
<p>With ordinary least squares regression, you only estimate as many coefficients as the data have samples. Thus, the traditional method that’s satisfactory in most cases would only let you estimate 53 coefficients for variables plus a constant coefficient.</p>
<p>This could leave you wondering about whether any of the other possible terms might have information that you need.</p>
Multicollinearity
<p>The NIR measurements are also highly collinear with each other. This <a href="http://blog.minitab.com/blog/understanding-statistics/handling-multicollinearity-in-regression-analysis">multicollinearity</a> complicates using statistical significance to choose among the variables to include in the model.</p>
<p>When the data have more variables than samples, especially when the predictor variables are highly collinear, it’s a good time to consider partial least squares regression.</p>
How to Perform Partial Least Squares Regression
<p>Try these steps if you want to follow along in Minitab Statistical Software using the soybean data:</p>
<ol>
<li>Choose <strong>Stat > Regression > Partial Least Squares</strong>.</li>
<li>In <strong>Responses</strong>, enter <em>Fat</em>.</li>
<li>In <strong>Model</strong>, enter <em>‘1’-‘88’</em>.</li>
<li>Click <strong>Options</strong>.</li>
<li>Under <strong>Cross-Validation</strong>, select <strong>Leave-one-out</strong>. Click OK.</li>
<li>Click <strong>Results</strong>.</li>
<li>Check <strong>Coefficients</strong>. Click <strong>OK </strong>twice.</li>
</ol>
<p>One of the great things about partial least squares regression is that it forms components and then does ordinary least squares regression with them. Thus the results include statistics that are familiar. For example, <a href="http://blog.minitab.com/blog/adventures-in-statistics/multiple-regession-analysis-use-adjusted-r-squared-and-predicted-r-squared-to-include-the-correct-number-of-variables">predicted R2</a> is the criterion that Minitab uses to choose the number of components.</p>
<p style="margin-left: 40px;"><br />
<img alt="Minitab selects the model with the highest predicted R-squared." src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/12f2493e350eb84a657035b915a5f45f/model_selection.gif" style="width: 476px; height: 194px;" /></p>
<p>Each of the 9 components in the model that maximizes the predicted R2 value is a complex linear combination of all 88 of the variables. So although the ANOVA table shows that you’re using only 9 degrees of freedom for the regression, the analysis uses information from all of the data.</p>
<p style="margin-left: 40px;"><img alt="The regression uses 9 degrees of freedom." src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/ce90634261a6cd8994f8e72682473d74/anova.gif" style="width: 381px; height: 113px;" /></p>
<p> The full list of standardized coefficients shows the relative importance of each predictor in the model. (I’m only showing a portion here because the table is 88 rows long.)</p>
<p style="margin-left: 40px;"><br />
<img alt="Each variable has a standardized coefficient." src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/b881dc2c5a4b26fa7330a0dbd9e70c8a/coefficients.gif" style="width: 255px; height: 284px;" /></p>
<p>Ordinary least squares regression is a great tool that’s allowed people to make lots of good decision over the years. But there are times when it’s not satisfying. Got too much detail in your data? Partial least squares regression could be the answer.</p>
<p>Want more partial least squares regression now? Check out how <a href="http://www.minitab.com/en-US/Case-Studies/Unifi-Manufacturing-Inc/">Unifi used partial least squares to improve their processes faster</a>.</p>
<span style="color:#a9a9a9;">The image of the soybeans is by Tammy Green </span><span style="color:#a9a9a9;">and is licensed for reuse under this</span> <a href="http://creativecommons.org/licenses/by-sa/2.0/deed.en">Creative Commons License</a>.
Data AnalysisRegression AnalysisStatisticsWed, 28 Sep 2016 12:00:00 +0000http://blog.minitab.com/blog/statistics-and-quality-improvement/fix-problems-in-regression-analysis-with-partial-least-squaresCody SteeleValidating Process Changes with Design of Experiments (DOE)
http://blog.minitab.com/blog/real-world-quality-improvement/validating-process-changes-with-design-of-experiments-doe
<p>We’ve got a plethora of <a href="https://www.minitab.com/en-us/company/case-studies/" target="_blank">case studies</a> showing how businesses from different industries solve problems and implement solutions with data analysis. Take a look for ideas about how you can use data analysis to ensure excellence at your business!</p>
<p>Boston Scientific, one of the world’s leading developers of medical devices, is just one organization who has shared their story. A team at their Heredia, Costa Rica facility was able to assess and validate a packaging process, which resulted in a streamlined process and a cost-saving redesign of the packaging.</p>
<p>Below is a brief look at how they did it, but you can also take a look at the full case study at <a href="https://www.minitab.com/Case-Studies/Boston-Scientific/" target="_blank">https://www.minitab.com/Case-Studies/Boston-Scientific/</a>.</p>
Their Challenge
<p><img alt="guidewires in pouch" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/03b6326dcb90a56ca905abbc2526f38c/guidewires.jpg" style="width: 233px; height: 174px; float: right;" />Boston Scientific Heredia evaluates its operations regularly, to maintain process efficiency and contribute to affordable healthcare by reducing costs. At this facility, one packaging engineer led an effort to streamline packaging for guidewires—which are used during procedures such as catheter placement or endoscopic diagnoses—with the introduction of a new, smaller plastic pouch.</p>
<p>Using smaller and different packaging materials for their guidewires would substantially reduce material costs, but the company needed to prove that the new pouches would work with their sealing process, which creates a barrier that keeps the guidewires sterile.</p>
How Data Analysis Helped
<p>To ensure that the seal strength for the smaller pouches met or exceeded standards, they evaluated the process and identified several important factors, such as the temperature of the sealing system. They then used a statistical method called <a href="http://blog.minitab.com/blog/doe" target="_blank">Design of Experiments (DOE)</a> to determine how each of the variables affected the quality of the pouch seal.</p>
<p>The DOE revealed which factors were most critical. Below is a Minitab <a href="http://blog.minitab.com/blog/understanding-statistics/when-to-use-a-pareto-chart" target="_blank">Pareto Chart</a> that identified the factors that significantly affect seal strength: front temperature, rear temperature, and their respective two-way interaction.</p>
<p><img alt="https://www.minitab.com/uploadedImages/Content/Case_Studies/EffectsParetoforAveragePull.jpg" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/abd4d05d00cf48c8b22ecc37e1264e93/pareto_chart.jpg" style="border-width: 0px; border-style: solid; width: 600px; height: 400px;" /></p>
<p>Armed with this knowledge, the team devised optimal process settings to ensure the new pouches had strong seals. To verify the effectiveness of the improved process, they used a statistical tool called capability analysis, which demonstrates whether or not a process meets specifications and can produce good results:</p>
<p><img alt="https://www.minitab.com/uploadedImages/Content/Case_Studies/ProcessCapabilityofHighSettings-SealStrength.jpg" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/c4d94b5ee153c1d3e38757565a5d24c2/process_capa.jpg" style="border-width: 0px; border-style: solid; width: 600px; height: 400px;" /></p>
Results
<p>The analysis showed that guidewires packaged using the new, optimal process settings met, and even exceeded, the minimum seal strength requirements.</p>
<p>With the new pouches, Boston Scientific has saved more than $330,000. “At the end of the day,” a key team member noted, “the more money we save, the more additional savings we can pass on to the people we serve.”</p>
<p><em>For another example of how Boston Scientific uses data analysis to ensure the safety and reliability of its products, read <a href="https://www.minitab.com/Case-Studies/Boston-Scientific-Heredia/" target="_blank">Pulling Its Weight: Tensile Testing Challenge Speeds Regulatory Approval for Boston Scientific</a>, a story about how the company used Minitab Statistical Software to confirm the equivalency of its catheter’s pull-wire strength to previous testing results, and eliminate the need to perform test method validation by leveraging its existing tension testing standard.</em></p>
Data AnalysisDesign of ExperimentsQuality ImprovementStatisticsStatsMon, 26 Sep 2016 12:00:00 +0000http://blog.minitab.com/blog/real-world-quality-improvement/validating-process-changes-with-design-of-experiments-doeCarly BarryProblems Using Data Mining to Build Regression Models
http://blog.minitab.com/blog/adventures-in-statistics/problems-using-data-mining-to-build-regression-models
<p><img alt="Picture of mining truck filled with numbers" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/644d98694f1e6fec63d4f1db6b61a074/data_mining_crop.jpg" style="width: 250px; height: 171px; float: right; margin: 10px 15px;" />Data mining uses algorithms to explore correlations in data sets. An automated procedure sorts through large numbers of variables and includes them in the model based on statistical significance alone. No thought is given to whether the variables and the signs and magnitudes of their coefficients make theoretical sense.</p>
<p>We tend to think of data mining in the context of big data, with its huge databases and servers stuffed with information. However, it can also occur on the smaller scale of a research study.</p>
<p>The comment below is a real one that illustrates this point.</p>
<blockquote>“Then, I moved to the Regression menu and there I could add all the terms I wanted and more. Just for fun, I added many terms and performed backward elimination. Surprisingly, some terms appeared significant and my R-squared Predicted shot up. To me, your concerns are all taken care of with R-squared Predicted. If the model can still predict without the data point, then that's good.”</blockquote>
<p>Comments like this are common and emphasize the temptation to select regression models by trying as many different combinations of variables as possible and seeing which model produces the best-looking statistics. The overall gist of this type of comment is, "What could possibly be wrong with using data mining to build a regression model if the end results are that all the p-values are significant and the various types of R-squared values are all high?"</p>
<p>In this blog post, I’ll illustrate the problems associated with using data mining to build a regression model in the context of a smaller-scale analysis.</p>
An Example of Using Data Mining to Build a Regression Model
<p>My first order of business is to prove to you that data mining can have severe problems. I really want to bring the problems to life so you'll be leery of using this approach. Fortunately, this is simple to accomplish because I can use data mining to make it appear that a set of randomly generated <a href="http://support.minitab.com/en-us/minitab/17/topic-library/modeling-statistics/regression-and-correlation/regression-models/what-are-response-and-predictor-variables/" target="_blank">predictor variables</a> explains most of the changes in a randomly generated <a href="http://support.minitab.com/en-us/minitab/17/topic-library/modeling-statistics/regression-and-correlation/regression-models/what-are-response-and-predictor-variables/" target="_blank">response variable</a>!</p>
<p>To do this, I’ll create a worksheet in Minitab statistical software that has 100 columns, each of which contains 30 rows of entirely random data. In Minitab, you can use <strong>Calc > Random Data > Normal</strong> to create your own worksheet with random data, or you can use <a href="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/File/c740effad4cc27dc6580093ea6c070fd/randomdata.mtw">this worksheet</a> that I created for the data mining example below. (If you don’t have Minitab and want to try this out, <a href="http://www.minitab.com/en-us/products/minitab/" target="_blank">get the free 30 day trial!</a>)</p>
<p>Next, I’ll perform <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-smackdown-stepwise-versus-best-subsets" target="_blank">stepwise regression</a> using column 1 as the response variable and the other 99 columns as the potential predictor variables. This scenario produces a situation where stepwise regression is forced to dredge through 99 variables to see what sticks, which is a key characteristic of data mining.</p>
<p>When I perform stepwise regression, the procedure adds 28 variables that explain 100% of the variance! Because we only have 30 observations, we’re clearly overfitting the model. Overfitting the model is different problem that also inflates R-squared, which you can read about in my post about <a href="http://blog.minitab.com/blog/adventures-in-statistics/the-danger-of-overfitting-regression-models" target="_blank">the dangers of overfitting models</a>.</p>
<p>I’m specifically addressing the problems of data mining in this post, so I don’t want a model that is also overfit. To avoid an overfit model, a good rule of thumb is to include no more than one term for each 10 observations. We have 30 observations, so I’ll include only the first three variables that the stepwise procedure adds to the model: C7, C77, and C95. The output for the first three steps is below.</p>
<p style="margin-left: 40px;"><img alt="Stepwise regression output" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/e4fb01237dd0c8b34496dde3cc28b517/stepwise_swo.png" style="width: 498px; height: 251px;" /></p>
<p>Under step 3, we can see that all of the <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-interpret-regression-analysis-results-p-values-and-coefficients" target="_blank">coefficient p-values</a> are statistically significant. The <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-how-do-i-interpret-r-squared-and-assess-the-goodness-of-fit" target="_blank">R-squared</a> value of 67.54% can either be good or mediocre depending on your field of study. In a real study, there are likely to be some real effects mixed in that would boost the R-squared even higher. We can also look at <a href="http://blog.minitab.com/blog/adventures-in-statistics/multiple-regession-analysis-use-adjusted-r-squared-and-predicted-r-squared-to-include-the-correct-number-of-variables" target="_blank">the adjusted and predicted R-squared values</a> and neither one suggests a problem.</p>
<p>If we look at the model building process of steps 1 - 3, we see that at each step all of the R-squared values increase. That’s what we like to see. For good measure, let’s graph the relationship between the predictor (C7) and the response (C1). After all, seeing is believing, right?</p>
<p style="margin-left: 40px;"><img alt="Scatterplot of two variables in regression model" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/6e4dfb991b33031738756d4b2d1c77e4/scatterplot.png" style="width: 576px; height: 384px;" /></p>
<p>This graph looks good too! It sure appears that as C7 increases, C1 tends to increase, which agrees with the positive <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-interpret-regression-analysis-results-p-values-and-coefficients" target="_blank">regression coefficient</a> in the output. If we didn’t know better, we’d think that we have a good model!</p>
<p>This example answers the question posed at the beginning: what could possibly be wrong with this approach? Data mining can produce deceptive results. The statistics and graph all look good but these results are based on entirely random data with absolutely no real effects. Our regression model suggests that random data explain other random data even though that's impossible. Everything looks great but we have a lousy model.</p>
The problems associated with using data mining are real, but how the heck do they happen? And, how do you avoid them? <a href="http://blog.minitab.com/blog/adventures-in-statistics/problems-using-data-mining-to-build-regression-models-part-two">Read my next post</a> to learn the answers to these questions!ANOVAData AnalysisRegression AnalysisStatisticsStatistics HelpStatsWed, 21 Sep 2016 12:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/problems-using-data-mining-to-build-regression-modelsJim FrostWhen to Use a Pareto Chart
http://blog.minitab.com/blog/understanding-statistics/when-to-use-a-pareto-chart
<p>I confess: I'm not a natural-born decision-maker. Some people—my wife, for example—can assess even very complex situations, consider the options, and confidently choose a way forward. Me? I get anxious about deciding what to eat for lunch. So you can imagine what it used to <span style="line-height: 1.6;">be like when I needed to confront a really big decision or problem. My approach, to paraphrase the Byrds, was "Re: everything, churn, churn, churn."<img alt="question to answer" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/1b29ab96a420030f3551f71a26773259/question.jpg" style="width: 250px; height: 181px; margin: 10px 15px; float: right;" /></span></p>
<p>Thank heavens for Pareto charts.</p>
What Is a Pareto Chart, and How Do You Use It?
<p>A Pareto chart is a basic quality tool that helps you identify the most frequent defects, complaints, or any other factor you can <strong>count </strong>and <strong>categorize</strong>. The chart takes its name from Vilfredo Pareto, originator of the "80/20 rule," which postulates that, roughly speaking, 20 percent of the people own 80 percent of the wealth. Or, in quality terms, 80 percent of the losses come from 20 percent of the causes.</p>
<p><span style="line-height: 20.8px;">You can use a Pareto chart any time you have data that are broken down into categories, and you can count how often each category occurs. As children, most of us learned how to use this kind of data to make a bar chart:</span></p>
<p style="margin-left: 40px;"><img alt="bar chart" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/90e6067d7f0a1f4f738462290a05f439/bar_chart.png" style="width: 576px; height: 384px;" /></p>
<p>A Pareto chart is just a bar chart that arranges the bars (counts) from largest to smallest, from left to right. The categories or factors symbolized by the bigger bars on the left are more important than those on the right.</p>
<p style="margin-left: 40px;"><img alt="Pareto Chart" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/bf0be8506cc30954165e854f24f0ed7d/pareto.png" style="width: 576px; height: 384px;" /></p>
<p>By ordering the bars from largest to smallest, a Pareto chart helps you visualize which factors comprise the 20 percent that are most critical—the "vital few"—and which are the "trivial many."</p>
<p>A cumulative percentage line helps you judge the added contribution of each category. If a Pareto effect exists, the cumulative line rises steeply for the first few defect types and then levels off. In cases where the bars are approximately the same height, the cumulative percentage line makes it easier to compare categories.</p>
<p>It's common sense to focus on the ‘vital few’ factors. In the quality improvement arena, Pareto charts help teams direct their efforts where they can make the biggest impact. By taking a big problem and breaking it down into smaller pieces, a Pareto chart reveals where our efforts will create the most improvement.</p>
<p>If a Pareto chart seems rather basic, well, it is. But like a simple machine, its very simplicity makes the Pareto chart applicable to a very wide range of situations, both within and beyond quality improvement.</p>
Use a Pareto Chart Early in Your Quality Improvement Process
<p>At the leadership or management level, Pareto charts can be used at the start of a new round of quality improvement to figure out what business problems are responsible for the most complaints or losses, and dedicate improvement resources to those. Collecting and examining data like that can often result in surprises and upend an organization's "conventional wisdom." For example, leaders at one company believed that the majority of customer complaints involved product defects. But when they saw the complaint data in a Pareto chart, it showed that many more people complained about shipping delays. Perhaps the impression that defects caused the most complaints arose because the relatively few people who received defective products tended to complain very loudly—but since more customers were affected by shipping delays, the company's energy was better devoted to solving that problem.</p>
Use a Pareto Chart Later in Your Quality Improvement Process
<p>Once a project has been identified, and a team assembled to improve the problem, a Pareto chart can help the team select the appropriate areas to focus on. This is important because most business problems are big and multifaceted. For instance, shipping delays may occur for a wide variety of reasons, from mechanical breakdowns and accidents to data-entry mistakes and supplier issues. If there are many possible causes a team could focus on, it's smart to collect data about which categories account for the biggest number of incidents. That way, the team can choose a direction based on the numbers and not the team's "gut feeling."</p>
Use a Pareto Chart to Build Consensus
<p>Pareto charts also can be very helpful in resolving conflicts, particularly if a project involves many moving parts or crosses over many different units or work functions. Team members may have sharp disagreements about how to proceed, either because they wish to defend their own departments or because they honestly believe they <em>know </em>where the problem lies. For example, a hospital project improvement team was stymied in reducing operating room delays because the anesthesiologists blamed the surgeons, while the surgeons blamed the anesthesiologists. When the project team collected data and displayed it in a Pareto chart, it turned out that neither group accounted for a large proportion of the delays, and the team was able to stop finger-pointing. Even if the chart had indicated that one group or the other was involved in a significantly greater proportion of incidents, helping the team members see which types of delays were most 'vital' could be used to build consensus.</p>
Use Pareto Charts Outside of Quality Improvement Projects
<p>Their simplicity also makes <span><a href="http://blog.minitab.com/blog/real-world-quality-improvement/pareto-chart-power">Pareto charts</a> a valuable tool for making decisions beyond the world of quality improvement. By helping you visualize the relative importance of various categories, you can use them to prioritize customer needs, opportunities for training or investment—even your choices for lunch.</span></p>
How to Create a Pareto Chart
<p>Creating a Pareto chart is not difficult, even without statistical software. Of course, if you're using <a href="http://www.minitab.com/products/minitab/">Minitab</a>, the software will do all this for you automatically—create a Pareto chart by selecting <strong style="line-height: 1.6;">Stat > Quality Tools > Pareto Chart...</strong> or by selecting <strong style="line-height: 1.6;">Assistant > Graphical Analysis > Pareto Chart</strong>. You can collect raw data, in which each observation is recorded in a separate row of your worksheet, or summary data, in which you tally observation counts for each category.</p>
<p><strong>1. Gather Raw Data about Your Problem</strong></p>
<p>Be sure you collect a random sample that fully represents your process. For example, if you are counting the number of items returned to an electronics store in a given month, and you have multiple locations, you should not gather data from just one store and use it to make decisions about all locations. (If you want to compare the most important defects for different stores, you can show separate charts for each one side-by-side.)</p>
<p><strong>2. Tally Your Data</strong></p>
<p>Add up the observations in each of your categories.</p>
<p><strong>3. Label your horizontal and vertical axes.</strong></p>
<p>Make the widths of all your horizontal bars the same and label the categories in order from largest to smallest. On the vertical axis, use round numbers that slightly exceed your top category count, and include your measurement unit.</p>
<p><strong>4. Draw your category bars.</strong></p>
<p>Using your vertical axis, draw bars for each category that correspond to their respective counts. Keep the width of each bar the same.</p>
<p><strong>5. Add cumulative counts and lines.</strong></p>
<p>As a final step, you can list the cumulative counts along the horizontal axis and make a cumulative line over the top of your bars. Each category's cumulative count is the count for that category PLUS the total count of the preceding categories. If you want to add a line, draw a right axis and label it from 0 to 100%, lined up with the with the grand total on the left axis. Above the right edge of each category, mark a point at the cumulative total, then connect the points.</p>
Data AnalysisLean Six SigmaProject ToolsQuality ImprovementStatisticsWed, 14 Sep 2016 12:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/when-to-use-a-pareto-chartEston MartzControl Chart Tutorials and Examples
http://blog.minitab.com/blog/understanding-statistics/control-chart-tutorials-and-examples
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/3989007af54bf1e996aeee86c8cec497/control_chart_wow.jpg" style="border-width: 1px; border-style: solid; margin: 10px 15px; float: right; width: 288px; height: 173px;" />The other day I was talking with a friend about control charts, and I wanted to share an example one of my colleagues wrote on the Minitab Blog. Looking back through the index for "control charts" reminded me just how much material we've published on this topic.</p>
<p>Whether you're just getting started with control charts, or you're an old hand at statistical process control, you'll find some valuable information and food for thought in our control-chart related posts. </p>
Different Types of Control Charts
<p>One of the first things you learn in statistics is that when it comes to data, there's no one-size-fits-all approach. To get the most useful and reliable information from your analysis, you need to select the type of method that best suits the type of data you have.</p>
<p>The same is true with control charts. While there are a few charts that are used very frequently, a wide range of options is available, and selecting the right chart can make the difference between actionable information and false (or missed) alarms.</p>
<p><a href="http://blog.minitab.com/blog/understanding-statistics/what-control-chart-should-i-use">What Control Chart Should I Use?</a> offers a brief overview of the most common charts and a discussion of how to use the Assistant to help you choose the right one for your situation. And if you're a control chart neophyte and you want more background on why we use them, check out <a href="http://blog.minitab.com/blog/understanding-statistics/control-charts-show-you-variation-that-matters" itemprop="url">Control Charts Show You Variation that Matters.</a></p>
<p itemprop="headline">We extol the virtues of a less commonly used chart in <a href="http://blog.minitab.com/blog/fun-with-statistics/an-ode-to-the-ewma-control-chart" itemprop="url">Beyond the "Regular Guy" Control Charts: An Ode to the EWMA Chart</a>, and explain how to use control charts to track rare events in <a href="http://blog.minitab.com/blog/data-analysis-and-quality-improvement-and-stuff/using-g-whiz-charts-to-track-elusive-affirmations-from-almost-adolescents" itemprop="url">Using G-Whiz Charts to Track Elusive Affirmations from Almost Adolescents</a>.</p>
<p itemprop="headline">In <a href="http://blog.minitab.com/blog/adventures-in-software-development/the-laney-p-chart-and-minitab-software-development" itemprop="url">Using the Laney P' Control Chart in Minitab Software Development</a>, Dawn Keller discusses the distinction between P' charts and their cousins, described by Tammy Serensits in <a href="http://blog.minitab.com/blog/the-statistics-of-science/p-and-u-charts-and-limburger-cheese-a-smelly-combination" itemprop="url">P and U Charts and Limburger Cheese: A Smelly Combination</a>.</p>
<p itemprop="headline">And it's good to remember that things aren't always as complicated as they seem, and sometimes a simple solution can be just as effective as a more complicated approach. See why in <a href="http://blog.minitab.com/blog/understanding-statistics/take-it-easy-create-a-run-chart" itemprop="url">Take It Easy: Create a Run Chart. </a></p>
Control Chart Tutorials
<p itemprop="headline">Many of our Minitab bloggers have talked about the process of choosing, creating, and interpreting control charts under specific conditions. If you have data that can't be collected in subgroups, you may want to learn about <a href="http://blog.minitab.com/blog/understanding-statistics/how-create-and-read-an-i-mr-control-chart" itemprop="url">How to Create and Read an I-MR Control Chart</a>. </p>
<p itemprop="headline">If you do have data collected in subgroups, you'll want to understand why, when it comes to <a href="http://blog.minitab.com/blog/michelle-paret/control-charts-subgroup-size-matters" itemprop="url">Control Charts, Subgroup Size Matters</a>.</p>
<p itemprop="headline">It's often useful to look at control chart data in calendar-based increments, and taking the monthly approach is discussed in the series <a href="http://blog.minitab.com/blog/understanding-statistics/creating-a-chart-to-compare-month-to-month-change" itemprop="url">Creating a Chart to Compare Month-to-Month Change</a> and <a href="http://blog.minitab.com/blog/understanding-statistics/creating-charts-to-compare-month-to-month-change-part-2" itemprop="url">Creating Charts to Compare Month-to-Month Change, part 2</a>.</p>
<p itemprop="headline">If you want to see the difference your process improvements have made, check out <a href="http://blog.minitab.com/blog/real-world-quality-improvement/analyzing-a-process-before-and-after-improvement-historical-control-charts-with-stages" itemprop="url">Analyzing a Process Before and After Improvement: Historical Control Charts with Stages</a> and <a href="http://blog.minitab.com/blog/starting-out-with-statistical-software/setting-the-stage-accounting-for-process-changes-in-a-control-chart" itemprop="url">Setting the Stage: Accounting for Process Changes in a Control Chart</a>. </p>
<p itemprop="headline">While the basic idea of control charting is very simple, interpreting real-world control charts can be a little tricky. If you're using <a href="http://www.minitab.com/products/minitab">Minitab 17</a>, be sure to check out this post about a great new feature in the Assistant: <a href="http://blog.minitab.com/blog/statistics-and-quality-improvement/the-stability-report-for-control-charts-in-minitab-17-includes-example-patterns" itemprop="url">The Stability Report for Control Charts in Minitab 17 includes Example Patterns.</a></p>
<p itemprop="headline">Finally, one of our expert statistical trainers offers his suggestions about <a href="http://blog.minitab.com/blog/applying-statistics-in-quality-projects/five-ways-to-make-your-control-charts-more-effective" itemprop="url">Five Ways to Make Your Control Charts More Effective</a>.</p>
Control Chart Examples
<p itemprop="headline">Control charts are most frequently used for quality improvement and assurance, but they can be applied to almost any situation that involves variation.</p>
<p itemprop="headline">My favorite example of applying the lessons of quality improvement in business to your personal life involves Bill Howell, who applied his Six Sigma expertise to the (successful) management of his diabetes. Find out how he uses <a href="http://blog.minitab.com/blog/real-world-quality-improvement/control-charts-keep-blood-sugar-in-check" itemprop="url">Control Charts to Keep Blood Sugar in Check</a>.</p>
<p itemprop="headline">Some of our bloggers have applied control charts to their personal passions, including holiday candies in <a href="http://blog.minitab.com/blog/real-world-quality-improvement/control-charts-rational-subgrouping-and-marshmallow-peeps" itemprop="url">Control Charts: Rational Subgrouping and Marshmallow Peeps!</a> and bicycling in <a href="http://blog.minitab.com/blog/statistics-for-lean-six-sigma/the-problem-with-p-charts-out-of-control-cycle-laneys" itemprop="url">The Problem With P-Charts: Out-of-control Cycle LaneYs!</a>.</p>
<p itemprop="headline">If you're into sports, see how control charts can reveal <a href="http://blog.minitab.com/blog/the-statistical-mentor/when-should-nhl-goalies-get-pulled" itemprop="url">When NHL Goalies </a><a href="http://blog.minitab.com/blog/the-statistical-mentor/when-should-nhl-goalies-get-pulled" itemprop="url">Should </a><a href="http://blog.minitab.com/blog/the-statistical-mentor/when-should-nhl-goalies-get-pulled" itemprop="url">Get Pulled.</a> Or look to the cosmos to consider <a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/signal-to-noise-detecting-extraterrestrials-and-special-causes" itemprop="url">Signal to Noise: Detecting Extraterrestrials and Special Causes</a>. And finally, compulsive readers like myself might be interested to see how relevant control charts are to literature, too, as Cody Steele illustrates in <a href="http://blog.minitab.com/blog/statistics-and-quality-improvement/laney-p-prime-charts-show-how-poe-creates-intensity-in-the-fall-of-the-house-of-usher" itemprop="url">Laney P' Charts Show How Poe Creates Intensity in "The Fall of the House of Usher."</a></p>
<p itemprop="headline">How are <em>you </em>using control charts?</p>
<p itemprop="headline"> </p>
Quality ImprovementMon, 12 Sep 2016 12:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/control-chart-tutorials-and-examplesEston MartzCreating Value from Your Data
http://blog.minitab.com/blog/applying-statistics-in-quality-projects/creating-value-from-your-data
<p>There may be huge potential benefits waiting in the data in your servers. These data may be used for many different purposes. Better data allows better decisions, of course. Banks, insurance firms, and telecom companies already own a large amount of data about their customers. These resources are useful for building a more personal relationship with each customer.</p>
<p>Some organizations already use data from agricultural fields to build complex and customized models based on a very extensive number of input variables (soil characteristics, weather, plant types, etc.) in order to improve crop yields. Airline companies and large hotel chains use dynamic pricing models to improve their yield management. Data is increasingly being referred as the new “gold mine” of the 21st century.</p>
<p>A couple of factors underlie the rising prominence of data (and, therefore, data analysis):</p>
<p><img alt="Afficher l'image d'origine" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/File/de034e63187d191e1666721fa12a8880/de034e63187d191e1666721fa12a8880.png" style="width: 283px; height: 212px; margin: 10px 15px; float: right;" /></p>
Huge volumes of data
<p><span style="line-height: 1.6;">Data acquisition has never been easier (sensors in manufacturing plants, sensors in connected objects, data from internet usage and web clicks, from credit cards, fidelity cards, Customer Relations Management databases, satellite images etc…) and it can easily be stored at costs that are lower than ever before (huge storage capacity now available on the cloud and elsewhere). The amount of data that is being collected is not only huge, it is growing very fast… in an exponential way.</span></p>
Unprecedented velocity
<p>Connected devices, like our smart phones, provide data in almost real time and it can be processed very quickly. It is now possible to react to any change…almost immediately.</p>
Incredible variety
<p>The data collected is not be restricted to billing information; every source of data is potentially valuable for a business. Not only is numeric data getting collected in a massive way, but also unstructured data such as videos, pictures, etc., in a large variety of situations.</p>
<p>But the explosion of data available to us is prompting every business to wrestle with an extremely complicated problem:</p>
How can we create value from these resources ?
<p>Very simple methods, such as counting words used in queries submitted to company web sites, do provide a good insight as to the general mood of your customers and its evolution. Simple statistical correlations are often used by web vendors to suggest a purchase just after buying a product on the web. Very simple descriptive statistics are also useful.</p>
<p>Just guess what could be achieved from advanced regression models or powerful statistical multivariate techniques, which can be applied easily with <a href="http://www.minitab.com/products/minitab/">statistical software packages like Minitab</a>.</p>
A simple example of the benefits of analyzing an enormous database
<p>Let's consider an example of how one company benefited from analyzing a very large database.</p>
<p><span style="line-height: 20.8px;">Many steps are needed (security and safety checks, cleaning the cabin, etc.) before a plane can depart.</span><span style="line-height: 20.8px;"> Since d</span><span style="line-height: 20.8px;">elays negatively impact customer perceptions and also affect productivity, a</span><span style="line-height: 1.6;">irline companies routinely collect a very large amount of data related to flight delays and times required to perform tasks before departure. Some times are automatically collected, others are manually recorded.</span></p>
<p>A major worldwide airline company intended to use this data to identify the crucial milestones among a very large number of preparation steps, and which ones often triggered delays in departure times. The company used Minitab's <span><a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-smackdown-stepwise-versus-best-subsets">stepwise regression analysis</a></span> to quickly focus on the few variables that played a major role among a large number of potential inputs. Many variables turned out to be statistically significant, but two among them clearly seemed to make a major contribution (X6 and X10).</p>
<p style="margin-left: 40px;">Analysis of Variance1</p>
<p style="margin-left: 40px;">Source DF Seq SS <strong><span style="color: rgb(0, 0, 128);">Contribution </span></strong> Adj SS Adj MS F-Value P-Value</p>
<p style="margin-left: 40px;"><span style="line-height: 1.6;"> X6 1 337394 </span><span style="line-height: 1.6; color: rgb(0, 0, 128);"><strong>53.54%</strong></span><span style="line-height: 1.6;"> 2512 2512.2 29.21 0.000</span></p>
<p style="margin-left: 40px;"><span style="line-height: 1.6;"> X10 1 112911 </span><strong style="line-height: 1.6;"><span style="color: rgb(0, 0, 128);"> 17.92%</span> </strong><span style="line-height: 1.6;"> 66357 66357.1 771.46 0.000</span></p>
<p>When huge databases are used, statistical analyses may become overly sensitive and <a href="http://blog.minitab.com/blog/the-stats-cat/sample-size-statistical-power-and-the-revenge-of-the-zombie-salmon-the-stats-cat">detect even very small differences</a> (due to the large sample and power of the analysis). P values often tend to be quite small (p < 0.05) for a large number of predictors.</p>
<p>However, in Minitab, if you click on Results in the regression dialogue box and select Expanded tables, contributions from each variable will get displayed. X6 and X10 when considered together were contributing to more than 80% of the overall variability (with the largest F values by far), the contributions from the remaining factors were much smaller. The airline then ran a residual analysis to cross-validate the final model. </p>
<p>In addition, a Principal Component Analysis (<a href="http://blog.minitab.com/blog/applying-statistics-in-quality-projects/use-statistics-to-better-understand-your-customers">PCA, a multivariate technique</a>) was performed in Minitab to describe the relations between the most important predictors and the response. Milestones were expected to be strongly correlated to the subsequent steps.</p>
<p style="margin-left: 40px;"><img src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/File/c023d71140ea4ee2b5b22480712a55a4/c023d71140ea4ee2b5b22480712a55a4.png" /></p>
<p>The graph above is a Loading Plot from a principal component analysis. Lines that go in the same direction and are close to one another indicate how the variables may be grouped. Variables are visually grouped together according to their statistical correlations and how closely they are related.</p>
<p>A group of nine variables turned out to be strongly correlated to the most important inputs (X6 and X10) and to the final delay times (Y). Delays at the X6 stage obviously affected the X7 and X8 stages (subsequent operations), and delays from X10 affected the subsequent X11 and X12 operations.</p>
Conclusion
<p>This analysis provided simple rules that this airline's crews can follow in order to avoid delays, making passengers' next flight more pleasant. </p>
<p>The airline can repeat this analysis periodically to search for the next most important causes of delays. Such an approach can propel innovation and help organizations replace traditional and intuitive decision-making methods with data-driven ones.</p>
<p>What's more, the use of data to make things better is not restricted to the corporate world. More and more public administrations and non-governmental organizations are making large, open databases easily accessible to communities and to virtually anyone. </p>
ANOVAData AnalysisHypothesis TestingRegression AnalysisStatisticsStatistics in the NewsTue, 06 Sep 2016 13:19:00 +0000http://blog.minitab.com/blog/applying-statistics-in-quality-projects/creating-value-from-your-dataBruno Scibilia