Data Analysis Software | MinitabBlog posts and articles with tips for using statistical software to analyze data for quality improvement.
http://blog.minitab.com/blog/data-analysis-software/rss
Fri, 20 Jan 2017 15:56:29 +0000FeedCreator 1.7.3DMAIC Tools and Techniques: The Measure Phase
http://blog.minitab.com/blog/michelle-paret/dmaic-tools-and-techniques%3A-the-measure-phase
<p>In my last post on <a href="http://blog.minitab.com/blog/michelle-paret/dmaic-tools-and-techniques:-the-define-phase">DMAIC tools for the Define phase</a>, we reviewed various graphs and stats typically used to <em>define</em> project goals and customer deliverables. Let’s now move along to the tools you can use in <a href="http://www.minitab.com/products/minitab/">Minitab Statistical Software</a> to conduct the Measure phase.</p>
Measure Phase Methodology
<p>The goal of this phase is to <em>measure</em> the process to determine its current performance and quantify the problem. This includes validating the measurement system and establishing a baseline process capability (i.e., sigma level).</p>
I. Tools for Continuous Data
<strong><img alt="Gage RandR" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/6ff6ed7f4c0940a9eb1a548487b72b2b/gagerr.jpg" style="width: 350px; height: 263px; float: right; margin: 10px 15px;" /></strong>
Gage R&R
<p>Before you analyze your data, you should first make sure you can trust it, which is why successful Lean Six Sigma projects begin the Measure phase with Gage R&R. This measurement systems analysis tool assesses if measurements are both <a href="http://blog.minitab.com/blog/michelle-paret/do-you-know-the-truth-about-gage-repeatability-and-reproducibility">repeatable and reproducible</a>. And there are Gage R&R studies available in Minitab for both <a href="http://blog.minitab.com/blog/michelle-paret/a-simple-guide-to-gage-randr-for-destructive-testing">destructive and non-destructive tests</a>.</p>
<p>Minitab location:<strong> </strong><strong><em>Stat > Quality Tools > Gage Study > Gage R&R Study</em></strong> OR <strong><em>Assistant > Measurement Systems Analysis</em>.</strong></p>
Gage Linearity and Bias
<p>When assessing the validity of our data, we need to consider both <a href="http://blog.minitab.com/blog/real-world-quality-improvement/accuracy-vs-precision-whats-the-difference">precision and accuracy</a>. While Gage R&R assesses precision, it’s Gage Linearity and Bias that tells us if our measurements are accurate or are biased.</p>
<p>Minitab location: <em><strong>Stat > Quality Tools > Gage Study > Gage Linearity and Bias Study</strong>.</em></p>
<p style="margin-left: 40px;"><img alt="Gage Linearity and Bias" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/85a16583e2d97dd638b6ff21071a61dd/gage_linearity_and_bias.jpg" style="width: 350px; height: 263px;" /></p>
Distribution Identification
<p>Many statistical tools and p-values assume that your data follow a specific distribution, commonly the normal distribution, so it’s good practice to assess the distribution of your data before analyzing it. And if your data don’t follow a normal distribution, do not fear as there are various <a href="http://www.minitab.com/en-us/lp/Non-Normal-Data-Tips-And-Tricks">techniques for analyzing non-normal data</a>.</p>
<p>Minitab location: <strong><em>Stat > Basic Statistics > Normality Test</em></strong> OR <strong><em>Stat > Quality Tools > Individual Distribution Identification.</em></strong></p>
<p style="margin-left: 40px;"><img alt="Distribution Identification" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/1e6b3763f36f991cf5cf1eb142b0f8d0/distribution_id_plot.jpg" style="width: 350px; height: 233px;" /></p>
Capability Analysis
<p>Capability analysis is arguably the crux of “Six Sigma” because it’s the tool for calculating your sigma level. Is your process at a 1 Sigma, 2 Sigma, etc.? It reveals just how good or bad a process is relative to specification limit(s). And in the Measure phase, it’s important to use this tool to establish a baseline before making any improvements.</p>
<p>Minitab location: <strong><em>Stat > Quality Tools > Capability Analysis/Sixpack</em><em> </em></strong>OR <strong><em>Assistant > Capability Analysis.</em></strong></p>
<p style="margin-left: 40px;"><img alt="Process Capability Analysis" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/7f8e9183ad3a5b3ee66e0fadca51aea4/process_capability_sixpack_report.jpg" style="width: 350px; height: 263px;" /></p>
II. Tools for Categorical (Attribute) Data
Attribute Agreement Analysis
<strong><img alt="Attribute Agreement Analysis" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/5d1759e9ef4da886e677bb7c2a7b2c79/attribute_agreement_analysis.jpg" style="width: 300px; height: 233px; float: right; margin: 10px 15px;" /></strong>
<p>Like Gage R&R and Gage Linearity and Bias studies mentioned above for continuous measurements, this tool helps you <a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/the-lady-tasting-beer-evaluating-a-gono-go-gage-part-ii">assess if you can trust categorical measurements</a>, such as pass/fail ratings. This tool is available for <a href="http://blog.minitab.com/blog/adventures-in-statistics-2/understanding-and-using-discrete-distributions">binary, ordinal, and nominal data types</a>.</p>
<p>Minitab location: <strong><em>Stat > Quality Tools > Attribute Agreement Analysis</em> </strong>OR <strong><em>Assistant > Measurement Systems Analysis.</em></strong></p>
Capability Analysis (Binomial and Poisson)
<p>If you’re counting the number of defective items, where each item is classified as either pass/fail, go/no-go, etc., and you want to compute parts per million (PPM) defective, then you can use binomial capability analysis to assess the current state of the process.</p>
<p>Or if you’re counting the number of defects, where each item can have multiple flaws, then you can use Poisson capability analysis to establish your baseline performance.</p>
<p>Minitab location:<em> <strong>Stat > Quality Tools > Capability Analysis</strong></em> OR <strong><em>Assistant > Capability Analysis.</em></strong></p>
<p style="margin-left: 40px;"><img alt="Binomial Process Capability" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/4aad5a79836d8105d3adba60388b16b1/binomial_process_capability.jpg" style="width: 350px; height: 263px;" /></p>
Variation is Everywhere
<p>As I mentioned in my last post on the Define phase, Six Sigma projects can vary. Every project does not necessarily use the same identical tool set every time, so the tools above merely serve as a guide to the types of analyses you may need to use. And there are other tools to consider, such as flowcharts to map the process, which you can complete using Minitab’s cousin, <a href="http://www.minitab.com/products/quality-companion/">Quality Companion</a>.</p>
Capability AnalysisData AnalysisLean Six SigmaProject ToolsQuality ImprovementSix SigmaStatisticsStatsWed, 18 Jan 2017 13:00:00 +0000http://blog.minitab.com/blog/michelle-paret/dmaic-tools-and-techniques%3A-the-measure-phaseMichelle ParetHow to Use Data to Understand and Resolve Differences in Opinion, Part 1
http://blog.minitab.com/blog/understanding-statistics/how-to-use-data-to-understand-and-resolve-differences-in-opinion-part-1
<p>People frequently have different opinions. Usually that's fine—if everybody thought the same way, life would be pretty boring—but many business decisions are based on opinion. And when different people in an organization reach different conclusions about the same business situation, problems follow. </p>
<img alt="difference of opinion" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ad85e799b88c440d589cfc6b82caef8f/honest_disagreement.png" style="width: 300px; height: 200px; margin: 10px 15px; float: right;" />
<div>
<p>Inconsistency and poor quality result when people being asked to make yes / no, pass / fail, and similar decisions don't share the same opinions, or base their decisions on divergent standards. Consider the following examples. </p>
<p style="margin-left: 40px;"><strong>Manufacturing:</strong> Is this part acceptable? </p>
<p style="margin-left: 40px;"><strong>Billing and Purchasing:</strong> Are we paying or charging an appropriate amount for this project? </p>
<p style="margin-left: 40px;"><strong>Lending:</strong> Does this person qualify for a new credit line? </p>
<p style="margin-left: 40px;"><strong>Supervising:</strong> Is this employee's performance satisfactory or unsatisfactory? </p>
<p style="margin-left: 40px;"><strong>Teaching:</strong> Are essays being graded consistently by teaching assistants?</p>
<p>It's easy to see how differences in judgment can have serious impacts. I wrote about a situation encountered by the recreational equipment manufacturer <a href="http://www.minitab.com/burley">Burley</a>. Pass/fail decisions of inspectors at a manufacturing facility in China began to conflict with those of inspectors at Burley's U.S. headquarters. To make sure no products reached the market unless the company's strict quality standards were met, Burley acted quickly to ensure that inspectors at both facilities were making consistent decisions about quality evaluations. </p>
Sometimes We <em>Can't </em>Just Agree to Disagree
<p>The challenge is that people can have honest differences of opinion about, well, nearly everything—including different aspects of quality. So how do you get people to make business decisions based on a common viewpoint, or standard?</p>
<p>Fortunately, there's a statistical tool that can help businesses and other organizations figure out how, where, and why people evaluate the same thing in different ways. From there, problematic inconsistencies can be minimized. Also, inspectors and others who need to make tough judgment calls can be confident they are basing their decisions on a clearly defined, agreed-upon set of standards. </p>
<p>That statistical tool is called "Attribute Agreement Analysis," and using it is easier than you might think—especially with <a href="http://www.minitab.com/products/minitab">data analysis software such as Minitab</a>. </p>
What Does "Attribute Agreement Analysis" Mean?
<p>Statistical terms can be confusing, but "attribute agreement analysis" is exactly what it sounds like: a tool that helps you gather and <em>analyze </em>data about how much <em>agreement </em>individuals have on a given <em>attribute</em>.</p>
<p>So, what is an attribute? Basically, any characteristic that entails a <span><a href="http://blog.minitab.com/blog/understanding-statistics/got-good-judgment-prove-it-with-attribute-agreement-analysis">judgment call</a></span>, or requires us to classify items as <em>this </em>or <em>that</em>. We can't measure an attribute with an objective scale like a ruler or thermometer. The following statements concern such attributes:</p>
<ul>
<li>This soup is <strong><a href="http://www.minitab.com/products/minitab/quick-start/soup/">spicy</a></strong>.</li>
<li>The bill for that repair is <strong>low</strong>. </li>
<li>That dress is <strong>red</strong>. </li>
<li>The carpet is <strong>rough</strong>. </li>
<li>That part is <strong>acceptable</strong>. </li>
<li>This candidate is <strong>unqualified</strong>. </li>
</ul>
<p>Attribute agreement analysis uses data to understand how different people assess a particular item's attribute, how consistently the same person assesses the same item on multiple occasions, and compares both to the "right" assessment. </p>
<img alt="pass-fail" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/5868c6194234ef965e9320d10c7dab9e/pass_fail.png" style="width: 252px; height: 204px; margin: 10px 15px; float: right;" />
<ul>
</ul>
<p>This method can be applied to any situation where people need to appraise or rate things. In a typical quality improvement scenario, you might take a number of manufactured parts and ask multiple inspectors to assess each part more than once. The parts being inspected should include a roughly equal mix of good and bad items, which have been identified by an expert such as a senior inspector or supervisor. </p>
<p>In my next post, we'll look at an example from the financial industry to see how a loan department used this statistical method to make sure that applications for loans were accepted or rejected appropriately and consistently. </p>
</div>
Data AnalysisLean Six SigmaProject ToolsQuality ImprovementSix SigmaStatisticsMon, 16 Jan 2017 13:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/how-to-use-data-to-understand-and-resolve-differences-in-opinion-part-1Eston MartzStatistical Tools for Process Validation, Stage 1: Process Design
http://blog.minitab.com/blog/michelle-paret/statistical-tools-for-process-validation%2C-stage-1%3A-process-design
<p>Process validation is vital to the success of companies that manufacture drugs and biological products for people and animals. According to the FDA guidelines published by the U.S. Department of Health and Human Services:<img alt="Process Validation Stages" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/26c294a2e9b5b993bfd0f571be11113d/processvalidationstages.jpg" style="width: 280px; height: 299px; float: right; margin: 10px 15px;" /></p>
<p style="margin-left: 40px;"><em>“Process validation is defined as the collection and evaluation of data, from the process design state through commercial production, which establishes scientific evidence that a process is capable of consistently delivering quality product.”<br />
— Food and Drug Administration</em></p>
<p>The FDA recommends three stages for process validation. In this 3-part series, we will briefly explore the stage goals and the types of activities and statistical techniques typically conducted within each. For complete FDA guidelines, see <a href="http://www.fda.gov" target="_blank">www.fda.gov</a>. </p>
Stage 1: Process Design
<p>The goal of this stage is to design a process suitable for routine commercial manufacturing that can consistently deliver a product that meets its quality attributes. It is important to demonstrate an understanding of the process and characterize how it responds to various inputs within Process Design.</p>
Example: Identify Critical Process Parameters with DOE
<p>Suppose you need to identify the critical process parameters for an immediate-release tablet. There are three process input variables that you want to examine: filler%, disintegrant%, and particle size. You want to find which inputs and input settings will maximize the dissolution percentage at 30 minutes.</p>
<p>To conduct this analysis, you can use <a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/design-of-experiment-doe:-searching-for-a-selfie-fountain-of-youth">design of experiments</a> (DOE). DOE provides an efficient data collection strategy, during which inputs are simultaneously adjusted, to identify if relationships exist between inputs and output(s). Once you collect the data and analyze it to identify important inputs, you can then use DOE to pinpoint optimal settings.</p>
<strong>Running the Experiment</strong>
<p>The first step in DOE is to identify the inputs and corresponding input ranges you want to explore. The next step is to use statistical software, such as <a href="http://www.minitab.com">Minitab</a>, to create an experimental design that serves as your data collection plan.</p>
<p>According to the design shown below, we first want to use a particle size of 10, disintegrant of 1%, and MCC at 33.3%, and then record the corresponding average dissolution% using six tablets from a batch:</p>
<p style="margin-left: 40px;"><img alt="DOE Experiment" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/92bb269cff6b75b9a700a7e19bec78d2/doe_experiment.jpg" style="width: 250px; height: 189px;" /></p>
<strong>Analyzing the Data</strong>
<p>Using Minitab’s DOE analysis and p-values, we are ready to identify which X's are critical. Based on the bars that cross the red significance line, we can conclude that particle size and disintegrant% significantly affect the dissolution%, as does the interaction between these two factors. Filler% is not significant.</p>
<p style="margin-left: 40px;"><img alt="Pareto chart" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/2b32669ef0ad0071a038fa7b5ffa25b7/paretochart.jpg" style="width: 350px; height: 233px;" /></p>
<strong>Optimizing Product Quality</strong>
<p>Now that we've identified the critical X's, we're ready to determine the optimal settings for those inputs. Using a contour plot, we can easily identify the process window for the particle size and disintegrant% settings needed to achieve a percent dissolution of 80% or greater.</p>
<img alt="Contour plot" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/89f74b68916fd451deca51e832a72591/doe_contourplot.jpg" style="width: 350px; height: 233px;" />
<p>And that's how you can use design of experiments to conduct the Process Design stage. Next in this series, we'll look at the statistical tools and techniques commonly used for Process Qualification!</p>
Data AnalysisDesign of ExperimentsStatisticsFri, 13 Jan 2017 13:00:00 +0000http://blog.minitab.com/blog/michelle-paret/statistical-tools-for-process-validation%2C-stage-1%3A-process-designMichelle ParetStrangest Capability Study: Super-Zooper-Flooper-Do Broom Boom
http://blog.minitab.com/blog/statistics-in-the-field/strangest-capability-study%3A-super-zooper-flooper-do-broom-boom
<p><em>by Matthew Barsalou, guest blogger</em></p>
<p>The great Dr. Seuss tells of Mr. Plunger, who is the custodian at Diffendoofer School on the corner of Dinkzoober and Dinzott in the town of Dinkerville. The good Mr. Plunger “<a href="http://www.seussville.com/books/book_detail.php?isbn=9780679890089" target="_blank">keeps the whole school clean</a>” using a supper-zooper-flooper-do.</p>
<p>Unfortunately, Dr. Seuss fails to tell us where the supper-zooper-flooper-do came from and if the production process was capable.</p>
<p><img alt="supper-zooper" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/05c32de7c03ea0d764f792f96ec3e8aa/supper_zooper_400w.jpg" style="width: 400px; height: 300px; margin: 10px 15px; float: right;" />Let’s assume the broom boom length was the most critical dimension on the supper-zooper-flooper-do. The broom boom length drawing calls for a length of 55.0 mm with a tolerance of +/- 0.5 mm. The quality engineer has checked three supper-zooper-flooper-do broom booms and all were in specification, so he concludes that there is no reason to worry about the process producing out of specification parts. But we know this not true. Perhaps the fourth supper-zooper-flooper-do broom boom <em>will </em>be out of specification. Or maybe the 1,000th.</p>
<p>It’s time for a capability study, but don’t fire up your <a href="http://www.minitab.com/products/minitab">Minitab Statistical Software</a> just yet. First we need to plan the capability study. Each day the supper-zooper-flooper-do factory produces supper-zooper-flooper-do broom booms with a change in broom boom material batch every 50th part. A capability study should have a minimum of 100 values and 25 subgroups. The subgroups should be rational: that means the variability within each subgroup should be less than the variability between subgroups. We can anticipate more variation between material batches than within a material batch so we will use the batches as subgroups, with a sample size of four.</p>
<p>Once the <a href="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/666380b753452544acf5ebfa58ac27e6/supper_zooper_worksheet.mtw">data</a> has been collected, we can crank up our Minitab and perform a capability study by going to <strong>Stat > Quality Tools > Capability Analysis > Normal</strong>. Enter the column containing the measurement values. Then either enter the column containing the subgroup or type the size of the subgroup. Enter the lower specification limit and the upper specification limit, and click OK.</p>
<p style="margin-left: 40px;"><img alt="Process Capability Report for Broom Boom Length" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/81e504d93a72524caaf009f90fdc031c/process_capability_report_boom_length.png" style="border-width: 0px; border-style: solid; width: 605px; height: 454px;" /></p>
<p>We now have the results for the supper-zooper-flooper-do broom boom lengths, but can we trust our results? A capability study has requirements that must be met. We should have a minimum of 100 values and 25 subgroups, which we have. But the data should also be normally distributed and in a state of statistical control; otherwise, we either need to transform the data, or identify the distribution of the data and perform capability study for nonnormal data.</p>
<p>Dr. Seuss has never discussed transforming data so perhaps we should be hesitant if the data do not fit a distribution. Before performing a transformation, we should determine if there is a reason the data do not fit any distribution.</p>
<p>We can use the Minitab Capability Sixpack to determine if the data is normally distributed and in a state of statistical control. Go to <strong>Stat > Quality Tools > Capability Sixpack > Normal</strong>. Enter the column containing the measurement values. Then either enter the column containing the subgroup or type the size of the subgroup. Enter the lower specification limit and the upper specification limit and click OK.</p>
<p style="margin-left: 40px;"><img alt="Process Capability Sixpack Report for Broom Boom Length" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/8743bc7fe1898b69990e2de7594a7934/process_capability_sixpack_boom_length.png" style="border-width: 0px; border-style: solid; width: 605px; height: 454px;" /></p>
<p>There are no out-of-control points in the control chart and the normal probability plot follows a straight line, and has a P value is greater than 0.05, so we fail to reject the null hypothesis that the data follow a normal distribution. The data is suitable for a capability study.</p>
<p>The within subgroup variation is also known as short term capability and is indicated by <span><a href="http://blog.minitab.com/blog/statistics-and-quality-improvement/process-capability-statistics-cp-and-cpk-working-together">Cp and Cpk</a></span>. The between subgroup variability is also known as long term capability is given as Pp and Ppk. The Cp and Cpk fail to account for the variability that will occur between batches; Pp and Ppk tell us what we can expect from the process over time.</p>
<p>Both Cp and Pp tell us how well the process conforms to the specification limits. In this case, a Cp of 1.63 tells us the spread of the data is much narrower than the width of the specification limits, and that is a good thing. But Cp and Pp alone are not sufficient. The Cpk and Ppk indicate how spread out the data is relative to the center of the specification limits. There is an upper and lower Cpk and Ppk; however, we are generally only concerned with the lower of the two values.</p>
<p>In the supper-zooper-flooper-do broom boom length example, a Cpk of 1.10 is an indication that the process is off center. The Cp is 1.63, so we can reduce the number of potentially out-of-specification supper-zooper-flooper-do broom booms if we shift the process mean down to center the process while maintaining the current variation. This is a fortunate situation as it is often easier to shift the process mean than to reduce the process variation.</p>
<p>Once improvements are implemented and verified, we can be sure that the next supper-zooper-flooper-do the Diffendoofer School purchases for Mr. Plunger will have a broom boom that is in specification if only common cause variation is present.</p>
<p> </p>
<div>
<p><strong>About the Guest Blogger</strong></p>
<p><em><a href="https://www.linkedin.com/pub/matthew-barsalou/5b/539/198" target="_blank">Matthew Barsalou</a> is a statistical problem resolution Master Black Belt at <a href="http://www.3k-warner.de/" target="_blank">BorgWarner</a> Turbo Systems Engineering GmbH. He is a Smarter Solutions certified Lean Six Sigma Master Black Belt, ASQ-certified Six Sigma Black Belt, quality engineer, and quality technician, and a TÜV-certified quality manager, quality management representative, and auditor. He has a bachelor of science in industrial sciences, a master of liberal studies with emphasis in international business, and has a master of science in business administration and engineering from the Wilhelm Büchner Hochschule in Darmstadt, Germany. He is author of the books <a href="http://www.amazon.com/Root-Cause-Analysis-Step---Step/dp/148225879X/ref=sr_1_1?ie=UTF8&qid=1416937278&sr=8-1&keywords=Root+Cause+Analysis%3A+A+Step-By-Step+Guide+to+Using+the+Right+Tool+at+the+Right+Time" target="_blank">Root Cause Analysis: A Step-By-Step Guide to Using the Right Tool at the Right Time</a>, <a href="http://asq.org/quality-press/display-item/index.html?item=H1472" target="_blank">Statistics for Six Sigma Black Belts</a> and <a href="http://asq.org/quality-press/display-item/index.html?item=H1473&xvl=76115763" target="_blank">The ASQ Pocket Guide to Statistics for Six Sigma Black Belts</a>.</em></p>
</div>
<div style="clear:both;"> </div>
Capability AnalysisFun StatisticsMon, 09 Jan 2017 13:04:00 +0000http://blog.minitab.com/blog/statistics-in-the-field/strangest-capability-study%3A-super-zooper-flooper-do-broom-boomGuest BloggerHow to Make Your Statistical Software Fit You Perfectly
http://blog.minitab.com/blog/understanding-statistics/how-to-make-your-statistical-software-fit-you-perfectly
<p>Did you ever get a pair of jeans or a shirt that you liked, but didn't quite fit you perfectly? That happened to me a few months ago. The jeans looked good, and they were very well made, but it took a while before I was comfortable wearing them.</p>
<p>I much prefer it when I can get a pair with a perfect fit, that feel like I was born in them, with no period of "adjustment." <img alt="jeans" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f66ff501555082011e6457aac70ea720/jeans.jpg" style="width: 250px; height: 184px; margin: 10px 15px; float: right;" /></p>
<p>So which pair do you think I wear more often...the older pair that fits me like a glove, or the newer ones that aren't quite as comfortable? You already know the answer, because I'll bet <em>you </em>have a favorite pair of jeans, too. </p>
<p>So what does all this have to do with statistical software? Just this: if you can get statistical software that's perfectly matched to how you're going to use it, you're going to feel more comfortable, confident, and at ease from the second you open it. </p>
<p>We do strive to make Minitab Statistical Software very easy to use from the first time you launch it. Our roots lie in providing tools that make data analysis easier, and that's still our mission today. But we know a little bit of tailoring can make a garment that feels very good into one that feels <em>great</em>. </p>
<p>So if you want to tailor your Minitab software to fit you <em>perfectly</em>, we also make that easy—even if you have multiple people using Minitab on the same computer. </p>
A Set of Statistical Tools Made Just for You (or Me)
<p>If you're like most people, you want software that gives you the options you want, when you want them. You want a menu has everything organized just the way you like it. And while we're at it, how about a toolbar that gives you immediate access to the tools you know you'll be using most frequently? </p>
<p>We don't think that's too much to ask. </p>
<p>In my job, I frequently need to perform a series of analyses on data about marketing and online traffic. It's easy enough to access those tools through Minitab's default menus, but one day I realized I didn't even need to do that—I could just make myself a menu in Minitab that includes the tools I use most frequently. </p>
<img alt="customize statistical software menu" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/b475d15563643b83a5f32c071a0870cf/customize.jpg" style="width: 479px; height: 208px; margin: 10px 15px; float: right;" />
<p>Taking this thought from idea to execution was a breeze. I simply right-clicked on the menu bar and selected the "Customize" option. </p>
<div>That brought up the dialog box shown below. All I had to do was select the "New Menu" command and drag it from the "Commands" window to the to the menu bar, and Voila! A new menu. </div>
<div> </div>
<div style="margin-left: 40px;"><img alt="customize dialog box" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/b5c665d4aebcb2c0ea1500d72a9dfd7f/customize_dialog.jpg" style="width: 447px; height: 375px;" /></div>
<div>
<p>From there, a right-click and the "Rename Button" command let me to rename my new menu "Eston's Tools." I was then able to simply drag and drop the tools I use most frequently from the customization dialog box into my new menu: </p>
<p style="margin-left: 40px; "><img alt="customized statistics menu" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/9cf58f81de2ecc232bf2296e0f87bc9a/estonsmenu.jpg" style="width: 264px; height: 246px;" /></p>
<p>Pretty nifty. I could even customize the icons, were I inclined to do so. </p>
</div>
<p>There are many more ways you can <a href="http://support.minitab.com/en-us/minitab/17/topic-library/minitab-environment/interface/customize-the-minitab-interface/customize-menus-toolbars-and-shortcut-keys/">customize Minitab to suit your needs</a>, including the creation of <span><a href="http://blog.minitab.com/blog/starting-out-with-statistical-software/have-it-your-way-how-to-create-a-custom-toolbar-in-minitab">customized toolbars</a></span> and individual profiles, which are great if you share your computer with someone who would like to have Minitab customized to <em>their </em>preferences, too. </p>
<p>Let us know what you've done to customize Minitab so it fits <em>you </em>perfectly!</p>
Data AnalysisProject ToolsStatisticsTue, 03 Jan 2017 13:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/how-to-make-your-statistical-software-fit-you-perfectlyEston MartzThe Difference Between Right-, Left- and Interval-Censored Data
http://blog.minitab.com/blog/michelle-paret/the-difference-between-right-left-and-interval-censored-data
<p><a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/reliability-and-survival-the-high-stakes-of-product-performance">Reliability analysis</a> is the perfect tool for calculating the proportion of items that you can expect to survive for a specified period of time under identical operating conditions. Light bulbs—or lamps—are a classic example. Want to calculate the number of light bulbs expected to fail within 1000 hours? Reliability analysis can help you answer this type of question.</p>
<p>But to conduct the analysis properly, we need to understand the difference between the three types of censoring.</p>
What is censored data?
<p>When you perform reliability analysis, you may not have exact failure times for all items. In fact, lifetime data are often "censored." Using the light bulb example, perhaps not all the light bulbs have failed by the time your study ends. The time data for those bulbs that have not yet failed are referred to as censored.</p>
<img alt="baby" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/913ae1dbf78dd9728367bf0dead44f45/baby.jpg" style="width: 250px; height: 244px; margin: 10px 15px; float: right;" />
<p>It is important to include the censored observations in your analysis because the fact that these items have not yet failed has a big impact on your reliability estimates.</p>
Right-censored data
<p>Let’s move from light bulbs to newborns, inspired by my colleague who’s at the “you’re <em>still </em>here?” stage of pregnancy.</p>
<p>Suppose you’re conducting a study on pregnancy duration. You’re ready to complete the study and run your analysis, but some women in the study are still pregnant, so you don’t know exactly how long their pregnancies will last. These observations would be <em>right-censored</em>. The “failure,” or birth in this case, will occur after the recorded time.</p>
<p style="margin-left: 40px;"><img alt="Right censored" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/c75961d3d78018da3800683ab233c989/right_censored.png" style="width: 291px; height: 241px;" /></p>
Left-censored data
<p>Now suppose you survey some women in your study at the 250-day mark, but they already had their babies. You know they had their babies before 250 days, but don’t know <em>exactly </em>when. These are therefore <em>left-censored</em> observations, where the “failure” occurred before a particular time.</p>
<p style="margin-left: 40px;"><img alt="Left censored" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/7279d0487d0b3d08120e224456bafc2f/left_censored.png" style="width: 237px; height: 242px;" /></p>
Interval-censored data
<p>If we don’t know exactly when some babies were born but we know it was within some interval of time, these observations would be <em>interval-censored</em>. We know the “failure” occurred within some given time period. For example, we might survey expectant mothers every 7 days and then count the number who had a baby within that given week.</p>
<p style="margin-left: 40px;"><img alt="Interval censored" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/deb69487d6f3256172beefe22b4ecbf6/intervalcensored.png" style="width: 253px; height: 241px;" /></p>
<p>Once you set up your data, running the analysis is easy with <a href="http://www.minitab.com/products/minitab/">Minitab Statistical Software</a>. For more information on how to run the analysis and interpret your results, see <a href="http://blog.minitab.com/blog/fun-with-statistics/what-i-learned-from-treating-childbirth-as-a-failure">this blog post</a>, which—coincidentally—is baby-related, too.</p>
Lean Six SigmaQuality ImprovementReliability AnalysisSix SigmaWed, 07 Dec 2016 14:03:00 +0000http://blog.minitab.com/blog/michelle-paret/the-difference-between-right-left-and-interval-censored-dataMichelle ParetCommon Assumptions about Data Part 3: Stability and Measurement Systems
http://blog.minitab.com/blog/quality-business/common-assumptions-about-data-part-3-stability-and-measurement-systems
<p><img alt="Cart before the horse" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/8230e7c2bc193a831158677a70eb0146/chile_road_sign_po_4.svg" style="width: 101px; height: 101px; float: right; margin: 10px 15px;" />In Parts <span><a href="http://blog.minitab.com/blog/quality-business/common-assumptions-about-data-part-1-random-samples-and-statistical-independence">1</a></span> and <span><a href="http://blog.minitab.com/blog/quality-business/common-assumptions-about-data-part-2-normality-and-equal-variance">2</a></span> of this blog series, I wrote about how statistical inference uses data from a sample of individuals to reach conclusions about the whole population. That’s a very powerful tool, but you must check your assumptions when you make statistical inferences. Violating any of these assumptions can result in false positives or false negatives, thus invalidating your results. </p>
<p>The common data assumptions are: random samples, independence, normality, equal variance, stability, and that your measurement system is accurate and precise. I addressed random samples and statistical independence last time. Now let’s consider the assumptions of stability and measurement systems.</p>
What Is the Assumption of Stability?
<p>A stable process is one in which the inputs and conditions are consistent over time. When a process is stable, it is said to be “in control.” This means the sources of variation are consistent over time, and the process does not exhibit unpredictable variation. In contrast, if a process is unstable and changing over time, the sources of variation are inconsistent and unpredictable. As a result of the instability, you cannot be confident in your statistical test results.</p>
<p>Use one of the various types of <span><a href="http://blog.minitab.com/blog/understanding-statistics/what-control-chart-should-i-use">control charts</a></span> available in Minitab <a href="http://www.minitab.com/products/minitab/">Statistical Software</a> to assess the stability of your data set. The Assistant menu can walk you through the choices to select the appropriate control chart based on your data and subgroup size. You can get advice about collecting and using data by clicking the “more” link.</p>
<p><img alt="Choose a Control Chart" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/6ec77f5dbc070eb0c2070ce6bcf8144c/1_control_chart.png" style="border-width: 0px; border-style: solid; width: 474px; height: 338px; margin: 10px 15px;" /></p>
<p><img alt="I-MR Control Chart" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/3d69fc444cd5dd09a962a11e645a3a2e/2_control_chart.png" style="border-width: 0px; border-style: solid; width: 474px; height: 338px; margin: 10px 15px;" /></p>
<p>In addition to preparing the control chart, Minitab tests for out-of-control or non-random patterns based on the <a href="http://blog.minitab.com/blog/statistics-in-the-field/using-the-nelson-rules-for-control-charts-in-minitab">Nelson Rules</a> and provides an assessment in easy-to-read Summary and Stability reports. The Report Card, depending on the control chart selected, will automatically check your assumptions of stability, normality, amount of data, correlation, and will suggest alternative charts to further analyze your data.</p>
<p><img alt="Report Card" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/195741e519156b95ee5feee8b521041f/3_control_chart.jpg" style="border-width: 0px; border-style: solid; width: 464px; height: 348px; margin: 10px 15px;" /></p>
What Is the Assumption for Measurement Systems?
<p>All the other assumptions I’ve described “assume” the data reflects reality. But does it?</p>
<p>The <span><a href="http://blog.minitab.com/blog/understanding-statistics/explaining-quality-statistics-so-my-boss-will-understand-measurement-systems-analysis-msa">measurement system</a> </span>is one potential source of variability when measuring a product or process. When a measurement system is poor, you lose the ability to truthfully “see” process performance. A poor measurement system leads to incorrect conclusions and flawed implementation. </p>
<p>Minitab can perform a Gage R&R test for both measurement and appraisal data, depending on your measurement system. You can use the Assistant in Minitab to help you select the most appropriate test based on the type of measurement system you have.</p>
<p><img alt="Choose a MSA" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/3ff089fcee9ab280c8e8d1da1c56d610/4_msa.png" style="border-width: 0px; border-style: solid; width: 474px; height: 345px; margin: 10px 15px;" /></p>
<p>There are two assumptions that should be satisfied when performing a Gage R&R for measurement data: </p>
<ol>
<li>The measurement device should be calibrated.</li>
<li>The parts to be measured should be selected from a stable process and cover approximately 80% of the possible operating range. </li>
</ol>
<p>When using a measurement device make sure it is properly calibrated and check for linearity, bias, and stability over time. The device should produce accurate measurements, compared to a standard value, through the entire range of measurements and throughout the life of the device. Many companies have a metrology or calibration department responsible for calibrating and maintaining gauges. </p>
<p>Both these assumptions must be satisfied. If they are not, you cannot be sure that your data accurately reflect reality. And that means you’ll risk not understanding the sources of variation that influence your process outcomes. </p>
The Real Reason You Need to Check the Assumptions
<p>Collecting and analyzing data requires a lot of time and effort on your part. After all the work you put into your analysis, you want to be able to reach correct conclusions. Some analyses are robust to departures from these assumptions, but take the safe route and check! You want to be confident you can tell whether observed differences between data samples are simply due to chance, or if the populations are indeed different! </p>
<p>It’s easy to put the cart before the horse and just plunge in to the data collection and analysis, but it’s much wiser to take the time to understand which data assumptions apply to the statistical tests you will be using, and plan accordingly.</p>
<p>Thank you for reading my blog. I hope this information helps you with your data analysis mission!</p>
Data AnalysisHypothesis TestingQuality ImprovementStatisticsMon, 05 Dec 2016 13:00:00 +0000http://blog.minitab.com/blog/quality-business/common-assumptions-about-data-part-3-stability-and-measurement-systemsBonnie K. StoneThe Joy of Playing in Endless Backyards with Statistics
http://blog.minitab.com/blog/adventures-in-statistics-2/the-joy-of-playing-in-endless-backyards-with-statistics
<p>Dear Readers,</p>
<p><img alt="Jim Frost" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/1ae3640a9bb3396a48ee4478020340d5/avatar.png" style="width: 131px; height: 186px; float: right; margin: 10px 15px;" />As 2016 comes to a close, it’s time to reflect on the passage of time and changes. As I’m sure you’ve guessed, I love statistics and analyzing data! I also love talking and writing about it. In fact, I’ve been writing statistical blog posts for over five years, and it’s been an absolute blast. John Tukey, the renowned statistician, once said, “The best thing about being a statistician is that you get to play in everyone’s backyard.” I enthusiastically agree!</p>
<p>However, when I first started writing the blog, I wondered about being able to keep up a constant supply of fresh blog posts. And, when I first mentioned to some non-statistician friends that I’d be writing a statistical blog, I noticed a certain lack of enthusiasm. For instance, I heard a variety of comments like, “So, you’ll be writing things along the lines of 9 out of 10 dentists recommend . . .” Would readers even be interested in what I had to say about statistics?</p>
<p>It turns out that with a curious mind, statistical knowledge, data, and a powerful tool like <a href="http://www.minitab.com/en-us/products/minitab/" target="_blank">Minitab statistical software</a>, the possibilities are endless. You <em>can</em> play in a wide variety of fascinating backyards! </p>
<p>The most surprising statistic is that <a href="http://blog.minitab.com/blog/adventures-in-statistics" target="_blank">my blog posts</a> have received over 5.5 million views in the past year alone. Never in my wildest dreams did I imagine so many readers when I wrote <a href="http://blog.minitab.com/blog/adventures-in-statistics/three-measurement-system-analysis-questions-to-ask-before-you-take-a-single-measurement" target="_blank">my first post</a>! It’s a real testament to the growing importance of data analysis that so many people are interested in a blog dedicated to statistics. Thank you all for reading!</p>
Endless Backyards . . .
<p><img alt="Dolphin" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/f9c1d0c9fbd374b7272f5ee2ee2716c0/dolphin.jpg" style="width: 225px; height: 150px; float: right; margin: 10px 15px;" />Some of the topics I've written about are out of this world. I’ve assessed <a href="http://blog.minitab.com/blog/adventures-in-statistics/using-statistics-to-analyze-words" target="_blank">dolphin communications</a> and compared it to the search for extraterrestrial intelligence and analyzed <a href="http://blog.minitab.com/blog/adventures-in-statistics/exoplanet-statistics-and-the-search-for-earth-twins" target="_blank">exoplanet data</a> in the search for the Earth’s twin! (As an aside, my analysis showed that my writing style is similar to dolphin communications. I'll take that as a compliment!)</p>
<p>For more Earthly subjects, I’ve studied the relationship between <a href="http://blog.minitab.com/blog/adventures-in-statistics/size-matters-metabolic-rate-and-longevity" target="_blank">mammal size and their metabolic rate and longevity</a>. I’ve analyzed raw research data to assess the <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-effective-are-flu-shots" target="_blank">effectiveness of flu shots</a> first hand. I’ve downloaded economic data to assess patterns in both the <a href="http://blog.minitab.com/blog/adventures-in-statistics/reassessing-gdp-growth-part-1" target="_blank">U.S. GDP</a> and <a href="http://blog.minitab.com/blog/adventures-in-statistics/us-job-growth-assessing-the-numbers-and-making-predictions" target="_blank">U.S. job growth</a>. For a Thanksgiving Day post, I analyzed world income data to answer the question of <a href="http://blog.minitab.com/blog/adventures-in-statistics/statistically-how-thankful-should-we-be-a-look-at-global-income-distributions-part-1" target="_blank">how thankful we should be statistically</a>. As for <a href="http://blog.minitab.com/blog/adventures-in-statistics/when-is-easter-for-the-next-2086-years" target="_blank">Easter</a>, I can tell you the date on which it falls in any of 2,517 years, along with which dates are the most and least common.</p>
<p><img alt="Mythbusters" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/7b3b8859da99d60dd3e9c7932faefba3/mythbusters.jpg" style="width: 225px; height: 149px; float: right; margin: 10px 15px;" />In the world of politics, I’ve used data to <a href="http://blog.minitab.com/blog/adventures-in-statistics/predicting-the-us-presidential-election-evaluating-two-models-part-one" target="_blank">predict the 2012 U.S. Presidential election</a>, <a href="http://blog.minitab.com/blog/adventures-in-statistics/statistical-analyses-of-the-house-freedom-caucus-and-the-search-for-a-new-speaker" target="_blank">analyzed the House Freedom Caucus and the search for the new Speaker of the House</a>, assessed the <a href="http://blog.minitab.com/blog/adventures-in-statistics/great-presidents-revisited-does-history-provide-a-different-perspective" target="_blank">factors that make a great President</a>, and even <a href="http://blog.minitab.com/blog/adventures-in-statistics/using-the-solution-desirability-matrix-to-help-mitt-romney-choose-the-vp-candidate" target="_blank">helped Mitt Romney pick a running mate</a>. Everyone talks about the weather, so of course I had to <a href="http://blog.minitab.com/blog/adventures-in-statistics/are-atlantas-winters-getting-colder-and-snowier" target="_blank">analyze that</a>. My family loves the Mythbusters and it was fun applying statistical analyses to some of the myths that they tested (<a href="http://blog.minitab.com/blog/adventures-in-statistics/busting-the-mythbusters-are-yawns-contagious" target="_blank">here</a> and <a href="http://blog.minitab.com/blog/adventures-in-statistics/using-hypothesis-tests-to-bust-myths-about-the-battle-of-the-sexes" target="_blank">here</a>). That's my family and I meeting them in the picture to the right!</p>
<p>Some of my posts have even been a bit surreal. I took my turn at attempting to explain the statistical illusion of the <a href="http://blog.minitab.com/blog/adventures-in-statistics/the-monty-hall-problem-and-the-importance-of-checking-your-assumptions" target="_blank">infamous Monty Hall problem</a>. I’ve compared <a href="http://blog.minitab.com/blog/adventures-in-statistics/world-travel-bumpy-roads-and-adjusting-your-graph-scales" target="_blank">world travel to adjusting scales in graphs</a> (seriously). I wrote a true story about how <a href="http://blog.minitab.com/blog/adventures-in-statistics/lessons-in-quality-during-a-long-and-strange-journey-home" target="_blank">I drove a plane load of passengers 200 miles to their homes</a> in the context of <img alt="ghost hunting" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/51587c9ccc575874d23335f607e520a0/nightshot.jpg" style="width: 225px; height: 127px; float: right; margin: 10px 15px;" />quality improvement! For Halloween-themed posts, I showed how to go <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-be-a-ghost-hunter-with-a-statistical-mindset" target="_blank">ghost hunting with a statistical mindset</a> and how <a href="http://blog.minitab.com/blog/adventures-in-statistics/beware-of-phantom-degrees-of-freedom-that-haunt-your-regression-models" target="_blank">regression models can be haunted by phantom degrees of freedom</a>. I analyzed the <a href="http://blog.minitab.com/blog/adventures-in-statistics/using-data-analysis-to-assess-fatality-rates-in-star-trek-the-original-series" target="_blank">fatality rates in the original Star Trek TV series</a>. I explored how some people can <a href="http://blog.minitab.com/blog/adventures-in-statistics/the-odds-of-finding-a-four-leaf-clover-revisited-how-do-some-people-find-so-many" target="_blank">find so many four leaf clovers despite their rarity</a>. And, I wondered whether <a href="http://blog.minitab.com/blog/adventures-in-statistics/can-a-statistician-say-that-age-is-just-a-number" target="_blank">a statistician can say that age is just a number</a>?</p>
<p>See, not a mention of those dentists...well, not until now. By this point, 9 out of 10 dentists are probably feeling neglected!</p>
Helping Others Perform Their Own Analyses
<p>I’ve also written many posts aimed at helping those who are learning and performing statistical analyses. I described <a href="http://blog.minitab.com/blog/adventures-in-statistics/working-at-the-edge-of-human-knowledge-part-one" target="_blank">why statistics is cool</a> based on my own personal experiences and how the whole <a href="http://blog.minitab.com/blog/adventures-in-statistics/why-statistics-is-important" target="_blank">field of statistics is growing in importance</a>. I showed how <a href="http://blog.minitab.com/blog/adventures-in-statistics/why-anecdotal-evidence-is-unreliable" target="_blank">anecdotal evidence is unreliable</a> and explained why it fails so badly. And, I took a look forward at how <a href="http://blog.minitab.com/blog/adventures-in-statistics/expanding-the-role-of-statistics-to-areas-traditionally-dominated-by-expert-judgment" target="_blank">statistical analyses are expanding into areas traditionally ruled by expert judgement</a>.</p>
<p>I zoomed in to cover the details about how to perform and interpret statistical analyses. Some might think that covering the nitty gritty of statistical best practices is boring. Yet, you’d be surprised by the lively discussions we’ve had. We’ve had heated debates and philosophical discussions about <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-correctly-interpret-p-values" target="_blank">how to correctly interpret p-values</a> and what <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests:-significance-levels-alpha-and-p-values-in-statistics" target="_blank">statistical significance</a> does and does not tell you. This reached a fever pitch when a psychology journal actually <a href="http://blog.minitab.com/blog/adventures-in-statistics/banned-p-values-and-confidence-intervals-a-rebuttal-part-1" target="_blank">banned p-values</a>!</p>
<p><img alt="Regression residuals" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/58964ccf1cb00ead2ee1735ca54886d9/residual_illustration.gif" style="width: 221px; height: 149px; float: right; border-width: 0px; border-style: solid; margin: 10px 15px;" />We had our difficult questions and surprising topics to grapple with. <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-high-should-r-squared-be-in-regression-analysis" target="_blank">How high should R-squared be</a>? <a href="http://blog.minitab.com/blog/adventures-in-statistics/choosing-between-a-nonparametric-test-and-a-parametric-test" target="_blank">Should I use a parametric or nonparametric analysis</a>? <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-interpret-a-regression-model-with-low-r-squared-and-low-p-values" target="_blank">How is it possible that a regression model can have significant variables but still have a low R-squared</a>? I even had the nerve to suggest that <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-how-to-interpret-s-the-standard-error-of-the-regression" target="_blank">R-squared is overrated</a>! And, I made the unusual case that control charts are also <a href="http://blog.minitab.com/blog/adventures-in-statistics/control-charts-not-just-for-statistical-process-control-spc-anymore" target="_blank">very important outside the realm of quality improvement</a>. Then, there is the whole frequentist versus Bayesian debate, but let’s not go there!</p>
<p>However, it’s true that not all topics about how to perform statistical analyses are riveting. I still love these topics. The world is becoming an increasingly data-driven place, and to produce trustworthy results, you must analyze your data correctly. After all, it’s surprisingly easy to <a href="http://blog.minitab.com/blog/adventures-in-statistics/applied-regression-analysis-how-to-present-and-use-the-results-to-avoid-costly-mistakes-part-1" target="_blank">make a costly mistake</a> if you don’t know what you’re doing.</p>
<p><img alt="F-distribution with probability" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/6303a2314437d8fcf2f72d9a56b1293a/f_distribution_probability.png" style="width: 250px; height: 167px; float: right; margin: 10px 15px;" />A data-driven world requires an analyst to understand seemingly esoteric details such as: the <a href="http://blog.minitab.com/blog/adventures-in-statistics/curve-fitting-with-linear-and-nonlinear-regression" target="_blank">different methods of fitting curves</a>, <a href="http://blog.minitab.com/blog/adventures-in-statistics/the-danger-of-overfitting-regression-models" target="_blank">the dangers of overfitting your model</a>, <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-how-do-i-interpret-r-squared-and-assess-the-goodness-of-fit" target="_blank">assessing goodness-of-fit</a>, <a href="http://blog.minitab.com/blog/adventures-in-statistics/why-you-need-to-check-your-residual-plots-for-regression-analysis" target="_blank">checking your residual plots</a>, and how to check for and correct <a href="http://blog.minitab.com/blog/adventures-in-statistics/what-are-the-effects-of-multicollinearity-and-when-can-i-ignore-them" target="_blank">multicollinearity</a> and <a href="http://blog.minitab.com/blog/adventures-in-statistics/curing-heteroscedasticity-with-weighted-regression-in-minitab-statistical-software" target="_blank">heteroscedasticity</a>. How do you <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-choose-the-best-regression-model" target="_blank">choose the best model</a>? Do you need to <a href="http://blog.minitab.com/blog/adventures-in-statistics/when-is-it-crucial-to-standardize-the-variables-in-a-regression-model" target="_blank">standardize your variables</a> before performing the analysis? Maybe you need a <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-tutorial-and-examples" target="_blank">regression tutorial</a>?</p>
<p>You may need to know <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-identify-the-distribution-of-your-data-using-minitab" target="_blank">how to identify the distribution of your data</a>. And just <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests:-why-we-need-to-use-hypothesis-tests-in-statistics" target="_blank">how do hypothesis tests work</a> anyway? <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-analysis-of-variance-anova-and-the-f-test" target="_blank">F-tests</a>? <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-t-tests-t-values-and-t-distributions" target="_blank">T-tests</a>? How do you <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-test-your-discrete-distribution" target="_blank">test discrete data</a>? <a href="http://blog.minitab.com/blog/adventures-in-statistics/when-should-i-use-confidence-intervals-prediction-intervals-and-tolerance-intervals" target="_blank">Should you use a confidence interval, prediction interval, or a tolerance interval</a>? <a href="http://blog.minitab.com/blog/adventures-in-statistics/use-random-assignment-in-experiments-to-combat-confounding-variables" target="_blank">How do you know when X causes a change in Y</a>? <a href="http://blog.minitab.com/blog/adventures-in-statistics/confound-it-some-more-how-a-factor-that-wasnt-there-hampered-my-analysis" target="_blank">Is a confounding variable distorting your results</a>? <a href="http://blog.minitab.com/blog/adventures-in-statistics/repeated-measures-designs-benefits-challenges-and-an-anova-example" target="_blank">What are the pros and cons of using a repeated measures design</a>? <a href="http://blog.minitab.com/blog/adventures-in-statistics/did-welchs-anova-make-fishers-classic-one-way-anova-obsolete" target="_blank">Fisher’s or Welch’s ANOVA</a>? <a href="http://blog.minitab.com/blog/adventures-in-statistics/the-power-of-multivariate-anova-manova" target="_blank">ANOVA or MANOVA</a>? <a href="http://blog.minitab.com/blog/adventures-in-statistics/linear-or-nonlinear-regression-that-is-the-question" target="_blank">Linear or nonlinear regression?</a></p>
<p>These may not be “sexy” topics but they are the meat and potatoes of being able to draw sound conclusions from your data. And, based on numerous blog comments, they have been well received by many people. In fact, the most rewarding aspect of writing blog posts has been the interactions I've had with all of you. I've communicated with literally hundreds and hundreds of students learning statistics and practitioners performing statistics in the field. I’ve had the pleasure of learning how you use statistical analyses, understanding the difficulties you face, and helping you resolve those issues.</p>
<p>It's been an amazing journey and I hope that my blog posts have allowed you to see statistics through my eyes―as a key that can unlock discoveries that are trapped in your data. After all, that's the reason why I titled my blog <em>Adventures in Statistics</em>. Discovery is a bumpy road. There can be statistical challenges en route, but even those can be interesting, and perhaps even rewarding, to resolve. Sometimes it is the <a href="http://blog.minitab.com/blog/adventures-in-statistics/the-mysteries-of-variability-and-power" target="_blank">perplexing mystery in your data that prompts you to play detective and leads you to surprising new discoveries</a>!</p>
<p>To close out the old year, it's good to remember that change is constant. There are bound to be many new and exciting adventures in the New Year. I wish you all the best in your endeavors. </p>
<p>“We will open the book. Its pages are blank. We are going to put words on them ourselves. The book is called Opportunity and its first chapter is New Year's Day.” <em>― Edith Lovejoy Pierce </em></p>
<p>May you all find happiness in 2017! Onward and upward!</p>
<p>Jim</p>
Data AnalysisStatisticsStatistics HelpStatsWed, 30 Nov 2016 15:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics-2/the-joy-of-playing-in-endless-backyards-with-statisticsJim FrostMutant Trees Lay Waste to the Landscape and Reveal Mother Nature's Lean Design
http://blog.minitab.com/blog/data-analysis-and-quality-improvement-and-stuff/mutant-trees-lay-waste-to-the-landscape-and-reveal-mother-natures-lean-design
<p>The season of change is upon us here at Minitab's World Headquarters. The air is crisp and clear and the landscape is ablaze in vibrant fall colors. As I drove to work one recent morning, I couldn't help but soak in the beauty surrounding me and think, "Too bad everything they taught me as a kid was a lie."</p>
<p><img alt="fall trees" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/c2cb2bd427165df25e0ca2b38ef59381/trees.jpg" style="width: 208px; height: 182px; margin: 10px 15px; float: right;" />You see, as a boy growing up in New Hampshire, I was told that the sublime beauty of autumn was just a happy accident. As the days become shorter, the trees succumb to their own version of seasonal affective disorder; they stop producing chlorophyll because... well, what's the point? As a result of this photosynthetic funk, the green begins to drain from the leaves and the less pragmatic pigments prevail, if briefly.</p>
<p>But thanks to mutant trees, I now know the truth. Or at least one possible explanation. I refer, of course, to the findings of Hoch, Singsaas, and McCown, in their 2003 paper, "<a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC281624/" target="_blank">Resorption Protection. Anthocyanins Facilitate Nutrient Recovery in Autumn by Shielding Leaves from Potentially Damaging Light Levels.</a>"</p>
<p>In truth, I shouldn't say that what I learned as kid was a <em>lie</em>. The theory of autumn by chromatic attrition might still be true to some extent. But I was intrigued to discover recently that newer theories posit a more adaptive role for the annual display. For example, one theory suggests that the bright displays evolved to inform potentially injurious insects that they are barking up the wrong tree. (For more information, see Archetti and Brown 2004, "<a href="http://harvardforest.fas.harvard.edu/sites/harvardforest.fas.harvard.edu/files/leaves/Archetti_%20Brown_2004.pdf" target="_blank">The coevolution theory of autumn colours</a>".)</p>
<p>But most interesting to me was the discovery that red pigments aren't just late-season hold-outs—production of these pigments is actually ramped up in the fall. Obviously, the "Accidental Autumn" explanation doesn't hold in this case. In their paper, Hoch and colleagues present evidence that anthocyanins, which produce red fall colors, actually help trees prepare for winter.</p>
<p>Here's where the mutants come in. The theory is that the anthocyanins act as a kind of sunblock to protect the leaves while the tree recovers valuable nutrients from the leaves before sending them downward and duffward.</p>
<p>To test this theory, the scientists sampled leaves from normal (wild) trees and from mutant trees that possessed superhuman powers. Well, actually, all trees possess superhuman powers because all trees can produce food from sunlight. (I've yet to meet a human who can do that.) But in this case, affected trees had a mutation that prevented them from producing anthocyanins and turning red in the fall. </p>
<p>It's always easier to understand what your data are showing you when you can look at the results of your analysis in a graph. I used <a href="http://www.minitab.com/products/minitab">Minitab Statistical Software</a> to create a couple of graphs that illustrate some of the results shared in the paper. </p>
Before and after nitrogen levels
<p>The scientists measured the nitrogen levels in the leaves before and after the period when the trees normally recover as much of that valuable nutrient as they can. This graph shows the before and after nitrogen levels for mutant and wild-type specimens of 3 different tree species. The graph shows that the nitrogen levels in the leaves tend to drop more for the wild trees, indicating that they are more successful at recovering the nitrogen than the mutant trees. </p>
<p style="margin-left: 40px;"><img alt="Line plot of before and after nitrogen levels" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/d0620bb01ef55623402cd4b603e3f861/lineplotbeforeafter.jpg" style="width: 459px; height: 306px;" /></p>
Resorption efficiency
<p>This <span><a href="http://blog.minitab.com/blog/quality-data-analysis-and-statistics/bar-charts-decoded">bar chart</a></span> shows the same data, but expressed as "Resorption Efficiency," which is just the percent change between the before and after nitrogen levels. The graph suggests that the lack of anthocyanins hampered the ability of the mutant trees to recover the nitrogen from their leaves. </p>
<p style="margin-left: 40px;"><img alt="Bar chart of resorption efficiency" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/f42ddf69c4e21e9804b053dabef3623c/barchartresorptionefficiency.jpg" style="width: 459px; height: 306px;" /></p>
<p>So, rather than simply accepting seasonal spikes in scrap waste, it appears that mother nature is a much better quality engineer than we had given her credit for. In addition to dazzling us with some beautiful color before winter sets in, those brilliant reds are actually adding value to the process by helping to reduce waste.</p>
<p>My newfound appreciation for nature's lean genius inspired me to do a little exploring around Minitab's World Headquarters and capture some images of industrious anthocyanins hard at work improving plant profitability. Along with some cows. If you've never had the opportunity to see trees do this—and even if you have—perhaps you'll enjoy the images shared below. </p>
<p>Happy Autumn! </p>
<p><em>Corn rows weave under undulating clouds</em><br />
<img alt="Harvest has come" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/bbd7dc445005ffda4874c3ae424ab730/maze_2.jpg" style="width: 500px; height: 378px;" /></p>
<p><em>Rusty barns rest after the harvest</em><br />
<img alt="Barn" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/292dfb266a78f7fae35a28648b6b33a4/barn__enhanced.jpg" style="width: 500px; height: 262px;" /></p>
<p><em>Rustling stalks spread from road to ridge</em><br />
<img alt="Ridge and meadow" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/f1f734dd3153923e80039eab70dba9e9/ridge_and_meadow.jpg" style="font-size: 13px; width: 500px; height: 269px;" /></p>
<p><i>Heifers</i><em> forage contentedly under a calm fall sky</em><br />
<img alt="Cows" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/46f84b64ecc05fb461d3c9fe7c67d5c5/cows.jpg" style="font-size: 13px; width: 500px; height: 545px;" /></p>
<p><em>Autumn finery frames the fabled Beaver Stadium </em><br />
<img alt="Fabled Beaver Stadium" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/5a6bde627d438a786ca7edaf37f2ca27/stadium_framed_by_field_and_tree.jpg" style="width: 500px; height: 555px;" /></p>
<p><em>Scenic splendor surrounds majestic Mount Nittany </em><br />
<img alt="Mount Nittany" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/4abbb7f7b6a09d5e1a3e92b9205511ee/flaming_frame__bright.jpg" style="width: 500px; height: 258px;" /></p>
<p><em>Wary hawk takes wing amid wild autumn hues</em><br />
<img alt="Hawk on the wing" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/a1fd4a9278394d7917dd4928c6b09c13/soar_2.jpg" style="width: 500px; height: 528px;" /></p>
<p><em>Opportunistic apparitions hang around to haunt passers by</em><br />
<img alt="Ghosts" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/3c8869125bd9b7f17deef5a9506018c0/ghosts2.jpg" style="width: 500px; height: 209px;" /></p>
<p><em>Minitab World Headquarters looms large on the landscape</em><br />
<img alt="Minitab World Headquarters" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/a7e45f704289b992aa7f69eb39a92a8d/peeper_tab.jpg" style="width: 500px; height: 293px;" /></p>
<p> </p>
<p> </p>
Fun StatisticsStatisticsStatistics in the NewsFri, 18 Nov 2016 13:00:00 +0000http://blog.minitab.com/blog/data-analysis-and-quality-improvement-and-stuff/mutant-trees-lay-waste-to-the-landscape-and-reveal-mother-natures-lean-designGreg FoxHow Effective Are Flu Shots?
http://blog.minitab.com/blog/adventures-in-statistics-2/how-effective-are-flu-shots
<p><img alt="Influenza virus" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/9786f693e9bfb040dea4b7d56bf5c60e/influenza_virus.jpg" style="float: right; width: 175px; height: 256px; margin: 10px 15px;" />Once again, with the arrival of autumn, it's time for a flu shot.</p>
<p>I get a flu shot every year even though I know they’re not perfect. I figure they’re a relatively easy and inexpensive way to reduce the chance of having a miserable week.</p>
<p>I’ve heard on various news media that their effectiveness is about 60%. But what does 60% effectiveness mean, exactly? How much does this actually reduce the chances that I’ll get the flu in any given year? I'm going to explore this and go beyond the news media simplification and present you with very clear answers to these questions. Quite frankly, some of the results were not what I expected.</p>
We’ll Find Our Answers in Randomized, Controlled Trials (RCTs)
<p>I’m a numbers guy. I use numbers to understand the world. My background is in research, so when I want to understand an issue, I look at the primary research. If I can understand the researchers’ methodology, the data they collect, and how they draw their conclusions, I’ll understand the issue at a deeper, more fundamental level than news reports typically provide. </p>
<p>To understand flu shot effectiveness, I’m only going to assess double-blind, randomized controlled trials, the gold standard. These studies are more expensive to conduct but provide better results than observational studies. (I discuss the differences between these two types of studies in my post about the <a href="http://blog.minitab.com/blog/adventures-in-statistics/statistics-that-affect-you-are-vitamin-supplements-really-harmful" target="_blank">benefits of vitamins</a>.)</p>
<p>The two influenza vaccination studies I’ll look at satisfy the above criteria and are listed in a section of references for health professionals on the CDC’s <a href="http://www.cdc.gov/flu/professionals/vaccination/effectivenessqa.htm#references" target="_blank">website</a>. Presumably these studies make a good case, using trusted data. Along the way, we’ll use Minitab <a href="http://www.minitab.com/products/minitab">statistical software</a> to analyze their data for ourselves.</p>
Defining the Effectiveness of Flu Shots
<p>Flu shots contain vaccine for three influenza viruses that researchers predict will be the most common in a given flu season. However, plenty of other viruses (flu and otherwise) also are circulating and can make you sick. Many illnesses with flu-like symptoms are incorrectly attributed to the flu.</p>
<p>Consequently, the best studies use a lab to identify the specific virus that infects each of their sick subjects. These studies only count the subjects with confirmed cases of the three types of influenza virus. Effectiveness is defined as the reduction in these three influenza viruses among those who were vaccinated compared to those who were not vaccinated.</p>
The Two Studies of the Flu Vaccine
<p>It’s time to dig into the data! For me, this is where it gets exciting. You can hear about effectiveness on TV, but this is where it all comes from: counts of sick people in the experimental groups.</p>
The Beran Study
<p>The Beran et al. study1 assesses the 2006/2007 flu season and tracks its subjects from September to May. Subjects in this study range from 18-64 years old.</p>
<p style="text-align: center;"><strong>Treatment</strong></p>
<p style="text-align: center;"><strong>Flu count</strong></p>
<p style="text-align: center;"><strong>Group size</strong></p>
<p style="text-align: center;">Shot</p>
<p style="text-align: center;">49</p>
<p style="text-align: center;">5103</p>
<p style="text-align: center;">Placebo</p>
<p style="text-align: center;">74</p>
<p style="text-align: center;">2549</p>
<p>Because we want to compare the proportions between two groups, we’ll use the Two Proportions test in Minitab. To do this yourself, in Minitab, go to <strong>Stat > Basic Statistics > 2 Proportions</strong>. In the dialog, choose <strong>Summarized data</strong> and enter the data from the table above. Click<strong> OK</strong>, and you get the results below:</p>
<p style="margin-left: 40px;"><img alt="Minitab's Two Proportions test for the flu data" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/a3f37da27803215fb8fa3cd85d0b7924/flustudyberan.gif" style="width: 471px; height: 209px;" /></p>
<p>The p-value of 0.000 tells us that there is a significant difference between the two groups. The estimated difference between the vaccinated group and the placebo group is 1.9 percentage points. Because this is an RCT, it's fairly safe to assume that the vaccination caused the difference between the groups. However, outside of a randomized experiment, it's not wise to assume causality.</p>
<p>The vaccine effectiveness (or efficacy) is a relative reduction in risk between the two groups. You simply take the relative risk ratio of (vaccinated proportion/unvaccinated proportion) and subtract that from 1. We can get the proportion for each group from the Sample p column in Minitab’s output:</p>
<p style="margin-left: 40px;">1 - (0.009602/0.029031) = 0.669</p>
<p>This study finds a 66.9% vaccine efficacy for the flu shot compared to the placebo.</p>
The Monto Study
<p>The Monto et al study2 assesses the 2007-2008 flu season and tracks its subjects from January to April. Subjects in this study range from 18-49 years old.</p>
<p style="text-align: center;"><strong>Treatment</strong></p>
<p style="text-align: center;"><strong>Flu count</strong></p>
<p style="text-align: center;"><strong>Group size</strong></p>
<p style="text-align: center;">Shot</p>
<p style="text-align: center;">28</p>
<p style="text-align: center;">813</p>
<p style="text-align: center;">Placebo</p>
<p style="text-align: center;">35</p>
<p style="text-align: center;">325</p>
<p>We’ll do the Two Proportions test again for this study. This time, enter the numbers from the above table into the dialog.</p>
<p style="margin-left: 40px;"><img alt="Minitab's Two Proportions test for the flu data" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/89fb6b907bf288375b5faaeba5ff85ee/flustudymonto.gif" style="width: 469px; height: 210px;" /></p>
<p>Again, the p-value indicates that there is a significant difference between the two groups. The estimated difference between the vaccinated group and the placebo group is 7.3 percentage points. Let's calculate the effectiveness:</p>
<p style="margin-left: 40px;">1 – (0.034440/0.107692) = 0.680</p>
<p>This study finds a 68.0% vaccine efficacy for the flu shot compared to the placebo.</p>
Conclusions So Far
<p>We’ve looked at the data from two gold-standard studies and have drawn the same conclusions that you commonly hear on the news. Flu shots significantly reduce the number of influenza infections, and they are about 68% effective.</p>
<p>However, looking at the data and analyses myself, I have new insights. Specifically, the low number of influenza cases in the placebo group for each study caught my eye, and that’s what we’re looking at next.</p>
What It Means for You: Relative versus Absolute Risk
<p>If you’re like me, the 68% effective statistic isn’t too helpful. The problem is that it is a relative comparison of risk, not an absolute assessment of risk. To illustrate the difference, consider which type of assessment is more useful:</p>
<ol>
<li><strong>Relative assessment:</strong> Your car is travelling half as fast as another car, but you don’t know the true speed of either car.<br />
</li>
<li><strong>Absolute assessment:</strong> Your car is travelling at 30 MPH and the other car is travelling at 60 MPH.</li>
</ol>
<p>Clearly, #2 is much more useful. Similarly, it would be more helpful to know the absolute risk of catching the flu if you get the shot versus not getting it!</p>
Vaccine effectiveness is a relative risk
<p>Vaccine effectiveness doesn’t tell you the exact risk of catching the flu for either group. Instead, it involves dividing one proportion by the other for the relative risk. In fact, as you should recall, effectiveness is the inverse of the relative risk, which makes it even <em>harder</em> to interpret. 67% effectiveness indicates that a vaccinated person has one-third the risk of contracting the flu as a non-vaccinated person.</p>
<p>Unfortunately, using these numbers, we don’t know the absolute risk for anyone!</p>
The group proportions are the absolute risks
<p>We can estimate the absolute risk from the studies by looking at the proportion for each group in the Minitab output, and subtracting to calculate the absolute reduction. I’ll summarize this information below as percentages and even add in the results for two more flu seasons from another study that the CDC references (Bridges et al.3):</p>
<p style="text-align: center;"><strong>Flu season</strong></p>
<strong>Placebo</strong>
<p style="text-align: center;"><strong>Flu Shot</strong></p>
<p style="text-align: center;"><strong>% Point Reduction</strong></p>
<p style="text-align: center;">1997/98</p>
4.4
<p style="text-align: center;">2.2</p>
<p style="text-align: center;">2.2</p>
<p style="text-align: center;">1998/99</p>
10.0
<p style="text-align: center;">1.0</p>
<p style="text-align: center;">9.0</p>
<p style="text-align: center;">2006/07</p>
2.9
<p style="text-align: center;">1.0</p>
<p style="text-align: center;">1.9</p>
<p style="text-align: center;">2007/08</p>
10.8
<p style="text-align: center;">3.4</p>
<p style="text-align: center;">7.4</p>
<p style="text-align: center;"><strong>Average</strong></p>
<strong>7.0</strong>
<p style="text-align: center;"><strong>1.9</strong></p>
<p style="text-align: center;"><strong>5.1</strong></p>
<p>Notice how the risk of getting the flu varies by flu season? The differences are not surprising because the studies use different samples and the flu seasons have different influenza viruses.</p>
<p>So let’s look at the average of these four flu seasons. If you aren’t vaccinated, you have a 7.0% chance of getting the flu. However, if you do get the flu shot, your risk is about 1.9%, which is a reduction of 5.1 percentage points.</p>
<p>Hmm. The "5.1% reduction" doesn’t sound nearly as impressive as the "67% effectiveness!" Both statistics are based on the same data, but I think the estimate of absolute risk is a more useful way to present the results.</p>
Closing Thoughts about the Flu Shot Data
<p>I was surprised by the results. While I knew flu shots were not perfect, I always got them because I thought they reduced my risk by more than what the CDC recommended studies actually show. Even if you aren’t vaccinated, your risk of getting the flu isn’t too high.</p>
<p>That probably explains why a number of people have told me that while they never get flu shots, they can’t remember having the flu!</p>
<p>These more subtle results made me wonder about flu vaccinations on a societal scale. Could the flu vaccine possibly reduce flu cases enough to save sufficient money (lost workdays, doctor and drug costs, etc) to pay for the vaccinations?</p>
<p>Bridges et al. conducted a cost-benefit analysis in their study. For the two flu seasons where they tracked flu vaccinations, infections, and expenditures, the vaccinations actually <em>increase </em>net societal costs. It would’ve been cheaper overall not to get vaccinated!</p>
<p>In light of this, I wasn’t surprised when I read an <a href="http://www.cnn.com/2013/01/17/health/flu-vaccine-policy/index.html?hpt=hp_bn12" target="_blank">article</a> on CNN.com that said, outside of the U.S. and Canada, other countries do not strongly encourage all of their citizens above 6 months to get the flu shot. According to the article, “global health experts say the data aren’t there yet to support this kind of vaccination policy, nor is there enough money.”</p>
<p>I understand this viewpoint better now.</p>
<p><strong><em>However, I’m not trying to talk anyone out of getting a flu shot.</em></strong> I’m on the fence myself. While the risk of getting the flu in any given year is fairly small, if you regularly get the flu shot, you’ll probably spare yourself a week of misery at some point! You should always consult a medical professional to determine the best decision for your specific situation.</p>
<p>In another post, I look at the <a href="http://blog.minitab.com/blog/adventures-in-statistics/flu-shot-followup-assessing-the-long-term-benefits-of-flu-vaccination">long-term benefits of flu vaccinations</a>.</p>
<p><strong>References</strong></p>
<p>1. Beran J, Vesikari T, Wertzova V, Karvonen A, Honegr K, Lindblad N, Van Belle P, Peeters M, Innis BL, Devaster JM. Efficacy of inactivated split-virus influenza vaccine against culture-confirmed influenza in healthy adults: a prospective, randomized, placebo-controlled trial. J Infect Dis 2009;200(12):1861-9</p>
<p>2. Monto AS, Ohmit SE, Petrie JG, Johnson E, Truscon R, Teich E, Rotthoff J, Boulton M, Victor JC. Comparative efficacy of inactivated and live attenuated influenza vaccines. N Engl J Med. 2009;361(13):1260-7</p>
<p>3. Bridges CB, Thompson WW, Meltzer MI, Reeve GR, Talamonti WJ, Cox NJ, Lilac HA, Hall H, Klimov A, Fukuda K. Effectiveness and cost-benefit of influenza vaccination of healthy working adults: A randomized controlled trial. JAMA. 2000;284(13):1655-63</p>
Data AnalysisWed, 09 Nov 2016 12:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics-2/how-effective-are-flu-shotsJim Frost8 Expert Tips for Excellent Designed Experiments (DOE)
http://blog.minitab.com/blog/understanding-statistics/8-expert-tips-for-excellent-designed-experiments-doe
<p>If your work involves quality improvement, you've at least <em>heard</em> of Design of Experiments (DOE). You probably know it's the most efficient way to optimize and improve your process. But many of us find DOE intimidating, especially if it's not a tool we use often. How do you select an appropriate design, and ensure you've got the right number of factors and levels? And after you've gathered your data, how do you pick the right model for your analysis?</p>
<p><img alt="gauge" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a43ef5d7bf55aac81f8e316f48e5f40e/gauge.png" style="width: 300px; height: 212px; margin: 10px 15px; float: right;" />One way to get started with DOE is the Assistant in Minitab Statistical Software. When you have many factors to evaluate, the Assistant will walk you through <span>a <a href="http://blog.minitab.com/blog/understanding-statistics/applying-doe-for-great-grilling-part-1">DOE to identify which factors matter the most (screening designs)</a></span>. Then the Assistant can guide you through <span>a <a href="http://blog.minitab.com/blog/understanding-statistics/applying-doe-for-great-grilling-part-2">designed experiment to fine-tune the important factors for maximum impact (optimization designs)</a></span>. </p>
<p>If you're comfortable enough to skip the Assistant, but still have some questions about whether you're approaching your DOE the right way, consider the following tips from Minitab's technical trainers. These veterans have done a host of designed experiments, both while working with Minitab customers and in their careers in before they became Minitab trainers. </p>
1. Identify the right variable space to study with exploratory runs.
<p>Performing exploratory runs before doing the main experiment can help you identify the settings of your process as performance moves from good to bad. This can help you determine the variable space to conduct your experiment that will yield the most beneficial results. </p>
2. Spread control runs throughout the experiment to measure process stability.
<div style="text-align: left">Since <a href="http://blog.minitab.com/blog/michelle-paret/doe-center-points-what-they-are-why-theyre-useful">center-point runs are usually near-normal operating conditions</a>, they can act as a control to check process performance. By spacing center points evenly through the design, these observations serve as an indicator of the stability of your process—or lack thereof—during the experiment. </div>
3. Identify the biggest problems with Pareto analysis.
<div style="text-align: left">A Pareto chart of product load or defect levels can help you identify which problem to fix that will result in the highest return to your business. Focusing on problems with high business impact improves support for your experiment by raising its priority among all potential improvement projects.</div>
<div style="text-align: left"> </div>
<div style="text-align: left; margin-left: 40px;"><img alt="Pareto Chart of the Effects" src="https://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/4c725391e378fa9b080480d6430d4199/pareto_chart.bmp" style="width: 577px; height: 385px;" /></div>
4. Improve power by expanding the range of input settings.
<div style="text-align: left;">Test the largest range of input variable settings that is physically possible. Even if you think they are far away from the “sweet spot,” this technique will allow you to use the experiment to understand your process so that you can find the optimal settings.</div>
<div style="text-align: left; margin-left: 40px;"><br />
<img alt="Maximizing your variable space can help you discover new insights about your process. " src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/7ade9a7affca6b322f225b3d2fa21186/expand_range_of_settings.gif" style="margin: 10px; border: thin solid; width: 640px; height: 429px;" title="Maximizing your variable space can help you discover new insights about your process. " /></div>
5. Fractionate to save runs, focusing on Resolution V designs.
<p>In many cases, <a href="http://blog.minitab.com/blog/applying-statistics-in-quality-projects/design-of-experiments-fractionating-and-folding-a-doe">it's beneficial to choose a design with ½ or ¼ of the runs of a full factorial</a>. Even though effects could be confounded or confused with each other, Resolution V designs minimize the impact of this confounding which allows you to estimate all main effects and two-way interactions. Conducting fewer runs can save money and keep experiment costs low.</p>
<img alt="Choosing the right fractional factorial helps reduce the size of your experiment while minimizing the level of confounding of effects. " src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/cee9a8474f5f880fb7e9c51b59f7b393/available_factorial_designs.png" style="margin: 10px; width: 448px; height: 177px;" title="Choosing the right fractional factorial helps reduce the size of your experiment while minimizing the level of confounding of effects. " />
6. Improve the power of your experiment with replicates.
<p>Power is the probability of detecting an effect on the response, if that effect exists. The number of replicates affects your experiment's power. To increase the chance that you will be successful identifying the inputs that affect your response, add replicates to your experiment to increase its power.</p>
<p style="margin-left: 40px;"><img alt="Power is a function of the number of replicates. " src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ea56cb24542fe92b4e850c840c823652/replicates_and_power.png" style="margin: 10px; width: 430px; height: 373px;" title="Power is a function of the number of replicates. " /></p>
7. Improve power by using quantitative measures for your response.
<p>Reducing defects is the primary goal of most experiments, so it makes sense that defect counts are often used as a response. But defect counts are a very expensive and unresponsive output to measure. Instead, try measuring a quantitative indicator related to your defect level. Doing this can decrease your sample size dramatically and improve the power of your experiment. </p>
8. Study all variables of interest and all key responses.
<p>Factorial designs let you take a comprehensive approach to studying all potential input variables. Removing a factor from the experiment slashes your chance of determining its importance to zero. With the tools available in <a href="http://www.minitab.com/products/minitab">statistical software such as Minitab</a> to help, you shouldn't let fear of complexity cause you to omit potentially important input variables. </p>
<p>Do you have any DOE tips to add to this list?</p>
Design of ExperimentsLean Six SigmaQuality ImprovementSix SigmaStatisticsWed, 02 Nov 2016 13:27:00 +0000http://blog.minitab.com/blog/understanding-statistics/8-expert-tips-for-excellent-designed-experiments-doeEston MartzCommon Assumptions about Data (Part 1: Random Samples and Statistical Independence)
http://blog.minitab.com/blog/quality-business/common-assumptions-about-data-part-1-random-samples-and-statistical-independence
<p><img alt="horse before the cart road sign" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/cc91865a3d4df6456934528866576a1b/horse_warning_sign.png" style="margin: 10px 15px; float: right; width: 120px; height: 120px;" /></p>
<p>Statistical inference uses data from a sample of individuals to reach conclusions about the whole population. It’s a very <span>powerful tool</span>. But as the saying goes, “With great power comes great responsibility!” When attempting to make inferences from sample data, you must check your assumptions. Violating any of these assumptions can result in false positives or false negatives, thus invalidating your results. In other words, you run the risk that your results are wrong, that your conclusions are wrong, and hence that the solutions you implement won’t solve the problem (unless you’re <em>really</em> lucky!).</p>
<p>You’ve heard the joke about <a href="https://www.goodreads.com/quotes/192478-you-should-never-assume-you-know-what-happens-when-you">what happens when you assume</a>? For this post, let’s instead ask “What happens when you fail to check your assumptions?” After all, we’re human—and humans assume things all the time. Suppose, for example, I want to schedule a phone meeting with you and I’m in the U.S. Eastern time zone. It’s easy for me to assume that everyone is in same time zone, but you’re really in California, or Australia. What would happen if I called a meeting at 2:00 p.m. but didn’t specify the time zone? Unless you checked, you might be early or late to the meeting, or miss it entirely! </p>
<p>The good news is that when it comes to the assumptions in statistical analysis, Minitab has your back. Minitab 17 has even more features to help you verify and validate the needed statistical analysis assumptions before you finalize your conclusion. When you use <a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/angst-over-anova-assumptions-ask-the-assistant">the Assistant in Minitab</a>, the software will identify the appropriate assumptions for your analysis, provide guidance to help you develop robust data collection plans, check the assumptions when you analyze your data, and let you know the results in an easy-to-understand Report Card and Diagnostic Report.</p>
<p>The common data assumptions are: random samples, independence, normality, equal variance, stability, and that your measurement system is accurate and precise. In this post, we’ll address random samples and statistical independence.</p>
What Is the Assumption of Random Samples?
<p>A sample is random when each data point in your population has an equal chance of being included in the sample; therefore <a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/collecting-random-data-isnt-monkey-business">selection of any individual happens by chance, rather than by choice</a>. This reduces the chance that differences in materials or conditions strongly bias results. Random samples are more likely to be representative of the population; therefore you can be more confident with your statistical inferences with a random sample. </p>
<p>There is no test that assures random sampling has occurred. Following good sampling techniques will help to ensure your samples are random. Here are some common approaches to making sure a sample is randomly created:</p>
<ul>
<li>Using a random number table or feature in Minitab (Figure 1).</li>
<li>Systematic selection (every nth unit or at specific times during the day).</li>
<li>Sequential selection (taken in sequence for destructive testing, etc.).</li>
<li>Avoiding the use of judgement or convenience to select samples.</li>
</ul>
<p><img alt="Minitab dialog boxes" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/9f122a1ab01790cdc5f6530ec3a90a14/assumptions_dialog_box.png" style="border-width: 0px; border-style: solid; width: 700px; height: 421px;" /></p>
<p><em>Figure 1. Random Data Generator in Minitab 17</em></p>
<p>Non-random samples introduce bias and can result in incorrect interpretations.</p>
What Is the Assumption of Statistical Independence?
<p>Statistical independence is a critical assumption for many statistical tests, such as the 2-sample t test and ANOVA. Independence means the value of one observation does not influence or affect the value of other observations. Independent data items are not connected with one another in any way (unless you account for it in your model). This includes the observations in both the “between” and “within” groups in your sample. Non-independent observations introduce bias and can make your statistical test give too many false positives. </p>
<p>Following good sampling techniques will help to ensure your samples are independent. Common sources of non-independence include:</p>
<ul>
<li>Observations that are close together in time.</li>
<li>Observations that are close together in space or nested.</li>
<li>Observations that are somehow related.</li>
</ul>
<p>Minitab can test for independence using the Chi-Square Test for Association, which is designed to determine if the distribution of observations for one variable is similar for all categories of the second variable. </p>
The Real Reason You Need to Check the Assumptions
<p>You will be putting a lot of time and effort into collecting and analyzing data. After all the work you put into the analysis, you want to be able to reach correct conclusions. You want to be confident that you can tell whether observed differences between data samples are simply due to chance, or if the populations are indeed different! </p>
<p>It’s easy to put the cart before the horse and just plunge in to the data collection and analysis, but it’s much wiser to take the time to understand which data assumptions apply to the statistical tests you will be using, and plan accordingly.</p>
<p>In my next blog post, I will <a href="http://blog.minitab.com/blog/quality-business/common-assumptions-about-data-part-2-normality-and-equal-variance">review the Normality and Equal Variance assumptions</a>. </p>
Data AnalysisStatisticsStatistics HelpStatsMon, 24 Oct 2016 12:00:00 +0000http://blog.minitab.com/blog/quality-business/common-assumptions-about-data-part-1-random-samples-and-statistical-independenceBonnie K. StoneImproving Cash Flow and Cutting Costs at Bank Branch Offices
http://blog.minitab.com/blog/understanding-statistics/improving-cash-flow-and-cutting-costs-at-bank-branch-offices
<p>Every day, thousands of people withdraw extra cash for daily expenses. Each transaction may be small, but the total amount of cash dispersed over hundreds or thousands of daily transactions can be very high. But every bank branch has a fixed cash flow, which must be set without knowing what each customer will need on a given day. This creates a challenge for financial entities. Customers expect their local bank office to have adequate cash on hand, so how can a bank confidently ensure each branch has enough funds to handle transactions without keeping too much in reserve?</p>
<p><img alt="Grupo Mutual" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/b2366c2da44cd861775ebab6c6d07e55/grupo_mutual_logo_200w_1_.png" style="width: 200px; height: 95px; margin: 10px 15px; float: right;" />A quality project team led by Jean Carlos Zamora and Francisco Aguilar tackled that problem at Grupo Mutual, a financial entity in Costa Rica.</p>
<p>When the project began, each of Grupo Mutual's 55 branches kept additional cash in a vault to avoid having insufficient funds. But without a clear understanding of daily needs, some branches often ran out of cash anyway, while others had significant unused reserves.</p>
<p>When a branch ran short, it created high costs for the company and gave customers three undesirable options: receive the funds as an electronic transfer, wait 1–3 days for consignment, or travel to the main branch to withdraw their cash. Having the right amount of cash in each branch vault would reduce costs and maintain customer satisfaction.</p>
<p>Using <a href="http://www.minitab.com/products/minitab">Minitab Statistical Software</a> and Lean Six Sigma methods, the team set out to determine the optimal amount of currency to store at each branch to avoid both a negative cash flow and idle funds. The team followed the five-phase <a href="http://blog.minitab.com/blog/real-world-quality-improvement/dmaic-vs-dmadv-vs-dfss">DMAIC (Define, Measure, Analyze, Improve, and Control)</a> method. In the Define phase, they set the goal: creating an efficient process that transferred cash from idle vaults to branches that needed it most.</p>
<p>In the Measure phase, the team analyzed two years' worth of cash-flow data from the 55 branches. “Managing the databases and analyzing about 2,000 data points from each of the 55 branches was our biggest challenge,” says Jean-Carlos Zamora Mora, project leader and improvement specialist at Grupo Mutual. “Minitab played a very important part in addressing this issue. It reduced the analysis time by helping us identify where to focus our efforts to improve our process.” </p>
<p>The Analyze phase began with an analysis of variance (ANOVA) for to explore how the banks’ cash flow varied per month. They used Minitab to identify which months were different from one another, and grouped similar months together to streamline the analysis. </p>
<p>The team next used control charts to graph the data over time and assess whether or not the process was stable, in preparation for conducting capability analysis. To choose the right control chart and create comprehensive summaries of the results, the team used the Minitab Assistant.</p>
<p style="margin-left: 40px;"><img alt="grupo mutual i-mr chart" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/2d9ac9b2597c592e5be5b779bae85076/grupo_mutual_i_mr_chart_1_.png" style="width: 585px; height: 432px;" /></p>
<p>The team then performed a capability analysis of each group’s current cash flow to determine whether customer transactions matched the services provided, and establish the percentage of cash used at each branch.</p>
<p style="margin-left: 40px;"><img alt="grupo mutual capability analysis" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f0e25ef8282111550e8fe8733eb889de/grupo_mutual_capability_analysis_1_.png" style="width: 586px; height: 439px;" /></p>
<p>The analysis revealed that, in total, the vaults contained more than the necessary funds each branch needed to operate effectively, but excessive circulation of the money caused some to overdraw their vaults while others stored cash that was not utilized. </p>
<p>“We found a positive cash balance at 95% of the branches,” says Zamora Mora. “The analysis showed the cash on hand to meet customer needs exceeded the requirements by over 200%, so we suddenly had lots of money to invest.” </p>
<p>The analysis gave the team the confidence to move forward with the Improve phase: implementing real-time control charts that enabled management to check each branch’s cash balance throughout the day. Managers could now quickly move cash from branches with excess cash to those needing additional funds, and make more strategic cash flow decisions.</p>
<p>The team found that being able to answer objections with data helped secure buy-in from skeptical stakeholders. “Throughout this project, we encountered questions and situations that could have jeopardized our team’s credibility and our likelihood of success,” recalls Zamora Mora. “But the accuracy and reliability of our data analysis with Minitab was overpowering.” </p>
<p>The changes made during the project increased cash usage by 40% and slashed remittance costs by 60%.The new process also cut insurance costs and shrank risks associated with storing and transporting cash. Overall, the project increased revenue by $1.1 million. </p>
<p>To read a more detailed account of this project, <a href="https://www.minitab.com/Case-Studies/Grupo-Mutual/">click here</a>. </p>
Capability AnalysisLean Six SigmaQuality ImprovementFri, 21 Oct 2016 12:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/improving-cash-flow-and-cutting-costs-at-bank-branch-officesEston MartzProblems Using Data Mining to Build Regression Models, Part Two
http://blog.minitab.com/blog/adventures-in-statistics-2/problems-using-data-mining-to-build-regression-models-part-two
<p>Data mining can be helpful in the exploratory phase of an analysis. If you're in the early stages and you're just figuring out which predictors are potentially correlated with your response variable, data mining can help you identify candidates. However, there are problems associated with using data mining to select variables.</p>
<p>In my <a href="http://blog.minitab.com/blog/adventures-in-statistics/problems-using-data-mining-to-build-regression-models" target="_blank">previous post</a>, we used data mining to settle on the following model and graphed one of the relationships between the response (C1) and a predictor (C7). It all looks great! The only problem is that all of these data are randomly generated! No true relationships are present. </p>
<p style="margin-left: 40px;"><img alt="Regression output for data mining example" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/24e98167e2dfd848b346292af371acf3/regression_swo.png" style="width: 364px; height: 278px;" /></p>
<p style="margin-left: 40px;"><img alt="Scatter plot for data mining example" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/6e4dfb991b33031738756d4b2d1c77e4/scatterplot.png" style="width: 576px; height: 384px;" /></p>
<p>If you didn't already know there was no true relationship between these variables, these results could lead you to a very inaccurate conclusion.</p>
<p>Let's explore how these problems happen, and how to avoid them</p>
Why <em>Do </em>These Problems Occur with Data Mining?
<p>The problem with data mining is that you fit many different models, trying lots of different variables, and you pick your final model based mainly on statistical significance, rather than being guided by theory.</p>
<p>What's wrong with that approach? The problem is that every statistical test you perform has a chance of a false positive. A false positive in this context means that the <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-correctly-interpret-p-values" target="_blank">p-value</a> is statistically significant but there really is no relationship between the variables at the population level. If you set the <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests:-significance-levels-alpha-and-p-values-in-statistics" target="_blank">significance level at 0.05</a>, you can expect that in 5% of the cases where the null hypothesis is true, you'll have a false positive.</p>
<p>Because of this false positive rate, if you analyze many different models with many different variables you will inevitably find false positives. And if you're guided mainly by statistical significance, you'll leave the false positives in your model. If you keep going with this approach, you'll fill your model with these false positives. That’s exactly what happened in our example. We had 100 candidate predictor variables and the stepwise procedure literally dredged through hundreds and hundreds of potential models to arrive at our final model.</p>
<p>As we’ve seen, data mining problems can be hard to detect. The numeric results and graph all look great. However, these results don’t represent true relationships but instead are chance correlations that are bound to occur with enough opportunities.</p>
<p>If I had to name my favorite R-squared, it would be <a href="http://blog.minitab.com/blog/adventures-in-statistics/multiple-regession-analysis-use-adjusted-r-squared-and-predicted-r-squared-to-include-the-correct-number-of-variables" target="_blank">predicted R-squared</a>, without a doubt. However, even predicted R-squared can't detect all problems. Ultimately, even though the predicted R-squared is moderate for our model, the ability of this model to predict accurately for an entirely new data set is practically zero.</p>
Theory, the Alternative to Data Mining
<p>Data mining can have a role in the exploratory stages of an analysis. However, for all variables that you identify through data mining, you should perform a confirmation study using newly collected to data to verify the relationships in the new sample. Failure to do so can be very costly. Just imagine if we had made decisions based on the model above!</p>
<p>An alternative to data mining is to use theory as a guide in terms of both the models you fit and the evaluation of your results. Look at what others have done and incorporate those findings when building your model. Before beginning the regression analysis, develop an idea of what the important variables are, along with their expected relationships, coefficient signs, and effect magnitudes.</p>
<p>Building on the results of others makes it easier both to collect the correct data and to specify the best regression model without the need for data mining. The difference is the process by which you fit and evaluate the models. When you’re guided by theory, you reduce the number of models you fit and you assess properties beyond just statistical significance.</p>
<p>Theoretical considerations should not be discarded based solely on statistical measures.</p>
<ul>
<li>Compare the coefficient signs to theory. If any of the signs contradict theory, investigate and either change your model or explain the inconsistency.</li>
<li>Use <a href="http://www.minitab.com/en-us/products/minitab/" target="_blank">Minitab statistical software</a> to create factorial plots based on your model to see if all the effects match theory.</li>
<li>Compare the <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-how-do-i-interpret-r-squared-and-assess-the-goodness-of-fit" target="_blank">R-squared</a> for your study to those of similar studies. If your R-squared is very different than those in similar studies, it's a sign that your model may have a problem.</li>
</ul>
<p>If you’re interested in learning more about these issues, read my post about <a href="http://blog.minitab.com/blog/adventures-in-statistics/beware-of-phantom-degrees-of-freedom-that-haunt-your-regression-models">how using too many <em>phantom</em> degrees of freedom is related to data mining problems</a>.</p>
<p> </p>
Data AnalysisHypothesis TestingLearningRegression AnalysisStatisticsStatistics HelpWed, 19 Oct 2016 12:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics-2/problems-using-data-mining-to-build-regression-models-part-twoJim Frost