Data Analysis Software | MinitabBlog posts and articles with tips for using statistical software to analyze data for quality improvement.
http://blog.minitab.com/blog/data-analysis-software/rss
Fri, 09 Dec 2016 09:35:08 +0000FeedCreator 1.7.3The Difference Between Right-, Left- and Interval-Censored Data
http://blog.minitab.com/blog/michelle-paret/the-difference-between-right-left-and-interval-censored-data
<p><a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/reliability-and-survival-the-high-stakes-of-product-performance">Reliability analysis</a> is the perfect tool for calculating the proportion of items that you can expect to survive for a specified period of time under identical operating conditions. Light bulbs—or lamps—are a classic example. Want to calculate the number of light bulbs expected to fail within 1000 hours? Reliability analysis can help you answer this type of question.</p>
<p>But to conduct the analysis properly, we need to understand the difference between the three types of censoring.</p>
What is censored data?
<p>When you perform reliability analysis, you may not have exact failure times for all items. In fact, lifetime data are often "censored." Using the light bulb example, perhaps not all the light bulbs have failed by the time your study ends. The time data for those bulbs that have not yet failed are referred to as censored.</p>
<img alt="baby" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/913ae1dbf78dd9728367bf0dead44f45/baby.jpg" style="width: 250px; height: 244px; margin: 10px 15px; float: right;" />
<p>It is important to include the censored observations in your analysis because the fact that these items have not yet failed has a big impact on your reliability estimates.</p>
Right-censored data
<p>Let’s move from light bulbs to newborns, inspired by my colleague who’s at the “you’re <em>still </em>here?” stage of pregnancy.</p>
<p>Suppose you’re conducting a study on pregnancy duration. You’re ready to complete the study and run your analysis, but some women in the study are still pregnant, so you don’t know exactly how long their pregnancies will last. These observations would be <em>right-censored</em>. The “failure,” or birth in this case, will occur after the recorded time.</p>
<p style="margin-left: 40px;"><img alt="Right censored" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/c75961d3d78018da3800683ab233c989/right_censored.png" style="width: 291px; height: 241px;" /></p>
Left-censored data
<p>Now suppose you survey some women in your study at the 250-day mark, but they already had their babies. You know they had their babies before 250 days, but don’t know <em>exactly </em>when. These are therefore <em>left-censored</em> observations, where the “failure” occurred before a particular time.</p>
<p style="margin-left: 40px;"><img alt="Left censored" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/7279d0487d0b3d08120e224456bafc2f/left_censored.png" style="width: 237px; height: 242px;" /></p>
Interval-censored data
<p>If we don’t know exactly when some babies were born but we know it was within some interval of time, these observations would be <em>interval-censored</em>. We know the “failure” occurred within some given time period. For example, we might survey expectant mothers every 7 days and then count the number who had a baby within that given week.</p>
<p style="margin-left: 40px;"><img alt="Interval censored" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/deb69487d6f3256172beefe22b4ecbf6/intervalcensored.png" style="width: 253px; height: 241px;" /></p>
<p>Once you set up your data, running the analysis is easy with <a href="http://www.minitab.com/products/minitab/">Minitab Statistical Software</a>. For more information on how to run the analysis and interpret your results, see <a href="http://blog.minitab.com/blog/fun-with-statistics/what-i-learned-from-treating-childbirth-as-a-failure">this blog post</a>, which—coincidentally—is baby-related, too.</p>
Lean Six SigmaQuality ImprovementReliability AnalysisSix SigmaWed, 07 Dec 2016 14:03:00 +0000http://blog.minitab.com/blog/michelle-paret/the-difference-between-right-left-and-interval-censored-dataMichelle ParetCommon Assumptions about Data Part 3: Stability and Measurement Systems
http://blog.minitab.com/blog/quality-business/common-assumptions-about-data-part-3-stability-and-measurement-systems
<p><img alt="Cart before the horse" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/8230e7c2bc193a831158677a70eb0146/chile_road_sign_po_4.svg" style="width: 101px; height: 101px; float: right; margin: 10px 15px;" />In Parts <span><a href="http://blog.minitab.com/blog/quality-business/common-assumptions-about-data-part-1-random-samples-and-statistical-independence">1</a></span> and <span><a href="http://blog.minitab.com/blog/quality-business/common-assumptions-about-data-part-2-normality-and-equal-variance">2</a></span> of this blog series, I wrote about how statistical inference uses data from a sample of individuals to reach conclusions about the whole population. That’s a very powerful tool, but you must check your assumptions when you make statistical inferences. Violating any of these assumptions can result in false positives or false negatives, thus invalidating your results. </p>
<p>The common data assumptions are: random samples, independence, normality, equal variance, stability, and that your measurement system is accurate and precise. I addressed random samples and statistical independence last time. Now let’s consider the assumptions of stability and measurement systems.</p>
What Is the Assumption of Stability?
<p>A stable process is one in which the inputs and conditions are consistent over time. When a process is stable, it is said to be “in control.” This means the sources of variation are consistent over time, and the process does not exhibit unpredictable variation. In contrast, if a process is unstable and changing over time, the sources of variation are inconsistent and unpredictable. As a result of the instability, you cannot be confident in your statistical test results.</p>
<p>Use one of the various types of <span><a href="http://blog.minitab.com/blog/understanding-statistics/what-control-chart-should-i-use">control charts</a></span> available in Minitab <a href="http://www.minitab.com/products/minitab/">Statistical Software</a> to assess the stability of your data set. The Assistant menu can walk you through the choices to select the appropriate control chart based on your data and subgroup size. You can get advice about collecting and using data by clicking the “more” link.</p>
<p><img alt="Choose a Control Chart" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/6ec77f5dbc070eb0c2070ce6bcf8144c/1_control_chart.png" style="border-width: 0px; border-style: solid; width: 474px; height: 338px; margin: 10px 15px;" /></p>
<p><img alt="I-MR Control Chart" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/3d69fc444cd5dd09a962a11e645a3a2e/2_control_chart.png" style="border-width: 0px; border-style: solid; width: 474px; height: 338px; margin: 10px 15px;" /></p>
<p>In addition to preparing the control chart, Minitab tests for out-of-control or non-random patterns based on the <a href="http://blog.minitab.com/blog/statistics-in-the-field/using-the-nelson-rules-for-control-charts-in-minitab">Nelson Rules</a> and provides an assessment in easy-to-read Summary and Stability reports. The Report Card, depending on the control chart selected, will automatically check your assumptions of stability, normality, amount of data, correlation, and will suggest alternative charts to further analyze your data.</p>
<p><img alt="Report Card" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/195741e519156b95ee5feee8b521041f/3_control_chart.jpg" style="border-width: 0px; border-style: solid; width: 464px; height: 348px; margin: 10px 15px;" /></p>
What Is the Assumption for Measurement Systems?
<p>All the other assumptions I’ve described “assume” the data reflects reality. But does it?</p>
<p>The <span><a href="http://blog.minitab.com/blog/understanding-statistics/explaining-quality-statistics-so-my-boss-will-understand-measurement-systems-analysis-msa">measurement system</a> </span>is one potential source of variability when measuring a product or process. When a measurement system is poor, you lose the ability to truthfully “see” process performance. A poor measurement system leads to incorrect conclusions and flawed implementation. </p>
<p>Minitab can perform a Gage R&R test for both measurement and appraisal data, depending on your measurement system. You can use the Assistant in Minitab to help you select the most appropriate test based on the type of measurement system you have.</p>
<p><img alt="Choose a MSA" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/3ff089fcee9ab280c8e8d1da1c56d610/4_msa.png" style="border-width: 0px; border-style: solid; width: 474px; height: 345px; margin: 10px 15px;" /></p>
<p>There are two assumptions that should be satisfied when performing a Gage R&R for measurement data: </p>
<ol>
<li>The measurement device should be calibrated.</li>
<li>The parts to be measured should be selected from a stable process and cover approximately 80% of the possible operating range. </li>
</ol>
<p>When using a measurement device make sure it is properly calibrated and check for linearity, bias, and stability over time. The device should produce accurate measurements, compared to a standard value, through the entire range of measurements and throughout the life of the device. Many companies have a metrology or calibration department responsible for calibrating and maintaining gauges. </p>
<p>Both these assumptions must be satisfied. If they are not, you cannot be sure that your data accurately reflect reality. And that means you’ll risk not understanding the sources of variation that influence your process outcomes. </p>
The Real Reason You Need to Check the Assumptions
<p>Collecting and analyzing data requires a lot of time and effort on your part. After all the work you put into your analysis, you want to be able to reach correct conclusions. Some analyses are robust to departures from these assumptions, but take the safe route and check! You want to be confident you can tell whether observed differences between data samples are simply due to chance, or if the populations are indeed different! </p>
<p>It’s easy to put the cart before the horse and just plunge in to the data collection and analysis, but it’s much wiser to take the time to understand which data assumptions apply to the statistical tests you will be using, and plan accordingly.</p>
<p>Thank you for reading my blog. I hope this information helps you with your data analysis mission!</p>
Data AnalysisHypothesis TestingQuality ImprovementStatisticsMon, 05 Dec 2016 13:00:00 +0000http://blog.minitab.com/blog/quality-business/common-assumptions-about-data-part-3-stability-and-measurement-systemsBonnie K. StoneThe Joy of Playing in Endless Backyards with Statistics
http://blog.minitab.com/blog/adventures-in-statistics/the-joy-of-playing-in-endless-backyards-with-statistics
<p>Dear Readers,</p>
<p><img alt="Jim Frost" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/1ae3640a9bb3396a48ee4478020340d5/avatar.png" style="width: 131px; height: 186px; float: right; margin: 10px 15px;" />As 2016 comes to a close, it’s time to reflect on the passage of time and changes. As I’m sure you’ve guessed, I love statistics and analyzing data! I also love talking and writing about it. In fact, I’ve been writing statistical blog posts for over five years, and it’s been an absolute blast. John Tukey, the renowned statistician, once said, “The best thing about being a statistician is that you get to play in everyone’s backyard.” I enthusiastically agree!</p>
<p>However, when I first started writing the blog, I wondered about being able to keep up a constant supply of fresh blog posts. And, when I first mentioned to some non-statistician friends that I’d be writing a statistical blog, I noticed a certain lack of enthusiasm. For instance, I heard a variety of comments like, “So, you’ll be writing things along the lines of 9 out of 10 dentists recommend . . .” Would readers even be interested in what I had to say about statistics?</p>
<p>It turns out that with a curious mind, statistical knowledge, data, and a powerful tool like <a href="http://www.minitab.com/en-us/products/minitab/" target="_blank">Minitab statistical software</a>, the possibilities are endless. You <em>can</em> play in a wide variety of fascinating backyards! </p>
<p>The most surprising statistic is that <a href="http://blog.minitab.com/blog/adventures-in-statistics" target="_blank">my blog posts</a> have received over 5.5 million views in the past year alone. Never in my wildest dreams did I imagine so many readers when I wrote <a href="http://blog.minitab.com/blog/adventures-in-statistics/three-measurement-system-analysis-questions-to-ask-before-you-take-a-single-measurement" target="_blank">my first post</a>! It’s a real testament to the growing importance of data analysis that so many people are interested in a blog dedicated to statistics. Thank you all for reading!</p>
Endless Backyards . . .
<p><img alt="Dolphin" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/f9c1d0c9fbd374b7272f5ee2ee2716c0/dolphin.jpg" style="width: 225px; height: 150px; float: right; margin: 10px 15px;" />Some of the topics I've written about are out of this world. I’ve assessed <a href="http://blog.minitab.com/blog/adventures-in-statistics/using-statistics-to-analyze-words" target="_blank">dolphin communications</a> and compared it to the search for extraterrestrial intelligence and analyzed <a href="http://blog.minitab.com/blog/adventures-in-statistics/exoplanet-statistics-and-the-search-for-earth-twins" target="_blank">exoplanet data</a> in the search for the Earth’s twin! (As an aside, my analysis showed that my writing style is similar to dolphin communications. I'll take that as a compliment!)</p>
<p>For more Earthly subjects, I’ve studied the relationship between <a href="http://blog.minitab.com/blog/adventures-in-statistics/size-matters-metabolic-rate-and-longevity" target="_blank">mammal size and their metabolic rate and longevity</a>. I’ve analyzed raw research data to assess the <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-effective-are-flu-shots" target="_blank">effectiveness of flu shots</a> first hand. I’ve downloaded economic data to assess patterns in both the <a href="http://blog.minitab.com/blog/adventures-in-statistics/reassessing-gdp-growth-part-1" target="_blank">U.S. GDP</a> and <a href="http://blog.minitab.com/blog/adventures-in-statistics/us-job-growth-assessing-the-numbers-and-making-predictions" target="_blank">U.S. job growth</a>. For a Thanksgiving Day post, I analyzed world income data to answer the question of <a href="http://blog.minitab.com/blog/adventures-in-statistics/statistically-how-thankful-should-we-be-a-look-at-global-income-distributions-part-1" target="_blank">how thankful we should be statistically</a>. As for <a href="http://blog.minitab.com/blog/adventures-in-statistics/when-is-easter-for-the-next-2086-years" target="_blank">Easter</a>, I can tell you the date on which it falls in any of 2,517 years, along with which dates are the most and least common.</p>
<p><img alt="Mythbusters" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/7b3b8859da99d60dd3e9c7932faefba3/mythbusters.jpg" style="width: 225px; height: 149px; float: right; margin: 10px 15px;" />In the world of politics, I’ve used data to <a href="http://blog.minitab.com/blog/adventures-in-statistics/predicting-the-us-presidential-election-evaluating-two-models-part-one" target="_blank">predict the 2012 U.S. Presidential election</a>, <a href="http://blog.minitab.com/blog/adventures-in-statistics/statistical-analyses-of-the-house-freedom-caucus-and-the-search-for-a-new-speaker" target="_blank">analyzed the House Freedom Caucus and the search for the new Speaker of the House</a>, assessed the <a href="http://blog.minitab.com/blog/adventures-in-statistics/great-presidents-revisited-does-history-provide-a-different-perspective" target="_blank">factors that make a great President</a>, and even <a href="http://blog.minitab.com/blog/adventures-in-statistics/using-the-solution-desirability-matrix-to-help-mitt-romney-choose-the-vp-candidate" target="_blank">helped Mitt Romney pick a running mate</a>. Everyone talks about the weather, so of course I had to <a href="http://blog.minitab.com/blog/adventures-in-statistics/are-atlantas-winters-getting-colder-and-snowier" target="_blank">analyze that</a>. My family loves the Mythbusters and it was fun applying statistical analyses to some of the myths that they tested (<a href="http://blog.minitab.com/blog/adventures-in-statistics/busting-the-mythbusters-are-yawns-contagious" target="_blank">here</a> and <a href="http://blog.minitab.com/blog/adventures-in-statistics/using-hypothesis-tests-to-bust-myths-about-the-battle-of-the-sexes" target="_blank">here</a>). That's my family and I meeting them in the picture to the right!</p>
<p>Some of my posts have even been a bit surreal. I took my turn at attempting to explain the statistical illusion of the <a href="http://blog.minitab.com/blog/adventures-in-statistics/the-monty-hall-problem-and-the-importance-of-checking-your-assumptions" target="_blank">infamous Monty Hall problem</a>. I’ve compared <a href="http://blog.minitab.com/blog/adventures-in-statistics/world-travel-bumpy-roads-and-adjusting-your-graph-scales" target="_blank">world travel to adjusting scales in graphs</a> (seriously). I wrote a true story about how <a href="http://blog.minitab.com/blog/adventures-in-statistics/lessons-in-quality-during-a-long-and-strange-journey-home" target="_blank">I drove a plane load of passengers 200 miles to their homes</a> in the context of <img alt="ghost hunting" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/51587c9ccc575874d23335f607e520a0/nightshot.jpg" style="width: 225px; height: 127px; float: right; margin: 10px 15px;" />quality improvement! For Halloween-themed posts, I showed how to go <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-be-a-ghost-hunter-with-a-statistical-mindset" target="_blank">ghost hunting with a statistical mindset</a> and how <a href="http://blog.minitab.com/blog/adventures-in-statistics/beware-of-phantom-degrees-of-freedom-that-haunt-your-regression-models" target="_blank">regression models can be haunted by phantom degrees of freedom</a>. I analyzed the <a href="http://blog.minitab.com/blog/adventures-in-statistics/using-data-analysis-to-assess-fatality-rates-in-star-trek-the-original-series" target="_blank">fatality rates in the original Star Trek TV series</a>. I explored how some people can <a href="http://blog.minitab.com/blog/adventures-in-statistics/the-odds-of-finding-a-four-leaf-clover-revisited-how-do-some-people-find-so-many" target="_blank">find so many four leaf clovers despite their rarity</a>. And, I wondered whether <a href="http://blog.minitab.com/blog/adventures-in-statistics/can-a-statistician-say-that-age-is-just-a-number" target="_blank">a statistician can say that age is just a number</a>?</p>
<p>See, not a mention of those dentists...well, not until now. By this point, 9 out of 10 dentists are probably feeling neglected!</p>
Helping Others Perform Their Own Analyses
<p>I’ve also written many posts aimed at helping those who are learning and performing statistical analyses. I described <a href="http://blog.minitab.com/blog/adventures-in-statistics/working-at-the-edge-of-human-knowledge-part-one" target="_blank">why statistics is cool</a> based on my own personal experiences and how the whole <a href="http://blog.minitab.com/blog/adventures-in-statistics/why-statistics-is-important" target="_blank">field of statistics is growing in importance</a>. I showed how <a href="http://blog.minitab.com/blog/adventures-in-statistics/why-anecdotal-evidence-is-unreliable" target="_blank">anecdotal evidence is unreliable</a> and explained why it fails so badly. And, I took a look forward at how <a href="http://blog.minitab.com/blog/adventures-in-statistics/expanding-the-role-of-statistics-to-areas-traditionally-dominated-by-expert-judgment" target="_blank">statistical analyses are expanding into areas traditionally ruled by expert judgement</a>.</p>
<p>I zoomed in to cover the details about how to perform and interpret statistical analyses. Some might think that covering the nitty gritty of statistical best practices is boring. Yet, you’d be surprised by the lively discussions we’ve had. We’ve had heated debates and philosophical discussions about <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-correctly-interpret-p-values" target="_blank">how to correctly interpret p-values</a> and what <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests:-significance-levels-alpha-and-p-values-in-statistics" target="_blank">statistical significance</a> does and does not tell you. This reached a fever pitch when a psychology journal actually <a href="http://blog.minitab.com/blog/adventures-in-statistics/banned-p-values-and-confidence-intervals-a-rebuttal-part-1" target="_blank">banned p-values</a>!</p>
<p><img alt="Regression residuals" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/58964ccf1cb00ead2ee1735ca54886d9/residual_illustration.gif" style="width: 221px; height: 149px; float: right; border-width: 0px; border-style: solid; margin: 10px 15px;" />We had our difficult questions and surprising topics to grapple with. <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-high-should-r-squared-be-in-regression-analysis" target="_blank">How high should R-squared be</a>? <a href="http://blog.minitab.com/blog/adventures-in-statistics/choosing-between-a-nonparametric-test-and-a-parametric-test" target="_blank">Should I use a parametric or nonparametric analysis</a>? <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-interpret-a-regression-model-with-low-r-squared-and-low-p-values" target="_blank">How is it possible that a regression model can have significant variables but still have a low R-squared</a>? I even had the nerve to suggest that <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-how-to-interpret-s-the-standard-error-of-the-regression" target="_blank">R-squared is overrated</a>! And, I made the unusual case that control charts are also <a href="http://blog.minitab.com/blog/adventures-in-statistics/control-charts-not-just-for-statistical-process-control-spc-anymore" target="_blank">very important outside the realm of quality improvement</a>. Then, there is the whole frequentist versus Bayesian debate, but let’s not go there!</p>
<p>However, it’s true that not all topics about how to perform statistical analyses are riveting. I still love these topics. The world is becoming an increasingly data-driven place, and to produce trustworthy results, you must analyze your data correctly. After all, it’s surprisingly easy to <a href="http://blog.minitab.com/blog/adventures-in-statistics/applied-regression-analysis-how-to-present-and-use-the-results-to-avoid-costly-mistakes-part-1" target="_blank">make a costly mistake</a> if you don’t know what you’re doing.</p>
<p><img alt="F-distribution with probability" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/6303a2314437d8fcf2f72d9a56b1293a/f_distribution_probability.png" style="width: 250px; height: 167px; float: right; margin: 10px 15px;" />A data-driven world requires an analyst to understand seemingly esoteric details such as: the <a href="http://blog.minitab.com/blog/adventures-in-statistics/curve-fitting-with-linear-and-nonlinear-regression" target="_blank">different methods of fitting curves</a>, <a href="http://blog.minitab.com/blog/adventures-in-statistics/the-danger-of-overfitting-regression-models" target="_blank">the dangers of overfitting your model</a>, <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-how-do-i-interpret-r-squared-and-assess-the-goodness-of-fit" target="_blank">assessing goodness-of-fit</a>, <a href="http://blog.minitab.com/blog/adventures-in-statistics/why-you-need-to-check-your-residual-plots-for-regression-analysis" target="_blank">checking your residual plots</a>, and how to check for and correct <a href="http://blog.minitab.com/blog/adventures-in-statistics/what-are-the-effects-of-multicollinearity-and-when-can-i-ignore-them" target="_blank">multicollinearity</a> and <a href="http://blog.minitab.com/blog/adventures-in-statistics/curing-heteroscedasticity-with-weighted-regression-in-minitab-statistical-software" target="_blank">heteroscedasticity</a>. How do you <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-choose-the-best-regression-model" target="_blank">choose the best model</a>? Do you need to <a href="http://blog.minitab.com/blog/adventures-in-statistics/when-is-it-crucial-to-standardize-the-variables-in-a-regression-model" target="_blank">standardize your variables</a> before performing the analysis? Maybe you need a <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-tutorial-and-examples" target="_blank">regression tutorial</a>?</p>
<p>You may need to know <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-identify-the-distribution-of-your-data-using-minitab" target="_blank">how to identify the distribution of your data</a>. And just <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests:-why-we-need-to-use-hypothesis-tests-in-statistics" target="_blank">how do hypothesis tests work</a> anyway? <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-analysis-of-variance-anova-and-the-f-test" target="_blank">F-tests</a>? <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-t-tests-t-values-and-t-distributions" target="_blank">T-tests</a>? How do you <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-test-your-discrete-distribution" target="_blank">test discrete data</a>? <a href="http://blog.minitab.com/blog/adventures-in-statistics/when-should-i-use-confidence-intervals-prediction-intervals-and-tolerance-intervals" target="_blank">Should you use a confidence interval, prediction interval, or a tolerance interval</a>? <a href="http://blog.minitab.com/blog/adventures-in-statistics/use-random-assignment-in-experiments-to-combat-confounding-variables" target="_blank">How do you know when X causes a change in Y</a>? <a href="http://blog.minitab.com/blog/adventures-in-statistics/confound-it-some-more-how-a-factor-that-wasnt-there-hampered-my-analysis" target="_blank">Is a confounding variable distorting your results</a>? <a href="http://blog.minitab.com/blog/adventures-in-statistics/repeated-measures-designs-benefits-challenges-and-an-anova-example" target="_blank">What are the pros and cons of using a repeated measures design</a>? <a href="http://blog.minitab.com/blog/adventures-in-statistics/did-welchs-anova-make-fishers-classic-one-way-anova-obsolete" target="_blank">Fisher’s or Welch’s ANOVA</a>? <a href="http://blog.minitab.com/blog/adventures-in-statistics/the-power-of-multivariate-anova-manova" target="_blank">ANOVA or MANOVA</a>? <a href="http://blog.minitab.com/blog/adventures-in-statistics/linear-or-nonlinear-regression-that-is-the-question" target="_blank">Linear or nonlinear regression?</a></p>
<p>These may not be “sexy” topics but they are the meat and potatoes of being able to draw sound conclusions from your data. And, based on numerous blog comments, they have been well received by many people. In fact, the most rewarding aspect of writing blog posts has been the interactions I've had with all of you. I've communicated with literally hundreds and hundreds of students learning statistics and practitioners performing statistics in the field. I’ve had the pleasure of learning how you use statistical analyses, understanding the difficulties you face, and helping you resolve those issues.</p>
<p>It's been an amazing journey and I hope that my blog posts have allowed you to see statistics through my eyes―as a key that can unlock discoveries that are trapped in your data. After all, that's the reason why I titled my blog <em>Adventures in Statistics</em>. Discovery is a bumpy road. There can be statistical challenges en route, but even those can be interesting, and perhaps even rewarding, to resolve. Sometimes it is the <a href="http://blog.minitab.com/blog/adventures-in-statistics/the-mysteries-of-variability-and-power" target="_blank">perplexing mystery in your data that prompts you to play detective and leads you to surprising new discoveries</a>!</p>
<p>To close out the old year, it's good to remember that change is constant. There are bound to be many new and exciting adventures in the New Year. I wish you all the best in your endeavors. </p>
<p>“We will open the book. Its pages are blank. We are going to put words on them ourselves. The book is called Opportunity and its first chapter is New Year's Day.” <em>― Edith Lovejoy Pierce </em></p>
<p>May you all find happiness in 2017! Onward and upward!</p>
<p>Jim</p>
Data AnalysisStatisticsStatistics HelpStatsWed, 30 Nov 2016 15:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/the-joy-of-playing-in-endless-backyards-with-statisticsJim FrostMutant Trees Lay Waste to the Landscape and Reveal Mother Nature's Lean Design
http://blog.minitab.com/blog/data-analysis-and-quality-improvement-and-stuff/mutant-trees-lay-waste-to-the-landscape-and-reveal-mother-natures-lean-design
<p>The season of change is upon us here at Minitab's World Headquarters. The air is crisp and clear and the landscape is ablaze in vibrant fall colors. As I drove to work one recent morning, I couldn't help but soak in the beauty surrounding me and think, "Too bad everything they taught me as a kid was a lie."</p>
<p><img alt="fall trees" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/c2cb2bd427165df25e0ca2b38ef59381/trees.jpg" style="width: 208px; height: 182px; margin: 10px 15px; float: right;" />You see, as a boy growing up in New Hampshire, I was told that the sublime beauty of autumn was just a happy accident. As the days become shorter, the trees succumb to their own version of seasonal affective disorder; they stop producing chlorophyll because... well, what's the point? As a result of this photosynthetic funk, the green begins to drain from the leaves and the less pragmatic pigments prevail, if briefly.</p>
<p>But thanks to mutant trees, I now know the truth. Or at least one possible explanation. I refer, of course, to the findings of Hoch, Singsaas, and McCown, in their 2003 paper, "<a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC281624/" target="_blank">Resorption Protection. Anthocyanins Facilitate Nutrient Recovery in Autumn by Shielding Leaves from Potentially Damaging Light Levels.</a>"</p>
<p>In truth, I shouldn't say that what I learned as kid was a <em>lie</em>. The theory of autumn by chromatic attrition might still be true to some extent. But I was intrigued to discover recently that newer theories posit a more adaptive role for the annual display. For example, one theory suggests that the bright displays evolved to inform potentially injurious insects that they are barking up the wrong tree. (For more information, see Archetti and Brown 2004, "<a href="http://harvardforest.fas.harvard.edu/sites/harvardforest.fas.harvard.edu/files/leaves/Archetti_%20Brown_2004.pdf" target="_blank">The coevolution theory of autumn colours</a>".)</p>
<p>But most interesting to me was the discovery that red pigments aren't just late-season hold-outs—production of these pigments is actually ramped up in the fall. Obviously, the "Accidental Autumn" explanation doesn't hold in this case. In their paper, Hoch and colleagues present evidence that anthocyanins, which produce red fall colors, actually help trees prepare for winter.</p>
<p>Here's where the mutants come in. The theory is that the anthocyanins act as a kind of sunblock to protect the leaves while the tree recovers valuable nutrients from the leaves before sending them downward and duffward.</p>
<p>To test this theory, the scientists sampled leaves from normal (wild) trees and from mutant trees that possessed superhuman powers. Well, actually, all trees possess superhuman powers because all trees can produce food from sunlight. (I've yet to meet a human who can do that.) But in this case, affected trees had a mutation that prevented them from producing anthocyanins and turning red in the fall. </p>
<p>It's always easier to understand what your data are showing you when you can look at the results of your analysis in a graph. I used <a href="http://www.minitab.com/products/minitab">Minitab Statistical Software</a> to create a couple of graphs that illustrate some of the results shared in the paper. </p>
Before and after nitrogen levels
<p>The scientists measured the nitrogen levels in the leaves before and after the period when the trees normally recover as much of that valuable nutrient as they can. This graph shows the before and after nitrogen levels for mutant and wild-type specimens of 3 different tree species. The graph shows that the nitrogen levels in the leaves tend to drop more for the wild trees, indicating that they are more successful at recovering the nitrogen than the mutant trees. </p>
<p style="margin-left: 40px;"><img alt="Line plot of before and after nitrogen levels" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/d0620bb01ef55623402cd4b603e3f861/lineplotbeforeafter.jpg" style="width: 459px; height: 306px;" /></p>
Resorption efficiency
<p>This <span><a href="http://blog.minitab.com/blog/quality-data-analysis-and-statistics/bar-charts-decoded">bar chart</a></span> shows the same data, but expressed as "Resorption Efficiency," which is just the percent change between the before and after nitrogen levels. The graph suggests that the lack of anthocyanins hampered the ability of the mutant trees to recover the nitrogen from their leaves. </p>
<p style="margin-left: 40px;"><img alt="Bar chart of resorption efficiency" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/f42ddf69c4e21e9804b053dabef3623c/barchartresorptionefficiency.jpg" style="width: 459px; height: 306px;" /></p>
<p>So, rather than simply accepting seasonal spikes in scrap waste, it appears that mother nature is a much better quality engineer than we had given her credit for. In addition to dazzling us with some beautiful color before winter sets in, those brilliant reds are actually adding value to the process by helping to reduce waste.</p>
<p>My newfound appreciation for nature's lean genius inspired me to do a little exploring around Minitab's World Headquarters and capture some images of industrious anthocyanins hard at work improving plant profitability. Along with some cows. If you've never had the opportunity to see trees do this—and even if you have—perhaps you'll enjoy the images shared below. </p>
<p>Happy Autumn! </p>
<p><em>Corn rows weave under undulating clouds</em><br />
<img alt="Harvest has come" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/bbd7dc445005ffda4874c3ae424ab730/maze_2.jpg" style="width: 500px; height: 378px;" /></p>
<p><em>Rusty barns rest after the harvest</em><br />
<img alt="Barn" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/292dfb266a78f7fae35a28648b6b33a4/barn__enhanced.jpg" style="width: 500px; height: 262px;" /></p>
<p><em>Rustling stalks spread from road to ridge</em><br />
<img alt="Ridge and meadow" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/f1f734dd3153923e80039eab70dba9e9/ridge_and_meadow.jpg" style="font-size: 13px; width: 500px; height: 269px;" /></p>
<p><i>Heifers</i><em> forage contentedly under a calm fall sky</em><br />
<img alt="Cows" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/46f84b64ecc05fb461d3c9fe7c67d5c5/cows.jpg" style="font-size: 13px; width: 500px; height: 545px;" /></p>
<p><em>Autumn finery frames the fabled Beaver Stadium </em><br />
<img alt="Fabled Beaver Stadium" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/5a6bde627d438a786ca7edaf37f2ca27/stadium_framed_by_field_and_tree.jpg" style="width: 500px; height: 555px;" /></p>
<p><em>Scenic splendor surrounds majestic Mount Nittany </em><br />
<img alt="Mount Nittany" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/4abbb7f7b6a09d5e1a3e92b9205511ee/flaming_frame__bright.jpg" style="width: 500px; height: 258px;" /></p>
<p><em>Wary hawk takes wing amid wild autumn hues</em><br />
<img alt="Hawk on the wing" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/a1fd4a9278394d7917dd4928c6b09c13/soar_2.jpg" style="width: 500px; height: 528px;" /></p>
<p><em>Opportunistic apparitions hang around to haunt passers by</em><br />
<img alt="Ghosts" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/3c8869125bd9b7f17deef5a9506018c0/ghosts2.jpg" style="width: 500px; height: 209px;" /></p>
<p><em>Minitab World Headquarters looms large on the landscape</em><br />
<img alt="Minitab World Headquarters" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/a7e45f704289b992aa7f69eb39a92a8d/peeper_tab.jpg" style="width: 500px; height: 293px;" /></p>
<p> </p>
<p> </p>
Fun StatisticsStatisticsStatistics in the NewsFri, 18 Nov 2016 13:00:00 +0000http://blog.minitab.com/blog/data-analysis-and-quality-improvement-and-stuff/mutant-trees-lay-waste-to-the-landscape-and-reveal-mother-natures-lean-designGreg FoxHow Effective Are Flu Shots?
http://blog.minitab.com/blog/adventures-in-statistics/how-effective-are-flu-shots
<p><img alt="Influenza virus" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/9786f693e9bfb040dea4b7d56bf5c60e/influenza_virus.jpg" style="float: right; width: 175px; height: 256px; margin: 10px 15px;" />Once again, with the arrival of autumn, it's time for a flu shot.</p>
<p>I get a flu shot every year even though I know they’re not perfect. I figure they’re a relatively easy and inexpensive way to reduce the chance of having a miserable week.</p>
<p>I’ve heard on various news media that their effectiveness is about 60%. But what does 60% effectiveness mean, exactly? How much does this actually reduce the chances that I’ll get the flu in any given year? I'm going to explore this and go beyond the news media simplification and present you with very clear answers to these questions. Quite frankly, some of the results were not what I expected.</p>
We’ll Find Our Answers in Randomized, Controlled Trials (RCTs)
<p>I’m a numbers guy. I use numbers to understand the world. My background is in research, so when I want to understand an issue, I look at the primary research. If I can understand the researchers’ methodology, the data they collect, and how they draw their conclusions, I’ll understand the issue at a deeper, more fundamental level than news reports typically provide. </p>
<p>To understand flu shot effectiveness, I’m only going to assess double-blind, randomized controlled trials, the gold standard. These studies are more expensive to conduct but provide better results than observational studies. (I discuss the differences between these two types of studies in my post about the <a href="http://blog.minitab.com/blog/adventures-in-statistics/statistics-that-affect-you-are-vitamin-supplements-really-harmful" target="_blank">benefits of vitamins</a>.)</p>
<p>The two influenza vaccination studies I’ll look at satisfy the above criteria and are listed in a section of references for health professionals on the CDC’s <a href="http://www.cdc.gov/flu/professionals/vaccination/effectivenessqa.htm#references" target="_blank">website</a>. Presumably these studies make a good case, using trusted data. Along the way, we’ll use Minitab <a href="http://www.minitab.com/products/minitab">statistical software</a> to analyze their data for ourselves.</p>
Defining the Effectiveness of Flu Shots
<p>Flu shots contain vaccine for three influenza viruses that researchers predict will be the most common in a given flu season. However, plenty of other viruses (flu and otherwise) also are circulating and can make you sick. Many illnesses with flu-like symptoms are incorrectly attributed to the flu.</p>
<p>Consequently, the best studies use a lab to identify the specific virus that infects each of their sick subjects. These studies only count the subjects with confirmed cases of the three types of influenza virus. Effectiveness is defined as the reduction in these three influenza viruses among those who were vaccinated compared to those who were not vaccinated.</p>
The Two Studies of the Flu Vaccine
<p>It’s time to dig into the data! For me, this is where it gets exciting. You can hear about effectiveness on TV, but this is where it all comes from: counts of sick people in the experimental groups.</p>
The Beran Study
<p>The Beran et al. study1 assesses the 2006/2007 flu season and tracks its subjects from September to May. Subjects in this study range from 18-64 years old.</p>
<p style="text-align: center;"><strong>Treatment</strong></p>
<p style="text-align: center;"><strong>Flu count</strong></p>
<p style="text-align: center;"><strong>Group size</strong></p>
<p style="text-align: center;">Shot</p>
<p style="text-align: center;">49</p>
<p style="text-align: center;">5103</p>
<p style="text-align: center;">Placebo</p>
<p style="text-align: center;">74</p>
<p style="text-align: center;">2549</p>
<p>Because we want to compare the proportions between two groups, we’ll use the Two Proportions test in Minitab. To do this yourself, in Minitab, go to <strong>Stat > Basic Statistics > 2 Proportions</strong>. In the dialog, choose <strong>Summarized data</strong> and enter the data from the table above. Click<strong> OK</strong>, and you get the results below:</p>
<p style="margin-left: 40px;"><img alt="Minitab's Two Proportions test for the flu data" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/a3f37da27803215fb8fa3cd85d0b7924/flustudyberan.gif" style="width: 471px; height: 209px;" /></p>
<p>The p-value of 0.000 tells us that there is a significant difference between the two groups. The estimated difference between the vaccinated group and the placebo group is 1.9 percentage points. Because this is an RCT, it's fairly safe to assume that the vaccination caused the difference between the groups. However, outside of a randomized experiment, it's not wise to assume causality.</p>
<p>The vaccine effectiveness (or efficacy) is a relative reduction in risk between the two groups. You simply take the relative risk ratio of (vaccinated proportion/unvaccinated proportion) and subtract that from 1. We can get the proportion for each group from the Sample p column in Minitab’s output:</p>
<p style="margin-left: 40px;">1 - (0.009602/0.029031) = 0.669</p>
<p>This study finds a 66.9% vaccine efficacy for the flu shot compared to the placebo.</p>
The Monto Study
<p>The Monto et al study2 assesses the 2007-2008 flu season and tracks its subjects from January to April. Subjects in this study range from 18-49 years old.</p>
<p style="text-align: center;"><strong>Treatment</strong></p>
<p style="text-align: center;"><strong>Flu count</strong></p>
<p style="text-align: center;"><strong>Group size</strong></p>
<p style="text-align: center;">Shot</p>
<p style="text-align: center;">28</p>
<p style="text-align: center;">813</p>
<p style="text-align: center;">Placebo</p>
<p style="text-align: center;">35</p>
<p style="text-align: center;">325</p>
<p>We’ll do the Two Proportions test again for this study. This time, enter the numbers from the above table into the dialog.</p>
<p style="margin-left: 40px;"><img alt="Minitab's Two Proportions test for the flu data" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/89fb6b907bf288375b5faaeba5ff85ee/flustudymonto.gif" style="width: 469px; height: 210px;" /></p>
<p>Again, the p-value indicates that there is a significant difference between the two groups. The estimated difference between the vaccinated group and the placebo group is 7.3 percentage points. Let's calculate the effectiveness:</p>
<p style="margin-left: 40px;">1 – (0.034440/0.107692) = 0.680</p>
<p>This study finds a 68.0% vaccine efficacy for the flu shot compared to the placebo.</p>
Conclusions So Far
<p>We’ve looked at the data from two gold-standard studies and have drawn the same conclusions that you commonly hear on the news. Flu shots significantly reduce the number of influenza infections, and they are about 68% effective.</p>
<p>However, looking at the data and analyses myself, I have new insights. Specifically, the low number of influenza cases in the placebo group for each study caught my eye, and that’s what we’re looking at next.</p>
What It Means for You: Relative versus Absolute Risk
<p>If you’re like me, the 68% effective statistic isn’t too helpful. The problem is that it is a relative comparison of risk, not an absolute assessment of risk. To illustrate the difference, consider which type of assessment is more useful:</p>
<ol>
<li><strong>Relative assessment:</strong> Your car is travelling half as fast as another car, but you don’t know the true speed of either car.<br />
</li>
<li><strong>Absolute assessment:</strong> Your car is travelling at 30 MPH and the other car is travelling at 60 MPH.</li>
</ol>
<p>Clearly, #2 is much more useful. Similarly, it would be more helpful to know the absolute risk of catching the flu if you get the shot versus not getting it!</p>
Vaccine effectiveness is a relative risk
<p>Vaccine effectiveness doesn’t tell you the exact risk of catching the flu for either group. Instead, it involves dividing one proportion by the other for the relative risk. In fact, as you should recall, effectiveness is the inverse of the relative risk, which makes it even <em>harder</em> to interpret. 67% effectiveness indicates that a vaccinated person has one-third the risk of contracting the flu as a non-vaccinated person.</p>
<p>Unfortunately, using these numbers, we don’t know the absolute risk for anyone!</p>
The group proportions are the absolute risks
<p>We can estimate the absolute risk from the studies by looking at the proportion for each group in the Minitab output, and subtracting to calculate the absolute reduction. I’ll summarize this information below as percentages and even add in the results for two more flu seasons from another study that the CDC references (Bridges et al.3):</p>
<p style="text-align: center;"><strong>Flu season</strong></p>
<strong>Placebo</strong>
<p style="text-align: center;"><strong>Flu Shot</strong></p>
<p style="text-align: center;"><strong>% Point Reduction</strong></p>
<p style="text-align: center;">1997/98</p>
4.4
<p style="text-align: center;">2.2</p>
<p style="text-align: center;">2.2</p>
<p style="text-align: center;">1998/99</p>
10.0
<p style="text-align: center;">1.0</p>
<p style="text-align: center;">9.0</p>
<p style="text-align: center;">2006/07</p>
2.9
<p style="text-align: center;">1.0</p>
<p style="text-align: center;">1.9</p>
<p style="text-align: center;">2007/08</p>
10.8
<p style="text-align: center;">3.4</p>
<p style="text-align: center;">7.4</p>
<p style="text-align: center;"><strong>Average</strong></p>
<strong>7.0</strong>
<p style="text-align: center;"><strong>1.9</strong></p>
<p style="text-align: center;"><strong>5.1</strong></p>
<p>Notice how the risk of getting the flu varies by flu season? The differences are not surprising because the studies use different samples and the flu seasons have different influenza viruses.</p>
<p>So let’s look at the average of these four flu seasons. If you aren’t vaccinated, you have a 7.0% chance of getting the flu. However, if you do get the flu shot, your risk is about 1.9%, which is a reduction of 5.1 percentage points.</p>
<p>Hmm. The "5.1% reduction" doesn’t sound nearly as impressive as the "67% effectiveness!" Both statistics are based on the same data, but I think the estimate of absolute risk is a more useful way to present the results.</p>
Closing Thoughts about the Flu Shot Data
<p>I was surprised by the results. While I knew flu shots were not perfect, I always got them because I thought they reduced my risk by more than what the CDC recommended studies actually show. Even if you aren’t vaccinated, your risk of getting the flu isn’t too high.</p>
<p>That probably explains why a number of people have told me that while they never get flu shots, they can’t remember having the flu!</p>
<p>These more subtle results made me wonder about flu vaccinations on a societal scale. Could the flu vaccine possibly reduce flu cases enough to save sufficient money (lost workdays, doctor and drug costs, etc) to pay for the vaccinations?</p>
<p>Bridges et al. conducted a cost-benefit analysis in their study. For the two flu seasons where they tracked flu vaccinations, infections, and expenditures, the vaccinations actually <em>increase </em>net societal costs. It would’ve been cheaper overall not to get vaccinated!</p>
<p>In light of this, I wasn’t surprised when I read an <a href="http://www.cnn.com/2013/01/17/health/flu-vaccine-policy/index.html?hpt=hp_bn12" target="_blank">article</a> on CNN.com that said, outside of the U.S. and Canada, other countries do not strongly encourage all of their citizens above 6 months to get the flu shot. According to the article, “global health experts say the data aren’t there yet to support this kind of vaccination policy, nor is there enough money.”</p>
<p>I understand this viewpoint better now.</p>
<p><strong><em>However, I’m not trying to talk anyone out of getting a flu shot.</em></strong> I’m on the fence myself. While the risk of getting the flu in any given year is fairly small, if you regularly get the flu shot, you’ll probably spare yourself a week of misery at some point! You should always consult a medical professional to determine the best decision for your specific situation.</p>
<p>In another post, I look at the <a href="http://blog.minitab.com/blog/adventures-in-statistics/flu-shot-followup-assessing-the-long-term-benefits-of-flu-vaccination">long-term benefits of flu vaccinations</a>.</p>
<p><strong>References</strong></p>
<p>1. Beran J, Vesikari T, Wertzova V, Karvonen A, Honegr K, Lindblad N, Van Belle P, Peeters M, Innis BL, Devaster JM. Efficacy of inactivated split-virus influenza vaccine against culture-confirmed influenza in healthy adults: a prospective, randomized, placebo-controlled trial. J Infect Dis 2009;200(12):1861-9</p>
<p>2. Monto AS, Ohmit SE, Petrie JG, Johnson E, Truscon R, Teich E, Rotthoff J, Boulton M, Victor JC. Comparative efficacy of inactivated and live attenuated influenza vaccines. N Engl J Med. 2009;361(13):1260-7</p>
<p>3. Bridges CB, Thompson WW, Meltzer MI, Reeve GR, Talamonti WJ, Cox NJ, Lilac HA, Hall H, Klimov A, Fukuda K. Effectiveness and cost-benefit of influenza vaccination of healthy working adults: A randomized controlled trial. JAMA. 2000;284(13):1655-63</p>
Data AnalysisWed, 09 Nov 2016 12:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/how-effective-are-flu-shotsJim Frost8 Expert Tips for Excellent Designed Experiments (DOE)
http://blog.minitab.com/blog/understanding-statistics/8-expert-tips-for-excellent-designed-experiments-doe
<p>If your work involves quality improvement, you've at least <em>heard</em> of Design of Experiments (DOE). You probably know it's the most efficient way to optimize and improve your process. But many of us find DOE intimidating, especially if it's not a tool we use often. How do you select an appropriate design, and ensure you've got the right number of factors and levels? And after you've gathered your data, how do you pick the right model for your analysis?</p>
<p><img alt="gauge" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a43ef5d7bf55aac81f8e316f48e5f40e/gauge.png" style="width: 300px; height: 212px; margin: 10px 15px; float: right;" />One way to get started with DOE is the Assistant in Minitab Statistical Software. When you have many factors to evaluate, the Assistant will walk you through <span>a <a href="http://blog.minitab.com/blog/understanding-statistics/applying-doe-for-great-grilling-part-1">DOE to identify which factors matter the most (screening designs)</a></span>. Then the Assistant can guide you through <span>a <a href="http://blog.minitab.com/blog/understanding-statistics/applying-doe-for-great-grilling-part-2">designed experiment to fine-tune the important factors for maximum impact (optimization designs)</a></span>. </p>
<p>If you're comfortable enough to skip the Assistant, but still have some questions about whether you're approaching your DOE the right way, consider the following tips from Minitab's technical trainers. These veterans have done a host of designed experiments, both while working with Minitab customers and in their careers in before they became Minitab trainers. </p>
1. Identify the right variable space to study with exploratory runs.
<p>Performing exploratory runs before doing the main experiment can help you identify the settings of your process as performance moves from good to bad. This can help you determine the variable space to conduct your experiment that will yield the most beneficial results. </p>
2. Spread control runs throughout the experiment to measure process stability.
<div style="text-align: left">Since <a href="http://blog.minitab.com/blog/michelle-paret/doe-center-points-what-they-are-why-theyre-useful">center-point runs are usually near-normal operating conditions</a>, they can act as a control to check process performance. By spacing center points evenly through the design, these observations serve as an indicator of the stability of your process—or lack thereof—during the experiment. </div>
3. Identify the biggest problems with Pareto analysis.
<div style="text-align: left">A Pareto chart of product load or defect levels can help you identify which problem to fix that will result in the highest return to your business. Focusing on problems with high business impact improves support for your experiment by raising its priority among all potential improvement projects.</div>
<div style="text-align: left"> </div>
<div style="text-align: left; margin-left: 40px;"><img alt="Pareto Chart of the Effects" src="https://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/4c725391e378fa9b080480d6430d4199/pareto_chart.bmp" style="width: 577px; height: 385px;" /></div>
4. Improve power by expanding the range of input settings.
<div style="text-align: left;">Test the largest range of input variable settings that is physically possible. Even if you think they are far away from the “sweet spot,” this technique will allow you to use the experiment to understand your process so that you can find the optimal settings.</div>
<div style="text-align: left; margin-left: 40px;"><br />
<img alt="Maximizing your variable space can help you discover new insights about your process. " src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/7ade9a7affca6b322f225b3d2fa21186/expand_range_of_settings.gif" style="margin: 10px; border: thin solid; width: 640px; height: 429px;" title="Maximizing your variable space can help you discover new insights about your process. " /></div>
5. Fractionate to save runs, focusing on Resolution V designs.
<p>In many cases, <a href="http://blog.minitab.com/blog/applying-statistics-in-quality-projects/design-of-experiments-fractionating-and-folding-a-doe">it's beneficial to choose a design with ½ or ¼ of the runs of a full factorial</a>. Even though effects could be confounded or confused with each other, Resolution V designs minimize the impact of this confounding which allows you to estimate all main effects and two-way interactions. Conducting fewer runs can save money and keep experiment costs low.</p>
<img alt="Choosing the right fractional factorial helps reduce the size of your experiment while minimizing the level of confounding of effects. " src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/cee9a8474f5f880fb7e9c51b59f7b393/available_factorial_designs.png" style="margin: 10px; width: 448px; height: 177px;" title="Choosing the right fractional factorial helps reduce the size of your experiment while minimizing the level of confounding of effects. " />
6. Improve the power of your experiment with replicates.
<p>Power is the probability of detecting an effect on the response, if that effect exists. The number of replicates affects your experiment's power. To increase the chance that you will be successful identifying the inputs that affect your response, add replicates to your experiment to increase its power.</p>
<p style="margin-left: 40px;"><img alt="Power is a function of the number of replicates. " src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ea56cb24542fe92b4e850c840c823652/replicates_and_power.png" style="margin: 10px; width: 430px; height: 373px;" title="Power is a function of the number of replicates. " /></p>
7. Improve power by using quantitative measures for your response.
<p>Reducing defects is the primary goal of most experiments, so it makes sense that defect counts are often used as a response. But defect counts are a very expensive and unresponsive output to measure. Instead, try measuring a quantitative indicator related to your defect level. Doing this can decrease your sample size dramatically and improve the power of your experiment. </p>
8. Study all variables of interest and all key responses.
<p>Factorial designs let you take a comprehensive approach to studying all potential input variables. Removing a factor from the experiment slashes your chance of determining its importance to zero. With the tools available in <a href="http://www.minitab.com/products/minitab">statistical software such as Minitab</a> to help, you shouldn't let fear of complexity cause you to omit potentially important input variables. </p>
<p>Do you have any DOE tips to add to this list?</p>
Design of ExperimentsLean Six SigmaQuality ImprovementSix SigmaStatisticsWed, 02 Nov 2016 13:27:00 +0000http://blog.minitab.com/blog/understanding-statistics/8-expert-tips-for-excellent-designed-experiments-doeEston MartzCommon Assumptions about Data (Part 1: Random Samples and Statistical Independence)
http://blog.minitab.com/blog/quality-business/common-assumptions-about-data-part-1-random-samples-and-statistical-independence
<p><img alt="horse before the cart road sign" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/cc91865a3d4df6456934528866576a1b/horse_warning_sign.png" style="margin: 10px 15px; float: right; width: 120px; height: 120px;" /></p>
<p>Statistical inference uses data from a sample of individuals to reach conclusions about the whole population. It’s a very <span>powerful tool</span>. But as the saying goes, “With great power comes great responsibility!” When attempting to make inferences from sample data, you must check your assumptions. Violating any of these assumptions can result in false positives or false negatives, thus invalidating your results. In other words, you run the risk that your results are wrong, that your conclusions are wrong, and hence that the solutions you implement won’t solve the problem (unless you’re <em>really</em> lucky!).</p>
<p>You’ve heard the joke about <a href="https://www.goodreads.com/quotes/192478-you-should-never-assume-you-know-what-happens-when-you">what happens when you assume</a>? For this post, let’s instead ask “What happens when you fail to check your assumptions?” After all, we’re human—and humans assume things all the time. Suppose, for example, I want to schedule a phone meeting with you and I’m in the U.S. Eastern time zone. It’s easy for me to assume that everyone is in same time zone, but you’re really in California, or Australia. What would happen if I called a meeting at 2:00 p.m. but didn’t specify the time zone? Unless you checked, you might be early or late to the meeting, or miss it entirely! </p>
<p>The good news is that when it comes to the assumptions in statistical analysis, Minitab has your back. Minitab 17 has even more features to help you verify and validate the needed statistical analysis assumptions before you finalize your conclusion. When you use <a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/angst-over-anova-assumptions-ask-the-assistant">the Assistant in Minitab</a>, the software will identify the appropriate assumptions for your analysis, provide guidance to help you develop robust data collection plans, check the assumptions when you analyze your data, and let you know the results in an easy-to-understand Report Card and Diagnostic Report.</p>
<p>The common data assumptions are: random samples, independence, normality, equal variance, stability, and that your measurement system is accurate and precise. In this post, we’ll address random samples and statistical independence.</p>
What Is the Assumption of Random Samples?
<p>A sample is random when each data point in your population has an equal chance of being included in the sample; therefore <a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/collecting-random-data-isnt-monkey-business">selection of any individual happens by chance, rather than by choice</a>. This reduces the chance that differences in materials or conditions strongly bias results. Random samples are more likely to be representative of the population; therefore you can be more confident with your statistical inferences with a random sample. </p>
<p>There is no test that assures random sampling has occurred. Following good sampling techniques will help to ensure your samples are random. Here are some common approaches to making sure a sample is randomly created:</p>
<ul>
<li>Using a random number table or feature in Minitab (Figure 1).</li>
<li>Systematic selection (every nth unit or at specific times during the day).</li>
<li>Sequential selection (taken in sequence for destructive testing, etc.).</li>
<li>Avoiding the use of judgement or convenience to select samples.</li>
</ul>
<p><img alt="Minitab dialog boxes" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/9f122a1ab01790cdc5f6530ec3a90a14/assumptions_dialog_box.png" style="border-width: 0px; border-style: solid; width: 700px; height: 421px;" /></p>
<p><em>Figure 1. Random Data Generator in Minitab 17</em></p>
<p>Non-random samples introduce bias and can result in incorrect interpretations.</p>
What Is the Assumption of Statistical Independence?
<p>Statistical independence is a critical assumption for many statistical tests, such as the 2-sample t test and ANOVA. Independence means the value of one observation does not influence or affect the value of other observations. Independent data items are not connected with one another in any way (unless you account for it in your model). This includes the observations in both the “between” and “within” groups in your sample. Non-independent observations introduce bias and can make your statistical test give too many false positives. </p>
<p>Following good sampling techniques will help to ensure your samples are independent. Common sources of non-independence include:</p>
<ul>
<li>Observations that are close together in time.</li>
<li>Observations that are close together in space or nested.</li>
<li>Observations that are somehow related.</li>
</ul>
<p>Minitab can test for independence using the Chi-Square Test for Association, which is designed to determine if the distribution of observations for one variable is similar for all categories of the second variable. </p>
The Real Reason You Need to Check the Assumptions
<p>You will be putting a lot of time and effort into collecting and analyzing data. After all the work you put into the analysis, you want to be able to reach correct conclusions. You want to be confident that you can tell whether observed differences between data samples are simply due to chance, or if the populations are indeed different! </p>
<p>It’s easy to put the cart before the horse and just plunge in to the data collection and analysis, but it’s much wiser to take the time to understand which data assumptions apply to the statistical tests you will be using, and plan accordingly.</p>
<p>In my next blog post, I will <a href="http://blog.minitab.com/blog/quality-business/common-assumptions-about-data-part-2-normality-and-equal-variance">review the Normality and Equal Variance assumptions</a>. </p>
Data AnalysisStatisticsStatistics HelpStatsMon, 24 Oct 2016 12:00:00 +0000http://blog.minitab.com/blog/quality-business/common-assumptions-about-data-part-1-random-samples-and-statistical-independenceBonnie K. StoneImproving Cash Flow and Cutting Costs at Bank Branch Offices
http://blog.minitab.com/blog/understanding-statistics/improving-cash-flow-and-cutting-costs-at-bank-branch-offices
<p>Every day, thousands of people withdraw extra cash for daily expenses. Each transaction may be small, but the total amount of cash dispersed over hundreds or thousands of daily transactions can be very high. But every bank branch has a fixed cash flow, which must be set without knowing what each customer will need on a given day. This creates a challenge for financial entities. Customers expect their local bank office to have adequate cash on hand, so how can a bank confidently ensure each branch has enough funds to handle transactions without keeping too much in reserve?</p>
<p><img alt="Grupo Mutual" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/b2366c2da44cd861775ebab6c6d07e55/grupo_mutual_logo_200w_1_.png" style="width: 200px; height: 95px; margin: 10px 15px; float: right;" />A quality project team led by Jean Carlos Zamora and Francisco Aguilar tackled that problem at Grupo Mutual, a financial entity in Costa Rica.</p>
<p>When the project began, each of Grupo Mutual's 55 branches kept additional cash in a vault to avoid having insufficient funds. But without a clear understanding of daily needs, some branches often ran out of cash anyway, while others had significant unused reserves.</p>
<p>When a branch ran short, it created high costs for the company and gave customers three undesirable options: receive the funds as an electronic transfer, wait 1–3 days for consignment, or travel to the main branch to withdraw their cash. Having the right amount of cash in each branch vault would reduce costs and maintain customer satisfaction.</p>
<p>Using <a href="http://www.minitab.com/products/minitab">Minitab Statistical Software</a> and Lean Six Sigma methods, the team set out to determine the optimal amount of currency to store at each branch to avoid both a negative cash flow and idle funds. The team followed the five-phase <a href="http://blog.minitab.com/blog/real-world-quality-improvement/dmaic-vs-dmadv-vs-dfss">DMAIC (Define, Measure, Analyze, Improve, and Control)</a> method. In the Define phase, they set the goal: creating an efficient process that transferred cash from idle vaults to branches that needed it most.</p>
<p>In the Measure phase, the team analyzed two years' worth of cash-flow data from the 55 branches. “Managing the databases and analyzing about 2,000 data points from each of the 55 branches was our biggest challenge,” says Jean-Carlos Zamora Mora, project leader and improvement specialist at Grupo Mutual. “Minitab played a very important part in addressing this issue. It reduced the analysis time by helping us identify where to focus our efforts to improve our process.” </p>
<p>The Analyze phase began with an analysis of variance (ANOVA) for to explore how the banks’ cash flow varied per month. They used Minitab to identify which months were different from one another, and grouped similar months together to streamline the analysis. </p>
<p>The team next used control charts to graph the data over time and assess whether or not the process was stable, in preparation for conducting capability analysis. To choose the right control chart and create comprehensive summaries of the results, the team used the Minitab Assistant.</p>
<p style="margin-left: 40px;"><img alt="grupo mutual i-mr chart" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/2d9ac9b2597c592e5be5b779bae85076/grupo_mutual_i_mr_chart_1_.png" style="width: 585px; height: 432px;" /></p>
<p>The team then performed a capability analysis of each group’s current cash flow to determine whether customer transactions matched the services provided, and establish the percentage of cash used at each branch.</p>
<p style="margin-left: 40px;"><img alt="grupo mutual capability analysis" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f0e25ef8282111550e8fe8733eb889de/grupo_mutual_capability_analysis_1_.png" style="width: 586px; height: 439px;" /></p>
<p>The analysis revealed that, in total, the vaults contained more than the necessary funds each branch needed to operate effectively, but excessive circulation of the money caused some to overdraw their vaults while others stored cash that was not utilized. </p>
<p>“We found a positive cash balance at 95% of the branches,” says Zamora Mora. “The analysis showed the cash on hand to meet customer needs exceeded the requirements by over 200%, so we suddenly had lots of money to invest.” </p>
<p>The analysis gave the team the confidence to move forward with the Improve phase: implementing real-time control charts that enabled management to check each branch’s cash balance throughout the day. Managers could now quickly move cash from branches with excess cash to those needing additional funds, and make more strategic cash flow decisions.</p>
<p>The team found that being able to answer objections with data helped secure buy-in from skeptical stakeholders. “Throughout this project, we encountered questions and situations that could have jeopardized our team’s credibility and our likelihood of success,” recalls Zamora Mora. “But the accuracy and reliability of our data analysis with Minitab was overpowering.” </p>
<p>The changes made during the project increased cash usage by 40% and slashed remittance costs by 60%.The new process also cut insurance costs and shrank risks associated with storing and transporting cash. Overall, the project increased revenue by $1.1 million. </p>
<p>To read a more detailed account of this project, <a href="https://www.minitab.com/Case-Studies/Grupo-Mutual/">click here</a>. </p>
Capability AnalysisLean Six SigmaQuality ImprovementFri, 21 Oct 2016 12:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/improving-cash-flow-and-cutting-costs-at-bank-branch-officesEston MartzProblems Using Data Mining to Build Regression Models, Part Two
http://blog.minitab.com/blog/adventures-in-statistics/problems-using-data-mining-to-build-regression-models-part-two
<p>Data mining can be helpful in the exploratory phase of an analysis. If you're in the early stages and you're just figuring out which predictors are potentially correlated with your response variable, data mining can help you identify candidates. However, there are problems associated with using data mining to select variables.</p>
<p>In my <a href="http://blog.minitab.com/blog/adventures-in-statistics/problems-using-data-mining-to-build-regression-models" target="_blank">previous post</a>, we used data mining to settle on the following model and graphed one of the relationships between the response (C1) and a predictor (C7). It all looks great! The only problem is that all of these data are randomly generated! No true relationships are present. </p>
<p style="margin-left: 40px;"><img alt="Regression output for data mining example" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/24e98167e2dfd848b346292af371acf3/regression_swo.png" style="width: 364px; height: 278px;" /></p>
<p style="margin-left: 40px;"><img alt="Scatter plot for data mining example" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/6e4dfb991b33031738756d4b2d1c77e4/scatterplot.png" style="width: 576px; height: 384px;" /></p>
<p>If you didn't already know there was no true relationship between these variables, these results could lead you to a very inaccurate conclusion.</p>
<p>Let's explore how these problems happen, and how to avoid them</p>
Why <em>Do </em>These Problems Occur with Data Mining?
<p>The problem with data mining is that you fit many different models, trying lots of different variables, and you pick your final model based mainly on statistical significance, rather than being guided by theory.</p>
<p>What's wrong with that approach? The problem is that every statistical test you perform has a chance of a false positive. A false positive in this context means that the <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-correctly-interpret-p-values" target="_blank">p-value</a> is statistically significant but there really is no relationship between the variables at the population level. If you set the <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests:-significance-levels-alpha-and-p-values-in-statistics" target="_blank">significance level at 0.05</a>, you can expect that in 5% of the cases where the null hypothesis is true, you'll have a false positive.</p>
<p>Because of this false positive rate, if you analyze many different models with many different variables you will inevitably find false positives. And if you're guided mainly by statistical significance, you'll leave the false positives in your model. If you keep going with this approach, you'll fill your model with these false positives. That’s exactly what happened in our example. We had 100 candidate predictor variables and the stepwise procedure literally dredged through hundreds and hundreds of potential models to arrive at our final model.</p>
<p>As we’ve seen, data mining problems can be hard to detect. The numeric results and graph all look great. However, these results don’t represent true relationships but instead are chance correlations that are bound to occur with enough opportunities.</p>
<p>If I had to name my favorite R-squared, it would be <a href="http://blog.minitab.com/blog/adventures-in-statistics/multiple-regession-analysis-use-adjusted-r-squared-and-predicted-r-squared-to-include-the-correct-number-of-variables" target="_blank">predicted R-squared</a>, without a doubt. However, even predicted R-squared can't detect all problems. Ultimately, even though the predicted R-squared is moderate for our model, the ability of this model to predict accurately for an entirely new data set is practically zero.</p>
Theory, the Alternative to Data Mining
<p>Data mining can have a role in the exploratory stages of an analysis. However, for all variables that you identify through data mining, you should perform a confirmation study using newly collected to data to verify the relationships in the new sample. Failure to do so can be very costly. Just imagine if we had made decisions based on the model above!</p>
<p>An alternative to data mining is to use theory as a guide in terms of both the models you fit and the evaluation of your results. Look at what others have done and incorporate those findings when building your model. Before beginning the regression analysis, develop an idea of what the important variables are, along with their expected relationships, coefficient signs, and effect magnitudes.</p>
<p>Building on the results of others makes it easier both to collect the correct data and to specify the best regression model without the need for data mining. The difference is the process by which you fit and evaluate the models. When you’re guided by theory, you reduce the number of models you fit and you assess properties beyond just statistical significance.</p>
<p>Theoretical considerations should not be discarded based solely on statistical measures.</p>
<ul>
<li>Compare the coefficient signs to theory. If any of the signs contradict theory, investigate and either change your model or explain the inconsistency.</li>
<li>Use <a href="http://www.minitab.com/en-us/products/minitab/" target="_blank">Minitab statistical software</a> to create factorial plots based on your model to see if all the effects match theory.</li>
<li>Compare the <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-how-do-i-interpret-r-squared-and-assess-the-goodness-of-fit" target="_blank">R-squared</a> for your study to those of similar studies. If your R-squared is very different than those in similar studies, it's a sign that your model may have a problem.</li>
</ul>
<p>If you’re interested in learning more about these issues, read my post about <a href="http://blog.minitab.com/blog/adventures-in-statistics/beware-of-phantom-degrees-of-freedom-that-haunt-your-regression-models">how using too many <em>phantom</em> degrees of freedom is related to data mining problems</a>.</p>
<p> </p>
Data AnalysisHypothesis TestingLearningRegression AnalysisStatisticsStatistics HelpWed, 19 Oct 2016 12:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/problems-using-data-mining-to-build-regression-models-part-twoJim FrostMinitab 17 and Minitab Express: A Comparison of Software Features
http://blog.minitab.com/blog/marilyn-wheatleys-blog/minitab-17-and-minitab-express-a-comparison-of-software-features
<p><span style="line-height: 1.6;">Since the release of Minitab Express in 2014, we’ve often received questions in technical support about the differences between Express and Minitab 17. In this post, I’ll attempt to provide a comparison between these two Minitab products.</span></p>
What Is Minitab 17?
<p>Minitab 17 is an all-in-one graphical and statistical analysis package that includes basic analysis tools such as hypothesis testing, regression, and ANOVA. Additionally, Minitab 17 includes more advanced features such as reliability analysis, multivariate tools, design of experiments (DOE), and quality tools such as gage R&R and capability analysis. A full list of features that are included Minitab 17 is available on this <a href="http://www.minitab.com/en-us/products/minitab/features-list/">page</a>. </p>
What Is Minitab Express?
<p>Minitab Express is a more basic all-in-one software package for graphical and statistical analysis, designed for students and professors teaching introductory statistics courses. Minitab Express includes statistical analysis options such as hypothesis testing, regression, and ANOVA, but does not include many of the other advanced features that are available in Minitab 17. A full list of the features that are included in Minitab Express is available <a href="http://www.minitab.com/en-us/products/express/features-list/">here</a>.</p>
Key Differences
<strong><em>Supported Operating Systems</em></strong>
<p>One main difference between the two packages is that Minitab 17 is a Windows-only application (however, Minitab 17 can be installed on Mac OS X using one of the options described <a href="http://support.minitab.com/en-us/installation/frequently-asked-questions/other/minitab-companion-on-mac/">here</a>). System requirements for Minitab 17 are available <a href="http://www.minitab.com/en-us/products/minitab/system-requirements/">here</a>. </p>
<p>Minitab Express is available for both Window and Mac OS X. The system requirements for Minitab Express are available <a href="http://www.minitab.com/en-us/products/express/system-requirements/">here</a>.</p>
<strong><em>The Interface</em></strong>
<p>While the menu options for both versions of the software are located at the top and the worksheet/data window are below, there are several differences in the interface. The first screen shot below is for Minitab 17, while the next two screen shots are for Minitab Express:</p>
<p style="margin-left: 40px;"><br />
<strong>Minitab 17:</strong><img alt="Minitab 17 Interface" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f054ba83a85abb6245445502feb2ce86/minitab17interface.png" style="width: 800px; height: 481px;" /></p>
<p style="margin-left: 40px;"><strong>Minitab Express for Windows:</strong><img alt="Express for Windows" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/280aa535dde18d42aaf42eb517fbb9fe/expressforwindowsinterface.png" style="width: 800px; height: 571px;" /></p>
<strong>Minitab Express for OS X</strong><img alt="Minitab Express for OS X Interface" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/177920cdc081cd8d77458ccf3318d192/expressforosxinterface.png" style="width: 800px; height: 529px;" />
<em><strong>Comparison of Commonly Used Features</strong></em>
<p>In addition to cosmetic differences in appearance, the table below compares the features that are available in both versions:</p>
<p align="center"><strong>Feature</strong></p>
<p align="center"><strong>Minitab 17 </strong></p>
<p align="center"><strong>(Windows)</strong></p>
<p align="center"><strong>Minitab Express </strong></p>
<p align="center"><strong>(Windows & Mac OS X)</strong></p>
<p style="text-align: center;">Assistant menu</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p style="text-align: center;"> </p>
<p style="text-align: center;">Graphs</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p style="text-align: center;">Probability distributions</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p style="text-align: center;">Summary statistics</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p style="text-align: center;">Hypothesis tests</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p style="text-align: center;">One-Way ANOVA</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p style="text-align: center;">Two-Way ANOVA</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p style="text-align: center;">ANOVA with > 2 factors</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p style="text-align: center;"> </p>
<p style="text-align: center;">Linear regression</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p style="text-align: center;">Logistic regression</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p style="text-align: center;">Nonlinear regression</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p style="text-align: center;"> </p>
<p style="text-align: center;">Design of experiments</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p style="text-align: center;"> </p>
<p style="text-align: center;">Control charts</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p style="text-align: center;">Gage R&R</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p style="text-align: center;"> </p>
<p style="text-align: center;">Capability analysis</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p style="text-align: center;"> </p>
<p style="text-align: center;">Reliability</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p style="text-align: center;"> </p>
<p style="text-align: center;">Multivariate</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p style="text-align: center;"> </p>
<p style="text-align: center;">Time series</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p style="text-align: center;">Nonparametric tests</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p style="text-align: center;">Equivalence tests</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p style="text-align: center;"> </p>
<p style="text-align: center;">Power and sample size</p>
<ul>
<li style="text-align: center;"> </li>
</ul>
<p align="center"> </p>
<p>Although many of the same features are available in both packages, Minitab 17 has many graph editing options that are not available in Minitab Express. For many of the tests that are available in both packages, Minitab 17 allows more control over the results and has more options that Minitab Express. You can see a more detailed comparison <a href="http://www.minitab.com/academic/comparison/">here</a>. </p>
<p>I hope this post is useful in evaluating the two versions of Minitab. For any questions about either software package, we are more than happy to help here in <a href="http://www.minitab.com/en-us/support/">technical support</a>.</p>
StatisticsStatsMon, 17 Oct 2016 12:00:00 +0000http://blog.minitab.com/blog/marilyn-wheatleys-blog/minitab-17-and-minitab-express-a-comparison-of-software-featuresMarilyn WheatleyWhy You Should Celebrate Healthcare Quality Week
http://blog.minitab.com/blog/real-world-quality-improvement/ways-to-celebrate-healthcare-quality-week
<p>October 16–22 is National Healthcare Quality Week, started by the National Association for Healthcare Quality to increase awareness of healthcare quality programs and to highlight the work of healthcare quality professionals and their influence on improved patient care outcomes.</p>
<img alt="healthcare quality week logo" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/71359d037d4643f7c534b6b2e17a074e/hqw.jpg" style="width: 250px; height: 70px; float: right; margin: 10px 15px;" />
<p>This event deserves your attention because the quality of healthcare affects every one of us, and so does the cost of that care. Whether it's as a patient, a quality practitioner, or a health care provider, we all have a stake in learning what people are doing to improve the quality of care and, at the same time, working to make it more efficient and ultimately affordable.</p>
<div>In honor of the celebration, I wanted to point you to a few resources we have on hand that not only acknowledge the great work of healthcare quality professionals around the world, but also show off the tactics and tools they use to keep patients safe and care affordable.</div>
<p>Kudos to not only those in the field of healthcare quality, but all who work in healthcare to improve the experience of patients everywhere (thank you!).</p>
Q&As with Healthcare Quality Professionals
<p><a href="https://www.minitab.com/News/Getting-Better-All-The-Time/" target="_blank">Getting Better All the Time</a></p>
<p>As the corporate director of process excellence at Citrus Valley Health Partners, Denise Ronquillo plays a key role in improving quality and ensuring that patients receive excellent and safe care. Over the past two years, she and her colleagues have achieved substantial successes while overcoming resistance and skepticism, and are beginning to see a new culture of quality emerge in their organization.</p>
<p><a href="https://www.minitab.com/News/This-Isn-t-a-Game-We-re-Playing/" target="_blank">This Isn’t a Game We’re Playing</a></p>
<p>Quality improvement is something healthcare providers <em>have</em> to do, says Dr. Sandy Fogel, surgical quality officer at Carilion Clinic.</p>
<p><a href="https://www.minitab.com/en-us/News/Healthcare-Quality--Making-a-Difference-with-Data--A-Conversation-with-Dr--William-H--Woodall/" target="_blank">Healthcare Quality: Making a Difference with Data</a></p>
<p>How can statistics and data analysis help improve outcomes in healthcare? William H. Woodall, professor of statistics at Virginia Tech, has been focused on that question for over ten years.</p>
Blog Posts about Quality Improvement in Health Care
<p><a href="http://blog.minitab.com/blog/understanding-statistics/a-six-sigma-healthcare-project-part-1-examining-factors-with-a-pareto-chart" target="_blank">A Six Sigma Healthcare Project</a></p>
<p>Follow along with a series of blog posts on the application of binary logistic regression in a healthcare Six Sigma project, which had a goal of attracting and retaining more patients in a hospital's cardiac rehabilitation program.</p>
<p><a href="http://blog.minitab.com/blog/michelle-paret/monitoring-rare-events-with-g-charts" target="_blank">Monitoring Rare Events with G and T Charts</a></p>
<p>These charts make it easy to assess the stability of processes that involve rare events and have low defect rates.</p>
<p><a href="http://blog.minitab.com/blog/meredith-griffith/exploring-healthcare-data-part-1" target="_blank">Exploring Healthcare Data</a></p>
<p>Learn several tips for exploring and visualizing your healthcare data in a way that will prepare you for a formal analysis.</p>
Case Studies about Health Care Quality Improvement Projects
<p><a href="https://www.minitab.com/en-us/Case-Studies/Cathay-General-Hospital/" target="_blank">Cathay General Hospital</a></p>
<p>During an assessment of its angioplasty process for patients suffering from heart attacks, Cathay General Hospital in Taipei, Taiwan used Minitab to analyze data to help them introduce new treatment options that led to a decrease in the patients’ hospital stay and an increased savings in medical resources.</p>
<p><a href="https://www.minitab.com/Case-Studies/Riverview-Hospital-Association/" target="_blank">Riverview Hospital Association</a></p>
<p>The Riverview Hospital Association Lean Six Sigma team performed data analysis to identify patient groups who were scoring lower on patient satisfaction survey questions. This allowed the team to target process improvement efforts to specific patient populations.</p>
<p><a href="https://www.minitab.com/en-us/Case-Studies/Franciscan-Hospital-for-Children/" target="_blank">Franciscan Children’s Hospital</a></p>
<p>With the help of Lean Six Sigma and Minitab software, Franciscan Hospital for Children was able to analyze information about its processes and make data-driven decisions that increased dental operating room efficiency and enabled doctors to see more kids.</p>
<p><em>For more on how data analysis and Minitab can be used in healthcare, visit <a href="http://www.minitab.com/healthcare" target="_blank">www.minitab.com/healthcare</a>. </em></p>
Health Care Quality ImprovementFri, 14 Oct 2016 12:00:00 +0000http://blog.minitab.com/blog/real-world-quality-improvement/ways-to-celebrate-healthcare-quality-weekCarly BarryDo You Know the Truth about Gage Repeatability and Reproducibility?
http://blog.minitab.com/blog/michelle-paret/do-you-know-the-truth-about-gage-repeatability-and-reproducibility
<p>The ultimate goal of most quality improvement projects is clear: reducing the number of defects, improving a response, or making a change that benefits your customers.</p>
<p>We often want to jump right in and start gathering and analyzing data so we can solve the problems. Checking your measurement systems first, with methods like attribute agreement analysis or Gage R&R, may seem like a needless waste of time. </p>
<p>But the truth is that a Gage R&R Study is a critical step in <em>any </em>statistical analysis involving continuous data. That's because it allows you to determine if your measurement system for that data is adequate or not. If your measurement system isn’t capable of producing reliable measurements, then any analysis you conduct with those measurements is likely meaningless.</p>
<p>So let’s get to the “R&R” part of <span><a href="http://blog.minitab.com/blog/meredith-griffith/fundamentals-of-gage-rr">Gage R&R</a></span>—Repeatability and Reproducibility.</p>
<p>Suppose we’re measuring pencils with a ruler (which is an excellent hands-on activity you can use to teach Gage R&R). We want to determine if our measurement system can adequately measure the length of these pencils. To conduct a Gage R&R Study, we randomly select 10 pencils and 3 people—Abe, Brenda, and Charlie. Each person measures each pencil 2 times, using the same ruler. This gives us a total of 10 x 3 x 2 = 60 measurements.</p>
<p><img alt="parts and operators" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/c7685645ea8140d6ba67b1496ba57624/parts_and_ops.png" style="width: 548px; height: 293px;" /></p>
Repeatability
<p>Repeatability represents the variation observed when the same operator measures the same part multiple times with the same device. In other words, when Abe repeatedly measures the same pencil with the same ruler, will his measurements be consistent? If he measures 16.8 cm the first time, is he going to measure 16.8 cm the next time he measures that same pencil?</p>
Reproducibility
<p>Reproducibility represents the variation observed when DIFFERENT operators measure the same part multiple times with the same device. In other words, if Abe measures a pencil at 16.8 cm in length, will Brenda also measure 16.8 cm for that same pencil? And what about Charlie?</p>
<p><strong>Helpful Hint: </strong>To remember the difference between repeatability and reproducibility, note that reproducibility includes an ‘o’ – think ‘<strong>o</strong>’ for the variability across “<strong>o</strong>perators.”</p>
Answering Important Questions
<p>Gage R&R can help you answer questions such as:</p>
<ul>
<li>Is my measurement system capable of discriminating between parts?</li>
<li>Is the variability in my measurement system small compared with the manufacturing process variability?</li>
<li>How much variability is my measurement system is caused by differences between operators?</li>
</ul>
<p>And if your measurement system isn't great, you can also use Gage R&R to determine where the weaknesses are. For example, perhaps a study reveals that while repeatability is good, the reproducibility is poor. You can use Gage R&R to dig deeper and figure out why different operators reported different readings.</p>
<p>To easily setup your Gage R&R data collection plan and analyze the corresponding data to assess your measurement system, check out <a href="http://www.minitab.com/products/minitab">Minitab Statistical Software</a> and its <strong>Stat > Quality Tools > Gage Study</strong> and <strong>Assistant > Measurement Systems Analysis</strong> features.</p>
Data AnalysisFun StatisticsLean Six SigmaQuality ImprovementSix SigmaStatisticsStatistics HelpStatsFri, 07 Oct 2016 12:00:00 +0000http://blog.minitab.com/blog/michelle-paret/do-you-know-the-truth-about-gage-repeatability-and-reproducibilityMichelle Paret5 More Powerful Insights from Noted Quality Leaders
http://blog.minitab.com/blog/understanding-statistics/5-more-powerful-insights-from-noted-quality-leaders
<p>We hosted our first-ever Minitab Insights conference in September, and if you were among the attendees, you already know the caliber of the speakers and the value of the information they shared. Experts from a wide range of industries offered a lot of great lessons about how they use data analysis to improve business practices and solve a variety of problems.<img alt="tips from Minitab Insights 2016" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/394dfef193debd958deb2011edaaac16/insights_takeaways1.gif" style="width: 354px; height: 250px; margin: 10px 15px; float: right;" /></p>
<p>I blogged earlier about <a href="http://blog.minitab.com/blog/understanding-statistics/5-powerful-insights-from-noted-quality-leaders">five key takeaways</a> gleaned from the sessions at the Minitab Insights 2016 conference. But that was just the tip of the iceberg, and participants learned many more helpful things are well worth sharing. So here are five <em>more </em>helpful, challenging, and thought-provoking ideas and suggestions that we heard during the event.</p>
Improve Your Skills while Improving Yourself!
<p>Everyone has personal goals they'd like to achieve, such as getting fit, changing a habit, or writing a book. Rod Toro, deployment leader at <a href="http://www.minitab.com/en-us/Case-Studies/Edward-Jones/?cta=6675">Edward Jones</a>, explained how challenging himself and his team to apply Lean and Six Sigma tools to their personal goals has helped them better understand the underlying principles of quality improvement, personalized learning and gain deeper insights, and expanded their ability to apply quality methods in a variety of circumstances and situations. </p>
We Can't Claim the Null Hypothesis Is True.
<p>Minitab technical training specialist Scott Kowalski reminded us that when we test a hypothesis with statistics, "<span><a href="http://blog.minitab.com/blog/understanding-statistics/things-statisticians-say-failure-to-reject-the-null-hypothesis">failing to reject the null</a></span>" does not prove that the null hypothesis <em>is </em>true. It only means we don't have enough evidence to reject it. We need to keep this in mind when we interpret our results, and to be careful how we explain our findings to others. We also need to be sure our hypotheses are clearly stated, and that we've selected the appropriate test for our task!</p>
Outliers Won't Just Be Ignored, So You'd Better Investigate Them.
<p>We've all seen them in our data: those <a href="http://blog.minitab.com/blog/michelle-paret/how-to-identify-outliers-and-get-rid-of-them">troublesome observations</a> that just don't want to belong, lurking off in the margins, maybe with one or two other loners. It can be tempting to ignore or just delete those observations, but Larry Bartkus, senior distinguished engineer at Edwards Lifesciences, provided vivid illustrations of the drastic impact outliers can have on the results of an analysis. He also reminded us of the value in slowing down our assumptions, looking at the data in several ways, and trying to understand <em>why </em>our data is the way it is. </p>
Attribute Agreement Analysis Is Just One Option.
<p>When we need to assess how well an attribute measurement system performs, attribute agreement analysis is the go-to method—but Thomas Rust, reliability engineer at Autoliv, demonstrated that many more options are available. In encouraging quality practitioners to "break the attribute paradigm," Rust detailed four innovative ways to assess an attribute measurement system: measure an underlying variable; attribute measurement of a variable product; variable measurement of an attribute product; and attribute measurement of an attribute product.</p>
Minitab Users Do Great Things.
<p>More than anything else, what we took away from Minitab Insights 2016 was an even greater appreciation for the people who are using our software in innovative ways—to increase the quality of the products we use every day, to raise the level of service we receive from businesses and organizations, to increase the efficiency and safety of our healthcare providers, and so much more.</p>
<p>Watch for more stories and ideas from the the Minitab Insights conference in future issues of Minitab News, and on the Minitab Blog.</p>
Data AnalysisInsightsLean Six SigmaProject ToolsQuality ImprovementSix SigmaStatisticsWed, 05 Oct 2016 12:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/5-more-powerful-insights-from-noted-quality-leadersEston MartzWhy Shrewd Experts "Fail to Reject the Null" Every Time
http://blog.minitab.com/blog/understanding-statistics/why-shrewd-experts-fail-to-reject-the-null-every-time
<p><img alt="nulls angels: the toughest statisticians around!" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/d2c0571a-acbd-48c7-84f4-222276c293fe/Image/509959f8406d59b3bb31f686aeb3b6b0/nulls_angels.jpg" style="margin: 10px 15px; float: right; width: 175px; height: 198px;" />I watched an old <a href="https://en.wikipedia.org/wiki/The_Wild_Angels" target="_blank">motorcycle flick from the 1960s</a> the other night, and I was struck by the bikers' slang. They had a language all their own. Just like statisticians, whose manner of speaking often confounds those who aren't hep to the lingo of data analysis.</p>
<p>It got me thinking...what if there were an all-statistician biker gang? Call them the Nulls Angels. Imagine them in their colors, tearing across the countryside, analyzing data and asking the people they encounter on the road about whether they "fail to reject the null hypothesis."</p>
<p>If you point out how strange that phrase sounds, the Nulls Angels will <em>know</em> you're not cool...and not very aware of statistics.</p>
<p>Speaking purely as an editor, I acknowledge that "failing to reject the null hypothesis" <em>is</em> cringe-worthy. "Failing to reject" seems like an overly complicated equivalent to <em>accept</em>. At minimum, it's clunky phrasing.</p>
<p>But it turns out those rough-and-ready statisticians in the Nulls Angels have good reason to talk like that. From a <em>statistical</em> perspective, it's undeniably accurate—and replacing "failure to reject" with "accept" would just be wrong.</p>
What <em>Is </em>the Null Hypothesis, Anyway?
<p>Hypothesis tests include one- and two-sample t-tests, tests for association, tests for normality, and many more. (All of these tests are available under the <strong>Stat</strong><span> menu in Minitab <a href="http://www.minitab.com">statistical software</a>. Or, if you want a little more <a href="http://www.minitab.com/en-us/products/minitab/assistant">statistical guidance</a>, the Assistant can lead you through common hypothesis tests step-by-step.)</span></p>
<p>A hypothesis test examines two propositions: the null hypothesis (or H0 for short), and the alternative (H1). The <em>alternative </em>hypothesis is what we hope to support. We presume that the null hypothesis is true, unless the data provide sufficient evidence that it is not.</p>
<p>You've heard the phrase "Innocent until proven guilty." That means the defendant's innocence is taken for granted until guilt is proved. In statistics, the null hypothesis is taken for granted until the alternative is proved true.</p>
So Why Do We "Fail to Reject" the Null Hypothesis?
<p>That brings up the issue of "proof."</p>
<p>The degree of statistical evidence we need in order to “prove” the alternative hypothesis is the <a href="http://blog.minitab.com/blog/michelle-paret/alphas-p-values-confidence-intervals-oh-my">confidence level</a>. The confidence level is 1 minus our risk of committing a Type I error, which occurs when you incorrectly reject a null hypothesis that's true. Statisticians call this risk alpha, and also refer to it as the significance level. The typical alpha of 0.05 corresponds to a 95% confidence level: we're accepting a 5% chance of rejecting the null even if it is true. (In life-or-death matters, we might <a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/alpha-male-vs-alpha-female">lower the risk of a Type I error to 1% or less</a>.)</p>
<p>Regardless of the alpha level we choose, any hypothesis test has only two possible outcomes:</p>
<ol>
<li><strong>Reject the null hypothesis</strong> and conclude that the alternative hypothesis is true at the 95% confidence level (or whatever level you've selected).<br />
</li>
<li><strong>Fail to reject the null hypothesis</strong> and conclude that <em>not</em> enough evidence is available to suggest the null is false at the 95% confidence level.</li>
</ol>
<p>We often use a <a href="http://blog.minitab.com/blog/understanding-statistics/three-things-the-p-value-cant-tell-you-about-your-hypothesis-test">p-value</a> to decide if the data support the null hypothesis or not. If the test's p-value is less than our selected alpha level, we reject the null. Or, as statisticians say "When the p-value's low, the null must go."</p>
<p>This still doesn't explain <em>why</em> a statistician won't "accept the null hypothesis." Here's the bottom line: failing to reject the null hypothesis does not prove the null hypothesis <em>is</em> true. That's because a hypothesis test does not determine <em>which</em> hypothesis is true, or even which is most likely: it <em>only</em> assesses whether evidence exists to reject the null hypothesis.</p>
<img alt=""My hypothesis is Null until proven Alternative, sir!" " src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/d2c0571a-acbd-48c7-84f4-222276c293fe/Image/a07b85370986a3dd126ac4d021775d13/trial.jpg" style="border-width: 1px; border-style: solid; margin: 10px 15px; float: right; width: 300px; height: 200px;" />"Null Until Proved Alternative"
<p>Hark back to "innocent until proven guilty." As the data analyst, you are the judge. The hypothesis test is the trial, and the null hypothesis is the defendant. The alternative hypothesis is the prosecution, which needs to make its case <em>beyond a reasonable doubt</em> (say, with 95% certainty).</p>
<p>If the trial evidence does not show the defendant is guilty, neither has it proved that the defendant <em>is</em> innocent. However, based on the available evidence, you can't reject that <em>possibility</em>. So how would you announce your verdict?</p>
<p>"Not guilty."</p>
<p>That phrase is perfect: "Not guilty"doesn't say the defendant <em>is</em> innocent, because that has not been proved. It just says the prosecution couldn't convince the judge to abandon the assumption of innocence.</p>
<p>So "failure to reject the null" is the statistical equivalent of "not guilty." In a trial, the burden of proof falls to the prosecution. When analyzing data, the entire burden of proof falls to your sample data. "Not guilty" does not mean "innocent," and "failing to reject" the null hypothesis is quite distinct from "accepting" it. </p>
<p>So if a group of marauding statisticians in their Nulls Angels leathers ever asks, keep yourself in their good graces, and show that you know "failing to reject the null" is not "accepting the null."</p>
Fun StatisticsHypothesis TestingStatisticsStatistics HelpMon, 03 Oct 2016 12:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/why-shrewd-experts-fail-to-reject-the-null-every-timeEston Martz