Minitab | MinitabBlog posts and articles about using Minitab software in quality improvement projects, research, and more.
http://blog.minitab.com/blog/minitab/rss
Fri, 09 Dec 2016 09:37:05 +0000FeedCreator 1.7.3Creating Charts to Compare Month-to-Month Change, part 2
http://blog.minitab.com/blog/understanding-statistics/creating-charts-to-compare-month-to-month-change-part-2
<p>A member of <a href="http://www.linkedin.com/groups?gid=166220">Minitab's LinkedIn group</a> asked how to create a chart to monitor change by month, specifically comparing last year's data to this year's data. My last post showed how to do this using an <a href="http://blog.minitab.com/blog/understanding-statistics/creating-a-chart-to-compare-month-to-month-change">Individuals Chart of the differences</a> between this year's and last year's data. Here's another approach suggested by a participant in the group. </p>
Applying Statistical Thinking
<p>An individuals chart of the differences between this year's data and last year's might not be our best approach. Another approach is to look at all of the data together. We'll put this year's and last year's data into a single column and see how it looks in an individuals chart. (Want to play along? Here's my <a href="//cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/File/a1edf070e6d2c23fddef97de7fbbd6e0/ichart2.MTW">data set</a>, and if you don't already have Minitab, download the free <a href="http://www.minitab.com/en-us/products/minitab/free-trial/">30-day trial</a> version.)</p>
<p>We'll choose <strong style="border: 0px; margin: 0px; padding: 0px;">Stat > Control Charts > Variables Charts for Individuals > Individuals...</strong> and choose the "2 years" column in my datasheet as the variable. Minitab creates the following I chart: </p>
<p><img alt="i chart of two years" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/d2c0571a-acbd-48c7-84f4-222276c293fe/Image/5d141486c87195b234ea96c779560021/i_chart_of_two_years.gif" style="width: 576px; height: 384px;" /></p>
<p>Now we can examine all of the data sequentially and ask some questions about it. Are there outliers? The data seem remarkably consistent, but those points in December (12 and 24) warrant more investigation as potential sources of special cause variation. If investigation revealed a source for these data points that indicate these outliers should be disregarded, these outliers could be removed from the calculations for the center line and control limits, or removed from the chart altogether.</p>
<p>What about seasonality, or a trend over the sequence? Neither issue affects this data set, but if they did, we could detrend or deseasonalize the data and chart the residuals to gain more insight into how the data are changing month-to-month. </p>
I-MR Chart
<p>Instead of an Individuals chart, one participant in the group suggested using an I-MR chart, which provides both the indiviudals chart and a moving-range chart. We can use the same single column of data, then examine the resulting I-MR chart for indications of special cause variation. "If not, there's no real reason to believe one year was different than another," this participant suggests. </p>
<p>Another thing you can do with most of the control charts in Minitab is establish stages. For example, if we want to look for differences between years, we can add a column of data (call it "Year") to our worksheet that labels each data point by year (2012 or 2013). Now when we select <strong>Stat > Control Charts > Variables Charts for Individuals > I-MR...</strong>we will go into the Options dialog and select the Stages tab. </p>
<p><img alt="I-MR Chart stage dialog" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/d2c0571a-acbd-48c7-84f4-222276c293fe/Image/aefdeed4518854809218a9fec5b0598e/imr_stage_dialog.gif" style="width: 446px; height: 437px;" /></p>
<p>As shown above, we'll enter the "Year" column to define the stages. Minitab produces the following I-MR chart:</p>
<p><img alt="I-MR Chart with Stages" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/d2c0571a-acbd-48c7-84f4-222276c293fe/Image/025f60fd707c99ed1ab8ec75c22f6de2/2_yr_imr_chart.gif" style="width: 576px; height: 384px;" /> </p>
<p>This I-MR chart displays the data in two distinct phases by year, so we can easily see if there are any points from 2013 that are outside the limits for 2012. That would indicate a significant difference. In this case, it looks like the only point outside the control limits for 2012 is that for December 2013, and we already know there's something we need to investigate for the December data.</p>
Time Series Plot
<p>For the purposes of visual comparison, some members of the Minitab group on LinkedIn advocate the use of a time series plot. To create this graph, we'll need two columns in the data sheet, one for this year's data and one for last year's. Then we'll choose <strong>Graph > Time Series Plot > Multiple</strong> and select the "Last Year" and "This Year" columns for our series. Minitab gives us the following plot: </p>
<p><img alt="Time Series Plot" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/d2c0571a-acbd-48c7-84f4-222276c293fe/Image/08e7518ccdbd1c6d044a902adb141a82/timeseriesplot1.gif" style="width: 576px; height: 384px;" /></p>
<p>Because the plot of this year's and last year's data are shown in parallel, it's very easy to see where and by how much they differ over time.</p>
<p>Most of the months appear to be quite close for these data, but once again this graph gives us a dramatic visual representation of the difference between the December data points, not just as compared to the rest of the year, but compared to each other from last year to this. </p>
<p>Oh, and here's a neat Minitab trick: what if you'd rather have the Index values of 1, 2, 3...12 in the graph above appear as the names of the months? Very easy! Just double-click on the X axis, which brings up the Edit Scale dialog box. Click on the <strong>Time </strong>tab and fill it out as follows: </p>
<p><img alt="Edit the time scale of your graph" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/d2c0571a-acbd-48c7-84f4-222276c293fe/Image/093a955f82d749e99f3f49e55b606abf/edit_scale.gif" style="width: 477px; height: 376px;" /></p>
<p>(Note that our data start with January, so we use 1 for our starting value. If your data started with the month of February, you'd choose to start with 2, etc.) Now we just click OK, and Minitab automatically updates the graph to include the names of the month: </p>
<p><img alt="Time Series Plot with Months" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/d2c0571a-acbd-48c7-84f4-222276c293fe/Image/5c503c5d623e8f348e3aedbecc7e8e2d/time_series_2.gif" style="width: 576px; height: 384px;" /></p>
The Value of Different Angles
<p>One thing I see again and again on the Minitab LinkedIn group is how a simple question -- how can I look at change from month to month between years? -- can be approached from many different angles. </p>
<p>What's nice about using statistical software is that we have speed and power to quickly and easily follow up on all of these angles, and see what different things each approach can tell us about our data. </p>
<p> </p>
Data AnalysisQuality ImprovementStatisticsThu, 08 Dec 2016 20:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/creating-charts-to-compare-month-to-month-change-part-2Eston MartzThe Difference Between Right-, Left- and Interval-Censored Data
http://blog.minitab.com/blog/michelle-paret/the-difference-between-right-left-and-interval-censored-data
<p><a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/reliability-and-survival-the-high-stakes-of-product-performance">Reliability analysis</a> is the perfect tool for calculating the proportion of items that you can expect to survive for a specified period of time under identical operating conditions. Light bulbs—or lamps—are a classic example. Want to calculate the number of light bulbs expected to fail within 1000 hours? Reliability analysis can help you answer this type of question.</p>
<p>But to conduct the analysis properly, we need to understand the difference between the three types of censoring.</p>
What is censored data?
<p>When you perform reliability analysis, you may not have exact failure times for all items. In fact, lifetime data are often "censored." Using the light bulb example, perhaps not all the light bulbs have failed by the time your study ends. The time data for those bulbs that have not yet failed are referred to as censored.</p>
<img alt="baby" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/913ae1dbf78dd9728367bf0dead44f45/baby.jpg" style="width: 250px; height: 244px; margin: 10px 15px; float: right;" />
<p>It is important to include the censored observations in your analysis because the fact that these items have not yet failed has a big impact on your reliability estimates.</p>
Right-censored data
<p>Let’s move from light bulbs to newborns, inspired by my colleague who’s at the “you’re <em>still </em>here?” stage of pregnancy.</p>
<p>Suppose you’re conducting a study on pregnancy duration. You’re ready to complete the study and run your analysis, but some women in the study are still pregnant, so you don’t know exactly how long their pregnancies will last. These observations would be <em>right-censored</em>. The “failure,” or birth in this case, will occur after the recorded time.</p>
<p style="margin-left: 40px;"><img alt="Right censored" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/c75961d3d78018da3800683ab233c989/right_censored.png" style="width: 291px; height: 241px;" /></p>
Left-censored data
<p>Now suppose you survey some women in your study at the 250-day mark, but they already had their babies. You know they had their babies before 250 days, but don’t know <em>exactly </em>when. These are therefore <em>left-censored</em> observations, where the “failure” occurred before a particular time.</p>
<p style="margin-left: 40px;"><img alt="Left censored" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/7279d0487d0b3d08120e224456bafc2f/left_censored.png" style="width: 237px; height: 242px;" /></p>
Interval-censored data
<p>If we don’t know exactly when some babies were born but we know it was within some interval of time, these observations would be <em>interval-censored</em>. We know the “failure” occurred within some given time period. For example, we might survey expectant mothers every 7 days and then count the number who had a baby within that given week.</p>
<p style="margin-left: 40px;"><img alt="Interval censored" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/deb69487d6f3256172beefe22b4ecbf6/intervalcensored.png" style="width: 253px; height: 241px;" /></p>
<p>Once you set up your data, running the analysis is easy with <a href="http://www.minitab.com/products/minitab/">Minitab Statistical Software</a>. For more information on how to run the analysis and interpret your results, see <a href="http://blog.minitab.com/blog/fun-with-statistics/what-i-learned-from-treating-childbirth-as-a-failure">this blog post</a>, which—coincidentally—is baby-related, too.</p>
Lean Six SigmaQuality ImprovementReliability AnalysisSix SigmaWed, 07 Dec 2016 14:03:00 +0000http://blog.minitab.com/blog/michelle-paret/the-difference-between-right-left-and-interval-censored-dataMichelle ParetCommon Assumptions about Data Part 3: Stability and Measurement Systems
http://blog.minitab.com/blog/quality-business/common-assumptions-about-data-part-3-stability-and-measurement-systems
<p><img alt="Cart before the horse" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/8230e7c2bc193a831158677a70eb0146/chile_road_sign_po_4.svg" style="width: 101px; height: 101px; float: right; margin: 10px 15px;" />In Parts <span><a href="http://blog.minitab.com/blog/quality-business/common-assumptions-about-data-part-1-random-samples-and-statistical-independence">1</a></span> and <span><a href="http://blog.minitab.com/blog/quality-business/common-assumptions-about-data-part-2-normality-and-equal-variance">2</a></span> of this blog series, I wrote about how statistical inference uses data from a sample of individuals to reach conclusions about the whole population. That’s a very powerful tool, but you must check your assumptions when you make statistical inferences. Violating any of these assumptions can result in false positives or false negatives, thus invalidating your results. </p>
<p>The common data assumptions are: random samples, independence, normality, equal variance, stability, and that your measurement system is accurate and precise. I addressed random samples and statistical independence last time. Now let’s consider the assumptions of stability and measurement systems.</p>
What Is the Assumption of Stability?
<p>A stable process is one in which the inputs and conditions are consistent over time. When a process is stable, it is said to be “in control.” This means the sources of variation are consistent over time, and the process does not exhibit unpredictable variation. In contrast, if a process is unstable and changing over time, the sources of variation are inconsistent and unpredictable. As a result of the instability, you cannot be confident in your statistical test results.</p>
<p>Use one of the various types of <span><a href="http://blog.minitab.com/blog/understanding-statistics/what-control-chart-should-i-use">control charts</a></span> available in Minitab <a href="http://www.minitab.com/products/minitab/">Statistical Software</a> to assess the stability of your data set. The Assistant menu can walk you through the choices to select the appropriate control chart based on your data and subgroup size. You can get advice about collecting and using data by clicking the “more” link.</p>
<p><img alt="Choose a Control Chart" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/6ec77f5dbc070eb0c2070ce6bcf8144c/1_control_chart.png" style="border-width: 0px; border-style: solid; width: 474px; height: 338px; margin: 10px 15px;" /></p>
<p><img alt="I-MR Control Chart" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/3d69fc444cd5dd09a962a11e645a3a2e/2_control_chart.png" style="border-width: 0px; border-style: solid; width: 474px; height: 338px; margin: 10px 15px;" /></p>
<p>In addition to preparing the control chart, Minitab tests for out-of-control or non-random patterns based on the <a href="http://blog.minitab.com/blog/statistics-in-the-field/using-the-nelson-rules-for-control-charts-in-minitab">Nelson Rules</a> and provides an assessment in easy-to-read Summary and Stability reports. The Report Card, depending on the control chart selected, will automatically check your assumptions of stability, normality, amount of data, correlation, and will suggest alternative charts to further analyze your data.</p>
<p><img alt="Report Card" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/195741e519156b95ee5feee8b521041f/3_control_chart.jpg" style="border-width: 0px; border-style: solid; width: 464px; height: 348px; margin: 10px 15px;" /></p>
What Is the Assumption for Measurement Systems?
<p>All the other assumptions I’ve described “assume” the data reflects reality. But does it?</p>
<p>The <span><a href="http://blog.minitab.com/blog/understanding-statistics/explaining-quality-statistics-so-my-boss-will-understand-measurement-systems-analysis-msa">measurement system</a> </span>is one potential source of variability when measuring a product or process. When a measurement system is poor, you lose the ability to truthfully “see” process performance. A poor measurement system leads to incorrect conclusions and flawed implementation. </p>
<p>Minitab can perform a Gage R&R test for both measurement and appraisal data, depending on your measurement system. You can use the Assistant in Minitab to help you select the most appropriate test based on the type of measurement system you have.</p>
<p><img alt="Choose a MSA" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/3ff089fcee9ab280c8e8d1da1c56d610/4_msa.png" style="border-width: 0px; border-style: solid; width: 474px; height: 345px; margin: 10px 15px;" /></p>
<p>There are two assumptions that should be satisfied when performing a Gage R&R for measurement data: </p>
<ol>
<li>The measurement device should be calibrated.</li>
<li>The parts to be measured should be selected from a stable process and cover approximately 80% of the possible operating range. </li>
</ol>
<p>When using a measurement device make sure it is properly calibrated and check for linearity, bias, and stability over time. The device should produce accurate measurements, compared to a standard value, through the entire range of measurements and throughout the life of the device. Many companies have a metrology or calibration department responsible for calibrating and maintaining gauges. </p>
<p>Both these assumptions must be satisfied. If they are not, you cannot be sure that your data accurately reflect reality. And that means you’ll risk not understanding the sources of variation that influence your process outcomes. </p>
The Real Reason You Need to Check the Assumptions
<p>Collecting and analyzing data requires a lot of time and effort on your part. After all the work you put into your analysis, you want to be able to reach correct conclusions. Some analyses are robust to departures from these assumptions, but take the safe route and check! You want to be confident you can tell whether observed differences between data samples are simply due to chance, or if the populations are indeed different! </p>
<p>It’s easy to put the cart before the horse and just plunge in to the data collection and analysis, but it’s much wiser to take the time to understand which data assumptions apply to the statistical tests you will be using, and plan accordingly.</p>
<p>Thank you for reading my blog. I hope this information helps you with your data analysis mission!</p>
Data AnalysisHypothesis TestingQuality ImprovementStatisticsMon, 05 Dec 2016 13:00:00 +0000http://blog.minitab.com/blog/quality-business/common-assumptions-about-data-part-3-stability-and-measurement-systemsBonnie K. StoneThe Joy of Playing in Endless Backyards with Statistics
http://blog.minitab.com/blog/adventures-in-statistics/the-joy-of-playing-in-endless-backyards-with-statistics
<p>Dear Readers,</p>
<p><img alt="Jim Frost" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/1ae3640a9bb3396a48ee4478020340d5/avatar.png" style="width: 131px; height: 186px; float: right; margin: 10px 15px;" />As 2016 comes to a close, it’s time to reflect on the passage of time and changes. As I’m sure you’ve guessed, I love statistics and analyzing data! I also love talking and writing about it. In fact, I’ve been writing statistical blog posts for over five years, and it’s been an absolute blast. John Tukey, the renowned statistician, once said, “The best thing about being a statistician is that you get to play in everyone’s backyard.” I enthusiastically agree!</p>
<p>However, when I first started writing the blog, I wondered about being able to keep up a constant supply of fresh blog posts. And, when I first mentioned to some non-statistician friends that I’d be writing a statistical blog, I noticed a certain lack of enthusiasm. For instance, I heard a variety of comments like, “So, you’ll be writing things along the lines of 9 out of 10 dentists recommend . . .” Would readers even be interested in what I had to say about statistics?</p>
<p>It turns out that with a curious mind, statistical knowledge, data, and a powerful tool like <a href="http://www.minitab.com/en-us/products/minitab/" target="_blank">Minitab statistical software</a>, the possibilities are endless. You <em>can</em> play in a wide variety of fascinating backyards! </p>
<p>The most surprising statistic is that <a href="http://blog.minitab.com/blog/adventures-in-statistics" target="_blank">my blog posts</a> have received over 5.5 million views in the past year alone. Never in my wildest dreams did I imagine so many readers when I wrote <a href="http://blog.minitab.com/blog/adventures-in-statistics/three-measurement-system-analysis-questions-to-ask-before-you-take-a-single-measurement" target="_blank">my first post</a>! It’s a real testament to the growing importance of data analysis that so many people are interested in a blog dedicated to statistics. Thank you all for reading!</p>
Endless Backyards . . .
<p><img alt="Dolphin" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/f9c1d0c9fbd374b7272f5ee2ee2716c0/dolphin.jpg" style="width: 225px; height: 150px; float: right; margin: 10px 15px;" />Some of the topics I've written about are out of this world. I’ve assessed <a href="http://blog.minitab.com/blog/adventures-in-statistics/using-statistics-to-analyze-words" target="_blank">dolphin communications</a> and compared it to the search for extraterrestrial intelligence and analyzed <a href="http://blog.minitab.com/blog/adventures-in-statistics/exoplanet-statistics-and-the-search-for-earth-twins" target="_blank">exoplanet data</a> in the search for the Earth’s twin! (As an aside, my analysis showed that my writing style is similar to dolphin communications. I'll take that as a compliment!)</p>
<p>For more Earthly subjects, I’ve studied the relationship between <a href="http://blog.minitab.com/blog/adventures-in-statistics/size-matters-metabolic-rate-and-longevity" target="_blank">mammal size and their metabolic rate and longevity</a>. I’ve analyzed raw research data to assess the <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-effective-are-flu-shots" target="_blank">effectiveness of flu shots</a> first hand. I’ve downloaded economic data to assess patterns in both the <a href="http://blog.minitab.com/blog/adventures-in-statistics/reassessing-gdp-growth-part-1" target="_blank">U.S. GDP</a> and <a href="http://blog.minitab.com/blog/adventures-in-statistics/us-job-growth-assessing-the-numbers-and-making-predictions" target="_blank">U.S. job growth</a>. For a Thanksgiving Day post, I analyzed world income data to answer the question of <a href="http://blog.minitab.com/blog/adventures-in-statistics/statistically-how-thankful-should-we-be-a-look-at-global-income-distributions-part-1" target="_blank">how thankful we should be statistically</a>. As for <a href="http://blog.minitab.com/blog/adventures-in-statistics/when-is-easter-for-the-next-2086-years" target="_blank">Easter</a>, I can tell you the date on which it falls in any of 2,517 years, along with which dates are the most and least common.</p>
<p><img alt="Mythbusters" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/7b3b8859da99d60dd3e9c7932faefba3/mythbusters.jpg" style="width: 225px; height: 149px; float: right; margin: 10px 15px;" />In the world of politics, I’ve used data to <a href="http://blog.minitab.com/blog/adventures-in-statistics/predicting-the-us-presidential-election-evaluating-two-models-part-one" target="_blank">predict the 2012 U.S. Presidential election</a>, <a href="http://blog.minitab.com/blog/adventures-in-statistics/statistical-analyses-of-the-house-freedom-caucus-and-the-search-for-a-new-speaker" target="_blank">analyzed the House Freedom Caucus and the search for the new Speaker of the House</a>, assessed the <a href="http://blog.minitab.com/blog/adventures-in-statistics/great-presidents-revisited-does-history-provide-a-different-perspective" target="_blank">factors that make a great President</a>, and even <a href="http://blog.minitab.com/blog/adventures-in-statistics/using-the-solution-desirability-matrix-to-help-mitt-romney-choose-the-vp-candidate" target="_blank">helped Mitt Romney pick a running mate</a>. Everyone talks about the weather, so of course I had to <a href="http://blog.minitab.com/blog/adventures-in-statistics/are-atlantas-winters-getting-colder-and-snowier" target="_blank">analyze that</a>. My family loves the Mythbusters and it was fun applying statistical analyses to some of the myths that they tested (<a href="http://blog.minitab.com/blog/adventures-in-statistics/busting-the-mythbusters-are-yawns-contagious" target="_blank">here</a> and <a href="http://blog.minitab.com/blog/adventures-in-statistics/using-hypothesis-tests-to-bust-myths-about-the-battle-of-the-sexes" target="_blank">here</a>). That's my family and I meeting them in the picture to the right!</p>
<p>Some of my posts have even been a bit surreal. I took my turn at attempting to explain the statistical illusion of the <a href="http://blog.minitab.com/blog/adventures-in-statistics/the-monty-hall-problem-and-the-importance-of-checking-your-assumptions" target="_blank">infamous Monty Hall problem</a>. I’ve compared <a href="http://blog.minitab.com/blog/adventures-in-statistics/world-travel-bumpy-roads-and-adjusting-your-graph-scales" target="_blank">world travel to adjusting scales in graphs</a> (seriously). I wrote a true story about how <a href="http://blog.minitab.com/blog/adventures-in-statistics/lessons-in-quality-during-a-long-and-strange-journey-home" target="_blank">I drove a plane load of passengers 200 miles to their homes</a> in the context of <img alt="ghost hunting" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/51587c9ccc575874d23335f607e520a0/nightshot.jpg" style="width: 225px; height: 127px; float: right; margin: 10px 15px;" />quality improvement! For Halloween-themed posts, I showed how to go <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-be-a-ghost-hunter-with-a-statistical-mindset" target="_blank">ghost hunting with a statistical mindset</a> and how <a href="http://blog.minitab.com/blog/adventures-in-statistics/beware-of-phantom-degrees-of-freedom-that-haunt-your-regression-models" target="_blank">regression models can be haunted by phantom degrees of freedom</a>. I analyzed the <a href="http://blog.minitab.com/blog/adventures-in-statistics/using-data-analysis-to-assess-fatality-rates-in-star-trek-the-original-series" target="_blank">fatality rates in the original Star Trek TV series</a>. I explored how some people can <a href="http://blog.minitab.com/blog/adventures-in-statistics/the-odds-of-finding-a-four-leaf-clover-revisited-how-do-some-people-find-so-many" target="_blank">find so many four leaf clovers despite their rarity</a>. And, I wondered whether <a href="http://blog.minitab.com/blog/adventures-in-statistics/can-a-statistician-say-that-age-is-just-a-number" target="_blank">a statistician can say that age is just a number</a>?</p>
<p>See, not a mention of those dentists...well, not until now. By this point, 9 out of 10 dentists are probably feeling neglected!</p>
Helping Others Perform Their Own Analyses
<p>I’ve also written many posts aimed at helping those who are learning and performing statistical analyses. I described <a href="http://blog.minitab.com/blog/adventures-in-statistics/working-at-the-edge-of-human-knowledge-part-one" target="_blank">why statistics is cool</a> based on my own personal experiences and how the whole <a href="http://blog.minitab.com/blog/adventures-in-statistics/why-statistics-is-important" target="_blank">field of statistics is growing in importance</a>. I showed how <a href="http://blog.minitab.com/blog/adventures-in-statistics/why-anecdotal-evidence-is-unreliable" target="_blank">anecdotal evidence is unreliable</a> and explained why it fails so badly. And, I took a look forward at how <a href="http://blog.minitab.com/blog/adventures-in-statistics/expanding-the-role-of-statistics-to-areas-traditionally-dominated-by-expert-judgment" target="_blank">statistical analyses are expanding into areas traditionally ruled by expert judgement</a>.</p>
<p>I zoomed in to cover the details about how to perform and interpret statistical analyses. Some might think that covering the nitty gritty of statistical best practices is boring. Yet, you’d be surprised by the lively discussions we’ve had. We’ve had heated debates and philosophical discussions about <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-correctly-interpret-p-values" target="_blank">how to correctly interpret p-values</a> and what <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests:-significance-levels-alpha-and-p-values-in-statistics" target="_blank">statistical significance</a> does and does not tell you. This reached a fever pitch when a psychology journal actually <a href="http://blog.minitab.com/blog/adventures-in-statistics/banned-p-values-and-confidence-intervals-a-rebuttal-part-1" target="_blank">banned p-values</a>!</p>
<p><img alt="Regression residuals" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/58964ccf1cb00ead2ee1735ca54886d9/residual_illustration.gif" style="width: 221px; height: 149px; float: right; border-width: 0px; border-style: solid; margin: 10px 15px;" />We had our difficult questions and surprising topics to grapple with. <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-high-should-r-squared-be-in-regression-analysis" target="_blank">How high should R-squared be</a>? <a href="http://blog.minitab.com/blog/adventures-in-statistics/choosing-between-a-nonparametric-test-and-a-parametric-test" target="_blank">Should I use a parametric or nonparametric analysis</a>? <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-interpret-a-regression-model-with-low-r-squared-and-low-p-values" target="_blank">How is it possible that a regression model can have significant variables but still have a low R-squared</a>? I even had the nerve to suggest that <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-how-to-interpret-s-the-standard-error-of-the-regression" target="_blank">R-squared is overrated</a>! And, I made the unusual case that control charts are also <a href="http://blog.minitab.com/blog/adventures-in-statistics/control-charts-not-just-for-statistical-process-control-spc-anymore" target="_blank">very important outside the realm of quality improvement</a>. Then, there is the whole frequentist versus Bayesian debate, but let’s not go there!</p>
<p>However, it’s true that not all topics about how to perform statistical analyses are riveting. I still love these topics. The world is becoming an increasingly data-driven place, and to produce trustworthy results, you must analyze your data correctly. After all, it’s surprisingly easy to <a href="http://blog.minitab.com/blog/adventures-in-statistics/applied-regression-analysis-how-to-present-and-use-the-results-to-avoid-costly-mistakes-part-1" target="_blank">make a costly mistake</a> if you don’t know what you’re doing.</p>
<p><img alt="F-distribution with probability" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/6303a2314437d8fcf2f72d9a56b1293a/f_distribution_probability.png" style="width: 250px; height: 167px; float: right; margin: 10px 15px;" />A data-driven world requires an analyst to understand seemingly esoteric details such as: the <a href="http://blog.minitab.com/blog/adventures-in-statistics/curve-fitting-with-linear-and-nonlinear-regression" target="_blank">different methods of fitting curves</a>, <a href="http://blog.minitab.com/blog/adventures-in-statistics/the-danger-of-overfitting-regression-models" target="_blank">the dangers of overfitting your model</a>, <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-how-do-i-interpret-r-squared-and-assess-the-goodness-of-fit" target="_blank">assessing goodness-of-fit</a>, <a href="http://blog.minitab.com/blog/adventures-in-statistics/why-you-need-to-check-your-residual-plots-for-regression-analysis" target="_blank">checking your residual plots</a>, and how to check for and correct <a href="http://blog.minitab.com/blog/adventures-in-statistics/what-are-the-effects-of-multicollinearity-and-when-can-i-ignore-them" target="_blank">multicollinearity</a> and <a href="http://blog.minitab.com/blog/adventures-in-statistics/curing-heteroscedasticity-with-weighted-regression-in-minitab-statistical-software" target="_blank">heteroscedasticity</a>. How do you <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-choose-the-best-regression-model" target="_blank">choose the best model</a>? Do you need to <a href="http://blog.minitab.com/blog/adventures-in-statistics/when-is-it-crucial-to-standardize-the-variables-in-a-regression-model" target="_blank">standardize your variables</a> before performing the analysis? Maybe you need a <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-tutorial-and-examples" target="_blank">regression tutorial</a>?</p>
<p>You may need to know <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-identify-the-distribution-of-your-data-using-minitab" target="_blank">how to identify the distribution of your data</a>. And just <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests:-why-we-need-to-use-hypothesis-tests-in-statistics" target="_blank">how do hypothesis tests work</a> anyway? <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-analysis-of-variance-anova-and-the-f-test" target="_blank">F-tests</a>? <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-t-tests-t-values-and-t-distributions" target="_blank">T-tests</a>? How do you <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-test-your-discrete-distribution" target="_blank">test discrete data</a>? <a href="http://blog.minitab.com/blog/adventures-in-statistics/when-should-i-use-confidence-intervals-prediction-intervals-and-tolerance-intervals" target="_blank">Should you use a confidence interval, prediction interval, or a tolerance interval</a>? <a href="http://blog.minitab.com/blog/adventures-in-statistics/use-random-assignment-in-experiments-to-combat-confounding-variables" target="_blank">How do you know when X causes a change in Y</a>? <a href="http://blog.minitab.com/blog/adventures-in-statistics/confound-it-some-more-how-a-factor-that-wasnt-there-hampered-my-analysis" target="_blank">Is a confounding variable distorting your results</a>? <a href="http://blog.minitab.com/blog/adventures-in-statistics/repeated-measures-designs-benefits-challenges-and-an-anova-example" target="_blank">What are the pros and cons of using a repeated measures design</a>? <a href="http://blog.minitab.com/blog/adventures-in-statistics/did-welchs-anova-make-fishers-classic-one-way-anova-obsolete" target="_blank">Fisher’s or Welch’s ANOVA</a>? <a href="http://blog.minitab.com/blog/adventures-in-statistics/the-power-of-multivariate-anova-manova" target="_blank">ANOVA or MANOVA</a>? <a href="http://blog.minitab.com/blog/adventures-in-statistics/linear-or-nonlinear-regression-that-is-the-question" target="_blank">Linear or nonlinear regression?</a></p>
<p>These may not be “sexy” topics but they are the meat and potatoes of being able to draw sound conclusions from your data. And, based on numerous blog comments, they have been well received by many people. In fact, the most rewarding aspect of writing blog posts has been the interactions I've had with all of you. I've communicated with literally hundreds and hundreds of students learning statistics and practitioners performing statistics in the field. I’ve had the pleasure of learning how you use statistical analyses, understanding the difficulties you face, and helping you resolve those issues.</p>
<p>It's been an amazing journey and I hope that my blog posts have allowed you to see statistics through my eyes―as a key that can unlock discoveries that are trapped in your data. After all, that's the reason why I titled my blog <em>Adventures in Statistics</em>. Discovery is a bumpy road. There can be statistical challenges en route, but even those can be interesting, and perhaps even rewarding, to resolve. Sometimes it is the <a href="http://blog.minitab.com/blog/adventures-in-statistics/the-mysteries-of-variability-and-power" target="_blank">perplexing mystery in your data that prompts you to play detective and leads you to surprising new discoveries</a>!</p>
<p>To close out the old year, it's good to remember that change is constant. There are bound to be many new and exciting adventures in the New Year. I wish you all the best in your endeavors. </p>
<p>“We will open the book. Its pages are blank. We are going to put words on them ourselves. The book is called Opportunity and its first chapter is New Year's Day.” <em>― Edith Lovejoy Pierce </em></p>
<p>May you all find happiness in 2017! Onward and upward!</p>
<p>Jim</p>
Data AnalysisStatisticsStatistics HelpStatsWed, 30 Nov 2016 15:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/the-joy-of-playing-in-endless-backyards-with-statisticsJim FrostA Six Sigma Master Black Belt in the Kitchen
http://blog.minitab.com/blog/statistics-in-the-field/a-six-sigma-master-black-belt-in-the-kitchen
<p><em>by Matt Barsalou, guest blogger</em></p>
<p>I know that Thanksgiving is always on the last Thursday in November, but somehow I failed to notice it was fast approaching until the Monday before Thanksgiving. This led to frantically sending a last-minute invitation, and a hunt for a turkey.</p>
<p>I live in Germany and this greatly complicated the matter. Not only is Thanksgiving not celebrated, but also actual turkeys are rather difficult to find.</p>
<p><img alt="turkey" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/d365f1dd99c0b0d21406cb33b21e451c/turkey.jpg" style="width: 270px; height: 210px; margin: 10px 15px; float: right;" /></p>
<p>I looked at a large grocery store’s website and found 15 types of cat and dog food that <em>contain </em>turkey, but the only human food I could find was one jar of baby food.</p>
<p>Close, but not close enough. I wanted a whole turkey, not turkey puree.</p>
<p>The situation was even more complicated due to language: Germans have one word for a male turkey and a different word for a female turkey. I did not realize there was a difference, so I wound up only looking for a male turkey. My conversation with the store clerk would sound like this if it were translated into English, where there is only <em>one </em>word commonly used for turkey:</p>
<p style="margin-left: 40px;"><strong>Me:</strong> Do you carry turkey?</p>
<p style="margin-left: 40px;"><strong>Clerk:</strong> No. We only have turkey.</p>
<p style="margin-left: 40px;"><strong>Me:</strong> I don’t need turkey. I’m looking for turkey.</p>
<p style="margin-left: 40px;"><strong>Clerk: </strong>Sorry, we don’t carry turkey, but we have turkey if you want it.</p>
<p style="margin-left: 40px;"><strong>Me: </strong>No thank you. I need turkey, not turkey.</p>
<p>Eventually, I figured out what happened and returned to buy the biggest female turkey they had. It weighed 5 pounds.</p>
<p>This was not the first time I cooked a turkey, but my first attempt resulted in The Great Turkey Fireball of 1998. (Cooking tip: Don’t spray turkey juice onto the oven burner). My second attempt resulted in a turkey that still had ice in it after five hours in the oven. (Life hack: The inside of a turkey is a good place to keep ice from melting.)</p>
<p>This year, to be safe, I contacted an old friend who explained how to properly cook a turkey, but I was told I would need to figure out the cooking time on my own. This was not a problem...or so I thought. I looked online and found turkey <a href="http://homecooking.about.com/od/foodinformation/fl/Turkey-Roasting-Times-Chart.htm">cooking times</a> for a stuffed turkey, but my turkey was too light to be included in the table.</p>
Graphing the Data
<p>I may not know much about cooking, but I do know statistics, so I decided to run a regression analysis to determine the correct cooking time for my bird. The weights and times were in a table for ranges so I selected the times that corresponded to the low and high weight ranges and entered the data into a Minitab worksheet as shown in Figure 1.</p>
<p style="margin-left: 40px;"><img alt="worksheet 1" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/dbba52c9d91a81c9072dd2a84c9268fa/image001.png" style="text-align: -webkit-center; width: 362px; height: 302px;" /></p>
<p style="margin-left: 40px;"><em><strong>Figure 1: Worksheet with weight and times</strong></em></p>
<p>I like to look at my data before I analyze it so I created a scatterplot to see how time compares to weight. Go to <strong>Graph > Scatter Plot </strong>and select <em>Simple</em>. Enter Time as the Y variable and Weight as the X variable.</p>
<p>Visually, it looks as if there may be a relationship between weight and cooking time, so I then performed a regression analysis (see Fig. 2).</p>
<p style="margin-left: 40px;"><img alt="scatterplot of time vs. weight" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/99742492888a9fabca3f6a1aaa00bf52/image002.png" style="text-align: -webkit-center; width: 576px; height: 384px;" /></p>
<p style="margin-left: 40px;"><em><strong>Figure 2: Scatter plot of weight and times</strong></em></p>
Performing Regression Analysis
<p>Go to <strong>Stat > Regression > Regression > Fit Regression Model...</strong> and select Time for the response and Weight as the continuous predictor. Click on <em>Graphs </em>and select <em>Four in One</em>, then OK out of the dialog boxes.</p>
<p>The P-value is < 0.05 and the adjusted r-squared (adjusted) is 97.04% so it looks like I have a good model for time versus weight (See Fig. 3).</p>
<p style="margin-left: 40px;"><img alt="regression analysis of time versus weight" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/d3e5093cbf0572a6525cb57ea4477d0a/image003.png" style="width: 632px; height: 546px;" /></p>
<p style="margin-left: 40px;"><em><strong>Figure 3: Session window for regression analysis for time versus weight</strong></em></p>
<p>The residual plots for time shown in Figure 4 include a normal probability plot with residuals that look like they are normally distributed. My data did not need to follow the normal distribution, but the residuals should. But something seemed odd to me when I looked at the other three plots. Suddenly, I was not so sure my model was as good as I thought it was.</p>
<p style="margin-left: 40px;"><img alt="Residual Plots for Time" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/d9e22fa32ce8c4452e7b14bf501c3adc/image004.png" style="text-align: -webkit-center; width: 576px; height: 384px;" /></p>
<p style="margin-left: 40px;"><em><strong>Figure 4: Residual plots for time</strong></em></p>
Regression Analysis with the Assistant
<p>I then used the Minitab Assistant to perform another regression analysis. Since I was uncertain about my first model, I could use the reports generated by the Assistant to better assess my data and the resulting analysis.</p>
<p>Go to <strong>Assistant > Regression</strong> and select <em>Simple Regression</em>. Select Time for the Y column and Weight for the X column and select OK.</p>
<p>The first report provided by the Minitab Assistant is the summary report, shown in Figure 5. The report indicates a statistically significant relationship between time and weight using an Alpha of 0.05. It also tells me that 99.8% of the variability in time is caused by weight. This does not match my previous results and I can see why: I previously performed linear regression and the Minitab Assistant identified a quadratic model for the data.</p>
<p>The regression equation is Y = 0.9281 +0.3738X -0.005902(X2).</p>
<p align="center">Time = 0.9281 +0.3738(5) -0.005902(52) =</p>
<p align="center">0.9281 + 1.869 – 1.8692 =</p>
<p align="center">2.7971 – 0.0008708401 = 2.796 hours</p>
<p>That means the cooking time is 2 hours and 48 minutes.</p>
<p style="margin-left: 40px;"><img alt="regression for time vs. weight summary report" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/72f3557cc7a6508c104936076cd34dd1/image005.png" style="text-align: -webkit-center; width: 605px; height: 454px;" /></p>
<p style="margin-left: 40px;"><em><strong>Figure 5: Summary report for time versus weight</strong></em></p>
<p>Figure 6 depicts the model selection report, which includes a plot of the quadratic model and the r-squared (adjusted) for both the quadratic model and a linear model.</p>
<p style="margin-left: 40px;"><img alt="regression model selection report" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/e3a4152754cadfd92ed92bc5c964b439/image006.png" style="text-align: -webkit-center; width: 605px; height: 454px;" /></p>
<p style="margin-left: 40px;"><em><strong>Figure 6: Model Selection report for time versus weight</strong></em></p>
<p>The diagnostic report in Figure 7 is used to assess the residuals and guidance on the interpretation of the report is provided on the right side.</p>
<p style="margin-left: 40px;"><img alt="regression for time vs weight diagnostic report" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/80fc423213aa7dff193dd510165b8109/image007.png" style="text-align: -webkit-center; width: 605px; height: 454px;" /></p>
<p style="margin-left: 40px;"><em><strong>Figure 7: Diagnostic report for time versus weight</strong></em></p>
<p>The prediction report in Figure 8 shows the prediction plot with the 95% prediction interval.</p>
<p align="center" style="margin-left: 40px;"><img alt="regression for time vs weight prediction report" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/688056af5b668e9e4bcb898d96ff9c53/image008.png" style="width: 605px; height: 454px;" /></p>
<p style="margin-left: 40px;">Figure 8: Prediction report for time versus weight</p>
<p>The report card shown in Figure 8 helps us to assess the suitability of the data. Here, I saw a problem: my sample size was only six. Minitab still provided me with results, but it warned me that the estimate for the strength of the relationship may not be very precise due to the low number if values I used. Minitab recommended I use 40 or more values. My data did not include any unusual data points, but using less than 15 values means the P-value could be incorrect if my results were not normally distributed.</p>
<p style="margin-left: 40px;"><img alt="regression for time vs weight report card" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/e36058ed576bd59fa8a99ab922851718/image009.png" style="text-align: -webkit-center; width: 605px; height: 454px;" /></p>
<p style="margin-left: 40px;"><em><strong>Figure 9: Report card for time versus weight</strong></em></p>
<p>It looks like my calculated cooking time may not be as accurate as I’d like it to be, but I don’t think it will be too far off since the relationship between weights and cooking time is so strong.</p>
<p>It is important to remember not to extrapolate beyond the data set when taking actions based on a regression model. My turkey weighs less than the lowest value used in the model, but I’m going to need to risk it. In such a situation, statistics alone will not provide us an answer on a platter (with stuffing and side items such as cranberry sauce and candied yams), but we can use the knowledge gained from the study to help us when making judgment calls based on expert knowledge or previous experience. I expect my turkey to be finished in around two and a half to three hours, but I plan to use a thermometer to help ensure I achieve the correct cooking time.</p>
<p>But first, it looks like I am going to need to perform a <a href="http://blog.minitab.com/blog/statistics-and-quality-improvement/gummi-bear-measurement-systems-analysis-type-1-gage-study">Type 1 Gage Study</a> analysis, once I figure out how to use my kitchen thermometer.</p>
<p> </p>
<div>
<p><strong>About the Guest Blogger</strong></p>
<p><em><a href="https://www.linkedin.com/pub/matthew-barsalou/5b/539/198" target="_blank">Matthew Barsalou</a> is a statistical problem resolution Master Black Belt at <a href="http://www.3k-warner.de/" target="_blank">BorgWarner</a> Turbo Systems Engineering GmbH. He is a Smarter Solutions certified Lean Six Sigma Master Black Belt, ASQ-certified Six Sigma Black Belt, quality engineer, and quality technician, and a TÜV-certified quality manager, quality management representative, and auditor. He has a bachelor of science in industrial sciences, a master of liberal studies with emphasis in international business, and has a master of science in business administration and engineering from the Wilhelm Büchner Hochschule in Darmstadt, Germany. He is author of the books <a href="http://www.amazon.com/Root-Cause-Analysis-Step---Step/dp/148225879X/ref=sr_1_1?ie=UTF8&qid=1416937278&sr=8-1&keywords=Root+Cause+Analysis%3A+A+Step-By-Step+Guide+to+Using+the+Right+Tool+at+the+Right+Time" target="_blank">Root Cause Analysis: A Step-By-Step Guide to Using the Right Tool at the Right Time</a>, <a href="http://asq.org/quality-press/display-item/index.html?item=H1472" target="_blank">Statistics for Six Sigma Black Belts</a> and <a href="http://asq.org/quality-press/display-item/index.html?item=H1473&xvl=76115763" target="_blank">The ASQ Pocket Guide to Statistics for Six Sigma Black Belts</a>.</em></p>
</div>
<div> </div>
Data AnalysisFun StatisticsStatisticsStatistics in the NewsWed, 30 Nov 2016 13:00:00 +0000http://blog.minitab.com/blog/statistics-in-the-field/a-six-sigma-master-black-belt-in-the-kitchenGuest BloggerCreating a Chart to Compare Month-to-Month Change
http://blog.minitab.com/blog/understanding-statistics/creating-a-chart-to-compare-month-to-month-change
<p><a href="http://www.linkedin.com/groups?gid=166220">Minitab's LinkedIn group</a> is a good place to ask questions and get input from people with experience analyzing data and doing statistics in a wide array of professions. For example, one member asked this question:</p>
<p style="margin-left: 40px;"><em>I am trying to create a chart that can monitor change by month. I have [last year's] data and want to compare it to [this year's] data...what chart should I use, and can I auto-update it? Thank you. </em></p>
<p>As usual when a question is asked, the Minitab user community responded with some great information and helpful suggestions. Participants frequently go above and beyond, answering not just the question being asked, but raising issues that the question implies. For instance, one of our regular commenters responded thus: </p>
<p style="margin-left: 40px;"><em><span class="comment-body" data-li-comment-text="">There are two ways to answer this inquiry...by showing you a solution to the specific question you asked or by applying statistical thinking arguments such as described by Donald Wheeler </span></em><span class="comment-body" data-li-comment-text="">et al</span><em><span class="comment-body" data-li-comment-text=""> and applying a solution that gives the most instructive interpretation to the data.</span></em></p>
<p>In this and subsequent posts, I'd like to take a closer look at the various suggestions group members made, because each has merits. First up: a simple individuals chart of differences, with some cool tricks for instant updating as new data becomes available. </p>
Individuals Chart of Differences
<p>An easy way to monitor change month-by-month is to use an individuals chart. Here's how to do it in Minitab Statistical Software, and if you'd like to play along, here's the <a href="//cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/File/98924ae77a8184f851ebce861747a5dc/ichart.MTW">data set</a> I'm using. If you don't already have Minitab, download the free <a href="http://www.minitab.com/en-us/products/minitab/free-trial/">30-day trial</a> version.</p>
<p>I need four columns in the data sheet: month name, this year's data, last year's data, and one for the difference between this year and last. I'm going to right-click on the Diff column, and then select <strong>Formulas > Assign Formula to Column...</strong>, which gives me the dialog box below. I'll complete it with a simple subtraction formula, but depending on your situation a different formula might be called for:</p>
<p style="margin-left: 40px;"> <img alt="assign formula to column" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/d2c0571a-acbd-48c7-84f4-222276c293fe/Image/ae9401a308db20a087039ad3667d2391/assign_formula_to_column.gif" style="width: 423px; height: 320px;" /></p>
<p><span class="comment-body" data-li-comment-text="">With this formula assigned, as I enter the data for this year and last year, the difference between them will be calculated on the fly. </span></p>
<p style="margin-left: 40px;"><img alt="data set" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/d2c0571a-acbd-48c7-84f4-222276c293fe/Image/937353b7902e55c01f1f3f7b77931aca/dataset.gif" style="width: 270px; height: 280px;" /></p>
<p><span class="comment-body" data-li-comment-text="">Now I can create an Individuals Chart, or I Chart, of the differences. I choose <strong>Stat > Control Charts > Variables Charts for Individuals > Individuals...</strong> and simply choose the Diff column as my variable. Minitab creates the following graph of the differences between last year's data and this year's data: </span></p>
<p style="margin-left: 40px;"><span class="comment-body" data-li-comment-text=""><img alt="Individuals Chart" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/d2c0571a-acbd-48c7-84f4-222276c293fe/Image/4cfe765b1b83fd17ddd3191ae64d9df9/i_chart.gif" style="width: 576px; height: 384px;" /></span><br />
</p>
Updating the Individuals Chart Automatically
<p>Now, you'll notice that when I started, I only had this year's data through September. What happens when I need to update it for the whole year? Easy - I can return to the data sheet in January to add in the data from the last quarter. As I do, my Diff column uses its assigned formula (indicated by the little green cross in the column header) to calculate the differences: </p>
<p style="margin-left: 40px;"><img alt="auto-updated worksheet" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ec88080d9335cb6ff7c750d52fd70334/diff.jpg" style="width: 270px; height: 300px;" /></p>
<p><br />
<span class="comment-body" data-li-comment-text="">Now if I look at the I-chart I created earlier, I see a big yellow dot in the top-left corner.</span></p>
<p style="margin-left: 40px;"><span class="comment-body" data-li-comment-text=""><img alt="automatic update for an individuals chart" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/d2c0571a-acbd-48c7-84f4-222276c293fe/Image/bca558faf76f317dd0eeae7342f23354/auto_update.gif" style="width: 230px; height: 92px;" /></span></p>
<p>When I right-click on that yellow dot and choose "Automatic Updates," as shown in the image above, Minitab <a href="http://blog.minitab.com/blog/the-statistics-of-science/minitab-and-excel-making-the-data-connection">automatically updates</a> my Individuals chart with the information from the final three months of the year: </p>
<p style="margin-left: 40px;"><img alt="automatically updated i chart" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/d2c0571a-acbd-48c7-84f4-222276c293fe/Image/4582cd3d3bc31b137cfbcf6f2d6872f3/i_chart2.gif" style="width: 576px; height: 384px;" /></p>
<p>Whoa! It looks like we might have some special-cause variation happening in that last month of the year...but at least I can use the time I've saved by automatically updating this chart to start investigating that! </p>
<p>In my next post, we'll try another way to look at monthly differences, again following the suggestions offered by the good people on Minitab's LinkedIn group. </p>
<p> </p>
Data AnalysisStatisticsMon, 28 Nov 2016 12:59:00 +0000http://blog.minitab.com/blog/understanding-statistics/creating-a-chart-to-compare-month-to-month-changeEston MartzGiving Thanks for the Minitab Assistant
http://blog.minitab.com/blog/real-world-quality-improvement/giving-thanks-for-the-minitab-assistant
<p><img alt="mashed potatoes with gravy" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/85c4796d7e0b9a5fa6a8711db65a1f8c/potatoes.jpg" style="width: 236px; height: 293px; float: right; border-width: 1px; border-style: solid; margin: 10px 15px;" />This week we’re celebrating the annual Thanksgiving holiday in the United States, which is not only a good time to reflect on the things we’re grateful for, but it’s also a good time to stuff yourself with turkey, mashed potatoes, green bean casserole, and the usual suspects that find their way to the Thanksgiving table!</p>
<p>While I’m of course very thankful for my family, friends, home, etc., I’m also thankful for some features in Minitab that have made my life easier (especially as someone who is not a trained statistician or professional data analyst!). I look at these features like the yummy homemade gravy on top of my mashed potatoes—while the gravy isn’t really a necessity, it sure makes those potatoes taste better! The <a href="https://www.minitab.com/products/minitab/assistant/" target="_blank">Assistant menu</a> is one Minitab feature that makes statistics “taste” better to me because it makes many concepts clearer and my results understandable.</p>
<p>But what <em>is</em> the Assistant? It’s a built-in menu within Minitab that contains interactive decision trees to help you choose the right tool, and then walks you through your analysis step-by-step. It includes guidelines to ensure your analysis is successful, has a simplified interface that is easy to follow, and provides comprehensive reports and interpretation of your output that you can use to present and share your results.</p>
<p>You can find the Assistant menu to the right of the Help menu:</p>
<p><img alt="Assistant menu" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f36168fa47a59a282572aff95a0aabb6/menubar.jpg" style="width: 562px; height: 57px;" /></p>
<p>You’ll see that it includes options for Measurement Systems Analysis, Capability Analysis, Graphical Analysis, Hypothesis Tests, Regression, DOE, Before/After Capability Analysis, Before/After Control Charts, and Control Charts:</p>
<p><img alt="http://www.minitab.com/uploadedImages/Content/Products/Minitab_Statistical_Software/Quick_Start/QS1-1-AsstGraphAnalysis.jpg" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/2271d1330d2ed7b4191773ff9cb7b49a/assistant2.jpg" style="width: 315px; height: 189px;" /></p>
<p>For example, if you choose Graphical Analysis, you’ll see the following screen with graphing options for your particular objective:</p>
<p><img alt="Chooser" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/be40e91d387da859bb592eb1d2f3e9c0/assis_3.jpg" style="width: 370px; height: 292px;" /> </p>
<p>And if you’re left not knowing which graphing option is right for your objective, the “Help me choose” link will take you to a flow chart with decision trees to lead you in the right direction:</p>
<p><img alt="Assistant Chooser" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/55d77c42759e2ab66e01da9142b828b4/assis4.jpg" style="width: 369px; height: 290px;" /> </p>
<p>Pretty cool, right? And once you’ve selected the chart that’s right for you, you’ll have access to guidelines for collecting your data and using your chart.</p>
<p>For more on how the Assistant works, I recommend checking out our Quick Start exercises, which walk you through some examples using the Assistant:</p>
<p><a href="http://www.minitab.com/products/minitab/quick-start/titanic/" target="_blank">Would You Survive a Voyage on the Titanic?</a></p>
<p><a href="http://www.minitab.com/products/minitab/quick-start/soup/" target="_blank">Is the Soup Too Spicy?</a></p>
<p><a href="http://www.minitab.com/products/minitab/quick-start/getting-to-work/" target="_blank">Can You Bike to Work on Time?</a></p>
Help That’s Really Helpful
<p>One other Minitab feature I’m very thankful for is the built-in Help content, which includes concise overviews of major statistical topics, guidance for setting up your data, information on methods and formulas, comprehensive guidance for completing dialog boxes, and easy-to-follow examples.</p>
<p>And that’s not all—Minitab’s built-in help options also include:</p>
<ul>
<li><strong>StatGuide:</strong> After you analyze your data, the StatGuide helps you interpret statistical graphs and tables in a practical, straightforward way. To access the StatGuide, just right-click on your output, press Shift+F1 on the keyboard, or click the StatGuide icon in the toolbar:</li>
</ul>
<p><img alt="stat guide" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/70cf62eeabb6f8878a0fef59c1666b0f/menubar2.jpg" style="border-width: 0px; border-style: solid; width: 543px; height: 53px;" /></p>
<ul>
<li><strong>Tutorials:</strong> For a refresher on statistical tasks, take a look at tutorials (<strong>Help</strong> > <strong>Tutorials</strong>), which include an overview of data requirements, step-by-step instructions, and guidance on interpreting the results.</li>
</ul>
<p>Like including some extra gravy to liven up those Thanksgiving mashed potatoes, the Assistant and built-in Help options in Minitab make statistics taste better to me! Happy Thanksgiving! </p>
StatisticsStatistics HelpMon, 21 Nov 2016 13:00:00 +0000http://blog.minitab.com/blog/real-world-quality-improvement/giving-thanks-for-the-minitab-assistantCarly BarryMutant Trees Lay Waste to the Landscape and Reveal Mother Nature's Lean Design
http://blog.minitab.com/blog/data-analysis-and-quality-improvement-and-stuff/mutant-trees-lay-waste-to-the-landscape-and-reveal-mother-natures-lean-design
<p>The season of change is upon us here at Minitab's World Headquarters. The air is crisp and clear and the landscape is ablaze in vibrant fall colors. As I drove to work one recent morning, I couldn't help but soak in the beauty surrounding me and think, "Too bad everything they taught me as a kid was a lie."</p>
<p><img alt="fall trees" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/c2cb2bd427165df25e0ca2b38ef59381/trees.jpg" style="width: 208px; height: 182px; margin: 10px 15px; float: right;" />You see, as a boy growing up in New Hampshire, I was told that the sublime beauty of autumn was just a happy accident. As the days become shorter, the trees succumb to their own version of seasonal affective disorder; they stop producing chlorophyll because... well, what's the point? As a result of this photosynthetic funk, the green begins to drain from the leaves and the less pragmatic pigments prevail, if briefly.</p>
<p>But thanks to mutant trees, I now know the truth. Or at least one possible explanation. I refer, of course, to the findings of Hoch, Singsaas, and McCown, in their 2003 paper, "<a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC281624/" target="_blank">Resorption Protection. Anthocyanins Facilitate Nutrient Recovery in Autumn by Shielding Leaves from Potentially Damaging Light Levels.</a>"</p>
<p>In truth, I shouldn't say that what I learned as kid was a <em>lie</em>. The theory of autumn by chromatic attrition might still be true to some extent. But I was intrigued to discover recently that newer theories posit a more adaptive role for the annual display. For example, one theory suggests that the bright displays evolved to inform potentially injurious insects that they are barking up the wrong tree. (For more information, see Archetti and Brown 2004, "<a href="http://harvardforest.fas.harvard.edu/sites/harvardforest.fas.harvard.edu/files/leaves/Archetti_%20Brown_2004.pdf" target="_blank">The coevolution theory of autumn colours</a>".)</p>
<p>But most interesting to me was the discovery that red pigments aren't just late-season hold-outs—production of these pigments is actually ramped up in the fall. Obviously, the "Accidental Autumn" explanation doesn't hold in this case. In their paper, Hoch and colleagues present evidence that anthocyanins, which produce red fall colors, actually help trees prepare for winter.</p>
<p>Here's where the mutants come in. The theory is that the anthocyanins act as a kind of sunblock to protect the leaves while the tree recovers valuable nutrients from the leaves before sending them downward and duffward.</p>
<p>To test this theory, the scientists sampled leaves from normal (wild) trees and from mutant trees that possessed superhuman powers. Well, actually, all trees possess superhuman powers because all trees can produce food from sunlight. (I've yet to meet a human who can do that.) But in this case, affected trees had a mutation that prevented them from producing anthocyanins and turning red in the fall. </p>
<p>It's always easier to understand what your data are showing you when you can look at the results of your analysis in a graph. I used <a href="http://www.minitab.com/products/minitab">Minitab Statistical Software</a> to create a couple of graphs that illustrate some of the results shared in the paper. </p>
Before and after nitrogen levels
<p>The scientists measured the nitrogen levels in the leaves before and after the period when the trees normally recover as much of that valuable nutrient as they can. This graph shows the before and after nitrogen levels for mutant and wild-type specimens of 3 different tree species. The graph shows that the nitrogen levels in the leaves tend to drop more for the wild trees, indicating that they are more successful at recovering the nitrogen than the mutant trees. </p>
<p style="margin-left: 40px;"><img alt="Line plot of before and after nitrogen levels" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/d0620bb01ef55623402cd4b603e3f861/lineplotbeforeafter.jpg" style="width: 459px; height: 306px;" /></p>
Resorption efficiency
<p>This <span><a href="http://blog.minitab.com/blog/quality-data-analysis-and-statistics/bar-charts-decoded">bar chart</a></span> shows the same data, but expressed as "Resorption Efficiency," which is just the percent change between the before and after nitrogen levels. The graph suggests that the lack of anthocyanins hampered the ability of the mutant trees to recover the nitrogen from their leaves. </p>
<p style="margin-left: 40px;"><img alt="Bar chart of resorption efficiency" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/f42ddf69c4e21e9804b053dabef3623c/barchartresorptionefficiency.jpg" style="width: 459px; height: 306px;" /></p>
<p>So, rather than simply accepting seasonal spikes in scrap waste, it appears that mother nature is a much better quality engineer than we had given her credit for. In addition to dazzling us with some beautiful color before winter sets in, those brilliant reds are actually adding value to the process by helping to reduce waste.</p>
<p>My newfound appreciation for nature's lean genius inspired me to do a little exploring around Minitab's World Headquarters and capture some images of industrious anthocyanins hard at work improving plant profitability. Along with some cows. If you've never had the opportunity to see trees do this—and even if you have—perhaps you'll enjoy the images shared below. </p>
<p>Happy Autumn! </p>
<p><em>Corn rows weave under undulating clouds</em><br />
<img alt="Harvest has come" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/bbd7dc445005ffda4874c3ae424ab730/maze_2.jpg" style="width: 500px; height: 378px;" /></p>
<p><em>Rusty barns rest after the harvest</em><br />
<img alt="Barn" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/292dfb266a78f7fae35a28648b6b33a4/barn__enhanced.jpg" style="width: 500px; height: 262px;" /></p>
<p><em>Rustling stalks spread from road to ridge</em><br />
<img alt="Ridge and meadow" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/f1f734dd3153923e80039eab70dba9e9/ridge_and_meadow.jpg" style="font-size: 13px; width: 500px; height: 269px;" /></p>
<p><i>Heifers</i><em> forage contentedly under a calm fall sky</em><br />
<img alt="Cows" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/46f84b64ecc05fb461d3c9fe7c67d5c5/cows.jpg" style="font-size: 13px; width: 500px; height: 545px;" /></p>
<p><em>Autumn finery frames the fabled Beaver Stadium </em><br />
<img alt="Fabled Beaver Stadium" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/5a6bde627d438a786ca7edaf37f2ca27/stadium_framed_by_field_and_tree.jpg" style="width: 500px; height: 555px;" /></p>
<p><em>Scenic splendor surrounds majestic Mount Nittany </em><br />
<img alt="Mount Nittany" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/4abbb7f7b6a09d5e1a3e92b9205511ee/flaming_frame__bright.jpg" style="width: 500px; height: 258px;" /></p>
<p><em>Wary hawk takes wing amid wild autumn hues</em><br />
<img alt="Hawk on the wing" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/a1fd4a9278394d7917dd4928c6b09c13/soar_2.jpg" style="width: 500px; height: 528px;" /></p>
<p><em>Opportunistic apparitions hang around to haunt passers by</em><br />
<img alt="Ghosts" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/3c8869125bd9b7f17deef5a9506018c0/ghosts2.jpg" style="width: 500px; height: 209px;" /></p>
<p><em>Minitab World Headquarters looms large on the landscape</em><br />
<img alt="Minitab World Headquarters" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/a7e45f704289b992aa7f69eb39a92a8d/peeper_tab.jpg" style="width: 500px; height: 293px;" /></p>
<p> </p>
<p> </p>
Fun StatisticsStatisticsStatistics in the NewsFri, 18 Nov 2016 13:00:00 +0000http://blog.minitab.com/blog/data-analysis-and-quality-improvement-and-stuff/mutant-trees-lay-waste-to-the-landscape-and-reveal-mother-natures-lean-designGreg FoxCorrect Case Mismatches in Minitab, Fast
http://blog.minitab.com/blog/statistics-and-quality-improvement/correct-case-mismatches-in-minitab-fast
<p>In this day and age, it’s not uncommon that data entry errors occur in data sets that are so large that looking for and correcting the errors by hand is impractical. Fortunately, Minitab includes tools that make it easy to get your data into shape, so that you can proceed to getting the answers you need.</p>
<img alt="tropical forest" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/1224f3419c2de117d3ff61292117dba8/forest.jpg" style="width: 250px; height: 198px; margin: 10px 15px; float: right;" />
<p>Let’s say, for example, that you were going to look at the <a href="http://datadryad.org/resource/doi:10.5061/dryad.234">Global Wood Density Database</a>. It’s an exciting piece of work if you’re into wood density. Chave et al. called it “the largest compilation of wood density data to date, encompassing 8412 taxa, 1683 genera, 191 families” (2009). Kindly, however, it’s provided at datadryad.org as an Excel file.</p>
<p>As it turns out, there’s a minor error in the Region column (at least as of this writing). You’d probably hardly notice it, but there’s a case mismatch. A total of 4,182 rows are given the region South America (tropical) while 9 rows of the dataset are given the region South America (Tropical). This is the kind of thing that can cause problems in your analysis. If you suspect such an error exists, or just want to verify that it <em>doesn't</em>, it would be a real chore to pore through 4,191 rows of data in search of mismatches.</p>
<p>Fortunately, you could find them by doing a quick tally in Minitab.</p>
Find It
<ol>
<li>Choose <strong>Stat > Tables > Tally Individual Variables</strong>.</li>
<li>In <strong>Variables</strong>, enter <em>Region</em>. Click <strong>OK</strong>.</li>
</ol>
<p>In the output table, you can spot the case mismatch at the bottom.</p>
<p style="margin-left: 40px;"><img alt="9 instances of South America (Tropical) use an upper-case t, while the other 4182 use a lower-case t." src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/a87305c674b503a48151ce56930cafed/casemismatch.jpg" style="width: 304px; height: 348px;" /></p>
Fix it
<p>Fixing case mismatches is extremely easy in Minitab. Try this:</p>
<ol>
<li>Choose <strong>Data > Recode > To Text</strong>.</li>
<li>In <strong>Recode values in the following columns</strong>, enter <em>Region</em>.</li>
<li>In <strong>Method</strong>, select <strong>Recode individual values</strong>.</li>
<li>In the table that appears, scroll down to find the case mismatch. Then, in the <strong>Recoded value</strong> column, change <em>South America (Tropical)</em> so that it uses a lower-case t.</li>
<li>In <strong>Storage location for the recoded columns</strong>, select <strong>In the original columns</strong>. Click <strong>OK</strong>.</li>
</ol>
<p>The summary shows you the 9 instances that were changed.<strong> </strong></p>
<p style="margin-left: 40px;"><img alt="The summary shows the 9 rows with the upper-case t were corrected." src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/a6e80066c3ccd3ffdfa3296803be7693/recodesummary.jpg" style="width: 445px; height: 203px;" /></p>
Fix It Before It’s Even a Problem
<p>If you’re opening an Excel file, Minitab can fix case mismatches before you even know that they’re a problem. If you have the Global Wood Density Database saved and you open it, in Minitab, you’re presented with options for opening an Excel file. Try this:</p>
<ol>
<li>Choose <strong>File > Open</strong> and select the Excel file from your file system.</li>
<li>Click the tab titled <strong>Data</strong>, the name of the sheet with the data in the original Excel file.</li>
<li>Select <strong>Data has column names</strong>.</li>
<li>Click <strong>Options</strong>.</li>
<li>In <strong>Text columns</strong>, select <strong>Correct case mismatches</strong>. Click <strong>OK</strong> twice.</li>
</ol>
<p>If you tally the Regions column now, the correction to the column is already done.</p>
<p style="margin-left: 40px;"><img alt="All 4191 instances of South America (tropical) use a lower-case t." src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/9c2cecb06a411111ea72c638375d867b/casematch.jpg" style="width: 298px; height: 326px;" /></p>
Wrap-up
<p>To get the answers you need from your data, the data themselves have to be clean enough to analyze. Minitab provides a number of tools you can use to get your data ready faster, so that you can get on to the insights. Ready for more? Check out <a href="http://blog.minitab.com/blog/michelle-paret/3-tips-for-importing-excel-data-into-minitab">3 Tips for Importing Excel Data into Minitab</a>.</p>
References
<p>Chave J, Coomes DA, Jansen S, Lewis SL, Swenson NG, Zanne AE (2009). Towards a worldwide wood economics spectrum. Ecology Letters 12(4): 351-366. <a href="http://dx.doi.org/10.1111/j.1461-0248.2009.01285.x">http://dx.doi.org/10.1111/j.1461-0248.2009.01285.x</a></p>
<p>Zanne AE, Lopez-Gonzalez G, Coomes DA, Ilic J, Jansen S, Lewis SL, Miller RB, Swenson NG, Wiemann MC, Chave J (2009). Data from: Towards a worldwide wood economics spectrum. Dryad Digital Repository. <a href="http://dx.doi.org/10.5061/dryad.234">http://dx.doi.org/10.5061/dryad.234</a></p>
Data AnalysisStatisticsStatistics HelpWed, 16 Nov 2016 13:00:00 +0000http://blog.minitab.com/blog/statistics-and-quality-improvement/correct-case-mismatches-in-minitab-fastCody SteeleHow to Avoid Messing Up Your Pareto Charts
http://blog.minitab.com/blog/understanding-statistics/how-to-avoid-messing-up-your-pareto-charts
<p>Pareto charts are a special type of bar chart you can use to prioritize almost anything. This makes them very useful in making sound decisions. For example, if you have several possible quality improvement projects, but not enough time or people to do them all now, you can use a Pareto chart to identify which projects have the most potential for making meaningful improvement.</p>
<p>Pareto charts look somewhat similar to regular bar charts. In their simplest form, you collect counts (for different categories of defects, for example). Then each bar is ordered according to size or frequency, so you can determine which categories comprise the "vital few" that you should care about, and which are the "trivial many" and therefore less worthy of your attention.</p>
<p>In the example below, taken from a <a href="http://blog.minitab.com/blog/understanding-statistics/a-six-sigma-healthcare-project-part-1-examining-factors-with-a-pareto-chart">Six Sigma healthcare project</a>, you can see why Pareto charts are great for seeing where the largest gains might be made as you focus your improvement efforts.</p>
<p style="margin-left: 40px;"><img alt="Pareto chart example" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/4418104d45e4556ff52270cc56aec5c8/pareto_chart_of_reason.png" style="width: 576px; height: 384px;" /></p>
<p style="line-height: 20.8px;">Pareto charts are easy to understand and use. But, like any statistical tool, there are some things you need to keep in mind when you create and interpret the Pareto chart. Here are the top concerns to watch for so that you can get the most benefit from these simple but powerful tools. </p>
<ul>
<li style="line-height: 20.8px;">If you only collect data from a brief period of time, you may reach incorrect conclusions. This is particularly true if <a href="http://blog.minitab.com/blog/understanding-statistics/how-create-and-read-an-i-mr-control-chart">your process is unstable</a>. When the process is not in control, the causes may be unstable and the vital few problems may change from week to week. So collecting data for a single day may truly reflect your whole process. If your data are not reliable, or aren't truly representative of your population, your Pareto chart will give you a distorted picture of the distribution of defects and causes.<br />
</li>
<li style="line-height: 20.8px;">On the other hand, you don't want to collect data over too long a time period, either. Data collected during long periods of time may include changes that affect counts or frequencies, but shouldn't be included as causes. Check the data for stratification, or changes in the distribution of frequencies or counts over time.<br />
</li>
<li style="line-height: 20.8px;">Select the categories you will measure carefully. If your initial Pareto analysis does not yield useful results, make sure that your categories are meaningful and that your "other" category is not too large.<br />
</li>
<li style="line-height: 20.8px;"><a href="https://it.minitab.com/products/quality-trainer/glossary/glossarycontent/files/Pareto_chart_def.htm">Weighted Pareto charts</a> can be particularly useful in many situations. But weighting criteria need to be selected with care. For example, if the expense of certain defects are higher than others, cost may be a more meaningful basis for prioritization than number of occurrences.<br />
</li>
<li style="line-height: 20.8px;">Be clear about your ultimate goal, and choose your focus appropriately. Focusing on the problems that happen most often should cut the amount of rework that needs to happen. But focusing on problems with the highest cost should maximize an improvement project's financial benefits.<br />
</li>
<li style="line-height: 20.8px;">Use common sense. The purpose of a conducting a Pareto analysis is to identify where you might get the most "bang for your buck" in quality improvement, but you shouldn't ignore small, easily solved problems until all larger problems are solved. </li>
</ul>
<p style="line-height: 20.8px;">Do you have any tips or suggestions for using Pareto charts effectively? Please share them in the comments! </p>
Data AnalysisQuality ImprovementStatisticsMon, 14 Nov 2016 13:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/how-to-avoid-messing-up-your-pareto-chartsEston MartzPicking the Perfect Plot to Communicate Your Data
http://blog.minitab.com/blog/data-analysis-and-quality-improvement-and-stuff/picking-the-perfect-plot-to-communicate-your-data
<p>At the inaugural Minitab Insights Conference in September, presenters Benjamin Turcan and Jennifer Berner discussed <a href="http://blog.minitab.com/blog/data-analysis-and-quality-improvement-and-stuff/5-questions-to-ask-before-you-present-statistical-results">how to present data effectively</a>. Among the considerations they discussed was choosing the right graph.</p>
<p>Different graphs are good for different things. Of course, opinions about which graph is best can, and do, differ. Dotplot devotees might decide that they are demonstrably advantageous for all applications. On the other hand, determined dotplot detractors might beg to differ, and declare that they are decidedly good for nothing. (The dotplots that is, not the devotees. But I digress.)</p>
<p>In their presentation, Turcan and Berner divided the many uses for graphs into four broad categories:</p>
<ol>
<li>Examining relationships between variables.</li>
<li>Comparing groups.</li>
<li>Assessing how the parts comprise the whole.</li>
<li>Looking at how values are distributed.</li>
</ol>
<p>In this post I'll explore some examples of how Minitab's many marvelous graphs match up with this matrix.</p>
Examining relationships between variables
<p><img alt="Bubble plot" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/5ab0ae04699d1ac64484efba1be41204/bubblegraph.png" style="width: 209px; height: 119px; margin: 10px 15px; float: left; border-width: 1px; border-style: solid;" />Is your scrap rate higher on days with higher humidity? Do hospital admissions increase or decrease when the weather gets warmer? Does your pulse pound faster the more trips you make to the coffee machine? </p>
<p>Questions like these involve examining pairs of measurements. For example, you might record the high temperature each day as well as the number patients admitted to a hospital, and then use one of the following graphs to look for a pattern. </p>
<p><strong>Scatterplot and Fitted Line Plot</strong></p>
<p>The following post shows how to use both a scatterplot and a fitted line plot to good effect, <a href="http://blog.minitab.com/blog/the-statistics-game/march-madnesswith-minitab" target="_blank">March Madness…with Minitab</a>.</p>
<div>
<p><strong>Matrix Plot</strong></p>
<p>What if you want to evaluate several different pairs of variables? Instead of creating a bunch of separate scatterplots, you can use Minitab's convenient Matrix Plot functionality as discussed in this fine post, <a href="http://blog.minitab.com/blog/data-analysis-and-quality-improvement-and-stuff/the-matrix-its-a-complex-plot" target="_blank">The Matrix, It's a Complex Plot</a>.</p>
<p><strong>Contour Plot, 3D Scatterplot, 3D Surface Plot, and Bubble Plot</strong></p>
<p>Minitab also includes several graphs that allow you to explore the relationships among three variables at the same time, such as those discussed in <a href="http://blog.minitab.com/blog/real-world-quality-improvement/3-ways-to-graph-3-variables-in-minitab">3 Ways to Graph 3 Variables in Minitab</a> and <a href="http://blog.minitab.com/blog/starting-out-with-statistical-software/introducing-the-bubble-plot" target="_blank">Introducing the Bubble Plot</a>.</p>
Comparing groups
<p><img alt="Line plot" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/ef962ecb3d4a0d55a373ea9b4ae25a23/line_plot.png" style="width: 218px; height: 127px; margin: 10px 15px; float: right;" />Which shift produces the most scrap? Is it the same every day of the week, or does the first shift generate the most scrap on Mondays, and the last shift generates the most scrap on Fridays? Which wing of a hospital has the most empty beds? Is that the same for all four seasons of the year, or is the ER most crowded in the winter, while the maternity ward is most crowded in the spring?</p>
<p>These are the kinds of questions you can answer by comparing measurements across groups. The following graphs are well suited for this purpose.</p>
<p><strong>Bar Chart</strong></p>
<p>The following post shows how to use a bar chart to compare the means of different groups: <a href="http://blog.minitab.com/blog/starting-out-with-statistical-software/investigating-starfighters-with-bar-charts3a-function-of-a-variable">Investigating Starfighters with Bar Charts: Function of a Variable</a>.</p>
<p><em>Fun fact: </em>Did you know that Minitab's Bar Chart feature can create both a bar chart and a column chart? By default, Minitab orients the bars vertically. But you can easily flip (or "transpose") the axes to display the bars horizontally. Just double-click an axis and choose <strong>Transpose value and category scales</strong>. (For more helpful information on customizing axes, see <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/graph-options/graph-framework-elements/modifying-graph-scales/" target="_blank">Modifying graph scales</a>.)</p>
<p><strong>Line Plot</strong></p>
<p>Another way to visualize differences between groups is with a line plot, as shown in this post: <a href="http://blog.minitab.com/blog/understanding-statistics/how-to-explore-interactions-with-line-plots">How to Explore Interactions with Line Plots</a>.</p>
Assessing how the parts comprise the whole
<p><img alt="Pie chart" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/6e100b0a0828d55bd6e8b8fece69cdb1/pie_chart.jpg" style="width: 346px; height: 207px; float: left; margin: 10px 15px; border-width: 1px; border-style: solid;" />Are scratches, chips, and blisters all equally likely to mar the surface of a new car that rolls off your assembly line? Or is one defect more common than the others?</p>
<p>Do customers seem to call for help with each of your products equally often? Or does one of the products prove more troublesome than the others?</p>
<p>The following graphs can help you breakdown a variable into its constituent categories. </p>
<p><strong>Pie Chart, Stacked Bar Chart, Pareto Chart</strong></p>
<p>The post <a href="http://blog.minitab.com/blog/applying-statistics-in-quality-projects/analyzing-qualitative-data-part-1-pareto-pie-and-stacked-bar-charts" target="_blank">Analyzing Qualitative Data, part 1: Pareto, Pie, and Stacked Bar Charts</a> does a good job of comparing the relative merits of these useful plots.</p>
<p><strong>Area Graph</strong></p>
<p>As the post <a href="http://blog.minitab.com/blog/starting-out-with-statistical-software/area-graphs-an-underutilized-tool" target="_blank">Area Graphs: An Underutilized Tool</a> describes, an area graph is a great way to view multiple time series when each series is part of one whole. </p>
Looking at how values are distributed
<p><img alt="Histogram" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/279158ac94f9afdefdfc076d1cf47c48/histogramoutlier.jpg" style="width: 232px; height: 140px; margin: 10px 15px; float: right; border-width: 1px; border-style: solid;" />What is the range of values in my sample? Are the data distributed the same way this time as they were last time? Are there any unusual points that I should investigate?</p>
<p>The following graphs can help you answer these questions. </p>
<p><strong>Histogram and Dotplot</strong></p>
<p>For continuous data, you can use a histogram or a dotplot to look at the distribution. For examples, check out <a href="http://blog.minitab.com/blog/michelle-paret/3-things-a-histogram-can-tell-you" target="_blank">3 Things a Histogram Can Tell You</a> and <a href="http://blog.minitab.com/blog/real-world-quality-improvement/managing-diabetes-with-six-sigma-and-statistics-part-i" target="_blank">Managing Diabetes with Six Sigma and Statistics, Part I</a>.</p>
<p><strong>Bar Chart </strong></p>
<p>For discrete data, you can use a bar chart to look at the relative frequencies for each category. For example, see <a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/lost-baggage-its-all-relative">Analyzing Data about Lost Baggage: It’s All Relative (Frequency)</a>.</p>
What Are Your Go-To Graphs?
<p>These are just some possibilities of how you can use the many graphs available in Minitab Statistical Software to learn about your data and help present what you learn to others. You can find many other great examples on the Minitab Blog.</p>
<p>What are the graphs you like to use when presenting different kinds of data? Let us know in the comments! </p>
<p> </p>
</div>
Data AnalysisHypothesis TestingQuality ImprovementStatisticsStatistics HelpFri, 11 Nov 2016 13:00:00 +0000http://blog.minitab.com/blog/data-analysis-and-quality-improvement-and-stuff/picking-the-perfect-plot-to-communicate-your-dataGreg FoxHow Effective Are Flu Shots?
http://blog.minitab.com/blog/adventures-in-statistics/how-effective-are-flu-shots
<p><img alt="Influenza virus" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/9786f693e9bfb040dea4b7d56bf5c60e/influenza_virus.jpg" style="float: right; width: 175px; height: 256px; margin: 10px 15px;" />Once again, with the arrival of autumn, it's time for a flu shot.</p>
<p>I get a flu shot every year even though I know they’re not perfect. I figure they’re a relatively easy and inexpensive way to reduce the chance of having a miserable week.</p>
<p>I’ve heard on various news media that their effectiveness is about 60%. But what does 60% effectiveness mean, exactly? How much does this actually reduce the chances that I’ll get the flu in any given year? I'm going to explore this and go beyond the news media simplification and present you with very clear answers to these questions. Quite frankly, some of the results were not what I expected.</p>
We’ll Find Our Answers in Randomized, Controlled Trials (RCTs)
<p>I’m a numbers guy. I use numbers to understand the world. My background is in research, so when I want to understand an issue, I look at the primary research. If I can understand the researchers’ methodology, the data they collect, and how they draw their conclusions, I’ll understand the issue at a deeper, more fundamental level than news reports typically provide. </p>
<p>To understand flu shot effectiveness, I’m only going to assess double-blind, randomized controlled trials, the gold standard. These studies are more expensive to conduct but provide better results than observational studies. (I discuss the differences between these two types of studies in my post about the <a href="http://blog.minitab.com/blog/adventures-in-statistics/statistics-that-affect-you-are-vitamin-supplements-really-harmful" target="_blank">benefits of vitamins</a>.)</p>
<p>The two influenza vaccination studies I’ll look at satisfy the above criteria and are listed in a section of references for health professionals on the CDC’s <a href="http://www.cdc.gov/flu/professionals/vaccination/effectivenessqa.htm#references" target="_blank">website</a>. Presumably these studies make a good case, using trusted data. Along the way, we’ll use Minitab <a href="http://www.minitab.com/products/minitab">statistical software</a> to analyze their data for ourselves.</p>
Defining the Effectiveness of Flu Shots
<p>Flu shots contain vaccine for three influenza viruses that researchers predict will be the most common in a given flu season. However, plenty of other viruses (flu and otherwise) also are circulating and can make you sick. Many illnesses with flu-like symptoms are incorrectly attributed to the flu.</p>
<p>Consequently, the best studies use a lab to identify the specific virus that infects each of their sick subjects. These studies only count the subjects with confirmed cases of the three types of influenza virus. Effectiveness is defined as the reduction in these three influenza viruses among those who were vaccinated compared to those who were not vaccinated.</p>
The Two Studies of the Flu Vaccine
<p>It’s time to dig into the data! For me, this is where it gets exciting. You can hear about effectiveness on TV, but this is where it all comes from: counts of sick people in the experimental groups.</p>
The Beran Study
<p>The Beran et al. study1 assesses the 2006/2007 flu season and tracks its subjects from September to May. Subjects in this study range from 18-64 years old.</p>
<p style="text-align: center;"><strong>Treatment</strong></p>
<p style="text-align: center;"><strong>Flu count</strong></p>
<p style="text-align: center;"><strong>Group size</strong></p>
<p style="text-align: center;">Shot</p>
<p style="text-align: center;">49</p>
<p style="text-align: center;">5103</p>
<p style="text-align: center;">Placebo</p>
<p style="text-align: center;">74</p>
<p style="text-align: center;">2549</p>
<p>Because we want to compare the proportions between two groups, we’ll use the Two Proportions test in Minitab. To do this yourself, in Minitab, go to <strong>Stat > Basic Statistics > 2 Proportions</strong>. In the dialog, choose <strong>Summarized data</strong> and enter the data from the table above. Click<strong> OK</strong>, and you get the results below:</p>
<p style="margin-left: 40px;"><img alt="Minitab's Two Proportions test for the flu data" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/a3f37da27803215fb8fa3cd85d0b7924/flustudyberan.gif" style="width: 471px; height: 209px;" /></p>
<p>The p-value of 0.000 tells us that there is a significant difference between the two groups. The estimated difference between the vaccinated group and the placebo group is 1.9 percentage points. Because this is an RCT, it's fairly safe to assume that the vaccination caused the difference between the groups. However, outside of a randomized experiment, it's not wise to assume causality.</p>
<p>The vaccine effectiveness (or efficacy) is a relative reduction in risk between the two groups. You simply take the relative risk ratio of (vaccinated proportion/unvaccinated proportion) and subtract that from 1. We can get the proportion for each group from the Sample p column in Minitab’s output:</p>
<p style="margin-left: 40px;">1 - (0.009602/0.029031) = 0.669</p>
<p>This study finds a 66.9% vaccine efficacy for the flu shot compared to the placebo.</p>
The Monto Study
<p>The Monto et al study2 assesses the 2007-2008 flu season and tracks its subjects from January to April. Subjects in this study range from 18-49 years old.</p>
<p style="text-align: center;"><strong>Treatment</strong></p>
<p style="text-align: center;"><strong>Flu count</strong></p>
<p style="text-align: center;"><strong>Group size</strong></p>
<p style="text-align: center;">Shot</p>
<p style="text-align: center;">28</p>
<p style="text-align: center;">813</p>
<p style="text-align: center;">Placebo</p>
<p style="text-align: center;">35</p>
<p style="text-align: center;">325</p>
<p>We’ll do the Two Proportions test again for this study. This time, enter the numbers from the above table into the dialog.</p>
<p style="margin-left: 40px;"><img alt="Minitab's Two Proportions test for the flu data" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/89fb6b907bf288375b5faaeba5ff85ee/flustudymonto.gif" style="width: 469px; height: 210px;" /></p>
<p>Again, the p-value indicates that there is a significant difference between the two groups. The estimated difference between the vaccinated group and the placebo group is 7.3 percentage points. Let's calculate the effectiveness:</p>
<p style="margin-left: 40px;">1 – (0.034440/0.107692) = 0.680</p>
<p>This study finds a 68.0% vaccine efficacy for the flu shot compared to the placebo.</p>
Conclusions So Far
<p>We’ve looked at the data from two gold-standard studies and have drawn the same conclusions that you commonly hear on the news. Flu shots significantly reduce the number of influenza infections, and they are about 68% effective.</p>
<p>However, looking at the data and analyses myself, I have new insights. Specifically, the low number of influenza cases in the placebo group for each study caught my eye, and that’s what we’re looking at next.</p>
What It Means for You: Relative versus Absolute Risk
<p>If you’re like me, the 68% effective statistic isn’t too helpful. The problem is that it is a relative comparison of risk, not an absolute assessment of risk. To illustrate the difference, consider which type of assessment is more useful:</p>
<ol>
<li><strong>Relative assessment:</strong> Your car is travelling half as fast as another car, but you don’t know the true speed of either car.<br />
</li>
<li><strong>Absolute assessment:</strong> Your car is travelling at 30 MPH and the other car is travelling at 60 MPH.</li>
</ol>
<p>Clearly, #2 is much more useful. Similarly, it would be more helpful to know the absolute risk of catching the flu if you get the shot versus not getting it!</p>
Vaccine effectiveness is a relative risk
<p>Vaccine effectiveness doesn’t tell you the exact risk of catching the flu for either group. Instead, it involves dividing one proportion by the other for the relative risk. In fact, as you should recall, effectiveness is the inverse of the relative risk, which makes it even <em>harder</em> to interpret. 67% effectiveness indicates that a vaccinated person has one-third the risk of contracting the flu as a non-vaccinated person.</p>
<p>Unfortunately, using these numbers, we don’t know the absolute risk for anyone!</p>
The group proportions are the absolute risks
<p>We can estimate the absolute risk from the studies by looking at the proportion for each group in the Minitab output, and subtracting to calculate the absolute reduction. I’ll summarize this information below as percentages and even add in the results for two more flu seasons from another study that the CDC references (Bridges et al.3):</p>
<p style="text-align: center;"><strong>Flu season</strong></p>
<strong>Placebo</strong>
<p style="text-align: center;"><strong>Flu Shot</strong></p>
<p style="text-align: center;"><strong>% Point Reduction</strong></p>
<p style="text-align: center;">1997/98</p>
4.4
<p style="text-align: center;">2.2</p>
<p style="text-align: center;">2.2</p>
<p style="text-align: center;">1998/99</p>
10.0
<p style="text-align: center;">1.0</p>
<p style="text-align: center;">9.0</p>
<p style="text-align: center;">2006/07</p>
2.9
<p style="text-align: center;">1.0</p>
<p style="text-align: center;">1.9</p>
<p style="text-align: center;">2007/08</p>
10.8
<p style="text-align: center;">3.4</p>
<p style="text-align: center;">7.4</p>
<p style="text-align: center;"><strong>Average</strong></p>
<strong>7.0</strong>
<p style="text-align: center;"><strong>1.9</strong></p>
<p style="text-align: center;"><strong>5.1</strong></p>
<p>Notice how the risk of getting the flu varies by flu season? The differences are not surprising because the studies use different samples and the flu seasons have different influenza viruses.</p>
<p>So let’s look at the average of these four flu seasons. If you aren’t vaccinated, you have a 7.0% chance of getting the flu. However, if you do get the flu shot, your risk is about 1.9%, which is a reduction of 5.1 percentage points.</p>
<p>Hmm. The "5.1% reduction" doesn’t sound nearly as impressive as the "67% effectiveness!" Both statistics are based on the same data, but I think the estimate of absolute risk is a more useful way to present the results.</p>
Closing Thoughts about the Flu Shot Data
<p>I was surprised by the results. While I knew flu shots were not perfect, I always got them because I thought they reduced my risk by more than what the CDC recommended studies actually show. Even if you aren’t vaccinated, your risk of getting the flu isn’t too high.</p>
<p>That probably explains why a number of people have told me that while they never get flu shots, they can’t remember having the flu!</p>
<p>These more subtle results made me wonder about flu vaccinations on a societal scale. Could the flu vaccine possibly reduce flu cases enough to save sufficient money (lost workdays, doctor and drug costs, etc) to pay for the vaccinations?</p>
<p>Bridges et al. conducted a cost-benefit analysis in their study. For the two flu seasons where they tracked flu vaccinations, infections, and expenditures, the vaccinations actually <em>increase </em>net societal costs. It would’ve been cheaper overall not to get vaccinated!</p>
<p>In light of this, I wasn’t surprised when I read an <a href="http://www.cnn.com/2013/01/17/health/flu-vaccine-policy/index.html?hpt=hp_bn12" target="_blank">article</a> on CNN.com that said, outside of the U.S. and Canada, other countries do not strongly encourage all of their citizens above 6 months to get the flu shot. According to the article, “global health experts say the data aren’t there yet to support this kind of vaccination policy, nor is there enough money.”</p>
<p>I understand this viewpoint better now.</p>
<p><strong><em>However, I’m not trying to talk anyone out of getting a flu shot.</em></strong> I’m on the fence myself. While the risk of getting the flu in any given year is fairly small, if you regularly get the flu shot, you’ll probably spare yourself a week of misery at some point! You should always consult a medical professional to determine the best decision for your specific situation.</p>
<p>In another post, I look at the <a href="http://blog.minitab.com/blog/adventures-in-statistics/flu-shot-followup-assessing-the-long-term-benefits-of-flu-vaccination">long-term benefits of flu vaccinations</a>.</p>
<p><strong>References</strong></p>
<p>1. Beran J, Vesikari T, Wertzova V, Karvonen A, Honegr K, Lindblad N, Van Belle P, Peeters M, Innis BL, Devaster JM. Efficacy of inactivated split-virus influenza vaccine against culture-confirmed influenza in healthy adults: a prospective, randomized, placebo-controlled trial. J Infect Dis 2009;200(12):1861-9</p>
<p>2. Monto AS, Ohmit SE, Petrie JG, Johnson E, Truscon R, Teich E, Rotthoff J, Boulton M, Victor JC. Comparative efficacy of inactivated and live attenuated influenza vaccines. N Engl J Med. 2009;361(13):1260-7</p>
<p>3. Bridges CB, Thompson WW, Meltzer MI, Reeve GR, Talamonti WJ, Cox NJ, Lilac HA, Hall H, Klimov A, Fukuda K. Effectiveness and cost-benefit of influenza vaccination of healthy working adults: A randomized controlled trial. JAMA. 2000;284(13):1655-63</p>
Data AnalysisWed, 09 Nov 2016 12:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/how-effective-are-flu-shotsJim FrostCommon Assumptions about Data (Part 2: Normality and Equal Variance)
http://blog.minitab.com/blog/quality-business/common-assumptions-about-data-part-2-normality-and-equal-variance
<p>In Part 1 of this <a href="http://blog.minitab.com/blog/quality-business/common-assumptions-about-data-part-1-random-samples-and-statistical-independence">blog</a> series, I wrote about how statistical inference uses data from a sample of individuals to reach conclusions about the whole population. That’s a very powerful tool, but you must check your assumptions when you make statistical inferences. Violating any of these assumptions can result in false positives or false negatives, thus invalidating your results. <img alt="Horse and Cart sign" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/8230e7c2bc193a831158677a70eb0146/chile_road_sign_po_4.svg" style="width: 101px; height: 101px; margin: 10px 15px; float: right;" /></p>
<p>The common data assumptions are: random samples, independence, normality, equal variance, stability, and that your measurement system is accurate and precise.</p>
<p>I addressed random samples and statistical independence last time. Now let’s consider the assumptions of Normality and Equal Variance.</p>
What Is the Assumption of Normality?
<p>Before you perform a statistical test, you should find out the distribution of your data. If you don’t, you risk selecting an inappropriate statistical test. Many statistical methods start with the assumption your data follow the normal distribution, including the 1- and 2-Sample t tests, Process Capability, I-MR, and ANOVA. If you don’t have normally distributed data, you might use an <a href="http://blog.minitab.com/blog/understanding-statistics/data-not-normal-try-letting-it-be-with-a-nonparametric-hypothesis-test">equivalent non-parametric test</a> based on the median instead of the mean, or try the Box-Cox or Johnson Transformation to transform your non-normal data into a normal distribution.</p>
<p align="center"><img alt="Normal and Skewed Curves" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/01451195cce5757849948e3871c28187/1_curves.png" style="border-width: 0px; border-style: solid; width: 554px; height: 179px; margin: 10px 15px;" /></p>
<p>But keep in mind that many statistical tools based on the assumption of normality do not actually <em>require</em> normally distributed data if the sample sizes are at least 15 or 20. But if sample sizes are less than 15 and the data are not normally distributed, the p-value may be inaccurate and you should interpret the results with caution.</p>
<p>There are several methods to determine normality in Minitab, and I’ll discuss two of the tools in this post: the Normality Test and the Graphical Summary. </p>
<p>Minitab’s Normality Test will generate a probability plot and perform a one-sample hypothesis test to determine whether the population from which you draw your sample is non-normal. The null hypothesis states that the population is normal. The alternative hypothesis states that the population is non-normal.</p>
<p>Choose <strong>Stat > Basic Statistics > Normality Test</strong></p>
<p align="center"><img alt="Normality Test" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/363dab1dcf97061dd0075ab38aae2ee3/2_normality_test.png" style="border-width: 0px; border-style: solid; width: 583px; height: 306px; margin: 10px 15px;" /></p>
<p>When evaluating the distribution fit for the normality test:</p>
<ul>
<li>The plotted points will roughly form a straight line. Some departure from the straight line at the tails may be okay as long as it stays within the confidence limits.</li>
<li>The plotted points should fall close to the fitted distribution line and pass the “fat pencil” test. Imagine a "fat pencil" lying on top of the fitted line: If it covers all the data points on the plot, the data are probably normal.</li>
<li>The associated Anderson-Darling statistic will be small.</li>
<li>The associated p-value will be larger than your chosen α-level (commonly chosen levels for α include 0.05 and 0.10).</li>
</ul>
<p>The Anderson-Darling statistic is a measure of how far the plot points fall from the fitted line in a probability plot. The statistic is a weighted squared distance from the plot points to the fitted line with larger weights in the tails of the distribution. For a specified data set and distribution, the better the distribution fits the data, the smaller this statistic will be.</p>
<p>Minitab’s Descriptive Statistics with the Graphical Summary will generate a nice visual display of your data and calculate the Anderson-Darling & p-value. The graphical summary displays four graphs: histogram of data with an overlaid normal curve, boxplot, and 95% confidence intervals for both the mean and the median.</p>
<p>Choose <strong>Stat > Basic Statistics > Graphical Summary</strong></p>
<p align="center"><img alt="Probability Plot" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/9681575c2cdb6cfebde643d73a5e5ca0/3_probability_plot.png" style="border-width: 0px; border-style: solid; width: 599px; height: 350px; margin: 10px 15px;" /></p>
<p>When interpreting a graphical summary report for normality: </p>
<ul>
<li>The data will be displayed as a histogram. Look for how your data is distributed (normal or skewed), how the data is spread across the graph, and if there are outliers.</li>
<li>The associated Anderson-Darling statistic will be small.</li>
<li>The associated p-value will be larger than your chosen α-level (commonly chosen levels for α include 0.05 and 0.10).</li>
</ul>
<p>For some processes, such as time and cycle data, the data will never be normally distributed. Non-normal data are fine for some statistical methods, but make sure your data satisfy the <a href="http://blog.minitab.com/blog/fun-with-statistics/forget-statistical-assumptions-just-check-the-requirements">requirements</a> for your particular analysis.</p>
What Is the Assumption of Equal Variance?
<p>In simple terms, variance refers to the data spread or scatter. Statistical tests, such as analysis of variance (ANOVA), assume that although different samples can come from populations with different means, they have the same variance. Equal variances (homoscedasticity) is when the variances are approximately the same across the samples. Unequal variances (heteroscedasticity) can affect the Type I error rate and lead to false positives. If you are comparing two or more sample means, as in the 2-Sample t-test and ANOVA, a significantly different variance could overshadow the differences between means and lead to incorrect conclusions. </p>
<p>Minitab offers several methods to test for equal variances. Consult <a href="http://support.minitab.com/en-us/minitab/17/topic-library/modeling-statistics/anova/basics/understanding-test-for-equal-variances/">Minitab Help</a> to decide which method to use based on the type of data you have. You can also use the Minitab Assistant to check this assumption for you. (Tip: When using the Assistant, click “more” to see data collection tips and important information about how Minitab calculates your results.)</p>
<p align="center"><img alt="Hypothesis Assistant" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/cd958e1efe31a3a0c3acdc818971100c/4_hypothesis_assistant.png" style="border-width: 0px; border-style: solid; width: 402px; height: 318px; margin: 10px 15px;" /></p>
<p>After the analysis is performed, check the Diagnostic Report for the test interpretation and the Report Card for alerts to unusual data points or assumptions that were not met. (Tip: When performing the 2-Sample t test and ANOVA, the Assistant takes a more conservative approach and uses calculations that do not depend on the assumption of equal variance.)</p>
<p><img alt="Assistant Reports" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/a1d56fd284c40360bc62f96e04e69e59/5_assistant_reports.png" style="border-width: 0px; border-style: solid; width: 656px; height: 245px; margin: 10px 15px;" /></p>
The Real Reason You Need to Check the Assumptions
<p>You will be putting a lot of time and effort into collecting and analyzing data. After all the work you put into the analysis, you want to be able to reach correct conclusions. Some analyses are robust to departures from these assumptions, but take the safe route and check! You want to be confident that you can tell whether observed differences between data samples are simply due to chance, or if the populations are indeed different! </p>
<p>It’s easy to put the cart before the horse and just plunge in to the data collection and analysis, but it’s much wiser to take the time to understand which data assumptions apply to the statistical tests you will be using, and plan accordingly.</p>
<p>In my next blog post, I will review the <a href="http://blog.minitab.com/blog/quality-business/common-assumptions-about-data-part-3-stability-and-measurement-systems">common assumptions about stability and the measurement system</a>. </p>
Data AnalysisHypothesis TestingStatisticsStatistics HelpStatsMon, 07 Nov 2016 15:36:00 +0000http://blog.minitab.com/blog/quality-business/common-assumptions-about-data-part-2-normality-and-equal-varianceBonnie K. StoneWhat Are T Values and P Values in Statistics?
http://blog.minitab.com/blog/statistics-and-quality-data-analysis/what-are-t-values-and-p-values-in-statistics
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/6f4053a89257952fef0b9998547dffe2/tweedle_tweedledum.jpg" style="line-height: 20.8px; float: right; width: 248px; height: 255px; margin: 10px 15px;" /></p>
<p>If you’re not a statistician, looking through statistical output can sometimes make you feel a bit like <em>Alice in</em> <em>Wonderland. </em>Suddenly, you step into a fantastical world where strange and mysterious phantasms appear out of nowhere. </p>
<p>For example, consider the T and P in your t-test results.</p>
<p>“Curiouser and curiouser!” you might exclaim, like Alice, as you gaze at your output.</p>
<p><img alt="One-Sample T test output" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/1e5a4c064f43f19169121222402e4560/t_test_results_one_sided.jpg" style="width: 467px; height: 121px;" /></p>
<p>What are these values, really? Where do they come from? Even if you’ve used the p-value to interpret the statistical significance of your results<span style="line-height: 20.7999992370605px;"> </span><span style="line-height: 20.7999992370605px;">umpteen times</span><span style="line-height: 1.6;">, its actual origin may remain murky to you.</span></p>
T & P: The Tweedledee and Tweedledum of a T-test
<p>T and P are inextricably linked. They go arm in arm, like Tweedledee and Tweedledum. Here's why.</p>
<p>When you perform a t-test, you're usually trying to find evidence of a significant difference between population means (2-sample t) or between the population mean and a hypothesized value (1-sample t). <a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/what-is-a-t-test-and-why-is-it-like-telling-a-kid-to-clean-up-that-mess-in-the-kitchen">The t-value measures the size of the difference relative to the variation in your sample data</a>. Put another way, T is simply the calculated difference represented in units of standard error. The greater the magnitude of T (it can be either positive or negative), the greater the evidence <em>against </em>the null hypothesis that there is no significant difference. The closer T is to 0, the more likely there isn't a significant difference.</p>
<p>Remember, the t-value in your output is calculated from only one sample from the entire population. It you took repeated random samples of data from the same population, you'd get slightly different t-values each time, due to random sampling error (which is really not a mistake of any kind–it's just the random variation expected in the data).</p>
<p>How different could you expect the t-values from many random samples from the same population to be? And how does the t-value from your sample data compare to those expected t-values?</p>
<p>You can use a t-distribution to find out.</p>
Using a t-distribution to calculate probability
<p>For the sake of illustration, assume that you're using a 1-sample t-test to determine whether the population mean is greater than a hypothesized value, such as 5, based on a sample of 20 observations, as shown in the above t-test output.</p>
<ol>
<li>In Minitab, choose <strong>Graph > Probability Distribution Plot</strong>.</li>
<li>Select <strong>View Probability</strong>, then click <strong>OK</strong>.</li>
<li>From <strong>Distribution</strong>, select <strong>t</strong>.</li>
<li>In <strong>Degrees of freedom</strong>, enter <em>19</em>. (For a 1-sample t test, the degrees of freedom equals the sample size minus 1).</li>
<li>Click <strong>Shaded Area</strong>. Select <strong>X Value</strong>. Select <strong>Right Tail</strong>.</li>
<li> In <strong>X Value</strong>, enter 2.8 (the t-value), then click <strong>OK</strong>.</li>
</ol>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/bc5183a42a169d45632fd4f6c0b153b3/distribution_plot_t_2.8" style="width: 576px; height: 384px;" /></p>
<p>The highest part (peak) of the distribution curve shows you where you can expect most of the t-values to fall. Most of the time, you’d expect to get t-values close to 0. That makes sense, right? Because if you randomly select representative samples from a population, the mean of most of those random samples from the population should be close to the overall population mean, making their differences (and thus the calculated t-values) close to 0.</p>
T values, P values, and poker hands
<p>T values of larger magnitudes (either negative or positive) are less likely. The far left and right "tails" of the distribution curve represent instances of obtaining extreme values of t, far from 0. For example, the shaded region represents the probability of obtaining a t-value of 2.8 or greater. Imagine a magical dart that could be thrown to land randomly anywhere under the distribution curve. What's the chance it would land in the shaded region? The calculated probability is 0.005712.....which rounds to 0.006...which is...the p-value obtained in the t-test results! <img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/5633b267494c2017d6d7c7544247d57d/poker_picture.jpg" style="float: right; width: 200px; height: 164px; margin: 10px 15px;" /></p>
<p>In other words, the probability of obtaining a t-value of 2.8 or higher, when sampling from the same population (here, a population with a hypothesized mean of 5), is approximately 0.006.</p>
<p>How likely is that? Not very! For comparison, the probability of being dealt 3-of-a-kind in a 5-card poker hand is over three times as high (≈ 0.021).</p>
<p>Given that the probability of obtaining a t-value this high or higher when sampling from this population is so low, what’s more likely? It’s more likely this sample doesn’t come from this population (with the hypothesized mean of 5). It's much more likely that this sample comes from different population, one with a mean greater than 5.</p>
<p>To wit: Because the p-value is very low (< alpha level), you reject the null hypothesis and conclude that there's a statistically significant difference.</p>
<p>In this way, T and P are inextricably linked. Consider them simply different ways to quantify the "extremeness" of your results under the null hypothesis. You can’t change the value of one without changing the other.</p>
<p>The larger the absolute value of the t-value, the smaller the p-value, and the greater the evidence against the null hypothesis.(You can verify this by entering lower and higher t values for the t-distribution in step 6 above).</p>
Try this two-tailed follow up...
<p>The t-distribution example shown above is based on a one-tailed t-test to determine whether the mean of the population is greater than a hypothesized value. Therefore the t-distribution example shows the probability associated with the t-value of 2.8 only in one direction (the right tail of the distribution).</p>
<p>How would you use the t-distribution to find the p-value associated with a t-value of 2.8 for two-tailed t-test (in both directions)?</p>
<p><strong>Hint:</strong> In Minitab, adjust the options in step 5 to find the probability for both tails. If you don't have a copy of Minitab, download a free <a href="http://www.minitab.com/en-us/products/minitab/free-trial/" target="_blank">30-day trial version</a>.</p>
Hypothesis TestingFri, 04 Nov 2016 12:10:00 +0000http://blog.minitab.com/blog/statistics-and-quality-data-analysis/what-are-t-values-and-p-values-in-statisticsPatrick RunkelSimulating the U.S. Presidential Election of 2016
http://blog.minitab.com/blog/adventures-in-statistics/simulating-the-us-presidential-election-of-2016
<p>Regardless of who you support in the upcoming U.S. election, we can all agree that it’s been a very bumpy ride! It’s been a particularly chaotic election cycle. Wouldn’t it be nice if we could peek into the future and see potential election results right now? That’s what we'll do in this post!<img alt="clinton and trump" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/54dbb8d14ef02df09c3ebd5773cf3eea/clinton_trump.jpg" style="width: 300px; height: 149px; margin: 10px 15px; float: right;" /></p>
<p>In 2012, I used binary logistic regression to <a href="http://blog.minitab.com/blog/adventures-in-statistics/predicting-the-us-presidential-election-evaluating-two-models-part-one" target="_blank">predict that President Obama would be reelected for a second term</a>. That model requires that an incumbent is running for reelection. With no incumbent this time, I’ll need another approach. I’ve decided to use a <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-monte-carlo-simulation-with-an-example" target="_blank">Monte Carlo simulation</a>.</p>
<p>By simulating the election 100,000 times, we can examine the distribution of outcomes to determine probabilities for the election winner and to determine which states are the most important to win.</p>
Using Monte Carlo Simulation for the Election
<p>Monte Carlo simulations use a mathematical model to create simulated data for a system or a process in order to evaluate outcomes. I’ll simulate the upcoming election 100,000 times so we can determine which outcomes are more common or rare.</p>
<p>Imagine if we flip 50 coins. Basic probability tells us we should expect 25 heads and 25 tails, but while that is the most likely outcome, it happens only 11% of the time. There is a distribution of other outcomes around the most likely outcome.</p>
<p>The Monte Carlo simulation essentially treats the election as if we were flipping 51 coins (the states plus the District of Columbia). However, we’re using funny coins. For one thing, they have Donald Trump on one side and Hillary Clinton on the other! Also, these coins don’t necessarily have a 50/50 probability, and the probability changes over time. Currently, the Texas coin has 93% chance of showing Trump while the Wisconsin coin has an 80% chance of showing Clinton. The Florida coin, which is very important in our simulation, happens to be very balanced. It has a 51.1% chance of showing Clinton and 48.9% chance of showing Trump.</p>
<p>The U.S. Presidential election awards electoral votes to the winner of each state and the District of Columbia. The winner of a state gets all of the electoral votes for that state, which varies by population. When a candidate obtains 270 or more electoral votes, he or she wins the election.</p>
<p>I’ll have each state and Washington, D.C., flip their coin 100,000 times using the probabilities that <a href="http://projects.fivethirtyeight.com/2016-election-forecast/?ex_cid=rrpromo" target="_blank">Nate Silver calculated</a> on November 2, 2016. The transfer equation for this simulation awards the electoral votes to the winner of each state.</p>
Simulation Results for the Presidential Election
<p><img alt="Distribution of simulated electoral votes for Hillary Clinton" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/ef7280793878a49c9df9b3b7140e654d/evs_clinton_histo.png" style="width: 576px; height: 384px;" /></p>
<p>The simulation results show that Hillary Clinton currently has the advantage. Over the 100,000 simulated elections, Clinton’s electoral votes range from 149 to 412, with the most likely value of 301. In 95% of the simulated results, Clinton’s electoral votes fall within the range of 247 to 355. Clinton obtains at least 270 electoral votes in 87% of the simulated elections.</p>
<p>While the simulation gives Clinton an overall 87% chance of winning, the probabilities change as candidates win specific states. For example, Florida is a crucial state in this election because it has the largest single state impact on a candidate’s probability of winning the election.</p>
<p style="margin-left: 40px;"><img alt="Probability of winning the election based on the winner of Florida" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/dfef97a9cc9b4a2bee78b048f50c00f0/flwinner_pie.png" style="width: 580px; height: 387px;" /></p>
<p>The pie chart shows the probabilities of winning based on the winner of Florida. If Trump doesn’t win Florida, he is essentially out of the race. In simulated elections where Clinton wins Florida, Trump wins the election only 2.5% of the time.</p>
Using Binary Logistic Regression to Dig Deeper into the Simulation
<p>We can also use binary logistic regression to probe our simulated results. Binary logistic regression produces <a href="http://support.minitab.com/en-us/minitab/17/topic-library/modeling-statistics/regression-and-correlation/logistic-regression/what-is-the-odds-ratio/" target="_blank">odds ratios</a> that help us identify the states which have the greatest impact on a candidate's probability of winning the election.</p>
<p>Here, an odds ratio represents the odds of winning the election if a candidate wins a given state divided by the odds of winning the election if a candidate loses that state. The larger the odds ratio, the more important the state is to win. Among the battleground states, there is quite a large range of odds ratios—from Florida at 137.3 to Iowa at 2.7. The list includes the top 10 battleground states. </p>
<p style="text-align: center;"><strong>State</strong></p>
<p style="text-align: center;"><strong>Odds ratio</strong></p>
<p style="text-align: center;">Florida</p>
<p style="text-align: center;">137.3</p>
<p style="text-align: center;">Pennsylvania</p>
<p style="text-align: center;">29.7</p>
<p style="text-align: center;">Ohio</p>
<p style="text-align: center;">23.3</p>
<p style="text-align: center;">Georgia</p>
<p style="text-align: center;">15.8</p>
<p style="text-align: center;">Michigan</p>
<p style="text-align: center;">15.3</p>
<p style="text-align: center;">North Carolina</p>
<p style="text-align: center;">13.4</p>
<p style="text-align: center;">Virginia</p>
<p style="text-align: center;">9.2</p>
<p style="text-align: center;">Arizona</p>
<p style="text-align: center;">6.8</p>
<p style="text-align: center;">Wisconsin</p>
<p style="text-align: center;">5.5</p>
<p style="text-align: center;">Colorado</p>
<p style="text-align: center;">4.6</p>
<p>The list is pretty cool because it quantifies the importance of each state, and the top states match those you hear about on the news media most frequently.</p>
What to Watch for on Election Night
<p>This simulation indicates that Hillary Clinton is favored to win the election. Consequently, I’m going to focus on what it will take for Donald Trump to win. The five most important states can indicate the direction that the entire election is headed. As an added benefit, these states are mostly in the Eastern time zone, so you can use them to gain an earlier idea of who will ultimately win and how the close the election is likely to be.</p>
<p>Here’s how to read the table below. I start out with the assumption that Trump wins Florida because otherwise he has only a 2.5% chance of winning. For each subsequent row in the table, I add in the next state from the top 5 in which he has the greatest probability of winning and indicate both the chance of winning that state and the election. For example, the second row shows that Trump has an 83.9% chance of winning Georgia and, if he wins both Florida and Georgia, he has a 26.9% chance of winning the election.</p>
<p>Each additional row after Georgia represents a state that is harder for Trump to win. Trump has to win at least four of these states to have a greater than 50% chance of winning the election.</p>
<p style="text-align: center;"><strong>Trump States</strong></p>
<p style="text-align: center;"><strong>Chance of Trump Winning</strong></p>
<p style="text-align: center;"><strong>Most Likely Electoral Votes</strong></p>
<p style="text-align: center;">Florida (48.9%)</p>
<p style="text-align: center;">23.9%</p>
<p style="text-align: center;">285 Clinton</p>
<p style="text-align: center;">FL + GA (83.9%)</p>
<p style="text-align: center;">26.9%</p>
<p style="text-align: center;">283 Clinton</p>
<p style="text-align: center;">FL + GA + OH (61.2%)</p>
<p style="text-align: center;">37.2%</p>
<p style="text-align: center;">276 Clinton</p>
<p style="text-align: center;">FL + GA + OH + PA (22%)</p>
<p style="text-align: center;">70.5%</p>
<p style="text-align: center;">278 Trump</p>
<p style="text-align: center;">FL + GA + OH + PA + MI (21.2%)</p>
<p style="text-align: center;">91.9%</p>
<p style="text-align: center;">291 Trump</p>
<p>The table gets tough for Trump starting in the fourth row, where he needs to win Pennsylvania. However, if he wins Florida, Georgia, and Ohio—which is not an extremely unlikely combination—he'll have a 37% chance of winning the election. In this specific scenario, the electoral vote is likely to be closer than many might expect because Clinton's most likely number of electoral votes is 276. Of course, there is a margin of error around this expected value, which is why Trump has a chance to win.</p>
<p>In short, right now it is difficult for Trump to win, but it is entirely possible that the election will be a squeaker! Watching these key states will give you a forecast of where the race is headed.</p>
<p>There are a few caveats for these results. The probabilities for winning the election are based on simulated results. The underlying state probabilities are based on the status of the race on November 2 and these can change by Election Day. Additionally, early voting has already commenced in a number of states in which the state probabilities were different than they are now.</p>
<p>Despite these caveats, this Monte Carlo simulation shows the overall state of the race and which states are most important for a candidate’s chances of winning.</p>
Monte Carlo SimulationRegression AnalysisStatistics in the NewsThu, 03 Nov 2016 13:06:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/simulating-the-us-presidential-election-of-2016Jim Frost