Minitab | MinitabBlog posts and articles about using Minitab software in quality improvement projects, research, and more.
http://blog.minitab.com/blog/minitab/rss
Fri, 06 Mar 2015 03:53:30 +0000FeedCreator 1.7.3Understanding Hypothesis Tests: Why We Need to Use Hypothesis Tests in Statistics
http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-why-we-need-to-use-hypothesis-tests-in-statistics
<p>Hypothesis testing is an essential procedure in statistics. A hypothesis test evaluates two mutually exclusive statements about a population to determine which statement is best supported by the sample data. When we say that a finding is statistically significant, it’s thanks to a hypothesis test. How do these tests really work and what does statistical significance actually mean?</p>
<p>In this series of three posts, I’ll help you intuitively understand how hypothesis tests work by focusing on concepts and graphs rather than equations and numbers. After all, a key reason to use <a href="http://www.minitab.com/products/minitab/features/">statistical software </a>like Minitab is so you don’t get bogged down in the calculations and can instead focus on understanding your results.</p>
<p>To kick things off in this post, I highlight the rationale for using hypothesis tests with an example.</p>
The Scenario
<p>An economist wants to determine whether the monthly energy cost for families has changed from the previous year, when the mean cost per month was $260. The economist randomly samples 25 families and records their energy costs for the current year. (The data for this example is <a href="http://support.minitab.com/datasets/FamilyEnergyCost.MTW">FamilyEnergyCost</a> and it is just one of the many data set examples that can be found in <a href="http://support.minitab.com/datasets/">Minitab’s Data Set Library</a>.)</p>
<p><img alt="Descriptive statistics for family energy costs" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/135cd05bde9f7f16ab396a8525d2b09c/desc_stats.png" style="width: 302px; height: 87px;" /></p>
<p>I’ll use these descriptive statistics to create a probability distribution plot that shows you the importance of hypothesis tests. Read on!</p>
The Need for Hypothesis Tests
<p>Why do we even need hypothesis tests? After all, we took a random sample and our sample mean of 330.6 is different from 260. That <em>is</em> different, right? Unfortunately, the picture is muddied because we’re looking at a sample rather than the entire population.</p>
<p>Sampling error is the difference between a sample and the entire population. Thanks to sampling error, it’s entirely possible that while our <em>sample </em>mean is 330.6, the population mean could still be 260. Or, to put it another way, if we repeated the experiment, it’s possible that the second sample mean could be close to 260. A hypothesis test helps assess the likelihood of this possibility!</p>
Use the Sampling Distribution to See If Our Sample Mean is Unlikely
<p>For any given random sample, the mean of the sample almost certainly doesn’t equal the true mean of the population due to sampling error. For our example, it’s unlikely that the mean cost for the entire population is exactly 330.6. In fact, if we took multiple random samples of the same size from the same population, we could plot a distribution of the sample means.</p>
<p>A sampling distribution is the distribution of a statistic, such as the mean, that is obtained by repeatedly drawing a large number of samples from a specific population. This distribution allows you to determine the probability of obtaining the sample statistic.</p>
<p>Fortunately, I can create a plot of sample means without collecting many different random samples! Instead, I’ll create a probability distribution plot using the t-distribution, the sample size, and the <a href="http://blog.minitab.com/blog/adventures-in-statistics/assessing-variability-for-quality-improvement" target="_blank">variability</a> in our sample to graph the sampling distribution.</p>
<p>Our goal is to determine whether our sample mean is significantly different from the null hypothesis mean. Therefore, we’ll use the graph to see whether our sample mean of 330.6 is unlikely assuming that the population mean is 260. The graph below shows the expected distribution of sample means.</p>
<p><img alt="Sampling distribution plot for the null hypothesis" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/76699c4c1f2bd6c83b88c1ac8e93aa54/sampling_dist_null.png" style="width: 595px; height: 397px;" /></p>
<p>You can see that the most probable sample mean is 260, which makes sense because we’re assuming that the null hypothesis is true. However, there is a reasonable probability of obtaining a sample mean that ranges from 167 to 352, and even beyond! The takeaway from this graph is that while our sample mean of 330.6 is not the most probable, it’s also not outside the realm of possibility.</p>
The Role of Hypothesis Tests
<p>We’ve placed our sample mean in the context of all possible sample means while assuming that the null hypothesis is true. Are these results statistically significant?</p>
<p>As you can see, there is no magic place on the distribution curve to make this determination. Instead, we have a continual decrease in the probability of obtaining sample means that are further from the null hypothesis value. Where do we draw the line?</p>
<p>This is where hypothesis tests are useful. A hypothesis test allows us quantify the probability that our sample mean is unusual. In my next blog post, I’ll continue to use this graphical framework and add in the significance level and P value to show how hypothesis tests work and what statistical significance means.</p>
Data AnalysisHypothesis TestingStatisticsStatistics HelpStatsThu, 05 Mar 2015 13:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-why-we-need-to-use-hypothesis-tests-in-statisticsJim FrostHow to Be a Billionaire, Revealed by Pareto Charts
http://blog.minitab.com/blog/statistics-and-quality-improvement/how-to-be-a-billionaire-revealed-by-pareto-charts
<p>Forbes ranked the world’s billionaires for 2015 this week, which gives us a good opportunity to have fun with some data. After all, when you’re talking about billions, the most fun you can have is to see how big the number can get.</p>
<p><img alt="hundred dollar bills" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/512e332f22afe6ae5355cf00bc5b100d/1024px_money_cash.jpg" style="line-height: 20.7999992370605px; float: right; width: 200px; height: 133px; border-width: 1px; border-style: solid; margin: 10px 15px;" /></p>
<p>If you copy and paste Forbes’ data directly, you’ll find it a bit messy for analysis. For example, the sources of wealth for the billionaires include Telcom, Telecom, Telecom Equipment, Telecom Services, Telecommunications, and Telecoms. And that’s <em>after</em> <a href="http://support.minitab.com/en-us/minitab/17/topic-library/minitab-environment/calculator-and-matrices/text-calculator-functions/upper-lower-and-proper-functions/">you make sure that the capitalization is consistent</a>. My cleaning has not been too rigorous, but I think it’s enough to get started.</p>
Who to Work For
<p>Forbes ascribes the wealth most of the billionaires on the list to industries, but supplies company names for some of the more familiar brands. Among those that I recognized as brands, here are the companies that support the most billionaires:</p>
<p><img alt="Companies listed as the source of wealth for the most billionaire" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/95575268dfd72a79705757dab330f4ff/who_to_work_for.png" style="width: 577px; height: 385px;" /></p>
<p>Although Mark Zuckerberg is the face of Facebook, Facebook’s created 7 other billionaires in its lifespan. Of course, that’s still not as many as Cargill, but Cargill seems to have an unfair edge in terms of diversification. Cargill includes everything from Crisco Vegetable Oil to Black River Asset Management LLC among its products.</p>
Where to Work
<p>You might look to see where most of the billionaires are, but then you would probably see only that countries with larger populations have more billionaires than countries with smaller populations. A graph with harder-to-anticipate results comes when you look at the average wealth of the billionaires in a country.</p>
<p><img alt="Countries with the wealthiest average billionaires" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/44fca4e3ede62b6bcab47431d3a28a10/what_country_to_work_in.png" style="width: 577px; height: 385px;" /></p>
<p>If you expected to see the United States, China, and Russia, they’re absent from the Top 10 list of countries in terms of the mean wealth of billionaires in the country. Mexico, influenced heavily by the wealth of Carlos Helu, leads the list of countries where the mean wealth of billionaires is highest. Among these nations, France has the most billionaires with 46.</p>
What to Do
<p>Different countries are stronger in different industries. Forbes lists the source of wealth for four Finnish billionaires as “Elevators” and for three French billionaires as “Cheese.” But worldwide, there’s a clear winner.</p>
<p><img alt="Industries cited as the source of wealth for the most billionaires" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/fbaa2781235d503727b76bc8a36adcc7/what_job_to_get.png" style="width: 577px; height: 385px;" /></p>
<p>Employment numbers bode well for math and science jobs, but real estate is the industry that’s produced the most current billionaires. It’s not clear how many the Diversified category might add to any of the other sources, but I can imagine that many diversified billionaires make money from real estate and investments.</p>
Wrap Up
<p>So would being a realtor for Cargill in Mexico really put you on the path to being a billionaire? I’ll probably never be able to tell you from experience. But we can certainly dream.</p>
Bonus
<p>I used <a href="http://support.minitab.com/en-us/minitab/17/topic-library/quality-tools/quality-tools/pareto-chart-basics/">pareto charts</a> to show these data. If you're ready for more, check out<a href="http://blog.minitab.com/blog/understanding-statistics/explaining-quality-statistics-so-your-boss-will-understand-pareto-charts"> how to explain quality statistics so your boss will understand</a>.</p>
<p>The image of cash is by <a href="http://2bgr8stock.deviantart.com/art/Money-Cash-113445826">Amanda</a> and is licensed under this <a href="http://creativecommons.org/licenses/by/3.0/deed.en">Creative Commons license</a>.</p>
Statistics in the NewsWed, 04 Mar 2015 18:04:00 +0000http://blog.minitab.com/blog/statistics-and-quality-improvement/how-to-be-a-billionaire-revealed-by-pareto-chartsCody SteeleThe Statistical Saga of Baby’s Weight
http://blog.minitab.com/blog/real-world-quality-improvement/the-statistical-saga-of-baby%E2%80%99s-weight
<p>Many things have shocked me since having my first baby back in August. I didn’t think it was possible to be so tired that it actually <em>hurt</em>, and I also didn’t think that changing 10+ diapers a day would actually be the norm (or that needing to perform 10+ outfit changes was even possible, let alone necessary). I also didn’t think that we’d fall in love so hard with the little guy. What a wonderful, rewarding experience it is to be a parent!</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/8d0131cf08c4d15fcede44bc26a4e091/photo_w1024.jpeg" style="line-height: 20.7999992370605px; float: right; width: 300px; height: 400px; border-width: 1px; border-style: solid; margin: 10px 15px;" /></p>
<p>That’s enough mushy talk for now. Let’s get back to the surprises involved in having a newborn. Another shock we experienced those first few days stemmed from the weight loss our son experienced. I certainly didn’t imagine that my <em>perfect </em>newborn would lose so much weight those first couple of days! <span style="line-height: 20.7999992370605px;">After all, he was born at a very healthy 8 pounds 3 ounces, and I was doing all I could those first couple of days to ensure he was fed every 2 hours, on the dot. </span><span style="line-height: 1.6;">I didn’t know that newborn weight loss was </span><span style="line-height: 20.7999992370605px;">even</span><span style="line-height: 20.7999992370605px;"> </span><span style="line-height: 1.6;">a thing, let alone a very common thing. </span></p>
<p>Here’s where things get cloudy and pretty crazy (please be sure to imagine my <em>very</em> ugly cry here, due to the aforementioned sleep deprivation). We took our son to his first doctor’s appointment a few days after his birthday, which included a weight check. According to the doctor, things weren’t looking good and he had lost “too much” weight. Our pediatrician followed what is known as the “10 percent rule of thumb” for breastfed babies, which basically means that a 7-10 percent weight loss after birth is considered normal. Our son had 12 ounces of weight loss, or about 9.2 percent of his total weight—the higher end of “normal.” But in my sleep-deprived mind, that 12 ounces became more than 1 pound of lost weight, and I was calling in all the troops to assess what was going wrong.</p>
<p>I only wish that one of the troops I called on had been this cool newborn weight tool, known as <a href="https://www.newbornweight.org/" target="_blank">NEWT</a>. Folks at the Penn State College of Medicine and the Penn State Hershey Children’s Hospital developed a “growth chart” for infant weight loss in the first few days of a baby’s life that mimics the percentile-approach commonly used by pediatricians for plotting the height, weight, and head circumference of children. (Before making the tool, the doctors knew they needed a large set of data for NEWT to be statistically sound. You can read more about how they got this data and implemented NEWT <a href="http://lancasteronline.com/news/local/newborn-weight-tool-developed-at-penn-state-hershey-medical-center/article_3211c9b6-7748-11e4-baff-b3f3db99957d.html" target="_blank">here</a>.)</p>
<p>Let’s take a look at where his weight loss fell on NEWT’s continuum:</p>
<p><img src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/047231a81219b9fd81afa1601ed97a3d/newt_w1024.jpeg" style="border-width: 0px; border-style: solid; width: 584px; height: 288px;" /></p>
<p>Now, I can definitely see our doctor’s cause for concern. After all, according to NEWT, results that tend toward higher percentile levels may provide early identification of adverse weight loss conditions. Our son's weight loss at about 61 hours after birth (see the light blue dot) fell just outside the 75th percentile.</p>
<p>However, since our son is a breastfed baby, his weight loss of 9.2% at three days old was still considered normal by most pediatricians, albeit on the higher end of being normal (which NEWT also shows nicely). The doctors who created NEWT brought up a good point in the <a href="http://lancasteronline.com/news/local/newborn-weight-tool-developed-at-penn-state-hershey-medical-center/article_3211c9b6-7748-11e4-baff-b3f3db99957d.html" target="_blank">article</a> regarding the “10 percent rule of thumb”: a weight loss of 10 percent can matter a lot, or not all, depending on <em>when</em> and at <em>what rate</em> it occurs.</p>
<p>But…at 3 days postpartum, I was <em>convinced</em> I heard the doctor say our son had lost 16 ounces of weight, which equates to a much scarier 12.2% weight loss. <em>Yikes!</em> Sleep deprivation does crazy things to people. Like most first-time parents, I wanted my baby to be, above all things, healthy and <em>normal</em>. The 12.2% weight loss my tired brain had fabricated wasn’t normal, but his actual weight loss (9.2%) wasn’t far from normal at all. </p>
<p>This all ended quite well, as two days later we headed back to the doctor for another weight check, and our son ended up gaining a whopping 9 ounces—putting his weight almost back to his birth weight. Our doctor likes to see breastfed babies reach their birth weight again about one week after their birthday. So we were right on track!</p>
<p>Since his weight has been a sore spot for me, I’ve been charting it using a <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/graphs/graphs-of-time-series/time-series-plots/time-series-plot/" target="_blank">Time Series Plot</a> in Minitab in time increments that have followed his doctor appointment schedule (2 weeks, 2 months, 4 months, etc.):</p>
<p><img src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/2999bb01b7a15e494137f542b32313a1/time_series_plot_w1024.jpeg" style="border-width: 0px; border-style: solid; width: 624px; height: 415px;" /></p>
<p>Moving past his initial newborn weight loss, I’m monitoring for no dips moving forward, and hoping for a steady climb. You can see that the little guy has been doing just fine gaining weight so far, and we may even want to call him the “big” guy now!</p>
<p>As a parent, I’m very thankful for statistics and statistical tools like NEWT and Minitab!</p>
Data AnalysisStatisticsStatistics in the NewsStatsTue, 03 Mar 2015 13:00:00 +0000http://blog.minitab.com/blog/real-world-quality-improvement/the-statistical-saga-of-baby%E2%80%99s-weightCarly BarryThe Falling Child Project : Using Binary Logistic Regression to Predict Borewell Rescue Success
http://blog.minitab.com/blog/statistics-in-the-field/the-falling-child-project-%3A-using-binary-logistic-regression-to-predict-borewell-rescue-success
<p><em>by Lion "Ari" Ondiappan Arivazhagan, guest blogger. </em></p>
<p>An alarming number of borewell accidents, especially involving little children, have occurred across India in the recent past. This is the second of a series of articles on Borewell accidents in India. In the first installment of the series, I used the <a href="http://blog.minitab.com/blog/statistics-in-the-field/using-the-g-chart-control-chart-for-rare-events-to-predict-borewell-accidents">G-chart in Minitab</a> Statistical Software to predict the probabilities of innocent children falling into open borewells, which are sunk by farmers for agricultural and drinking water, while playing in the fields.</p>
<p>In this article, I will use the power of predictive analytics to predict the probability of successfully rescuing a trapped child based on the inputs of the child's age and gender using <a href="http://blog.minitab.com/blog/fun-with-statistics/analyzing-titanic-survival-rates-part-ii-v1">Binary Logistic Regression</a>.</p>
<p>In Minitab, we can use <strong>Stat > Regression > Binary Logistic Regression</strong> to create models when the response of interest (Rescue, in this case) is <em>binary</em> and only takes two values: successful or unsuccessful. </p>
<p>Borewell accidents data collected and provided by The Falling Child Project (www.fallingchild.org), a non-governmental organization (NGO) based in the United States, has been used for this predictive analysis.</p>
<p>Part of the raw data provided by the NGO is shown Table 1 below. A total of 62 borewell accident cases in India have been documented from 2001 to January 2015.</p>
<p><img alt="data" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/07bbd31b1482ae1d91c814a04c9ffe2d/170c6d6_1_.jpg" style="width: 461px; height: 285px;" /></p>
<p>As part of the analysis, Minitab will predict probabilities for the events you are interested in, based on your model. The predicted probabilities for unsuccessful events versus the Predicted Age and Predicted Gender are shown in the scatterplot below.</p>
<p><img alt="scatterplot of events" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/836e2dfc2fc0061df235fc2bfd594b41/30987c8_1_.png" style="width: 577px; height: 385px;" /></p>
<p>We can predict, with 70% confidence, that the probability of unsuccessful rescue is 15% higher for a male child of age 2 than that for a female child of same age. However, it is surprising to note that above age 5, girls have about a 10 % higher chance of an unsuccessful rescue attempt than boys.</p>
<p><img alt="output" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/34a90bfdffb4d41a10d71a9770c87622/37a7c0c_1_.jpg" style="width: 512px; height: 288px;" /></p>
<p>I should note that one outlier, a male of age 60, was replaced with a male of age 6 to reduce the unnecessary effect of outlier on the whole analysis / output.</p>
<strong>Inferences</strong>
<p>From the Binary Logistic Regression analysis above, we can predict that boys of age 5 and above have a greater chance of being successfully rescued than do girls of the same age. Although the analysis indicates a P-value of 0.736,hinting that there is not much of interaction between the age of the child and its gender in predicted probabilities, the over all model's P-Value is reasonable at 0.291, hinting at a moderate 70% confidence level in the model.</p>
<p>However, the scatter plot of predicted probabilities shown above paints a different picture. The age 5 seems to be critical age beyond which girls have lesser chances of being rescued alive than boys do.</p>
<p>My goal in performing this analysis and sharing my findings is to be helpful to the rescue teams that plan these rescue efforts, so that they can increase the chances of successfully rescuing every trapped child, boy or girl.</p>
<p> </p>
<p><strong>About the Guest Blogger:</strong></p>
<p><em>Ondiappan "Ari" Arivazhagan is an honors graduate in civil / structural engineering from the University of Madras. He is a certified PMP, PMI-SP, and PMI-RMP from the Project Management Institute. He is also a Master Black Belt in Lean Six Sigma and has done Business Analytics from IIM, Bangalore. He has 30 years of professional global project management experience in various countries and has almost 14 years of teaching / training experience in project management, analytics, risk management, and Lean Six Sigma. He is the Founder-CEO of International Institute of Project Management (IIPM), Chennai, and can be reached at <a href="mailto:askari@iipmchennai.com?subject=Minitab%20Blog%20Reader" style="box-sizing: border-box; color: rgb(66, 139, 202); text-decoration: none; background: 0px 0px;" target="_blank">askari@iipmchennai.com</a>.</em></p>
<p><em>An earlier version of this article was published on LinkedIn.</em></p>
<p> </p>
Regression AnalysisStatistics in the NewsMon, 02 Mar 2015 13:00:00 +0000http://blog.minitab.com/blog/statistics-in-the-field/the-falling-child-project-%3A-using-binary-logistic-regression-to-predict-borewell-rescue-successGuest BloggerHow Good is Kentucky…Really?
http://blog.minitab.com/blog/the-statistics-game/how-good-is-kentucky-really
<p><img alt="UK" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/3f0d236982326c8fbb4f7eb2c72915e2/uk.png" style="float: right; width: 272px; height: 200px; margin: 10px 15px;" />As I’m sure you’ve heard by now, Kentucky is really good at basketball. They're the only team in the country without a loss, and they have a realistic shot at becoming to first team to win the championship with an undefeated record since the 1976 Indiana Hoosiers. Under any ranking system you want to use, Kentucky is clearly the #1 team in college basketball.</p>
<p>Well, <em>almost</em> any ranking system.</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/0f87a651e7e20fd8843b9a59643963d0/rpi_1.jpg" style="line-height: 1.6; width: 468px; height: 46px;" /></p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/5e24c7817dc7414a3a51a7e4a38b04b4/rpi_2.jpg" style="width: 366px; height: 90px;" /></p>
<p>All right, I have to confess that those rankings are from February 23. But still, Kansas had to lose <em>six</em> games before Kentucky finally moved ahead of them in the RPI. (Just a friendly reminder to ignore the RPI when filling out your brackets in March.)</p>
<p>Back to the question at hand: Kentucky is really good, but just how much better are they than top-ranked teams from previous years? To answer this question, I’ll use Minitab <a href="http://www.minitab.com/products/minitab">Statistical Software</a> to dig into the Pythagorean rating from <a href="http://kenpom.com/" target="_blank">kenpom.com</a>. Basically, this rating estimates what a team’s winning percentage "should" be based on the number of points they scored and allowed. Currently Kentucky’s Pythagorean rating is 0.9794. Last year, the #1 ranked team in the Pomeroy Ratings (Louisville) had a Pythagorean rating of 0.952. So even though both teams were ranked #1, we see that Kentucky is better due to the higher rating.</p>
Comparing Kentucky to Previous #1 Teams
<p>So how does Kentucky stack up to previous teams? I took the top ranked team in the Pomeroy Ratings for every year since 2002 (since that’s as far back as the ratings go). I also took the ratings <em>before</em> the NCAA tournament, to best represent the point in the season that Kentucky is currently at.</p>
<p><img alt="IVP" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/a374e238ff8aa9d88de446546c6f61db/individual_value_plot_of_pythagorean_rating.jpg" style="width: 650px; height: 433px;" /></p>
<p>The <a href="http://blog.minitab.com/blog/real-world-quality-improvement/three-ways-individual-value-plots-can-help-you-analyze-data">individual value plot</a> above makes it plain how much higher Kentucky’s rating is than #1 teams in previous years. In fact, from 2002-2014, the #1 ranked team in the Pomeroy Ratings had an average rating of 0.9614 with a standard deviation of 0.0084. That makes Kentucky’s rating more than 2 standard deviations higher than the average #1 team. Impressive.</p>
How Will it Affect their Odds of Winning the Tournament?
<p>The great thing about the Pythagorean ratings is that you can use them to calculate the probability one team has of beating another. So let’s see how different ratings change the probability of Kentucky going on a hypothetical run through the NCAA tournament. I noted where Kentucky is in the latest Joe Lunardi mock bracket, and obtained the Pythagorean ratings of the 6 teams they would have to face (assuming the higher seed advanced in each round). Then I calculated the probability of beating each team with a rating equal to the average #1 team, the previous high rating (Illinois in 2005), and the rating Kentucky currently has.</p>
<p align="center"><strong>Opponent</strong></p>
<p align="center"><strong>Average #1 Team</strong></p>
<p align="center"><strong>Illinois 2005</strong></p>
<p align="center"><strong>2015 Kentucky</strong></p>
<p align="center">Sacramento St</p>
<p align="center">97%</p>
<p align="center">98%</p>
<p align="center">98%</p>
<p align="center">Ohio St</p>
<p align="center">76%</p>
<p align="center">81%</p>
<p align="center">86%</p>
<p align="center">Notre Dame</p>
<p align="center">78%</p>
<p align="center">83%</p>
<p align="center">87%</p>
<p align="center">Wisconsin</p>
<p align="center">58%</p>
<p align="center">65%</p>
<p align="center">73%</p>
<p align="center">Gonzaga</p>
<p align="center">57%</p>
<p align="center">64%</p>
<p align="center">71%</p>
<p align="center">Virginia</p>
<p align="center">47%</p>
<p align="center">54%</p>
<p align="center">63%</p>
<p align="center">Win the Championship</p>
<p align="center">9%</p>
<p align="center">15%</p>
<p align="center">24%</p>
<p>Kentucky’s chances of winning the championship are 15 percentage points higher than the average #1 team, and 9 percentage points higher than the team that previously had the highest Pythagorean rating. But you’ll notice that their overall chance of winning is still only 1 out of 4...pretty low for what could be the greatest team ever. Of course, part of this is because I simply advanced the highest seed in each game, and that ended up being a brutal path. After Sacramento State, the remaining 5 teams are all in the Pomeroy Top 20 with 3 of them being in the top 6! And Virginia is so good they would actually be favored against the average #1 team!</p>
<p>But we know that upsets happen in the tournament. So what would their probability look like with a slightly easier path? Let’s take the teams #1 seed Florida would have had to beat to win the 2014 tournament. The ratings for our 6 opponents come from the final 2014 Pomeroy Ratings.</p>
<p align="center"><strong>Opponent</strong></p>
<p align="center"><strong>Average #1 Team</strong></p>
<p align="center"><strong>Illinois 2005</strong></p>
<p align="center"><strong>2015 Kentucky</strong></p>
<p align="center">Albany</p>
<p align="center">97%</p>
<p align="center">97%</p>
<p align="center">98%</p>
<p align="center">Pitt</p>
<p align="center">76%</p>
<p align="center">82%</p>
<p align="center">86%</p>
<p align="center">UCLA</p>
<p align="center">75%</p>
<p align="center">80%</p>
<p align="center">85%</p>
<p align="center">Dayton</p>
<p align="center">86%</p>
<p align="center">89%</p>
<p align="center">92%</p>
<p align="center">Connecticut</p>
<p align="center">71%</p>
<p align="center">77%</p>
<p align="center">83%</p>
<p align="center">Kentucky</p>
<p align="center">73%</p>
<p align="center">79%</p>
<p align="center">84%</p>
<p align="center">Win the Championship</p>
<p align="center">25%</p>
<p align="center">35%</p>
<p align="center">46%</p>
<p>Against an easier tournament run, Kentucky’s chances of winning are 21 percentage points greater than the average #1 team. That's huge! Kentucky will most likely be the biggest favorite ever in this year's NCAA tournament. But even against weaker competition (only 1 of these 6 teams finished in the Pomeroy Top 10, and they that team was only #8) Kentucky <em>still </em>has slightly less than a 50% chance of winning the championship. And that’s with their probability of winning each individual at 83% or higher! </p>
<p>This just shows how hard it is to win 6 straight games in a single elimination tournament. And Kentucky’s path might look a little closer to the first table. Because……..well……..</p>
Kentucky Has Company
<p>Remember how Virginia would actually be favored against the average #1 team? That’s because despite being ranked #2 behind Kentucky, they are a very strong team. Not only is their Pythagorean rating higher than every other #2 ranked team from 2002-2014, it’s higher than 7 of the 13 #1-ranked teams!</p>
<p>I decided to look at every team in the top 10. I collected the ratings of the top 10 teams in the Pomeroy ratings (right before the NCAA tournament) from 2002-2014. For each ranking (1 through 10) I calculated the average rating, the third quartile, and the highest rating. For example, from 2002-2014 the average rating for the #2 ranked team was 0.9522, the third quartile was 0.9568, and the highest was 0.9589.</p>
<p>I then took the ratings of the current teams ranked in the Pomeroy top 10, and subtracted the values of teams from previous years. Virginia is currently the #2 ranked team with a rating of 0.9661. Since their rating is the highest of any #2 ranked team to come before them, their difference will be positive for the mean, third quartile, and the highest. Here are the results for the entire Pomeroy top 10, displayed in an Individual Value Plot created in <a href="http://www.minitab.com/products/minitab">Minitab Statistical Software</a>:</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/e66a11786b4271716b0014058fb8fb4d/ivp__.jpg" style="width: 650px; height: 433px;" /></p>
<p>Every team currently in the Pomeroy top 10 has a rating higher than the average for their ranking. Oklahoma (#9) is the only team with a rating less than third quartile for every other 9th ranked team. And, shockingly, 8 of the 10 teams in the top 10 have the highest rating of any similarly ranked team to come before them. The top-ranked teams are stacked. If you’re hoping this is the year that a 16 seed beats a 1 seed, don’t hold your breath.</p>
<p>Kentucky will still be an overwhelming favorite, but the data indicate that even against weaker teams their chances of winning would still be lower than 50%...and Kentucky’s top contenders this year are anything but weak. So don't think this is going going to be a cakewalk for the Wildcats.</p>
<p>After all, they don't call it March Madness for nothing.</p>
Data AnalysisFun StatisticsStatisticsStatistics in the NewsFri, 27 Feb 2015 15:54:00 +0000http://blog.minitab.com/blog/the-statistics-game/how-good-is-kentucky-reallyKevin RudyCreating a New Metric with Gage R&R, part 2
http://blog.minitab.com/blog/understanding-statistics/creating-a-new-metric-with-gage-rr-part-2
<p style="line-height: 20.7999992370605px;">In my previous post, I showed you <a href="http://blog.minitab.com/blog/understanding-statistics/creating-a-new-metric-with-gage-rr-part-1">how to set up data collection for a gage R&R analysis</a> using the Assistant in Minitab 17. In this case, the goal of the gage R&R study is to test whether a new tool provides an effective metric for assessing resident supervision in a medical facility. </p>
<p style="line-height: 20.7999992370605px;"><span style="line-height: 20.7999992370605px;">As noted in that post, I'm drawing on one of my favorite bloggers about health care quality, David Kashmer of the Business Model Innovation in Surgery blog, and specifically his</span><span style="line-height: 20.7999992370605px;"> column "</span><a href="http://www.surgicalbusinessmodelinnovation.com/statistical-process-control/how-to-measure-a-process-when-theres-no-metric/" style="line-height: 20.7999992370605px;" target="_blank">How to Measure a Process When There's No Metric</a><span style="line-height: 20.7999992370605px;">." </span></p>
An Effective Measure of Resident Supervision?
<p style="line-height: 20.7999992370605px;">In one scenario Kashmer presents, state regulators and hospital staff disagree about a health system's ability to oversee residents. In the absence of an established way to measure resident <span style="line-height: 20.7999992370605px;">supervision</span><span style="line-height: 20.7999992370605px;">, the staff devises a tool that uses a 0 to 10 scale to rate resident supervision. </span></p>
<p style="line-height: 20.7999992370605px;">Now we're going to analyze the Gage R&R data to test how effectively and reliably the new tool <span style="line-height: 20.7999992370605px;">measures what we want it to measure</span><span style="line-height: 20.7999992370605px;">. The analysis will evaluate whether different people who use the tool </span><span style="line-height: 20.7999992370605px;">(the gauge)</span><span style="line-height: 20.7999992370605px;"> </span><span style="line-height: 20.7999992370605px;">reach the same conclusion (reproducibility) and do it consistently (repeatability). </span></p>
<p style="line-height: 20.7999992370605px;">To get data, three evaluators used the tool to assess each of 20 charts three times each, and recorded their score for each chart in the worksheet we produced earlier. (You can download the completed worksheet <a href="//cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/File/02131d16de689b5864576174e86da023/gage_resident_supervision.MTW">here</a> if you're following along in Minitab.) </p>
<p style="line-height: 20.7999992370605px;"><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/96cfa33c2344135d665b93e2c637b017/data_sheet.gif" style="width: 332px; height: 381px;" /></p>
<p>Now we're ready to analyze the data. </p>
Evaluating the Ability to Measure Accurately
<p style="line-height: 20.7999992370605px;">Once again, we can turn to the Assistant in Minitab Statistical Software to help us. If you're not already using it, your can <a href="http://it.minitab.com/products/minitab/free-trial.aspx">download a 30-day trial version</a> for free so you can follow along. Start by selecting <strong>Assistant > Measurement Systems Analysis...</strong> from the menu: </p>
<p style="line-height: 20.7999992370605px;"><img alt="measurement systems analysis " src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/10b2080fd1ed8b3e1337e7838fd85313/assistant_msa.gif" style="width: 345px; height: 258px;" /></p>
<p style="line-height: 20.7999992370605px;">In my earlier post, we used the Assistant to set up this study and make it easy to collect the data we need. Now that we've gathered the data, we can follow the Assistant's decision tree to the "Analyze Data" option. </p>
<p style="line-height: 20.7999992370605px;"><img alt="measurement systems analysis decision tree for analysis" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f5fd33e7322a966327a9a5dc659294b6/gage_dialog_analyze.gif" style="width: 600px; height: 450px;" /></p>
<p style="line-height: 20.7999992370605px;">Selecting the right items for the Assistant's Gage R&R dialog box couldn't be easier—when you use the datasheet the Assistant generated, just enter "Operators" for Operators, "Parts" for Parts, and "Score" for Measurements. </p>
<p style="line-height: 20.7999992370605px;"><img alt="gage R&R analysis dialog box" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/b65b3a39837f9875c9748c46d12498f8/grnr_analysis_dialog.png" style="line-height: 20.7999992370605px; width: 600px; height: 397px;" /></p>
<p style="line-height: 20.7999992370605px;"><span style="line-height: 20.7999992370605px;">Before we press OK, though, we need to tell the Assistant how to estimate process variation. When Gage R&R is performed in a manufacturing context, historic data about the amount of variation in the output of the process being studied is usually available. Since this is the first time we're analyzing the performance of the new tool for measuring the quality of resident supervision, we don't have an historical standard deviation</span><span style="line-height: 20.7999992370605px;">, so </span><span style="line-height: 20.7999992370605px;">we will tell the Assistant to estimate the variation from the data we're analyzing. </span></p>
<p style="line-height: 20.7999992370605px;"><span style="line-height: 20.7999992370605px;"><img alt="gage r&r variation calculation options" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/0e55e7fca7d27eeaa22484359e996a8a/grnr_analysis_variation.png" style="width: 546px; height: 110px;" /></span></p>
<p style="line-height: 20.7999992370605px;"><span style="line-height: 20.7999992370605px;">The Assistant also asks for an upper or lower specification limit, or tolerance width</span><span style="line-height: 20.7999992370605px;">, which is the distance from the upper spec limit to the lower spec limit</span><span style="line-height: 20.7999992370605px;">. Minitab uses this to calculate %Tolerance, an optional statistic used to determine whether the measurement system can adequately sort good from bad parts—or in this case, good from bad supervision. For the sake of this example, let's say in designing the instrument you have selected a level of 5.0 as the minimum acceptable score. </span></p>
<p style="line-height: 20.7999992370605px;"><span style="line-height: 20.7999992370605px;"><img alt="gage r and r process tolerance" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/81c00cbc50239082a77d7e3e4c522afe/grnr_analysis_dialog_tolerance.png" style="width: 536px; height: 178px;" /> </span></p>
<p style="line-height: 20.7999992370605px;">When we press OK, the Assistant analyzes the data and presents a Summary Report, a Variation Report, and a Report Card for its analysis. The Summary Report gives us the bottom line about how well the new measurement system works. </p>
<p style="line-height: 20.7999992370605px;">The first item we see is a bar graph that answers the question, "Can you adequately assess process performance?" The Assistant's analysis of the data tells us that the system we're using to measure patient supervision can indeed assess the <span style="line-height: 20.7999992370605px;">resident supervision </span><span style="line-height: 20.7999992370605px;">process. </span></p>
<p style="line-height: 20.7999992370605px;"><img alt="gage R&R summary" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/36cd05b806ba3399849a5a7f562b6892/gage_r_r_summary_report.png" style="width: 600px; height: 471px;" /></p>
<p style="line-height: 20.7999992370605px;">The second bar graph answers the question "Can you sort good parts from bad?" In this case, we're evaluating patient supervision rather than parts, but the Analysis shows that the system is able to distinguish charts that indicate acceptable resident supervision from those that do not. </p>
<p style="line-height: 20.7999992370605px;">For both of these charts, less than 10% of the observed variation in the data could be attributed to the measurement system itself—a very good result.</p>
Measuring the "Unmeasurable"
<p style="line-height: 20.7999992370605px;">I can't count the number of times I've heard people say that they can't gather or analyze data about a situation because "it can't be measured." In most cases, that's just not true. Where a factor of interest—"service quality," say—is tough to measure <em>directly</em>, we can usually find measurable indicator variables that can at least give us some insight into our performance. </p>
<p style="line-height: 20.7999992370605px;">I hope this example, though simplified from what you're likely to encounter in the real world, shows how it's possible to demonstrate the effectiveness of a measurement system when one doesn't already exist. Even for outcomes that seem hard to quantify, we can create measurement systems to give us valuable data, which we can then use to make improvements. </p>
<p style="line-height: 20.7999992370605px;">What kinds of outcomes would you like to be able to measure in your profession? Could you use Gage R&R or another form of measurement system analysis to get started? </p>
<p style="line-height: 20.7999992370605px;"> </p>
Data AnalysisQuality ImprovementStatisticsStatistics HelpThu, 26 Feb 2015 13:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/creating-a-new-metric-with-gage-rr-part-2Eston MartzCreating a New Metric with Gage R&R, part 1
http://blog.minitab.com/blog/understanding-statistics/creating-a-new-metric-with-gage-rr-part-1
<p>One of my favorite bloggers about the application of statistics in health care is David Kashmer, an MD and MBA who runs and writes for the <a href="http://www.surgicalbusinessmodelinnovation.com/" target="_blank">Business Model Innovation in Surgery</a> blog. If you have an interest in how quality improvement methods like Lean and Six Sigma can be applied to healthcare, check it out. </p>
<p>A while back, Dr. Kashmer penned a column called "<a href="http://www.surgicalbusinessmodelinnovation.com/statistical-process-control/how-to-measure-a-process-when-theres-no-metric/" target="_blank">How to Measure a Process When There's No Metric</a>," in which he discusses how you can use the measurement systems analysis method called Gage R&R (or gauge R&R) to create your own measurement tools and validate them as useful metrics. (I select the term “useful” here deliberately: a metric you’ve devised could be very <em>useful </em>in helping you assess your situation, but might not meet requirements set by agencies, auditors, or other concerned parties.) </p>
<p>I thought I would use this post to show you how you can use the Assistant in Minitab Statistical Software to <span style="line-height: 20.7999992370605px;">do this</span><span style="line-height: 1.6;">.</span></p>
How Well Are You Supervising Residents?
<p>Kashmer posits a scenario in which state regulators assert that your health system's ability to oversee residents is poor, but your team believes residents are well supervised. You want to assess the situation with data, but you lack an established way to measure the quality of resident supervision. What to do?</p>
<p>Kashmer says, "You decide to design a tool for your organization. You pull a sample of charts and look for commonalities that seem to display excellent supervision versus poor supervision."</p>
<p>So you work with your team to come up with a tool that uses a 0 to 10 scale to rate resident supervision<span style="line-height: 20.7999992370605px;">, based on various factors appearing on a chart</span>. But how do you know if the tool will actually help you assess the quality of resident supervision? </p>
<p>This is where gage R&R comes in. The gage refers to the tool or instrument you're testing, and the R&R stands for reproducibility and repeatability. The analysis will tell you whether different people who use your tool to assess resident supervision (the gauge) will reach the same conclusion (reproducibility) and do it consistently (repeatability). </p>
Collecting Data to Evaluate the Ability to Measure Accurately
<p>We're going to use the Assistant in Minitab Statistical Software to help us. If you're not already using it, you can <a href="http://it.minitab.com/products/minitab/free-trial.aspx">download a 30-day trial version</a> for free so you can follow along. Start by selecting <strong>Assistant > Measurement Systems Analysis...</strong> from the menu: </p>
<p><img alt="measurement systems analysis " src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/10b2080fd1ed8b3e1337e7838fd85313/assistant_msa.gif" style="width: 345px; height: 258px;" /></p>
<p>Follow the decision tree...</p>
<p><img alt="measurement systems analysis decision tree" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/5f5055a500745c69c183056582dc41a6/msa_decision_tree.gif" style="width: 600px; height: 450px;" /></p>
<p>If you're not sure about what you need to do in a gage R&R, clicking the <strong><em>more...</em></strong> link gives you requirements, assumptions, and guidelines to follow: </p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/62bf793d1ea0e4bbbdad53ffb70783e5/gager_rassumptions.gif" style="width: 600px; height: 454px;" /></p>
<p>After a look at the requirements, you decide you will have three evaluators use your new tool to assess each of 20 charts 3 times, and so you complete the dialog box thus: </p>
<p><img alt="MSA dialog box" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/067ec3e5997c68e845d4061a8251b862/msa_dialog.gif" style="width: 500px; height: 401px;" /></p>
<p style="line-height: 20.7999992370605px;">When you press "OK," the Assistant asks if you'd like to print worksheets you can use to easily gather your data:</p>
<p><img alt="gage R&R data collection form" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ffb7215f1d123d17cca1eaa3c11211d8/msa_gage_r_r_data_collection_form.gif" style="line-height: 20.7999992370605px; width: 400px; height: 430px;" /></p>
<p>Minitab also creates a datasheet for the analysis. All you need to do is enter the data you collect in the "Measurements" column:</p>
<p><img alt="worksheet" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/cfd21a76a9cab42059705c67e54e5fdc/gage_r_r_worksheet.gif" style="line-height: 20.7999992370605px; width: 355px; height: 357px;" /></p>
<p>Note that the Assistant automatically randomizes the order in which each evaluator will examine the charts in each of their three judging sessions. </p>
<p>Now we're ready to gather the data to verify the effectiveness of our new metric for assessing the quality of patient supervision. Come back for Part 2, where we'll <a href="http://blog.minitab.com/blog/understanding-statistics/creating-a-new-metric-with-gage-rr-part-2">analyze the collected data</a>! </p>
Health Care Quality ImprovementLean Six SigmaSix SigmaWed, 25 Feb 2015 13:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/creating-a-new-metric-with-gage-rr-part-1Eston MartzGage Linearity and Bias: Wake Up and Smell Your Measuring System
http://blog.minitab.com/blog/statistics-and-quality-data-analysis/gage-linearity-and-bias%3A-wake-up-and-smell-your-measuring-system
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/b074146c1d3c9fde7367970e1220eb76/extra_large_coffee_cup.jpg" style="float: right; width: 250px; height: 251px; border-width: 1px; border-style: solid; margin: 10px 15px;" />Right now I’m enjoying my daily dose of morning joe. As the steam rises off the cup, the dark rich liquid triggers a powerful enzyme cascade that jump-starts my brain and central nervous system, delivering potent glints of perspicacity into the dark crevices of my still-dormant consciousness.</p>
<p>Feels good, yeah! But is it good for me? Let’s see what the studies say…</p>
<ul>
<li>Drinking more than 4 cups of coffee per day is associated with a <a href="http://www.ncbi.nlm.nih.gov/pubmed/22591295" target="_blank">higher risk of death from all causes</a></li>
<li>Drinking coffee is <em>inversely</em> associated with the mortality risk, with those drinking 4 cups a day having the <a href="http://www.ncbi.nlm.nih.gov/pubmed/25156996" target="_blank">lowest risk of death from all causes</a></li>
<li>Drinking 2 to 4 cups of coffee a day is associated with a <a href="http://www.ncbi.nlm.nih.gov/pubmed/22422331" target="_blank">higher risk of cardiovascular disease</a></li>
<li>Drinking 3.5 cups of coffee per day is associated with a <a href="http://www.ncbi.nlm.nih.gov/pubmed/24201300" target="_blank">lower risk of cardiovascular disease</a></li>
</ul>
<p>Hmm. These are just a few results from copious studies on coffee consumption. But already I'm having a hard time processing the information.</p>
<p>Maybe another cup of coffee would help. Er...uh...maybe not.</p>
The pivotal question you should ask before you perform any analysis
<p>There are a host of possible explanations that might help explain these seemingly contradictory study results.</p>
<p>Perhaps the studies utilized different study designs, different statistical methodologies, different survey techniques, different confounding variables, different clinical endpoints, or different populations. Perhaps the physiological effects of coffee are modulated by the dynamic interplay of a complex array of biomechanisms that are differently triggered in each individual based on their unique, dynamic phenotype-genotype profiles.</p>
<p>Or perhaps...just perhaps...there's something even more fundamental at play. The proverbial elephant in the room of any statistical analysis. The essential, pivotal question upon which all your results rest...</p>
<p><em>"What am I measuring? And how well am I actually measuring what I think I'm measuring?"</em></p>
Measurement system analysis helps ensure that your study isn't doomed from the start.
<p>A <span><a href="http://blog.minitab.com/blog/understanding-statistics/explaining-quality-statistics-so-my-boss-will-understand-measurement-systems-analysis-msa">measurement systems analysis (MSA)</a></span> evaluates the consistency and accuracy of a measuring system. MSA helps you determine whether you can trust your data <em>before</em> you use a statistical analysis to identify trends and patterns, test hypotheses, or make other general inferences.</p>
<p>MSA is frequently used for quality control in the manufacturing industry. In that context, the measuring system typically includes the data collection procedures, the tools and equipment used to measure (the "gage"), and the operators who measure.</p>
<p>Coffee consumption studies don't employ a conventional measuring system. Often, they rely on self-reported data from people who answer questionnaires about their life-style habits, such as "How many cups of coffee do you drink in a typical day?" So the measuring "system," loosely speaking, is every respondent who estimates the number of cups they drink. Despite this, could MSA uncover potential issues with measurements collected from such a survey? </p>
<p><strong>Caveat:</strong> What follows is an exploratory exercise performed with small set of nonrandom data for illustrative purposes only. To see standard MSA scenarios and examples, including sample data sets, go to the Minitab's <a href="http://support.minitab.com/en-us/datasets" target="_blank">online dataset library</a> and select the category <em>Measurement systems analysis</em>.</p>
<img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/a0bfc9c6ba82871132fcf40fa785416e/cups.jpg" style="width: 200px; height: 149px; float: right; border-width: 1px; border-style: solid; margin: 10px 15px;" />Gage Linearity and Bias: "Houston, we have a problem..."
<p>For this experiment (I can't call it a study), I collected different coffee cups in the cupboard of our department lunchroom (see image at right). Then I poured different amounts of liquid into each cup and and asked people to tell me how full the cup was. The actual amount of liquid was 0.50 cup, 0.75 cup, or 1 cup, as measured using a standard measuring cup.</p>
<p>To evaluate the estimated "measurements" in relation to the actual reference values, I performed a gage linearity and bias study (<strong>Stat > Quality Tools > Gage Study > Gage Linearity and Bias Study</strong>). The results are shown below.</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/206aa287a18fca42fa11e31ccce49879/gage_linearity_and_bias_for_coffee_cups.jpg" style="width: 960px; height: 720px;" /></p>
<p><strong>Note:</strong> A gage linearity and bias study evaluates whether a measurement system has bias when compared to a known standard. It also assesses linearity—the difference in average bias through the expected operating range of the measuring device. For this example, I didn't enter an estimate of process variation, so the results don't include linearity estimates.</p>
<p>The Y axis shows the amount of bias (in this particular case, the difference between the known amount of water (the standard) minus the amount estimated by each person using different coffee cups). If the measurements perfectly match the reference values, the data points on the graph should fall along the line bias = 0, with a slope of 0.</p>
<p>That's obviously not the case here. The estimated measurements for all three reference values show considerable negative bias. That is, when using the coffee cups in our department lunchroom as "gages", every person's estimated measurement was much smaller than the actual amount of liquid. Not a surprise, because the coffee cups are larger than a standard cup. (There are coffee cups that hold about one standard cup, by the way, such as <a href="https://images.search.yahoo.com/images/view;_ylt=AwrB8pnRUeJUSDAAEtKJzbkF;_ylu=X3oDMTIyMWVjZG80BHNlYwNzcgRzbGsDaW1nBG9pZAM2YWI0Y2YxYTczMzhjM2Y3MTdmZWFiN2RlNjA0MjFiOQRncG9zAzUEaXQDYmluZw--?.origin=&back=https%3A%2F%2Fimages.search.yahoo.com%2Fyhs%2Fsearch%3F_adv_prop%3Dimage%26va%3Dteema%2Bcoffee%2Bcup%2B0%252C22%2Bl%26fr%3Dyhs-mozilla-001%26hsimp%3Dyhs-001%26hspart%3Dmozilla%26tab%3Dorganic%26ri%3D5&w=450&h=450&imgurl=media-cache-ec0.pinimg.com%2F736x%2F3d%2Fed%2F13%2F3ded134470900ba66c746161838bcbc0.jpg&rurl=http%3A%2F%2Fpinterest.com%2Fpin%2F228346643577633964%2F&size=+9.9KB&name=%3Cb%3ETeema%3C%2Fb%3E+%3Cb%3ECoffee%3C%2Fb%3E+%3Cb%3ECup%3C%2Fb%3E+-+Kaj+Franck+-+Iittala+-+RoyalDesign.com&p=teema+coffee+cup+0%2C22+l&oid=6ab4cf1a7338c3f717feab7de60421b9&fr2=&fr=yhs-mozilla-001&tt=%3Cb%3ETeema%3C%2Fb%3E+%3Cb%3ECoffee%3C%2Fb%3E+%3Cb%3ECup%3C%2Fb%3E+-+Kaj+Franck+-+Iittala+-+RoyalDesign.com&b=0&ni=336&no=5&ts=&tab=organic&sigr=11cjp5aa4&sigb=14o2u26h3&sigi=12djkvq95&sigt=12elhiqco&sign=12elhiqco&.crumb=NFCYiF44SGZ&fr=yhs-mozilla-001&hsimp=yhs-001&hspart=mozilla" target="_blank">the cup that I use every morning</a>. But most Americans don't drink from coffee cups this small. It was designed back in the '50s, when most things—houses, grocery carts, cheeseburgers—were made in more modest proportions).</p>
<p>The Gage Bias table shows that the average bias increases as the amount of liquid increases. And even though this was a small sample, the bias was statistically significant (P < 0.000). Importantly, notice that the bias wasn't consistent at each reference value—there is a considerable range of bias among the estimates at each reference value.</p>
<p>Despite its obvious limitations, this informal, exploratory analysis provides some grounds for speculation.</p>
<p>What does "one cup of coffee" actually mean in studies that use self-reported data? What about categories such as 1-2 cups, or 2-4 cups? If it's not clear what x cups of coffee actually refers to, what do we make of risk estimates that are specifically associated with x number of cups of coffee? Or meta-analyses that combine self-reported coffee consumption data from different countries (equating one Japanese "cup of coffee", say, with one Australian "cup of coffee"?)</p>
<p>Of course, perfect data sets don't exist. And it's possible that some studies may manage to identify valid overall trends and correlations associated with increasing/decreasing coffee consumption.</p>
<p>Still, let's just say that a self-reported "cup of coffee" might best be served not with cream and sugar, but with a large grain of salt.</p>
So before you start brewing your data...
<p>And before you rush off to calculate p-values...it's worth taking the extra time and effort to make sure that you're actually measuring what you <em>think</em> you're measuring.</p>
Fun StatisticsStatisticsStatistics HelpStatsTue, 24 Feb 2015 13:00:00 +0000http://blog.minitab.com/blog/statistics-and-quality-data-analysis/gage-linearity-and-bias%3A-wake-up-and-smell-your-measuring-systemPatrick RunkelA Mommy’s Look at Scoliosis…A Study in Correlation
http://blog.minitab.com/blog/adventures-in-software-development/a-mommy%E2%80%99s-look-at-scoliosis%E2%80%A6a-study-in-correlation
<p><em>Juvenile Idiopathic Scoliosis.</em> That was the diagnosis given to my then 8 year old daughter last January. In short, it means that she’s young (under 10), she exhibits an abnormal amount of spinal curvature, and there’s no identified cause (aside from some bad luck).<img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1ce5308b-c6c1-4dde-9593-741c6cebc514/Image/9e474dc07c4c9371120a9f65c93965bf/spine.jpg" style="width: 184px; height: 274px; float: right; border-width: 1px; border-style: solid; margin: 10px 15px;" /></p>
<p>Emilia’s x-rays indicated an S-shaped curve with 26 degrees at its largest curvature. To look at my healthy, active daughter, you’d never notice. However, on an x-ray, 26 degrees is quite alarming.</p>
<p>We learned quickly that the goal with scoliosis is to minimize further curvature; thereby, preventing surgery. The typical solution: a brace. And, given her young age, it could be up to 5 years of wear.</p>
<p>Because Emilia was right on the edge of “bracing,” we had a decision to make: do we brace her now or wait and see? She’s our daughter and we want to do everything we can to support her. We definitely want to prevent surgery but we also want her to live an active life doing all of the things she loves: swimming, skiing, etc. How could we be sure wearing a brace will actually prevent curve progression? Does a relationship between brace wear and non-progression even exist? </p>
<p>A colleague, <a href="http://blog.minitab.com/blog/meredith-griffith/how-to-calculate-b10-life-with-statistical-software">Meredith Griffith</a>, found a particular study conducted at the Texas Scottish Rite Hospital for Children and reported by <em>The Journal of Bone and Joint Surgery</em>. A sample of 100 patients with curves between 25 and 45 degrees were each fitted with a brace containing a heat sensor for measuring the number of hours of brace wear. Once all patients reached skeletal maturity, doctors compared the number of hours of brace wear with the patient’s curve progression. Specifically, doctors were interested in a curve progression greater than or equal to 6 degrees as an indicator of brace treatment failure.</p>
<p>Based on this study, 82% of patients who wore the brace for more than twelve hours per day experienced successful brace treatment (<6 degrees of curve progression)! This result was more pronounced in patients who were less skeletally mature at the time of the study—indicating that earlier exposure to brace treatment offers a higher chance that the patient will experience minimal to no curve progression. It is also notable that as the hours of brace wear decrease, the rate of successful treatment decreases: those wearing a brace 7 to 12 hours per day showed a 61% treatment success rate, while those who wore the brace fewer than 7 hours per day showed only a 31% success rate. Ultimately, doctors found that a strong relationship—statistically speaking, a strong positive correlation—between hours of brace wear and non-progression exists. So as the number of hours of wear increases, so does the probability of non-progression, and vice versa.</p>
<p>The saying goes: “<a href="http://blog.minitab.com/blog/understanding-statistics/no-matter-how-strong-correlation-still-doesnt-imply-causation">Correlation does not imply causation.</a>” So although we cannot assume that wearing a brace for 24 hours each day throughout childhood development will yield no curve progression, we can assign probabilities or likelihoods to non-progression based on the hours of wear. Understanding the likelihood of non-progression will equip parents to make valid, data-driven decisions. </p>
<p>Emilia, being our level-headed, data-driven child, made the decision on her own: “Sounds like we need to brace it,” she told the doctor. I love that kid.</p>
<p>And so we did. Emilia wears a brace about 20 hours a day. She manages her time in it and it hasn’t slowed her down. She continues to be on the downhill race team, the swim team, and does everything else a 9-year-old does. Our adventure with scoliosis is a marathon and not a sprint, as our doctor would say. She has days where she doesn’t get a full 20 hours, but we manage and she always gets at least 12 hours of wear. She has a great attitude about it and wonderfully supportive friends.</p>
<p>And the results? At her 6-month checkup, her curvature measured 22 degrees. While there is measurement variation, the reading does indicate that it didn’t progress. As our spinal surgeon told us “No indication of progression and the rest, well, that’s just gravy.” That’s fancy doctor-speak for “We’re going to Disney to celebrate!” And we did.</p>
<p>Good Job, Emilia! You are a rock star!</p>
StatisticsMon, 23 Feb 2015 13:00:00 +0000http://blog.minitab.com/blog/adventures-in-software-development/a-mommy%E2%80%99s-look-at-scoliosis%E2%80%A6a-study-in-correlationDawn KellerCrossed Gage R&R: How are the Variance Components Calculated?
http://blog.minitab.com/blog/marilyn-wheatleys-blog/crossed-gage-rr%3A-how-are-the-variance-components-calculated
<p>In technical support, we often receive questions about <span><a href="http://blog.minitab.com/blog/michelle-paret/gage-this-or-gage-that-how-the-number-of-distinct-categories-relates-to-the-study-variation">Gage R&R</a></span> and how Minitab calculates the amount of variation that is attributable to the various sources in a measurement system.</p>
<p>This post will focus on how the variance components are calculated for a crossed Gage R&R using the ANOVA table, and how we can obtain the %Contribution, StdDev, Study Var and %Study Var shown in the Gage R&R output. For this example, we will accept all of Minitab’s default values for the calculations.</p>
<p>The sample data used in this post is available within Minitab by navigating to <strong>File</strong> > <strong>Open Worksheet</strong>, and then clicking the <strong>Look in Minitab Sample Data folder</strong> button at the bottom of the dialog box. (If you're not already using Minitab, <a href="http://it.minitab.com/products/minitab/free-trial.aspx">get the free 30-day trial</a>.) The name of the sample data set is <strong>Gageaiag.MTW</strong>. For this data set, 10 parts were selected that represent the expected range of the process variation. Three operators measured the 10 parts, three times per part, in a random order.</p>
<p>To see the Gage R&R ANOVA tables in Minitab, we use <strong>Stat</strong> > <strong>Quality Tools</strong> > <strong>Gage Study</strong> > <strong>Gage R&R Study (Crossed)</strong>, and then complete the dialog box as shown below:</p>
<p><img alt="" spellcheck="true" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/Image/48aecbaca4f0bc91fe82c149a6ebe99b/pic1.png" style="border-width: 1px; border-style: solid; width: 892px; height: 353px;" /></p>
<p>Minitab 17’s default alpha to remove the Part*Operator interaction is 0.05. Since the p-value for the interaction in the first ANOVA table is 0.974 (much greater than 0.05), Minitab removes the interaction and shows a second ANOVA table with no interaction.</p>
<p>To calculate the Variance Components, we turn to Minitab’s Methods and Formulas section: <strong>Help</strong> > <strong>Methods and Formulas </strong>> <strong>Measurement systems analysis</strong> > <strong>Gage R&R Study (Crossed)</strong>, and then choose <strong>VarComp for ANOVA method</strong> under <strong>Gage R&R table</strong>.</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/Image/fa381e07132def3c0d6a8e34df7be583/pic2.PNG" style="width: 803px; height: 706px;" /></p>
<p>There are two parts to this section of Methods and formulas. The first provides the formulas used when the Operator*Part interaction is part of the model. In this example, the Operator*Part interaction was not significant and was removed. Therefore we use the formulas for the reduced model:</p>
<p><img alt="" spellcheck="true" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/Image/227f3bb695f73fe56009f77ae34d46d8/pic3.png" style="border-width: 1px; border-style: solid; width: 535px; height: 311px;" /></p>
<p>The variance components section of the crossed Gage R&R output is shown below so we can compare our hand calculations to Minitab’s results:</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/Image/a0c0882e1f3c94ff2d6a01df0acceb1f/pic4.png" style="border-width: 1px; border-style: solid; width: 309px; height: 169px;" /></p>
<p>We will do the hand calculations using the reduced ANOVA table for each source of variation:</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/Image/2831d91347a6936ad7f3d2fa2fb8bd70/pic5.png" style="border-width: 1px; border-style: solid; width: 368px; height: 118px;" /></p>
<p>Repeatability is estimated as the Mean Square (MS column) for Repeatability in the ANOVA table, so the estimate for <u>Repeatability</u> is <strong>0.03997</strong>.</p>
<p>We can see the formula for Operator above. The number of replicates is the number of times each operator measured each part. We had 10 parts in this study, and each operator measured each of the 10 parts 3 times, so the denominator for the Operator calculation is 10*3. The numerator is the MS Operator – MS repeatability, so the formula for the variance component for the <u>Operator</u> is (1.58363-0.03997)/(10*3) = 1.54366/30 = <strong>0.0514553.</strong></p>
<p>Next, Methods and Formulas shows how to calculate the Part-to-Part variation. The b represents the number of operators (in this study we had 3), and n represents the number of replicates (that is also 3 since each operator measured each part 3 times). So the denominator for the Part-to-Part variation is 3*3, and the numerator is MS Part – MS Repeatability. Therefore, the <u>Part-to-Part</u> variation is (9.81799-0.03997)/(3*3) = <strong>1.08645</strong>.</p>
<p><u>Reproducibility</u> is easy since it is the same as the variance component for operator that we previously calculated; <strong>0.0514553</strong>.</p>
<p>For the last two calculations, we’re just adding the variance components for the sources that we previously calculated:</p>
<p><u>Total Gage R&R</u> = Repeatability + Reproducibility = 0.03997 + 0.0514553 = <strong>0.0914253</strong>.</p>
<p><u>Total Variation</u> = Total Gage R&R + Part-to-Part = 0.0914253 + 1.08645 = <strong>1.17788</strong>.</p>
<p>Notice that the Total Variation is the sum of all the variance components. The variances are additive so the total is just the sum of the other sources.</p>
<p>The %Contribution of VarComp column is calculated using the variance components- the VarComp for each source is divided by Total Variation:</p>
<p style="text-align: center;"><strong>Source</strong></p>
<p style="text-align: center;"><strong>VarComp</strong></p>
<p style="text-align: center;"><strong>Calculation</strong></p>
<p style="text-align: center;"><strong>%Contribution</strong></p>
<p style="text-align: center;"><strong>Total Gage R&R</strong></p>
<p style="text-align: center;">0.0914253</p>
<p style="text-align: center;">0.0914253/1.17788*100</p>
<p style="text-align: center;">7.76185</p>
<p style="text-align: center;"><strong> Repeatability</strong></p>
<p style="text-align: center;">0.03997</p>
<p style="text-align: center;">0.03997/1.17788*100</p>
<p style="text-align: center;">3.39338</p>
<p style="text-align: center;"><strong> Reproducibility</strong></p>
<p style="text-align: center;">0.0514553</p>
<p style="text-align: center;">0.0514553/1.17788*100</p>
<p style="text-align: center;">4.36847</p>
<p style="text-align: center;"> <strong> Operator</strong></p>
<p style="text-align: center;">0.0514553</p>
<p style="text-align: center;">0.0514553/1.17788*100</p>
<p style="text-align: center;">4.36847</p>
<p style="text-align: center;"><strong>Part-To-Part</strong></p>
<p style="text-align: center;">1.08645</p>
<p style="text-align: center;">1.08645/1.17788*100</p>
<p style="text-align: center;">92.2377</p>
<p style="text-align: center;">Total variation</p>
<p style="text-align: center;">1.17788</p>
<p style="text-align: center;">1.17788/1.17788*100</p>
<p style="text-align: center;">100</p>
<p><span style="line-height: 1.6;">Now that we’ve replicated the Variance components output, we can use these values to re-create the last table in Minitab’s Gage R&R output:</span></p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/Image/0bfe7807f260f4798ce577f55125c1c7/pic6.png" style="border-width: 1px; border-style: solid; width: 389px; height: 128px;" /></p>
<p>The StdDev column is simple- there we’re just taking the square root of each of the values in the VarComp column. To Total Variation value in the StDev column is the square root of the corresponding VarComp column (it is not the sum of the standard deviations):</p>
<p style="text-align: center;"><strong>Source</strong></p>
<p style="text-align: center;"><strong>VarComp</strong></p>
<p style="text-align: center;"><strong>Square Root of VarComp = StdDev</strong></p>
<p style="text-align: center;"><strong>6 x StdDev = Study Var</strong></p>
<p style="text-align: center;"><strong>Total Gage R&R</strong></p>
<p style="text-align: center;">0.0914253</p>
<p style="text-align: center;">0.302366</p>
<p style="text-align: center;">1.81420</p>
<p style="text-align: center;"> <strong>Repeatability</strong></p>
<p style="text-align: center;">0.03997</p>
<p style="text-align: center;">0.199925</p>
<p style="text-align: center;">1.19955</p>
<p style="text-align: center;"> <strong>Reproducibility</strong></p>
<p style="text-align: center;">0.0514553</p>
<p style="text-align: center;">0.226838</p>
<p style="text-align: center;">1.36103</p>
<p style="text-align: center;"> <strong>Operator</strong></p>
<p style="text-align: center;">0.0514553</p>
<p style="text-align: center;">0.226838</p>
<p style="text-align: center;">1.36103</p>
<p style="text-align: center;"><strong>Part-To-Part</strong></p>
<p style="text-align: center;">1.08645</p>
<p style="text-align: center;">1.04233</p>
<p style="text-align: center;">6.25397</p>
<p style="text-align: center;"><strong>Total Variation</strong></p>
<p style="text-align: center;">1.17788</p>
<p style="text-align: center;">1.08530</p>
<p style="text-align: center;">6.51181</p>
<p><span style="line-height: 1.6;">Finally, the %Study Var column is calculated by dividing the Study Var for each source by the Study Var value in the Total Variation row. For example, the %Study Var for Repeatability is 1.19955/6.51181*100 = 18.4211%.</span></p>
<p>I hope this post helps you understand where these numbers come from in a Gage R&R. Let’s just be glad that we have Minitab to do the calculations behind the scenes so we don’t have to do this by hand every time!</p>
StatisticsStatistics HelpFri, 20 Feb 2015 13:00:00 +0000http://blog.minitab.com/blog/marilyn-wheatleys-blog/crossed-gage-rr%3A-how-are-the-variance-components-calculatedMarilyn WheatleyChoosing Between a Nonparametric Test and a Parametric Test
http://blog.minitab.com/blog/adventures-in-statistics/choosing-between-a-nonparametric-test-and-a-parametric-test
<p>It’s safe to say that most people who use statistics are more familiar with parametric analyses than nonparametric analyses. Nonparametric tests are also called distribution-free tests because they don’t assume that your data follow a specific distribution.</p>
<p>You may have heard that you should use nonparametric tests when your data don’t meet the assumptions of the parametric test, especially the assumption about normally distributed data. That sounds like a nice and straightforward way to choose, but there are additional considerations.</p>
<p>In this post, I’ll help you determine when you should use a:</p>
<ul>
<li>Parametric analysis to test group means.</li>
<li>Nonparametric analysis to test group medians.</li>
</ul>
<p>In particular, I'll focus on an important reason to use nonparametric tests that I don’t think gets mentioned often enough!</p>
Hypothesis Tests of the Mean and Median
<p>Nonparametric tests are like a parallel universe to parametric tests. The table shows related pairs of <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/hypothesis-tests/basics/hypothesis-tests-in-minitab/" target="_blank">hypothesis tests</a> that <a href="http://www.minitab.com/en-us/products/minitab/features/" target="_blank">Minitab statistical software</a> offers.</p>
<p style="text-align: center;"><strong>Parametric tests (means)</strong></p>
<p style="text-align: center;"><strong>Nonparametric tests (medians)</strong></p>
<p style="text-align: center;">1-sample t test</p>
<p style="text-align: center;">1-sample Sign, 1-sample Wilcoxon</p>
<p style="text-align: center;">2-sample t test</p>
<p style="text-align: center;">Mann-Whitney test</p>
<p style="text-align: center;">One-Way ANOVA</p>
<p style="text-align: center;">Kruskal-Wallis, Mood’s median test</p>
<p style="text-align: center;">Factorial DOE with one factor and one blocking variable</p>
<p style="text-align: center;">Friedman test</p>
Reasons to Use Parametric Tests
<p><strong>Reason 1: Parametric tests can perform well with skewed and nonnormal distributions</strong></p>
<p>This may be a surprise but parametric tests can perform well with continuous data that are nonnormal if you satisfy these sample size guidelines.</p>
<p style="text-align: center;"><strong>Parametric analyses</strong></p>
<p style="text-align: center;"><strong>Sample size guidelines for nonnormal data</strong></p>
<p style="text-align: center;">1-sample t test</p>
<p style="text-align: center;">Greater than 20</p>
<p style="text-align: center;">2-sample t test</p>
<p style="text-align: center;">Each group should be greater than 15</p>
<p style="text-align: center;">One-Way ANOVA</p>
<ul>
<li style="text-align: center;">If you have 2-9 groups, each group should be greater than 15.</li>
<li style="text-align: center;">If you have 10-12 groups, each group should be greater than 20.</li>
</ul>
<p><strong>Reason 2: Parametric tests can perform well when the spread of each group is different</strong></p>
<p>While nonparametric tests don’t assume that your data follow a normal distribution, they do have other assumptions that can be hard to meet. For nonparametric tests that compare groups, a common assumption is that the data for all groups must have the same spread (dispersion). If your groups have a different spread, the nonparametric tests might not provide valid results.</p>
<p>On the other hand, if you use the 2-sample t test or One-Way ANOVA, you can simply go to the <strong>Options</strong> subdialog and uncheck <em>Assume equal variances</em>. Voilà, you’re good to go even when the groups have different spreads!</p>
<p><strong>Reason 3: Statistical power</strong></p>
<p>Parametric tests usually have more <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/power-and-sample-size/what-is-power/" target="_blank">statistical power</a> than nonparametric tests. Thus, you are more likely to detect a significant effect when one truly exists.</p>
Reasons to Use Nonparametric Tests
<p><strong>Reason 1: Your area of study is better represented by the median</strong></p>
<p><img alt="Comparing two skewed distributions" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/7223b01bc095dbd652bd863be5288cfe/mean_or_median.png" style="float: right; width: 200px; height: 181px; margin: 10px 15px;" />This is my favorite reason to use a nonparametric test and the one that isn’t mentioned often enough! The fact that you <em>can</em> perform a parametric test with nonnormal data doesn’t imply that the mean is the best <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/summary-statistics/measures-of-central-tendency/" target="_blank">measure of the central tendency</a> for your data.</p>
<p>For example, the center of a skewed distribution, like income, can be better measured by the median where 50% are above the median and 50% are below. If you add a few billionaires to a sample, the mathematical mean increases greatly even though the income for the typical person doesn’t change.</p>
<p>When your distribution is skewed enough, the mean is strongly affected by changes far out in the distribution’s tail whereas the median continues to more closely reflect the center of the distribution. For these two distributions, a random sample of 100 from each distribution produces means that are significantly different, but medians that are not significantly different.</p>
<p>Two of my colleagues have written excellent blog posts that illustrate this point:</p>
<ul>
<li>Michelle Paret: <a href="http://blog.minitab.com/blog/michelle-paret/using-the-mean-its-not-always-a-slam-dunk" target="_blank">Using the Mean in Data Analysis: It’s Not Always a Slam-Dunk</a></li>
<li>Redouane Kouiden: <a href="http://blog.minitab.com/blog/statistics-for-lean-six-sigma/the-non-parametric-economy-what-does-average-actually-mean" target="_blank">The Non-parametric Economy: What Does Average Actually Mean?</a></li>
</ul>
<p><strong>Reason 2: You have a very small sample size</strong></p>
<p>If you don’t meet the sample size guidelines for the parametric tests and you are not confident that you have normally distributed data, you should use a nonparametric test. When you have a really small sample, you might not even be able to ascertain the distribution of your data because the distribution tests will lack sufficient power to provide meaningful results.</p>
<p>In this scenario, you’re in a tough spot with no valid alternative. Nonparametric tests have less power to begin with and it’s a double whammy when you add a small sample size on top of that!</p>
<p><strong>Reason 3: You have ordinal data, ranked data, or outliers that you can’t remove</strong></p>
<p>Typical parametric tests can only assess continuous data and the results can be significantly affected by outliers. Conversely, some nonparametric tests can handle ordinal data, ranked data, and not be seriously affected by outliers. Be sure to check the assumptions for the nonparametric test because each one has its own data requirements.</p>
Closing Thoughts
<p>It’s commonly thought that the need to choose between a parametric and nonparametric test occurs when your data fail to meet an assumption of the parametric test. This can be the case when you have both a small sample size and nonnormal data. However, other considerations often play a role because parametric tests can often handle nonnormal data. Conversely, nonparametric tests have strict assumptions that you can’t disregard.</p>
<p>The decision often depends on whether the mean or median more accurately represents the center of your data’s distribution.</p>
<ul>
<li>If the mean accurately represents the center of your distribution and your sample size is large enough, consider a parametric test because they are more powerful.</li>
<li>If the median better represents the center of your distribution, consider the nonparametric test even when you have a large sample.</li>
</ul>
<p>Finally, if you have a very small sample size, you might be stuck using a nonparametric test. Please, collect more data next time if it is at all possible! As you can see, the sample size guidelines aren’t really that large. Your chance of detecting a significant effect when one exists can be very small when you have both a small sample size and you need to use a less efficient nonparametric test!</p>
Hypothesis TestingStatisticsStatistics HelpThu, 19 Feb 2015 13:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/choosing-between-a-nonparametric-test-and-a-parametric-testJim FrostUsing Regression to Evaluate Project Results, part 2
http://blog.minitab.com/blog/statistics-in-the-field/using-regression-to-evaluate-project-results%2C-part-2
<p>In part 1 of this post, I covered how Six Sigma students at Rose-Hulman Institute of Technology cleaned up and <a href="http://blog.minitab.com/blog/statistics-in-the-field/using-regression-to-evaluate-project-results%2C-part-1">prepared project data for a regression analysis</a>. Now we're ready to start our analysis. We’ll detail the steps in that process and what we can learn from our results.</p>
<img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/34f7a59b815a9a51c5a54e75b4041853/plastic.png" style="border-width: 1px; border-style: solid; margin: 10px 15px; float: right; width: 200px; height: 200px;" />
What Factors Are Important?
<p>We collected data about 11 factors we believe could be significant:</p>
<ul>
<li>Whether the date of collection was a Monday or a Tuesday</li>
<li>The number of trashcans in a team's area</li>
<li>The ratio of recycle bins to trash cans</li>
<li>Number of plastic cups and bottles collected</li>
<li>Number of Java City (a coffee shop on campus) cups collected</li>
<li>Number of paper sheets collected</li>
<li>Number of newspapers collected</li>
<li>Number of glass bottles collected</li>
<li>Number of aluminum cans collected</li>
<li>Number of cardboard items collected</li>
<li>Whether the data was collected pre or post improvement</li>
</ul>
<p>Just because we collected data about 11 factors doesn't mean that they are all important. Any good regression model should attempt to keep the number of factors down to a minimum. So how do we go about finding out which factors are important? The easiest way is to use Minitab's Best Subsets regression tool! Best Subsets evaluates and gives you important descriptive statistics about the regression models that can be formed from the different combinations of factors. The resulting output table lists the number of factors in each model, R2 and adjusted R2, and also tells us which factors are included in each model.</p>
<p><img alt="best subsets regression analysis" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/0fbd4228fd006e529bcd6ae1088998d5/best_subsets_regression.png" style="width: 504px; height: 930px;" /></p>
<p align="center"><em>Results of the Best Subsets</em></p>
Looking at Adjusted R2
<p>The output from the Best Subsets analysis gave us quite a lot of potential models we could use. Which one should we choose? We used two components to narrow down the options. The first was the adjusted R2 values, since this statistic takes into account the number of variables used. We want this value to be as high as possible. When we plot the adjusted R2 values against the number of factors in each model, we see a point where adding additional factors has diminishing returns. For this set of data, that point was at five factors.</p>
<p><img alt="scatterplot-of-r2-vs-variables" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/167b5b1d19db1b7e1c6a1970b842596e/scatterplot_of_r_2.png" style="width: 577px; height: 385px;" /></p>
<p>Notice how at 5 variables and beyond the adjusted R-squared value hits a plateau? That’s our point of diminishing returns!</p>
The Factors that Always Seem to Appear
<p>The second component we considered was which factors consistently appeared in the top models generated. If these factors keep appearing in the top models, we reasoned, there's a good chance they’re significant.</p>
<p>When we look at the results from our Best Subsets, we find that five factors are consistently chosen by the algorithm: The number of plastics, paper, newspaper, aluminum, and the effect of the improvement efforts.</p>
<p>Identifying those five factors enables us to generate our final model.</p>
Verifying the Final Model
<p>Great! So we went through all this and got ourselves a model. Now we are ready to make conclusions, right? Not quite. We still need to ensure that the model we’ve created adheres to the assumptions that are associated with regression analysis. If our model does not meet these assumptions, then we can't make any definitive conclusions. Luckily for us, the process doesn't change from before.</p>
<p>As before, first we need to check whether the mean error is zero and the data is homoscedastic.</p>
<p align="center"><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/b02632530203f0f07679753e0602fb22/scatterplots_7.png" style="width: 800px; height: 532px;" /><br />
<img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/1407f6f93f61d70b11ef64c135480571/scatterplots_8.png" style="width: 800px; height: 268px;" /><br />
<em>Plots used to verify regression assumptions.</em></p>
<p><span style="line-height: 1.6;">As before, the plots indicate that we have no reason to assume that the data is not IID. Moving on, we check whether the residuals are normally distributed.</span></p>
<p style="text-align: center;"><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/d57722f8f2dc4bb195d142ba9124ee70/probability_plot_of_tres1.png" style="width: 577px; height: 385px;" /><em><span style="line-height: 1.6;">Normality plot</span></em></p>
<p><span style="line-height: 1.6;">Last but not least, we are continuing our assumption that the teams can count and that there is no variance in the values in our predictor values.</span></p>
<p>It appears that this new model does in fact meet the regression assumptions. The final model created from this data is:</p>
<p align="center" style="margin-left: 40px;"><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/d92a497cfc18e5be0b70aab115b2cdf1/final_regression_model.png" style="width: 597px; height: 45px; float: left;" /></p>
<p align="center"> </p>
Final Results: What Have We Learned?
<p>At the end of all of this, we determined our regression model, all ready to go and verified. But what does this single equation we created tell us? What can we use it for?</p>
<p>For starters, we now have an accurate model that we can use to predict the weight of recyclables disposed of in the trash, based solely on five factors. This is nice, as we can predict the weight of recyclables from various areas simply by just looking at what the items are present in the trash!</p>
<p>We also learned that of the 11 factors we started with, only five of them have a significant relationship with the weight of the recyclables. Plastic cups and bottles, sheets of paper, newspapers, and aluminum cans were found to be significant contributors to the total weight of recyclables disposed of in the trash. This is important to know, since it tells us what to focus on in future efforts.</p>
<p>The last factor that was found to be significant was the effect of the improvement phase of our project. More importantly, if you look at the equation for the final model, this factor has a negative constant associated with it. This tells us that our efforts have been successful, as the effort was statistically significant and in a manner that <em>decreased</em> the amount of recyclables thrown away in the trash.</p>
<p>Now that wasn’t too bad, was it? With regression and a little help from <a href="http://www.minitab.com/products/minitab">Minitab</a>, there was no chance our data analysis efforts would go to waste!</p>
<p> </p>
<p style="line-height: 20.7999992370605px;"><strong>About the Guest Blogger</strong></p>
<p style="line-height: 20.7999992370605px;"><em>Peter Olejnik is a graduate student at the Rose-Hulman Institute of Technology in Terre Haute, Indiana. He holds a bachelor’s degree in mechanical engineering and his professional interests include controls engineering and data analysis.</em></p>
<div> </div>
Regression AnalysisTue, 17 Feb 2015 13:00:00 +0000http://blog.minitab.com/blog/statistics-in-the-field/using-regression-to-evaluate-project-results%2C-part-2Guest BloggerAre Big 10 Basketball Referees Biased Towards Winning Teams?
http://blog.minitab.com/blog/the-statistics-game/are-big-10-basketball-referees-biased-towards-winning-teams
<p>Over the weekend Penn State men's basketball coach Pat Chambers had <a href="http://espn.go.com/video/clip?id=12330724" target="_blank">some strong words</a> about a foul that went against his team in a 76-73 loss against Maryland. Chambers called it “The worst call I’ve ever seen in my entire life,” and he wasn’t alone in his thinking. Even sports media members with no affiliation to Penn State agreed with him.</p>
<p style="margin-left: 40px;"><img alt="Jay Bilas Tweet" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/63685fde5f957594d494a313bb7e230e/bilas.jpg" style="width: 500px; height: 194px;" /></p>
<p style="margin-left: 40px;"><img alt="Dan Dakich Tweet" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/e776cc355b2fd77c360be363cb33aded/dakich.jpg" style="width: 500px; height: 179px;" /></p>
<p>This wasn't the first time this season Chambers has criticized the referees. After a game against Michigan State, Chambers said he thought the teams and coaches affected how the referees called the end of close games.</p>
<p>"At some point, this thing's got to stop, it's got to switch, I'm not a Hall of Fame coach," Chambers said. "Nothing against Tom (Izzo), nothing against John Beilein, nothing against all these other guys, but it's got to stop."</p>
<p>This quote goes along the same thinking of the tweet by Dan Dakich. In a close game, will the refs be biased towards the more established coach or team? Now, I'm not saying the refs are having a secret meeting before the game, going "Okay, Michigan St is an elite program, if this game is close we need to make sure they win." But if you're a ref being yelled at by Tom Izzo/Bo Ryan/Thad Matta and on the other side is well, Pat Chambers, who are you going to listen to?</p>
<p>One way to answer this question is to look at the outcome of close games. Specifically, games decided by 2 possessions or less or that go into overtime (and from here on out, I'll just refer to these as "close games"). We've previously seen that <a href="http://blog.minitab.com/blog/the-statistics-game/analyzing-luck-in-college-basketball-part-ii" target="_blank">the outcome in close games is pretty random</a>. In the long run, you'll win about half your close games and lose about half of them. However, if the referees are consistently making calls that go against you (or for you) it's possible that your winning percentage could deviate from .500.</p>
Penn State's Big 10 Record in Close Games
<p>Since Pat Chambers has coached Penn State, they are 8-19 (.296) in close Big 10 games. And in close out-of-conference (OOC) games under Chambers (which are usually against traditionally weaker schools), they are 12-5 (.706). That's a huge difference! It's pretty easy to see why Chambers is so upset. However, we're dealing with some pretty small samples. So before we jump to any conclusions, we better increase our sample size. </p>
<p>I collected data for every close Penn State game since 2002. I got my data from <a href="http://kenpom.com" target="_blank">kenpom.com</a>, which only goes back to 2002, which is why I choose that year. The following table has the results.</p>
<strong>Type of Game</strong>
<strong>Wins</strong>
<strong>Losses</strong>
<strong>Winning Percentage</strong>
Out-of-Conference
28
18
.609
Big 10
38
58
.396
<p>We see that in 96 close Big 10 games, Penn State wasn't even able to win 40% of them! And if they were actually missing some sort of "clutch gene" (or whatever Skip Bayless would say) we would expect to see the same thing in their OOC games. However, they've actually <em>won</em> a majority of their close OCC games! </p>
<p>Let's see if we can chalk this up to random variation. Assuming that their actual probability in close games is .500, what is the probability that Penn State would win 38 or fewer games in 96 tries?</p>
<p><img alt="Distribution Plot" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/ba1297b0a2b7f1b2675be0e450c2837d/penn_state_dist_plot.jpg" style="width: 576px; height: 384px;" /></p>
<p>If their close games were truly random, the probability that Penn State would win 38 or fewer games is only 2.6%. That's low enough to conclude it didn't happen by chance. But if the reason is actually the refs giving preferential treatment to more established teams and coaches, we would expect to see similar results for other Big 10 teams.</p>
Examining the Entire Big 10
<p>If the refs are making calls against Penn State in close games, are they doing similar things to other lowly Big 10 teams? And on the flip side, are the better teams in the Big 10 seeing an increase in the amount of close games they win? First, let's just look at who the best and worst Big 10 teams have been since 2002. Here are each team's winning percentage in <em>only Big 10 conference games</em> since 2002. (Because I went back to 2002, Nebraska, Maryland, and Rutgers are not included.)</p>
<strong>Teams</strong>
<strong>Wins</strong>
<strong>Losses</strong>
<strong>Winning Percentage</strong>
Wisconsin
167
67
0.713675
Michigan St
157
77
0.67094
Ohio St
153
82
0.651064
Illinois
133
102
0.565957
Purdue
123
112
0.523404
Michigan
119
116
0.506383
Indiana
117
118
0.497872
Iowa
99
135
0.423077
Minnesota
97
138
0.412766
Northwestern
75
159
0.320513
Penn St
61
174
0.259574
<p><span style="line-height: 1.6;">We see that Wisconsin, Michigan St, and Ohio State have been the premier Big 10 teams since 2002. And it's no surprise that they have 3 of the most established coaches in the conference with Bo Ryan, Tom Izzo, and Thad Matta. On the flip side, Iowa, Minnesota, Northwestern, and Penn State all have Big 10 winning percentages significantly under .500. And naturally, these four programs have been through a combined 12 different coaches since 2002. So if referees were biased towards winning programs, we would expect Wisconsin, Michigan St, and Ohio State to have won a higher percentage of close games at the expense of Iowa, Minnesota, Northwestern, and Penn St. But if there is no bias, we would expect each team to have a winning percentage close to .500.</span></p>
<p>The following individual plot shows the percentage of close Big 10 games each team has won since 2002.</p>
<p><img alt="Individual Value Plot" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/8f01fa435c472630b847e28203775c23/ivp_big_10.jpg" style="width: 576px; height: 384px;" /></p>
<p><span style="line-height: 1.6;">The 4 perennial losers in the Big 10 also just happen to be the 4 teams that have had the worst "luck" in close games. But is it really bad luck, or could officiating be giving the benefit of the doubt to the more established team/coach? And if it's the latter, could Bo Ryan be the greatest referee manipulator ever? Wisconsin has won a ridiculous 56 of 87 close Big 10 games since 2002. If you assume their true chance of winning a close game is .500, the probability that they would win 56 or more games in 87 tries is...well, it's low.</span></p>
<p><img alt="Distribution Plot" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/478271baacfafcd4f0ff2bda42fefe51/wis_dp.jpg" style="width: 576px; height: 384px;" /></p>
<p><span style="line-height: 1.6;">That's less than half a percent! The chances of that are about 1 in 207! Is Wisconsin getting really, </span><em style="line-height: 1.6;">really</em><span style="line-height: 1.6;"> lucky, or is something else going on?</span></p>
Breaking Down the Top Versus the Bottom
<p>Granted, we should expect to see some variation in the distribution of the teams. If you take 11 different coins and flip each one 90 times, all 11 coins are not going to have heads come up 45 times. Wisconsin has an insanely high winning percentage in close games, but Michigan St and Ohio St have winning percentages right around .500. And Northwestern doesn't seem to have had as many close games go against them as Minnesota or Iowa, despite having a lower overall winning percentage than both teams.</p>
<p>So let's break down the close games between only our top and bottom teams. After all, if the theory is that the established coaches/programs get calls over the perennial losers, we should look at games played specifically between those teams. So here are how the top 3 Big 10 teams have fared in close games against Penn St, Iowa, and Minnesota. And don't worry Northwestern, we'll get to you in a minute.</p>
<strong>Team</strong>
<strong>Wins</strong>
<strong>Losses</strong>
<strong>Winning Percentage</strong>
Wisconsin
19
11
.633
Michigan St
14
9
.609
Ohio St
12
8
.600
Total
45
28
.616
<p>Michigan State and Ohio State didn't have winning percentages significantly higher than .500 in close games when compared to the entire Big 10, but when you just look at games against the weaker teams, they both have winning percentages around .600.<span style="line-height: 1.6;"> Total, our top 3 teams have won 45 out of 73 close games against Penn State, Minnesota, and Iowa. So is this significantly greater than .500?</span></p>
<p><img alt="1 Proportion Test" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/a7ee8c9995687895e938322075f1584d/1_proportion.jpg" style="width: 376px; height: 143px;" /></p>
<p><span style="line-height: 1.6;">The p-value, which is 0.03, is lower than the common significance level of 0.05, which means we can conclude the top 3 teams win more than 50% of their close games against the bottom of the Big 10. There is definitely a decent case to be made that Big 10 referees give the favorable calls to the better team at the end of close games.</span></p>
Hey, What About Northwestern?
<p>If there is one Big 10 team associated with losing even more so than Penn State, it's Northwestern. Even though Penn State has a worse winning percentage in Big 10 games since 2002, the Nittany Lions have at least had some success. They won the NIT in 2009, made the NCAA tournament in 2011, and even reached the Sweet Sixteen in 2001. Meanwhile, Northwestern hasn't made the NCAA Tournament since..........well, ever. And yet they haven't fared poorly in close Big 10 games, winning 36 out of 78 (.462). Assuming their true chance of winning close games is .500, the probability of them winning 36 or fewer games out of 78 is 28.58%. That's not near uncommon enough to conclude its significantly lower than .500. So has Northwestern ruined our referee theory?</p>
<p>Not at all. In fact, they're going to drive the final nail in the coffin.</p>
<p>Remember how Michigan State and Ohio State didn't appear to win a higher percentage of their close games until we looked at how they fared against only the worst teams? Well Northwestern doesn't appear to lose a lower percentage of their close games.........until we only look at their games against the top 3 teams.</p>
<p>2-13</p>
<p>Yep, that's Northwestern's record in close games against Wisconsin, Ohio State, and Michigan State since 2002. <em>Two and thirteen</em>! With a record that poor, I was sure I could find a game where the refs might have played a part. It took a 30 second internet search to find not one, but two. And in the same season!</p>
<p>Jan 29th, 2011: #1 Ohio St 58 - Northwestern 57<br />
Mar 11th, 2011: #1 Ohio St 67 - Northwestern 61 (OT)</p>
<p>Coming into both games, Ohio State was ranked #1 in the country. They also had one of the best players in the country in Jared Sullinger. The first game was played at Northwestern, while the second was played at a neutral site during the Big 10 tournament. So you don't have to worry about ref biased due to home court advantage.</p>
<p>So what happened during these two games? </p>
<strong>Northwestern Free Throws Attempted</strong>
<strong>Jared Sullinger Free Throws Attempted</strong>
Game 1
11
10
Game 2
18
18
Total
29
28
<p>Sullinger had almost as many free throw attempts as the entire Northwestern team. Overall Ohio State had 52 free throws to Northwestern's 29. But even more astonishing than that is that 18 of Sullinger's attempts occurred during the final 5 minutes of the game or in overtime. Close game between Northwestern and the #1 ranked team with one of the best players in the country? It's no longer looking like much of a surprise which way the fouls went.</p>
<p>When we add Northwestern's wins and losses to our previous table of the top teams versus the bottom teams, we get the following.</p>
<strong>Team</strong>
<strong>Wins</strong>
<strong>Losses</strong>
<strong>Winning Percentage</strong>
Wisconsin
20
12
.625
Michigan St
18
9
.668
Ohio St
20
9
.690
Total
58
30
.659
<p><span style="line-height: 1.6;">When the top 3 teams in the Big 10 since 2002 have played a close game against the bottom 4 teams, they've won about 66% of the time. That's pretty B1G. So what do you think? Do refs actually favor established teams and coaches at the end of close games?</span></p>
<p>Make the call.</p>
<p> </p>
Statistics in the NewsMon, 16 Feb 2015 20:37:00 +0000http://blog.minitab.com/blog/the-statistics-game/are-big-10-basketball-referees-biased-towards-winning-teamsKevin RudyUsing Regression to Evaluate Project Results, part 1
http://blog.minitab.com/blog/statistics-in-the-field/using-regression-to-evaluate-project-results%2C-part-1
<p><em>By Peter Olejnik, guest blogger.</em></p>
<p>Previous posts on the Minitab Blog have discussed the work of the Six Sigma students at <a href="http://rose-hulman.edu/">Rose-Hulman Institute of Technology</a> to reduce the quantities of recyclables that wind up in the trash. Led by Dr. Diane Evans, these students continue to make an important impact on their community.</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/64e636cef00078d6a7b28245f62f8759/recyclables.png" style="border-width: 1px; border-style: solid; margin: 10px 15px; float: right; width: 200px; height: 200px;" />As with any Six Sigma process, the results of the work need to be evaluated. A simple two-sample T test could be performed, but it gives us a very limited amount of information – only whether there is a difference between the before and after improvement data. But what if we want to know if a certain item or factor affects the amount of recyclables disposed of? What if we wanted to know by how much of an effect important factors have? What if we want to create a predictive model that can estimate the weight of the recyclables without the use of a scale?</p>
<p>Sounds like a lot of work, right? But actually, with the use of regression analysis tools in Minitab Statistical Software, it's quite easy!</p>
<p>In this two-part blog post, I’ll share with you how my team used regression analysis to identify and model the factors that are important in making sure recyclables are handled appropriately. </p>
Preparing Your Data for Regression Analysis
<p><span style="line-height: 1.6;">All the teams involved in this project collected a substantial amount of data. But some of this data is somewhat subjective. Also, this data has been recorded in a manner that is geared toward people, and not necessarily for analysis by computers. To start doing analysis in Minitab, all of our data points need to be quantifiable and in long format.</span></p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/9111cc378a9d3a50ad02e8f920d5af6d/data_as_inserted_by_teams_w1024.png" style="width: 800px; height: 485px;" /></p>
<p align="center"><em>The Data as Inserted by the Six Sigma Teams</em></p>
<p align="center"><img alt="data after conversion" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/06c69397a0dc317ee13067c8d9f47ab7/data_after_conversion_w1024.png" style="width: 800px; height: 383px;" /></p>
<p align="center"><em><span style="line-height: 1.6;">The Data, After Conversion into Long Format and Quantifiable Values</span></em></p>
<p><span style="line-height: 1.6;">Now that we have all this data in a computer-friendly format, we need to identify and eliminate any extreme outliers present, since they can distort our final model. First we create a regression model with all of the factors included. As part of this, we generate the residuals from the data vs. the fit. For our analysis, we utilized deleted T-residuals. These are less affected by the skew of an outlier compared to regular T-residuals, making it them better indicator. These can be selected to be displayed by Minitab in the same manner that any other residual can be selected. Looking at these residuals, those with values above 4 were removed. A new fit was then created and the process was repeated until no outliers remain.</span></p>
Satisfying the Assumptions for Regression Analysis
<p>Once the outliers have been eliminated, we need to verify the regression assumptions for our data to ensure that the analysis conducted is valid. We need to satisfy five assumptions:</p>
<ol>
<li style="margin-left: 0.5in;">The mean value of the errors is zero.</li>
<li style="margin-left: 0.5in;">The variance of the errors is even and consistent (or “homoscedastic”) through the data.</li>
<li style="margin-left: 0.5in;">The data is independent and identically distributed (IID).</li>
<li style="margin-left: 0.5in;">The errors are normally distributed.</li>
<li style="margin-left: 0.5in;">There is negligible variance in the predictor values.</li>
</ol>
<p><span style="line-height: 1.6;">For our third assumption, we know that the data points should be IID, because each area’s daily trash collection should have no effect on that of other areas or the next day’s collection. We have no reason to suspect otherwise. The fifth assumption is also believed to have been met, as we have no reason to suspect that there is variance in the predictor value. This means that only three of the five assumptions still need to be checked.</span></p>
<p>Our first and second assumptions can be checked simply by plotting the deleted T-residuals against the individual factors, as well as the fits and visually inspecting them.</p>
<p><br />
<img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/4db6a7f0e802ff26b7641baf76b3326b/scatterplots_1.png" style="width: 800px; height: 537px;" /><br />
<img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/93f6a521a7017db9f74588d7d0a28c98/scatterplots_2.png" style="width: 800px; height: 532px;" /><br />
<img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/79b06f78d420550190a39e49e045765c/scatterplots_3.png" style="width: 800px; height: 532px;" /></p>
<p align="center"><em><img src="file:///C:\Users\emartz\AppData\Local\Temp\msohtmlclip1\01\clip_image007.emz" /> <img src="file:///C:\Users\emartz\AppData\Local\Temp\msohtmlclip1\01\clip_image008.emz" /> <img src="file:///C:\Users\emartz\AppData\Local\Temp\msohtmlclip1\01\clip_image009.emz" /> <span style="line-height: 1.6;"> </span><img src="file:///C:\Users\emartz\AppData\Local\Temp\msohtmlclip1\01\clip_image012.emz" style="line-height: 1.6;" /><span style="line-height: 1.6;">Plots used to verify regression assumptions.</span></em></p>
<p>When looking over the scatter plots, it looks like these two assumptions are met. Checking the fourth assumption is just as easy. All that needs to be done is to run a normality test on the deleted T-residuals.</p>
<p align="center"><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ac5e0f64d22e22a5703b493d4a2e54a3/probability_plot_of_deleted_t_residuals.png" style="width: 577px; height: 385px;" /><br />
<em style="line-height: 1.6;">Normality plot of deleted T-residuals</em></p>
<p><span style="line-height: 1.6;">It appears that our residuals are not normally distributed, as seen by the p-value of our test. This is problematic, as it means any analysis we would conduct would be invalid. Fortunately, all is not lost: we can perform a Box-Cox analysis on our results. This will tell us is if the response variable needs to be raised by a power.</span></p>
<p align="center"><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/6539dba1ff25f65482ba8501a0216d3a/box_cox_transformation_analysis.png" style="width: 577px; height: 385px;" /><br />
<em>Box-Cox analysis of the data</em></p>
<p><span style="line-height: 1.6;">The results of this analysis indicate that the response variable should be raised by a constant of 0.75. A new model and residuals can be generated from this modified response variable and the assumptions can be checked.</span></p>
<p align="center"><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/c2eac5fd8a48d5476d19510ca69687bd/scatterplots_4.png" style="width: 800px; height: 533px;" /><br />
<img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/42763813beb394fda03a7eaa6ecb53b4/scatterplots_5.png" style="width: 800px; height: 532px;" /><br />
<img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/1eeafd4ef3c00837b425530c40b1a3d3/scatterplots_6.png" style="width: 800px; height: 533px;" /><em>N<span style="line-height: 1.6;">ew plots used in order to verify regression assumptions for revised model.</span></em></p>
<p><span style="line-height: 1.6;">The residuals again appear to be homoscedastic and centered about zero.</span></p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/42df464e7acd7f5b309e6dac0608ba3a/normality_plot.png" style="width: 577px; height: 385px;" /></p>
<p align="center"><em>Normality plot on deleted T-residuals</em></p>
<p>The residuals now are normally distributed. <span style="line-height: 1.6;">Our data is now prepped and ready for analysis!</span></p>
<p>The second part of this post will detail the regression analysis. </p>
<p> </p>
<p><strong>About the Guest Blogger</strong></p>
<p><em>Peter Olejnik is a graduate student at the Rose-Hulman Institute of Technology in Terre Haute, Indiana. He holds a bachelor’s degree in mechanical engineering and his professional interests include controls engineering and data analysis.</em></p>
<p> </p>
Regression AnalysisMon, 16 Feb 2015 13:00:00 +0000http://blog.minitab.com/blog/statistics-in-the-field/using-regression-to-evaluate-project-results%2C-part-1Guest BloggerA Little Trash Talk: Improving Recycling Processes at Rose-Hulman, Part II
http://blog.minitab.com/blog/real-world-quality-improvement/a-little-trash-talk%3A-improving-recycling-processes-at-rose-hulman%2C-part-ii
<p><span style="line-height: 1.6;">I left off last with a </span><a href="http://blog.minitab.com/blog/real-world-quality-improvement/a-little-trash-talk3a-improving-recycling-processes-at-rose-hulman" style="line-height: 1.6;" target="_blank">post</a><span style="line-height: 1.6;"> outlining how the Six Sigma students at Rose-Hulman were working on a project to reduce the amount of recycling thrown in the normal trash cans in all of the academic buildings at the institution.</span></p>
<p>Using the <a href="http://blog.minitab.com/blog/real-world-quality-improvement/dmaic-vs-dmadv-vs-dfss" target="_blank">DMAIC</a> methodology for completing improvement projects, they had already defined the problem at hand: how could the amount of recycling that’s thrown in the normal trash cans be reduced? They collected baseline data for the types of recyclables thrown into the trash, including their weights and frequencies. In order to brainstorm ideas to improve recycling efforts at Rose-Hulman and to determine causes for the lack of recycling in the first place, the students created <a href="http://blog.minitab.com/blog/understanding-statistics/five-types-of-fishbone-diagrams" target="_blank">fishbone diagrams</a>.</p>
Implementing Improvements
<p>The students then entered the ‘Improve’ phase of the project and formed a list of recommended actions based on the variables they could control to motivate recycling practices in a four-week time frame. The short time constraint was fixed due to the length of an academic quarter.</p>
<p>This list of actions included the following:</p>
<ul>
<li>Placing a recycling bin next to each and every trash can throughout the academic buildings, including classrooms.</li>
<li>Constructing and displaying posters next to or on recycling bins indicating what items are recyclable and are not recyclable on campus:</li>
</ul>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/f2c16c8a3b53884cd996ae306074f582/recycle.jpg" style="width: 560px; height: 348px;" /></p>
<ul>
<li>Informing campus about Rose-Hulman recycling policies, as well as the current percentage of recyclables on campus (by weight), determined during the Measure phase. (The information was shared with the entire campus via an email and an article in the school newspaper by Dr. Evans.)</li>
<li>Encouraging good recycling habits through creative posters, contests, incentives, and using concepts related to “<a href="http://www.thefuntheory.com/" target="_blank">The Fun Theory</a>.” Fun theory is used to change people’s behaviors through making activities fun. For example, the class discussed ways to make recycling bins produce amusing sounds when items are placed in it.</li>
</ul>
<p>The students implemented many of these improvements and then gathered post-improvement data at the end of four weeks during four fixed collection periods.</p>
Analyzing Pre-Improvement vs. Post-Improvement Data
<p>There were a total of 15 areas in the academic buildings where recycling data was collected. Fifteen student teams were assigned one of these areas for the entire project, collecting data during the pre- and post-improvement phases. There are a total of 60 data points for both phases.</p>
<p>The teams compared pre-improvement and post-improvement statistics for the percentage of recyclables in the trash with Minitab (using Stat > Basic Statistics > Display Descriptive Statistics in the software):</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/b4217833edb2c6803d89942897093085/descriptivestats.JPG" style="width: 781px; height: 232px;" /></p>
<p><span style="line-height: 1.6;">Some highlights of this analysis:</span></p>
<ul>
<li>The mean percentage of recyclables in trash decreased from 37% to 24%, which is a reduction of 35%.</li>
<li>The median percentage of recyclables in trash decreased from 31% to 17%, which is a reduction of 45%.</li>
<li>The total average weight of recyclables in trash over the baseline period (4 days) decreased from 84.3 pounds with a standard deviation of approximately 7.89 pounds to 45.9 pounds with a standard deviation of approximately 5.19 pounds during the improvement period, which is a reduction in the total average weight of 46%.</li>
<li>The mean recyclable weight for all areas decreased from 1.405 pounds to 0.765 pounds, which is a reduction of 84%.</li>
</ul>
<p>They were also able to view the improvements graphically with boxplots in Minitab:</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/87e11c170f9d5e1e1085bd8b6a3c61bd/boxplots.JPG" style="width: 830px; height: 286px;" /></p>
<p><em><span style="line-height: 1.6;">Boxplots of the percentage of recyclables during the four collection periods in the pre-improvement phase (left plot) and the four collection periods in the post-improvement phase (right plot).</span></em></p>
<p>Although it is not apparent in these boxplots that the mean percentage of recyclables (the circles with the crossbars) has decreased in the improvement phase, it is obvious that the median percentage of recyclables (line within the boxplot) has decreased.</p>
<p>In addition, the students used Minitab plots to track changes in percentage of recyclables in the trash <em>per area</em>, both pre and post-improvement: </p>
<p align="center"><strong><img src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/fceda9f381ab639aae8e8e087cba0b0f/plot.jpg" style="border-width: 0px; border-style: solid; width: 602px; height: 401px;" /></strong></p>
<p><em>Plot of the mean percentage of recyclables in the trash by academic building area for both pre and post-improvement phases. The mean is averaged over the four collection times in each phase.</em></p>
<p><span style="line-height: 1.6;">These plots helped the students to graphically see gaps between the percentages of recyclables collected pre and post-improvement by area. Given the location of each academic area, the changes between pre and post means were justifiable and informative.</span></p>
<p>And in order to statistically determine if the true mean percentage of recyclables post-improvement was significantly less than the true mean percentage of recyclables pre-improvement, the students ran a paired t-test for all 60 data points, pairing by area and day. See below for the Minitab output for this test:</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/661b0f3db3b30fa71345a9d495b2b41e/t_test.JPG" style="width: 780px; height: 140px;" /></p>
<p><span style="line-height: 1.6;">With a t-test statistic of 4.66, it is evident that the recycling improvements made a difference! They ran a paired </span><em style="line-height: 1.6;">t-</em><span style="line-height: 1.6;">test since the pre and post recyclable percentages were linked by area and day. They did not need to check for normality of the paired differences since we had </span><em style="line-height: 1.6;">n</em><span style="line-height: 1.6;"> = 60 data points.</span></p>
<p>After collecting baseline data, the students had created a Pareto Chart to display the type of trash (and recyclables) found in the regular trash cans. They also created a Pareto Chart for the post-improvement data—you can see both below to compare (pre-improvement – left, post-improvement – right):</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/3edfc7284f094d71fa90a356e50ccf26/pareto.JPG" style="width: 820px; height: 284px;" /><span style="line-height: 1.6;">Plastics were the most common recyclable item in the trash both pre- and post-improvement, and overall, besides the Java City coffee cups <em>increasing</em> </span><em style="line-height: 1.6;">post</em><span style="line-height: 1.6;">-improvement, the other categories saw a noticeable decrease post-improvement compared to pre-improvement.</span></p>
<p>To complete their pre- and post-improvement analysis, the students also ran a capability study in Minitab to determine the pre and post-improvement capability of recyclables in the trash. Post-improvement, both their Pp and Ppk values improved.</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/0e554f5242ca484743f2ce779212376e/processcap.JPG" style="width: 841px; height: 319px;" /></p>
<strong style="line-height: 1.2;">R</strong><strong style="line-height: 1.2;">esults</strong><strong style="line-height: 1.2;"> and Future Improvement Efforts</strong>
<p>Of the 15 areas (Spring Quarter 2014) that collected pre-improvement and post-improvement data over the span of two four-day collection periods, only two areas had an increased percentage of recyclables in the trash after the improvements were made. These two areas had “special causes” associated, which can be explained.</p>
<p>One area with increased recyclables <em>after</em> improvements was the Moench Mailroom. <span style="line-height: 1.6;">The Moench Mailroom area is next to the campus mailroom where students pick up their daily mail, graded homework assignments, etc., in their mail slots. It was evident during post-improvement trash collection that a student had emptied an entire quarter’s worth of mail, including junk mail, magazines, and assignments, into the trash can by the mailroom. Since the student’s name was on the mail and assignments, it was clear that that the recyclables discarded in the trash was from this one student. He certainly threw off that area’s post-improvement data!</span></p>
<p>Although the improvement efforts were short-term, the students saw their efforts significantly decrease the percentage of recyclables being discarded in the normal trash cans at the academic buildings. At the beginning of Spring Quarter 2014, 36% of trash cans (by weight) were recyclable items. At the end of Spring Quarter 2014 after the improvement phase, 24% of trash cans (by weight) were recyclable items!</p>
<p>They were not only able to decrease the carbon footprint of their school and aid in their school’s sustainability program, but the increase in recycling also has the potential to create revenue for the school down the road (if they choose to recycle aluminum cans or sell paper, for example).</p>
<p>Dr. Evans and the students have shared their results with the campus community and plan to work with the administration to publish their results, which will hopefully highlight why these improvement efforts should stick around long-term. <span style="line-height: 1.6;">Way to go Dr. Evans and Rose-Hulman Six Sigma Students!</span></p>
<p><em>Many thanks to Dr. Evans for her contributions to this post! </em></p>
Data AnalysisLean Six SigmaStatisticsFri, 13 Feb 2015 13:00:00 +0000http://blog.minitab.com/blog/real-world-quality-improvement/a-little-trash-talk%3A-improving-recycling-processes-at-rose-hulman%2C-part-iiCarly Barry