Minitab | MinitabBlog posts and articles about using Minitab software in quality improvement projects, research, and more.
http://blog.minitab.com/blog/minitab/rss
Fri, 26 Aug 2016 10:08:23 +0000FeedCreator 1.7.3What the Heck Are Sums of Squares in Regression?
http://blog.minitab.com/blog/marilyn-wheatleys-blog/what-the-heck-are-sums-of-squares-in-regression
<p>In regression, "sums of squares" are used to represent variation. In this post, we’ll use some sample data to walk through these calculations.</p>
<p><img alt="squares" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/bc08f61dab265a6e4a481df9e66e54e8/squares.jpg" style="width: 250px; height: 250px; margin: 10px 15px; float: right;" />The sample data used in this post is available within <a href="http://www.minitab.com/en-us/products/minitab/">Minitab</a> by choosing <strong>Help</strong> > <strong>Sample Data</strong>, or <strong>File</strong> > <strong>Open Worksheet</strong> > <strong>Look in Minitab Sample Data folder</strong> (depending on your version of Minitab). The dataset is called <strong>ResearcherSalary.MTW</strong>, and contains data on salaries for researchers in a pharmaceutical company.</p>
<p>For this example we will use the data in C1, the salary, as Y or the response variable and C4, the years of experience as X or the predictor variable.</p>
<p>First, we can run our data through Minitab to see the results: <strong>Stat</strong> > <strong>Regression</strong> > <strong>Fitted Line Plot</strong>. The salary is the Y variable, and the years of experience is our X variable. The regression output will tell us about the relationship between years of experience and salary after we complete the dialog box as shown below, and then click <strong>OK</strong>:</p>
<p style="margin-left: 40px;"><img alt="fitted line plot dialog" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/3c5f16c5e876000a5fe3941e6622dec3/sum_of_squares1.png" style="border-width: 0px; border-style: solid; width: 437px; height: 232px;" /></p>
<p>In the window above, I’ve also clicked the <strong>Storage</strong> button, selected the box next to <strong>Coefficients</strong> to store the coefficients from the regression equation in the worksheet. When we click <strong>OK</strong> in the window above, Minitab gives us two pieces of output:</p>
<p style="margin-left: 40px;"><img alt="fitted line plot and output" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/52211666f2a49c79d4127965964934da/sum_of_squares2.png" style="border-width: 0px; border-style: solid; width: 624px; height: 282px;" /></p>
<p>On the left side above we see the regression equation and the ANOVA (Analysis of Variance) table, and on the right side we see a graph that shows us the relationship between years of experience on the horizontal axis and salary on the vertical axis. Both the right and left side of the output above are conveying the same information. We can clearly see from the graph that as the years of experience increase, the salary increases, too (so years of experience and salary are positively correlated). For this post, we’ll focus on the SS (Sums of Squares) column in the Analysis of Variance table.</p>
Calculating the Regression Sum of Squares
<p>We see a SS value of 5086.02 in the Regression line of the ANOVA table above. That value represents the amount of variation in the salary that is attributable to the number of years of experience, based on this sample. Here's where that number comes from. </p>
<ol>
<li>Calculate the average response value (the salary). In Minitab, I’m using <strong>Stat</strong> > <strong>Basic Statistics</strong> > <strong>Store Descriptive Statistics</strong>:</li>
</ol>
<p style="margin-left: 40px;"><img alt="dialog boxes" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/1d4099c8c098df54b6b8c9a4756e978c/sum_of_squares3.png" style="border-width: 0px; border-style: solid; width: 624px; height: 378px;" /></p>
<p><span style="line-height: 1.6;">In addition to entering the Salary as the variable, I’ve clicked </span><strong style="line-height: 1.6;">Statistics</strong><span style="line-height: 1.6;"> to make sure only </span><strong style="line-height: 1.6;">Mean</strong><span style="line-height: 1.6;"> is selected, and I’ve also clicked </span><strong style="line-height: 1.6;">Options</strong><span style="line-height: 1.6;"> and checked the box next to </span><strong style="line-height: 1.6;">Store a row of output for each row of input</strong><span style="line-height: 1.6;">. As a result, Minitab will store a value of 82.9514 (the average salary) in C5 35 times:</span></p>
<p style="margin-left: 40px;"><img alt="data" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/2cafda0a9754841e093f614ce337d0ec/sum_of_squares4.png" style="border-width: 0px; border-style: solid; width: 359px; height: 252px;" /></p>
<ol>
<li value="2">Next, we will use the regression equation that Minitab gave us to calculate the fitted values. The fitted values are the salaries that our regression equation would predict, given the number of years of experience. </li>
</ol>
<p style="margin-left: 40px;">Our regression equation is <strong>Salary = 60.70 + 2.169*Years</strong>, so for every year of experience, we expect the salary to increase by 2.169. </p>
<p style="margin-left: 40px;">The first row in the Years column in our sample data is 11, so if we use 11 in our equation we get 60.70 + 2.169*11 = 84.559. So with 11 years of experience our regression equation tells us the expected salary is about $84,000. </p>
<p style="margin-left: 40px;">Rather than calculating this for every row in our worksheet manually, we can use Minitab’s calculator: <strong>Calc</strong> > <strong>Calculator </strong>(I used the stored coefficients in the worksheet to include more decimals in the regression equation that I’ve typed into the calculator):</p>
<p style="margin-left: 40px;"><img alt="calculator" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/5c11d157d8fb68745a47a8d5bbf12bc7/sum_of_squares5.png" style="border-width: 0px; border-style: solid; width: 423px; height: 381px;" /></p>
<p style="margin-left: 40px;">After clicking <strong>OK</strong> in the window above, Minitab will store the predicted salary value for every year in column C6. <strong>NOTE</strong>: <em>In the regression graph we obtained, the red regression line represents the values we’ve just calculated in C6.</em></p>
<ol>
<li value="3">Now that we have the average salary in C5 and the predicted values from our equation in C6, we can calculate the Sums of Squares for the Regression (the 5086.02). We’ll use <strong>Calc</strong> > <strong>Calculator</strong> again, and this time we will subtract the average salary from the predicted values, square those differences, and then add all of those squared differences together:</li>
</ol>
<p style="margin-left: 40px;"><img alt="calculator" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/e43bec2e97d1e3552fa68b5babbc9162/sum_of_squares6.png" style="border-width: 0px; border-style: solid; width: 426px; height: 384px;" /></p>
<p style="margin-left: 40px;">We square all the values because some of the predicted values from our equation are lower than the average, so those predicted values would be negative. If we sum together both positive and negative values, they will cancel each other out. But because we square the values, all observations will be taken into account.</p>
<p style="margin-left: 40px;">We have just calculated the Sum of Squares for the regression by summing the squared values. Our results should match what we’d seen in the regression output previously:</p>
<p style="margin-left: 40px;"><img alt="output" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ad62e21671cf488d1c6be83420b92979/sum_of_squares7.png" style="border-width: 0px; border-style: solid; width: 624px; height: 134px;" /></p>
Calculating the Error Sum of Squares
<p>The Error Sum of Squares is the variation in the salary that is not explained by number of years of experience. For example, the additional variation in the salary could be due to the person’s gender, number of publications, or other variables that are not part of this model. Any variation that is not explained by the predictors in the model becomes part of the error term.</p>
<ol>
<li>To calculate the error sum of squares we will use the calculator (<strong>Calc </strong>> <strong>Calculator</strong>) again to subtract the fitted values (the salaries predicted by our regression equation) from the observed response (the actual salaries): </li>
</ol>
<p style="margin-left: 40px;"><img alt="calculator" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/dfcb4fce9d5b4ac05179749c02cd4199/sum_of_squares8.png" style="border-width: 0px; border-style: solid; width: 424px; height: 383px;" /></p>
<p style="margin-left: 0.5in;">In C9, Minitab will store the differences between the actual salaries and what our equation predicted.</p>
<ol>
<li value="2">Because we’re calculating sums of squares again, we’re going to square all the values we stored in C9, and then add them up to come up with the sum of squares for error:</li>
</ol>
<p style="margin-left: 40px;"><img alt="calculator" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/246ff836925c61a1e944a27f0964c213/sum_of_squares9.png" style="border-width: 0px; border-style: solid; width: 422px; height: 381px;" /></p>
<p style="margin-left: 0.5in;">When we click <strong>OK</strong> in the calculator window above, we see that our calculated sum of squares for error matches Minitab’s output:</p>
<p style="margin-left: 0.5in;"><img alt="output" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/be5bb46afeb785da264c08e2c4ce50f5/sum_of_squares10.png" style="border-width: 0px; border-style: solid; width: 621px; height: 128px;" /></p>
<p style="margin-left: 0.5in;">Finally the Sum of Squares total is calculated by adding the Regression and Error SS together: 5086.02 + 1022.61 = 6108.63.</p>
<p>I hope you’ve enjoyed this post, and that it helps demystify what sums of squares are. If you’d like to read more about regression, you may like some of <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-tutorial-and-examples">Jim Frost</a>’s <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-tutorial-and-examples">regression tutorials</a>.</p>
ANOVAData AnalysisLearningRegression AnalysisStatisticsStatistics HelpStatsWed, 24 Aug 2016 12:02:00 +0000http://blog.minitab.com/blog/marilyn-wheatleys-blog/what-the-heck-are-sums-of-squares-in-regressionMarilyn WheatleyData Not Normal? Try Letting It Be, with a Nonparametric Hypothesis Test
http://blog.minitab.com/blog/understanding-statistics/data-not-normal-try-letting-it-be-with-a-nonparametric-hypothesis-test
<p>So the data you nurtured, that you worked so hard to format and make useful, failed the normality test.</p>
<img alt="not-normal" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/c6e92e8046f3fcee28e7cf505fb77005/data_freak_flag_300.jpg" style="line-height: 20.8px; width: 300px; height: 293px; margin: 10px 15px; float: right;" />
<p>Time to face the truth: despite your best efforts, that data set is <em>never </em>going to measure up to the assumption you may have been trained to fervently look for.</p>
<p>Your data's lack of normality seems to make it poorly suited for analysis. Now what?</p>
<p>Take it easy. Don't get uptight. Just let your data be what they are, go to the <strong>Stat </strong>menu in Minitab Statistical Software, and choose "Nonparametrics."</p>
<p style="margin-left: 40px;"><img alt="nonparametrics menu" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/fbebf763ac6bd92b40c0d241b7c4029c/nonparametrics_menu.png" style="width: 367px; height: 309px;" /></p>
<p>If you're stymied by your data's lack of normality, nonparametric statistics might help you find answers. And if the word "nonparametric" looks like five syllables' worth of trouble, don't be intimidated—it's just a big word that usually refers to "tests that don't assume your data follow a normal distribution."</p>
<p>In fact, nonparametric statistics don't assume your data follow <em>any distribution at all</em>. The following table lists common parametric tests, their equivalent nonparametric tests, and the main characteristics of each.</p>
<p style="margin-left: 40px;"><img alt="correspondence table for parametric and nonparametric tests" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/4a69043809861f5187be271de67f8161/parametric_correspondence_table.png" style="width: 661px; height: 488px;" /></p>
<p>Nonparametric analyses free your data from the straitjacket of the <span style="line-height: 20.8px;">normality </span><span style="line-height: 1.6;">assumption. So choosing a nonparametric analysis is sort of like removing your data from a stifling, </span><a href="https://www.verywell.com/the-asch-conformity-experiments-2794996" style="line-height: 1.6;" target="_blank">conformist environment</a><span style="line-height: 1.6;">, and putting it into </span><a href="https://en.wikipedia.org/wiki/Utopia" style="line-height: 1.6;" target="_blank">a judgment-free, groovy idyll</a><span style="line-height: 1.6;">, where your data set can just be what it is, with no hassles about its unique and beautiful shape. How cool is </span><em style="line-height: 1.6;">that</em><span style="line-height: 1.6;">, man? Can you dig it?</span></p>
<p>Of course, it's not <em>quite </em>that carefree. Just like the 1960s encompassed both <a href="https://en.wikipedia.org/wiki/Woodstock" target="_blank">Woodstock</a> and <a href="https://en.wikipedia.org/wiki/Altamont_Free_Concert" target="_blank">Altamont</a>, so nonparametric tests offer both compelling advantages and serious limitations.</p>
Advantages of Nonparametric Tests
<p>Both parametric and nonparametric tests draw inferences about populations based on samples, but parametric tests focus on sample parameters like the mean and the standard deviation, and make various assumptions about your data—for example, that it follows a normal distribution, and that samples include a minimum number of data points.</p>
<p>In contrast, nonparametric tests are unaffected by the distribution of your data. Nonparametric tests also accommodate many conditions that parametric tests do not handle, including small sample sizes, ordered outcomes, and outliers.</p>
<p>Consequently, they can be used in a wider range of situations and with more types of data than traditional parametric tests. Many people also feel that nonparametric analyses are more intuitive.</p>
Drawbacks of Nonparametric Tests
<p><span style="line-height: 20.8px;">But nonparametric tests are not </span><em style="line-height: 20.8px;">completely </em><span style="line-height: 20.8px;">free from assumptions—they do require data to be an independent random sample, for example.</span></p>
<p>And nonparametric tests aren't a cure-all. For starters, they typically have less <a href="http://blog.minitab.com/blog/starting-out-with-statistical-software/how-powerful-am-i-power-and-sample-size-in-minitab">statistical power</a> than parametric equivalents. Power is the probability that you will correctly reject the null hypothesis when it is false. That means you have an increased chance making a Type II error with these tests.</p>
<p>In practical terms, that means nonparametric tests are <em>less </em>likely to detect an effect or association when one really exists.</p>
<p>So if you want to draw conclusions with the same confidence level you'd get using an equivalent parametric test, you will need larger sample sizes. </p>
<p>Nonparametric tests are not a one-size-fits-all solution for non-normal data, but they can yield good answers in situations that parametric statistics just won't work.</p>
Is Parametric or Nonparametric the Right Choice for You?
<p>I've briefly outlined differences between parametric and nonparametric hypothesis tests, looked at which tests are equivalent, and considered some of their advantages and disadvantages. If you're waiting for me to tell you which direction you should choose...well, all I can say is, "It depends..." But I can give you some established rules of thumb to consider when you're looking at the specifics of your situation.</p>
<p>Keep in mind that <strong>nonnormal data does not immediately disqualify your data for a parametric test</strong>. What's your sample size? <span style="line-height: 20.8px;">As long as a certain minimum sample size is met, most parametric tests will be </span><a href="http://blog.minitab.com/blog/fun-with-statistics/forget-statistical-assumptions-just-check-the-requirements" style="line-height: 20.8px;">robust to the normality assumption</a><span style="line-height: 20.8px;">. </span><span style="line-height: 1.6;">For example, the Assistant in Minitab (which uses Welch's t-test) points out that </span><span style="line-height: 1.6;">while the 2-sample t-test is based on the assumption that the data are normally distributed, this assumption is not critical when the sample sizes are at least 15. And Bonnett's 2-sample standard deviation test performs well for nonnormal data even when sample sizes are as small as 20. </span></p>
<p><span style="line-height: 1.6;">In addition, while they may not require normal data, many nonparametric tests have other assumptions that you can’t disregard.</span> For example, t<span style="line-height: 20.8px;">he Kruskal-Wallis test assumes your samples come from populations that have similar shapes and equal variances. </span><span style="line-height: 1.6;">And the 1-sample Wilcoxon test does not assume a particular population distribution, but it does assume the distribution is symmetrical. </span></p>
<p><span style="line-height: 1.6;">In most cases, your choice between parametric and nonparametric tests ultimately comes down to sample size, and whether the center of your data's distribution is better reflected by the mean or the median.</span></p>
<ul>
<li>If the mean accurately represents the center of your distribution and your sample size is large enough, a parametric test offers you better accuracy and more power. </li>
<li>If your sample size is small, you'll likely need to go with a nonparametric test. But if the median better represents the center of your distribution, a nonparametric test may be a better option even for a large sample.</li>
</ul>
<p> </p>
Data AnalysisHypothesis TestingStatisticsStatistics HelpMon, 22 Aug 2016 12:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/data-not-normal-try-letting-it-be-with-a-nonparametric-hypothesis-testEston MartzTo Infinity and Beyond with the Geometric Distribution
http://blog.minitab.com/blog/rkelly/to-infinity-and-beyond-with-the-geometric-distribution
<p><span style="font-size: 13px; line-height: 1.6;">See if this sounds fair to you. I flip a coin.</span></p>
<img alt="pennies" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a918781b135680886606bf9c4f58c36f/pennies.jpg" style="line-height: 20.8px; width: 300px; height: 150px; float: right; margin: 10px 15px;" />
<p style="margin-left: 40px;"><strong>Heads: </strong>You win $1.<br />
<strong style="line-height: 1.6;">Tails: </strong><span style="line-height: 1.6;">You pay me $1.</span></p>
<p>You may not like games of chance, but you have to admit it seems like a fair game. At least, assuming the coin is a normal, balanced coin, and assuming I’m not a sleight-of-hand magician who can control the coin.</p>
<p><span style="line-height: 1.6;">How about this next game?</span></p>
<p style="margin-left: 40px;">You pay me $2 to play.<br />
<span style="line-height: 1.6;">I flip a coin over and over until it comes up heads.</span><br />
<span style="line-height: 1.6;">Your winnings are the total number of flips.</span></p>
<p>So if the first flip comes up heads, you only get back $1. That’s a net loss of $1. If it comes up tails on the first flip and heads on the second flip, you get back $2, and we’re even. If it comes up tails on the first two flips and then heads on the third flip, you get $3, for a net profit of $1. If it takes more flips than that, your profit is greater.</p>
<p>It’s not quite as obvious in this case, but this would be considered a fair game if each coin flip has an equal chance of heads or tails. That’s because the expected value (or mean) number of flips is 2. The total number of flips follows a geometric distribution with parameter p = ½, and the expected value is 1/p.</p>
<p style="margin-left: 40px;"><img alt="geometric distribution with p=0.5" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/037562ad542357e18ef0cf1071510f86/geometric_distribution.png" style="width: 576px; height: 384px;" /></p>
<p>Now it gets really interesting. What about this next game?</p>
<p style="margin-left: 40px;">You pay me $<em>x</em> dollars to play.<br />
<span style="line-height: 1.6;">I flip a coin over and over until it comes up heads.</span><br />
<span style="line-height: 1.6;">Your winnings start at $1, but double with every flip.</span></p>
<p>This is a lot like the previous game with two important differences. First, I haven’t told you how much you have to pay to play. I just called it <strong>x </strong>dollars. Second, the winnings grow faster with the number of flips. It starts off the same with $1 for one flip and $2 dollars for two flips. But then it goes to $4, then $8, then $16. If the first head comes up on the eighth flip, you win $128. And it just keeps getting better from there.</p>
<p>So what’s a fair price to play this game? Well, let’s consider the expected value of the winnings. It’s $∞. You read that right. It’s infinity dollars! So you shouldn’t be too worried about what x is, right? No matter what price I set, you should be eager to pay it. Right? Right?</p>
<p>I’m going to go out on a limb and guess that maybe you would not just let me name my price. Sure, you’d admit that it’s worth more than $2. But would you pay $10? $50? $100,000,000? If the fair price is the expected winnings, then any of these prices should be reasonable. But I’m guessing you would draw the line somewhere short of $100, and maybe even less than $10.</p>
<p>This fascinating little conundrum is known as the Saint Petersburg paradox. Wikipedia tells me <a href="https://en.wikipedia.org/wiki/St._Petersburg_paradox">it goes by that name because</a> it was addressed in the <em>Commentaries of the Imperial Academy of Science of Saint Petersburg</em> back in 1738 by that pioneer of probability theory, Daniel Bernoulli.</p>
<p>The paradox is that while theory tells us that no price is too high to pay to play this game, nobody is willing to pay very much at all to play it.</p>
<p>What’s more, even if you decide what you're willing to pay, you won't find any casinos that even offer this game, because the ultimate outcome is just as unpredictable for the house as it is for the player.</p>
<p>The paradox has been discussed from various angles over the years. One reason I find it so interesting is that it forces me to think carefully about things that are easy to take for granted about the mean of a distribution.</p>
<p>Such as…</p>
<p><strong>The mean is a measure of central tendency.</strong></p>
<p>This is one of the first things we learn in statistics. The mean is in some sense a central value, which the data tends to vary around. It’s the balancing point of the distribution. But when the mean is infinite, this interpretation goes out the window. Now every possible value in the distribution is less than the mean. That’s not very central!</p>
<p><strong>The sample mean approaches the population mean.</strong></p>
<p>One of the most powerful results in statistics is the law of large numbers. Roughly speaking, it tells us that as your sample size grows, you can expect the sample average to approach the mean of the distribution you are sampling from. I think this is a good reason to treat the mean winnings as the fair price for playing the game. If you play repeatedly at the fair price your average profit approaches zero. But here’s the catch: the law of large numbers assumes the mean of the distribution is finite. So we lose one of the key justifications of treating the mean as the fair price when it’s infinite.</p>
<p><strong>The central limit theorem.</strong></p>
<p>Another extremely important result in statistics is <a href="http://blog.minitab.com/blog/rkelly/weight-for-it-a-healthy-application-of-the-central-limit-theorem">the central limit theorem</a>, which I wrote about in a previous blog post. It tells us that the average of a large sample has an approximate normal distribution centered at the population mean, with a standard deviation that shrinks as the sample size grows. But the central limit theorem requires not only a finite mean but a finite standard deviation. I’m sorry to tell you that if the mean of the distribution is infinite, then so is the standard deviation. So not only do we lack a finite mean that our average winnings can gravitate toward, we don’t have a nicely behaved standard deviation to narrow down the variability of our winnings.</p>
<p><span style="line-height: 1.6;">Let’s end by using Minitab to simulate these two games where the payoff is tied to the number of flips until heads comes up. I generated 10,000 random values from the geometric distribution. The two graphs show the running average of the winnings in the two games. In the first case, we have expected winnings of 2, and we see the average stabilizes near 2 pretty quickly. </span></p>
<p style="margin-left: 40px;"><img alt="Time Series Plot of Expected Game Winnings - $2" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/720fad39f34d838883d660c040bc5d7a/time_series_plot_of_expected_game_winnings___2.png" style="border-width: 0px; border-style: solid; width: 576px; height: 384px;" /></p>
<p><span style="line-height: 20.8px;">In the second case, we have infinite expected winnings, and the average does not stabilize.</span></p>
<p style="margin-left: 40px;"><img alt="Time Series Plot of Infinite Expected Game Winnings " src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/5fd506cc3a7500f81dd459724c2c2889/time_series_plot_of_expected_game_winnings____.png" style="border-width: 0px; border-style: solid; width: 576px; height: 384px;" /></p>
<p><span style="line-height: 1.6;">If you'd like to do some simulation on this paradox yourself, here's how to do it in <a href="http://www.minitab.com/products/minitab">Minitab</a>. First, use <strong>Calc > Make Patterned Data > Simple Set of Numbers...</strong> to make a column with the numbers 1 to 10,000. Next, open <strong>Calc > Random Data > Geometric...</strong> to create a separate column of 10,000 random data points from the geometric distribution, using .5 as the Event Probability. </span></p>
<p><span style="line-height: 1.6;">Now we can compute the running average of the random geometric data in C2 with Minitab's Calculator, using the PARS function. PARS is short for “partial sum.” In each row it stores the sum of the data up to and including that row. To get the running average of a game where the expected winnings are $2, divide the partial sums by C1, which just contains the row numbers:</span></p>
<p style="margin-left: 40px;"><span style="line-height: 1.6;"><img alt="calculator with formula for running average" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/8946ac7a66ea1bf865128c2a50a1c56a/calculator_1.png" style="width: 433px; height: 383px;" /></span></p>
<p><span style="line-height: 1.6;">The computation for the game with infinite mean is the same, except that the winnings double in value when C2 increases by 1. Therefore, we take the partial sums of 2(C2 – 1) instead of just C2, and divide each by C1. That formula is entered in the calculator as shown below: </span></p>
<p style="margin-left: 40px;"><img alt="calculator with running average formula for game with infinite winnings" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/d79436e6fdd2e6dc52f3448683e4a868/calculator_2.png" style="width: 433px; height: 383px;" /></p>
<p>Finally, select <strong>Time Series Analysis > Time Series Plot...</strong> and plot the running average of the games with expected winnings of $2 and $<span style="line-height: 20.8px;">∞. </span></p>
<p>So, how much <em>would </em>you pay to play this game? </p>
Data AnalysisFun StatisticsStatisticsStatsFri, 19 Aug 2016 12:00:00 +0000http://blog.minitab.com/blog/rkelly/to-infinity-and-beyond-with-the-geometric-distributionRob KellyHow to Calculate BX Life, Part 2b: Handling Triangular Matrix Data
http://blog.minitab.com/blog/meredith-griffith/how-to-calculate-bx-life-handling-triangular-matrix-data
<span style="font-size: 13px; line-height: 1.6;">I thought 3 posts would capture all the thoughts I had about B10 Life. That is, until this question appeared on the Minitab LinkedIn group:</span>
<p style="margin-left: 40px;"><img alt="pic1" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/dae6c7b7-fc22-4616-9d65-f04909c20ab1/Image/f06ea25c49405cfc937bbade2c19275c/pic1.jpg" style="width: 572px; height: 103px;" /></p>
<p>In case you missed it, my first post, <a href="http://blog.minitab.com/blog/meredith-griffith/how-to-calculate-b10-life-with-statistical-software">How to Calculate B10 Life with Statistical Software</a>, explains what B10 life is and how Minitab calculates this value. My second post, <a href="http://blog.minitab.com/blog/meredith-griffith/how-to-calculate-bx-life-part-2">How to Calculate BX Life, Part 2</a>, shows how to compute any BX life in Minitab. But before I round out my BX life blog series with rationale for why BX life is one of the best measures for reliability, I thought I’d take this opportunity to address the LinkedIn question—as you might wonder the same thing.</p>
B10 Life and Warranty Analysis
<p>BX Life can be a useful metric for establishing warranty periods for products. Why? Because it indicates the time at which X% of items in a population will fail. So a manufacturer might set a warranty period after a product’s B10 life, for instance, with the goal of minimizing the number of customers who will <a href="http://blog.minitab.com/blog/understanding-statistics/how-to-predict-warranty-claims">take advantage of the warranty</a> should the product they purchase fail within the warranty period. Naturally, someone doing warranty analysis in Minitab should want to compute this value too! But looking at raw reliability field data, which are recorded in the form of a triangular matrix, it’s not obvious how to compute B10 life!</p>
Warranty Input in Triangular Matrices
<p>It’s common to keep track of reliability field data in the form of number of items shipped and number of items returned from a particular shipment over time. And when several shipments are made at different dates and their corresponding returns noted, the recorded data are in the form of a triangular matrix.</p>
<p>Minitab has a tool that helps you convert shipping and warranty return data from matrix form into a standard reliability data form of failures. </p>
Convert your data from a matrix form for easy analysis!
<p>To demonstrate, let’s start with a new example and new data. If you’d like to follow along and you’re using <a href="http://www.minitab.com/products/minitab/whats-new/">Minitab 17.3</a>, navigate to <strong>Help > Sample Data</strong> and select the Compressor.MTW file.</p>
<p style="margin-left: 40px;"><img alt="pic1" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/dae6c7b7-fc22-4616-9d65-f04909c20ab1/Image/5f379e4e6795ad8d466926a882826e28/pic1.jpg" style="width: 423px; height: 130px;" /></p>
<p><span style="line-height: 1.6;">Here is what the data looks like:</span></p>
<p style="margin-left: 40px;"><img alt="pic3" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/dae6c7b7-fc22-4616-9d65-f04909c20ab1/Image/18a3643f21a50455704b0d8ad52e6fcc/pic3.jpg" style="width: 807px; height: 308px;" /></p>
<p>From here, you can use Minitab’s Pre-Process Warranty Data to reshape your data from triangular matrix format into interval censoring format. Select <strong>Stat > Reliability/Survival > Warranty Analysis > Pre-Process Warranty Data. </strong>For “Shipment (sale) column,” enter <em>Ship. </em>For “Return (failure) columns,” enter <em>Month1-Month12.</em> Click <strong>OK</strong>.</p>
<p style="margin-left: 40px;"><img alt="pic4" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/dae6c7b7-fc22-4616-9d65-f04909c20ab1/Image/4a00473b37bbb068fa2bce70a904c9f9/pic4.jpg" style="width: 543px; height: 363px;" /></p>
<p><span style="line-height: 1.6;">The Pre-Process step creates </span><em style="line-height: 1.6;">Start time, End time, </em><span style="line-height: 1.6;">and </span><em style="line-height: 1.6;">Frequencies </em><span style="line-height: 1.6;">columns in your worksheet! </span></p>
<p style="margin-left: 40px;"><img alt="pic5" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/dae6c7b7-fc22-4616-9d65-f04909c20ab1/Image/29bf052fa69b8cb01c4b2f3e3f348920/pic5.jpg" style="width: 207px; height: 572px;" /></p>
<p><span style="line-height: 1.6;">You can now use these columns to obtain BX life using </span><strong style="line-height: 1.6;">Stat > Reliability/Survival > Distribution Analysis (Arbitrary Censoring) > Parametric Distribution Analysis</strong><span style="line-height: 1.6;">. Enter </span><em style="line-height: 1.6;">Start time</em><span style="line-height: 1.6;"> in “Start variables,” </span><em style="line-height: 1.6;">End time </em><span style="line-height: 1.6;">in “End variables,” and </span><em style="line-height: 1.6;">Frequencies</em><span style="line-height: 1.6;"> in “Frequency columns (optional).” </span><span style="line-height: 1.6;">Also, make sure you have the appropriate assumed distribution selected. We’ll assume the Weibull distribution fits our data.</span></p>
<p style="margin-left: 40px;"><img alt="pic6" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/dae6c7b7-fc22-4616-9d65-f04909c20ab1/Image/dd0362a3f2b69e280eaa9fcd06532ac9/pic6.jpg" style="width: 497px; height: 358px;" /></p>
<p>Click the Estimate button to enter percents to be estimated in addition to what’s provided in the default output (In our case, let’s ask for B15 Life—so enter a 15 in “Estimate percentiles for these additional percents”). </p>
<p style="margin-left: 40px;"><img alt="pic7" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/dae6c7b7-fc22-4616-9d65-f04909c20ab1/Image/6f2426e01509a75b4d39ae9b1780a4aa/pic7.jpg" style="width: 508px; height: 447px;" /></p>
<p><span style="line-height: 1.6;">When we <strong>OK </strong>out of these dialogs, Minitab performs the analysis. Among the output Minitab provides is our handy Table of Percentiles, including our value for B15 life—or the time at which 15% of the items in our population will fail.</span></p>
<p style="margin-left: 40px;"><img alt="pic8" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/dae6c7b7-fc22-4616-9d65-f04909c20ab1/Image/f104d016a2b21c8aec8d9b54f0ac7338/pic8.jpg" style="width: 487px; height: 400px;" /></p>
<p><span style="line-height: 1.6;">And there you have it!</span></p>
<p>Collecting warranty data and doing <a href="http://blog.minitab.com/blog/understanding-statistics/how-to-predict-warranty-claims">warranty analysis</a> in Minitab shouldn’t prevent you from using reliability tools and metrics, such as BX life. In fact, letting Minitab reshape your data through the Pre-Process Warranty Data tool only makes your life easier when you dive into your reliability analysis!</p>
<p>Now, I promise, we’re well on our way to rounding out this series of posts, and in my next installment we'll look at the reasons BX life is a good metric to have in your reliability tool belt.</p>
Data AnalysisQuality ImprovementReliability AnalysisStatisticsWed, 17 Aug 2016 12:00:00 +0000http://blog.minitab.com/blog/meredith-griffith/how-to-calculate-bx-life-handling-triangular-matrix-dataMeredith GriffithTaking a Stratified Sample in Minitab Statistical Software
http://blog.minitab.com/blog/statistics-and-quality-improvement/taking-a-stratified-sample-in-minitab-statistical-software
<p>The Centers for Medicare and Medicaid Services (CMS) updated their star ratings on July 27. Turns out, the list of hospitals provide a great way to look at how easy it is to get random samples from data within Minitab.</p>
<img alt="Roper Hospital in Charleston, South Carolina" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/de5c1ce8073920674f2fec9ac9f5383b/316_calhoun.JPG" style="line-height: 20.8px; width: 275px; height: 258px; float: right; margin: 10px 15px;" />
<p>Say for example, that you wanted to look at the association between the government’s new star ratings and the safety rating scores provided by <a href="http://www.hospitalsafetyscore.org/" target="_blank">hospitalsafetyscore.org</a>. The CMS score is about overall quality, which includes components that aren't explicitly about safety, such as the quality of the communication between patients and doctors.</p>
<p>The safety score judges patient safety, using components like how often patients begin antibiotics before surgery and whether the process by which doctors order medications is reliable.</p>
<p>The CMS score gives out 1 to 5 stars. The safety score gives out A through F grades. The two measures aren't supposed to be duplicates, but it would be interesting to know whether there's an association between being a safer hospital and being a higher-quality hospital.</p>
<p>The government, kindly, provides the ability to <a href="https://data.medicare.gov/Hospital-Compare/Hospital-General-Information/xubh-q36u" target="_blank">download all 4,788 rows of data in their star ratings</a>, but hospitalsafetyscore.org prefers to provide information by location so that potential patients can quickly examine hospitals near them or find a particular hospital. To compare the star ratings and the safety scores, we need both values.</p>
<p>One solution would be to search hospitalsafetyscore.org for the names of all 4,788 hospitals in the government’s database and record all the scores we found. (Though even if we did this, we wouldn't find all of them. For example, hospitals in Maryland aren't required to provide the data hospitalsafetyscore.org uses.) However, searching 4,788 hospitals is time-consuming.</p>
<p>A faster solution is to study the relationship using a sample of the data. We’ll use the government’s star score data as our sampling frame.</p>
<strong> </strong>A simple random sample
<p>It’s easy to get a simple random sample in Minitab. If you already have the government's star data in Minitab, you can try this (or, you can skip getting it from the government and use <a href="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/File/ecd89f26fb1e3e698220eed8d675f8e1/cms_star_ratings_july27_2016.mtw">this Minitab worksheet version</a> I created):</p>
<ol>
<li>Choose <strong>Calc > Random Data > Sample From Columns</strong>.</li>
<li>In <strong>Number of Rows to Sample</strong>, enter <em>50</em>.</li>
<li>In<strong> From columns</strong>, enter <em>c1-c29</em>. That lets you get all of the information from a row of data into your new sample.</li>
<li>In store sample in, enter <em>c30-c58</em>. Click <strong>OK</strong>.</li>
<li>Copy the column headers from the original data to the sample data.</li>
</ol>
<p>Now, you have a sample of 50 hospitals chosen where each row in the original data set was equally likely.</p>
A stratified sample
<p>Of course, every simple random sample that you draw might not give you something representative, especially if your sample is small. For example, in the government’s star rating, only 2.82% of hospitals achieved 5 stars (102 hospitals). Even worse, nearly 25% of the hospitals in the data don't have a star rating (1,171 hospitals with no star rating).</p>
<p>If we do a hypergeometric probability calculation on a sample of size 50, assuming 102 events in a population of 3617, we find that roughly 25% of the random samples we could take would have 0 hospitals that achieved 5 stars. A simple random sample without any 5-star hospitals could tell us about the general association, but wouldn’t give us much information about what expected safety ratings for hospitals that achieved 5-star rank.</p>
<p>One way to fix the problem would be to take a larger simple random sample. If you take a sample of size 100 instead of a sample of size 50, then the probability that you don’t get any 5-star hospitals is almost down to 5%. Another method would be to modify your sampling scheme to make sure that you got some of every hospital ranking into your sample. Usually, you break your sample down into different groups, or strata. Then you take a simple random sample from each strata. At the end, you combine your multiple simple random samples to form your final sample.</p>
<p>The exact way that you determine how many observations to take from each strata depends on your goals, but let’s say that for this case, we’re going to get 10 hospitals for each star rating. We start by dividing the data:</p>
<ol>
<li>Choose <strong>Data > Split Worksheet</strong>.</li>
<li>In<strong> By variables</strong>, enter <em>‘Hospital overall rating’</em>. Click <strong>OK</strong>.</li>
</ol>
<p>Now, we have separate worksheets with the hospitals that achieved each number of stars. We repeat the simple random-sampling process on each worksheet so that we have a sample of 10 from each ranking.</p>
<p>Now we want to combine those samples from the different star rating data.<span id="cke_bm_348E" style="display: none;"> </span></p>
<ol>
<li>Choose <strong>Data > Stack Worksheets</strong>.</li>
<li>Move the worksheets with the star rating data from <strong>Available Worksheets</strong> to <strong>Worksheets to stack</strong>.</li>
<li>Name the new worksheet and click <strong>OK</strong>.</li>
</ol>
<p>If you’d like the worksheet to be just your final sample, you can go one step further.</p>
<ol>
<li>Choose <strong>Data > Copy > Columns to Columns</strong>.</li>
<li>In <strong>Copy from Columns</strong>, enter <em>c29-c58</em>.</li>
<li>Name the new worksheet.</li>
<li>Click <strong>Subset the data</strong>.</li>
<li>Select <strong>Rows that match</strong> and click <strong>Condition</strong>.</li>
<li>In <strong>Condition</strong>, enter<em> c42 <>’*’</em>. Click <strong>OK</strong> in all 3 dialog boxes.</li>
</ol>
<p>Now you have a worksheet with 50 hospitals, 10 for each star rating.</p>
Hospital Data
<p>At hospitalsafetyscore.org, I was able to find safety ratings for 30 of the hospitals in my sample of hospitals with government star ratings. I have a little bit of concern because I was more likely to find safety ratings on hospitals with lower star ratings than with higher star ratings, but I did find at least 4 hospitals in each category. Because I'm interested in the relationship between the scores and not in the evaluating individual hospitals, I can proceed with my smaller sample size to see if I can get a rough idea about the relationship.</p>
<p>My sample data suggest a relationship between the safety score and the star rating from the government. If we treat the variables as ordinal, the <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/tables/other-statistics-and-tests/what-are-spearman-s-rho-and-pearson-s-r-for-ordinal-categories/">Spearman's rho</a> that measures their correlation is about 0.73 and significantly different from 0. We would not expect perfect agreement because the two ratings are intended to measure different constructs. Still, in the stratified sample, we can see that no 1-star hospital achieved a safety score better than a C and that no 5-star hospital had a safety rating less than a B.</p>
<p style="margin-left: 40px;"><img alt="As the overall rating from the government increases, so does the safety score." src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/e9f968ef7ca5c21af1753ff75c959fdc/scatterplot_of_safety_score_vs_hospital_overall_rating.png" style="width: 576px; height: 384px;" /></p>
<p>Ready for more on Minitab? Read about the role Minitab played om helping <a href="http://www.minitab.com/en-us/case-studies/Akron-Childrens-Hospital-Home-Care/">Akron Children's Hospital could reduce costs while improving patient care</a>.</p>
<em>The image of Roper-Saint Francis Hospital in Charleston, South Carolina, is by <a href="https://commons.wikimedia.org/wiki/Special:Contributions/ProfReader">ProfReader</a> and is licensed under <a href="https://creativecommons.org/licenses/by-sa/3.0/deed.en">this Creative Commons License</a>.</em>
Data AnalysisHealth Care Quality ImprovementStatisticsStatistics in the NewsMon, 15 Aug 2016 12:00:00 +0000http://blog.minitab.com/blog/statistics-and-quality-improvement/taking-a-stratified-sample-in-minitab-statistical-softwareCody SteeleProcess Capability Statistics: Cpk vs. Ppk
http://blog.minitab.com/blog/michelle-paret/process-capability-statistics-cpk-vs-ppk
<p><img alt="" src="https://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/03230c63626f49695aa379570363684b/wham.jpg" style="border-bottom: 1px solid; border-left: 1px solid; margin: 10px; width: 200px; float: right; height: 160px; border-top: 1px solid; border-right: 1px solid" />Back when I used to work in Minitab Tech Support, customers often asked me, “What’s the difference between Cpk and Ppk?” It’s a good question, especially since many practitioners default to using Cpk while overlooking Ppk altogether. It’s like the '80s pop duo Wham!, where Cpk is George Michael and Ppk is that other guy.</p>
<p>Poofy hairdos styled with mousse, shoulder pads, and leg warmers aside, let’s start by defining rational subgroups and then explore the difference between Cpk and Ppk.</p>
<strong>Rational Subgroups</strong>
<p>A rational subgroup is a group of measurements produced under the same set of conditions. <a href="http://www.qualitydigest.com/inside/twitter-ed/importance-proper-spc-subgroup-sampling-technique.html">Subgroups</a> are meant to represent a snapshot of your process. Therefore, the measurements that make up a subgroup should be taken from a similar point in time. For example, if you sample 5 items every hour, your subgroup size would be 5.</p>
<strong>Formulas, Definitions, Etc.</strong>
<p>The goal of capability analysis is to ensure that a process is capable of meeting customer specifications, and we use capability statistics such as Cpk and Ppk to make that assessment. If we look at the formulas for Cpk and Ppk for normal (distribution) process capability, we can see they are nearly identical:</p>
<p><img alt="" src="https://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/23b1ac9902a3e8d40e80afd432e81728/cpk_ppk_formulas_w640.png" style="border-bottom: 2px solid; border-left: 2px solid; width: 500px; height: 139px; border-top: 2px solid; border-right: 2px solid" /></p>
<p>The only difference lies in the denominator for the Upper and Lower statistics: Cpk is calculated using the WITHIN standard deviation, while Ppk uses the OVERALL standard deviation. Without boring you with the details surrounding the formulas for the standard deviations, think of the within standard deviation as the average of the subgroup standard deviations, while the overall standard deviation represents the variation of all the data. This means that:</p>
<strong>Cpk:</strong>
<ul>
<li>Only accounts for the variation WITHIN the subgroups</li>
<li>Does not account for the shift and drift between subgroups</li>
<li>Is sometimes referred to as the <em>potential </em>capability because it represents the potential your process has at producing parts within spec, presuming there is no variation between subgroups (i.e. over time)</li>
</ul>
Ppk:
<ul>
<li>Accounts for the OVERALL variation of all measurements taken</li>
<li>Theoretically includes both the variation within subgroups and also the shift and drift between them</li>
<li>Is where you are at the end of the proverbial day</li>
</ul>
Examples of the Difference Between Cpk and Ppk
<p>For illustration, let's consider a <a href="//cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/File/f48e0364d0307b1f9e301d5c855e8ebc/cpk_vs_ppk_data.MTW">data set</a> where 5 measurements were taken every day for 10 days.</p>
<strong>Example 1 - Similar Cpk and Ppk</strong>
<p><img alt="similar Cpk and Ppk" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/1e5abc0b6276898999e52ff2e1b36803/similar_cpk_ppk.png" style="width: 744px; height: 175px;" /></p>
<p>As the graph on the left side shows, there is not a lot of shift and drift between subgroups compared to the variation within the subgroups themselves. Therefore, the within and overall standard deviations are similar, which means Cpk and Ppk are similar, too (at 1.13 and 1.07, respectively).</p>
<strong>Example 2 - Different Cpk and Ppk</strong>
<p><img alt="different Cpk and Ppk" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/d5ab0c5f12cde5fdc8f046bae771bb2d/different_cpk_ppk.png" style="width: 744px; height: 166px;" /></p>
<p>In this example, I used the same data and subgroup size, but I shifted the data around, moving it into different subgroups. (Of course we would never want to move data into different subgroups in practice – I’ve just done it here to illustrate a point.)</p>
<p>Since we used the same data, the overall standard deviation and Ppk did not change. But that’s where the similarities end.</p>
<p>Look at the Cpk statistic. It’s 3.69, which is much better than the 1.13 we got before. Looking at the subgroups plot, can you tell why Cpk increased? The graph shows that the points within each subgroup are much closer together than before. Earlier I mentioned that we can think of the within standard deviation as the average of the subgroup standard deviations. So less variability <em>within </em>each subgroup equals a smaller within standard deviation. And that gives us a higher Cpk.</p>
<strong>To Ppk or Not to Ppk</strong>
<p>And here is where the danger lies in only reporting Cpk and forgetting about Ppk like it’s George Michael’s lesser-known bandmate (no offense to whoever he may be). We can see from the examples above that Cpk only tells us part of the story, so the next time you <span><a href="http://blog.minitab.com/blog/marilyn-wheatleys-blog/high-cpk-and-a-funny-looking-histogram-is-my-process-really-that-amazing">examine process capability</a></span>, consider both your Cpk and your Ppk. And if the process is stable with little variation over time, the two statistics should be about the same anyway.</p>
<p>(Note: It is possible, and okay, to get a Ppk that is larger than Cpk, especially with a subgroup size of 1, but I’ll leave explanation for another day.)</p>
Capability AnalysisQuality ImprovementFri, 12 Aug 2016 12:00:00 +0000http://blog.minitab.com/blog/michelle-paret/process-capability-statistics-cpk-vs-ppkMichelle ParetWhen Should You Mistrust Statistics?
http://blog.minitab.com/blog/understanding-statistics/when-should-you-mistrust-statistics
<p>Figures lie, so they say, and liars figure. A recent post at Ben Orlin's always-amusing mathwithbaddrawings.com blog nicely encapsulates why so many people feel wary about <em>anything</em> related to statistics and data analysis. Do <a href="http://mathwithbaddrawings.com/2016/07/13/why-not-to-trust-statistics/" target="_blank">take a moment to check it out</a>, it's a fast read.</p>
<p><img alt="ask about the mean" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/8fc29ffef41c87b6f279a8092ae09a0d/ask_about_the_mean.gif" style="margin: 10px 15px; float: right; width: 238px; height: 273px;" />In all of the scenarios Orlin offers in his post, the statistical statements are completely accurate, but the person offering the statistics is committing a lie of omission by not putting the statement in context. Holding back critical information prevents an audience from making accurate assessment of the situation.</p>
<p>Ethical data analysts know better.</p>
<p>Unfortunately, unethical data analysts know <a href="https://www.theguardian.com/science/2016/jul/17/politicians-dodgy-statistics-tricks-guide">how to spin outcomes</a> to put them in the most flattering, if not the most direct, light. Done deliberately, that's the sort of behavior that leads many people to mistrust statistics completely.</p>
Lessons for People Who Consume Statistics
<p>So, where does this leave us as consumers of statistics? <em>Should </em>we mistrust statistics? The first question to ask is whether we trust the people who deliver statistical pronouncements. I believe most people try to do the right thing.</p>
<p>However, we all know that it's easy—all <em>too </em>easy—for humans to <a href="http://blog.minitab.com/blog/understanding-statistics/lessons-from-a-statistical-analysis-gone-wrong-part-3-v2">make mistakes</a>. And since statistics can be confusing, and not everyone who wants or needs to analyze data is a trained statistician, great potential exists for erroneous conclusions and interpretive blunders.</p>
<p>Bottom line: whether their intentions are good or bad, people often cite statistics in ways that may be statistically correct, but practically misleading. So how can you avoid getting fooled?</p>
<p>The solution is simple, and it's one most statisticians internalized long ago, but doesn't necessarily occur to people who haven't spent much time in the data trenches:</p>
<p><strong><em>Always </em>look at the underlying distribution of the data. </strong></p>
<p>Especially if the statistic in question pertains to something extremely important to you—like mean salary at your company, for example—ask about the distribution of the data if those details aren't volunteered. If you're told the mean or median as a number, are you also given a <a href="http://blog.minitab.com/blog/michelle-paret/using-the-mean-its-not-always-a-slam-dunk">histogram, boxplot, or individual value plot that lets you see how the data are arranged</a>? My colleague Michelle Paret wrote an excellent post about this. </p>
<p>If someone is trying to keep the distribution of the data a mystery, then the ultimate <em>meaning </em>of parameters like mean, median, or mode is also unknown...and your mistrust is warranted.</p>
Lessons for People Who Produce Statistics
<p>As purveyors and producers of statistics, who need to communicate results with people who aren't statistically savvy, what lessons can we take from this? After reading the Math with Bad Drawings blog, I thought about it and came up with two rules of thumb.</p>
<p><strong>1. Don't use statistics to obscure or deflect attention from a situation.</strong></p>
<p>Most people do not deliberately set out to distort the truth or mislead others. <span style="line-height: 20.8px;">Most people would never use the mean to support one conclusion when they know the median supports a far different story.</span><span style="line-height: 20.8px;"> </span><span style="line-height: 20.8px;">Our conscience rebels when we set out to deceive others.</span><span style="line-height: 20.8px;"> </span><span style="line-height: 1.6;">I'm usually willing to ascribe even </span><a href="http://blog.minitab.com/blog/understanding-statistics/imprisoned-by-statistics%3A-how-poor-data-collection-and-analysis-convicted-an-innocent-nurse" style="line-height: 1.6;">the most horrendous analysis</a><span style="line-height: 1.6;"> to gross incompetence rather than outright malice. On the other hand, I've read far too many papers and reports that torture language to </span><a href="http://blog.minitab.com/blog/understanding-statistics/what-can-you-say-when-your-p-value-is-greater-than-005" style="line-height: 1.6;">mischaracterize statistical findings</a><span style="line-height: 1.6;">.</span></p>
<p>Sometimes we don't get the outcomes we expected. <span style="line-height: 20.8px;">Statisticians aren't responsible for what the data show—but we are responsible for making sure we've performed appropriate analyses, satisfied checks and assumptions, and that we have </span><a href="http://blog.minitab.com/blog/understanding-statistics/the-single-most-important-question-in-every-statistical-analysis" style="line-height: 20.8px;">trustworthy data</a><span style="line-height: 20.8px;">. </span><span style="line-height: 1.6;">It should go without saying that we are ethically compelled to report our results honestly, and... </span></p>
<p><strong>2. Provide all of the information the audience needs to make informed decisions.</strong></p>
<p>When we present the results of an analysis, we need to be thorough. We need to offer all of the information and context that will enable our audience to reach confident conclusions. We need to use straightforward language that helps people tune in, and avoid jargon that makes listeners turn off.</p>
<p>That doesn't mean that every presentation we make needs to be laden with formulas and extended explanations of probability theory; often the bottom line is all a situation requires. When you're addressing experts, you don't need to cover the introductory material. But if we suspect an audience needs some background to fully appreciate the results of an analysis, we should provide it. </p>
<p>There are many approaches to communicating statistical results clearly. One of the easiest ways to present the full context of an analysis in plain language is to use <a href="http://www.minitab.com/products/minitab/assistant/">the Assistant in Minitab</a>. As many expert statisticians have told us, the Assistant doesn't just guide you through an analysis, it also explains the output thoroughly and without resorting to jargon.</p>
<p>And when statistics are clear, they're easier to trust.</p>
<p> </p>
<p><em>Bad drawing by Ben Orlin, via <a href="http://www.mathwithbaddrawings.com">www.mathwithbaddrawings.com</a></em></p>
<p> </p>
StatisticsStatistics in the NewsStatsWed, 10 Aug 2016 12:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/when-should-you-mistrust-statisticsEston MartzCorrelation: What It Shows You (and What It Doesn't)
http://blog.minitab.com/blog/starting-out-with-statistical-software/correlation%3A-what-it-shows-you-and-what-it-doesnt
<p><span style="line-height: 1.6;">Often, when we start analyzing new data, one of the very first things we look at is whether certain pairs of variables are correlated. </span><span style="line-height: 20.8px;">Correlation can tell if two variables have a linear relationship, and the strength of that relationship. </span><span style="line-height: 1.6;">This makes sense as a starting point, since we're usually looking for relationships and correlation is an easy way to get a quick handle on the data set we're working with. </span></p>
<div style="float: right; width: 405px; margin: 15px 0px 15px 20px;"><img alt="sharks and ice cream" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/6c9a2d97fb95cc34172430f71ae718ab/sharks_ice_cream.jpg" style="width: 400px; height: 216px; margin: 10px 15px; float: right;" /> <em>We'll talk about the correlation between these two factors later in the post.</em></div>
<span style="line-height: 1.6;">What Is Correlation?</span>
<p><span style="line-height: 1.6;">How do we define <span><a href="http://blog.minitab.com/blog/understanding-statistics/no-matter-how-strong-correlation-still-doesnt-imply-causation">correlation</a></span>? </span><span style="line-height: 20.8px;">We can think of it in terms of a simple question: when X increases, what does Y tend to do? In general, if Y tends to increase along with X, there's a positive relationship. If Y decreases as X increases, that's a negative relationship. </span></p>
<p><span style="line-height: 1.6;">Correlation is defined numerically by a <em>correlation coefficient</em>. This is a value that takes a range from -1 to 1. A coefficient of -1 is perfect negative linear correlation: a straight line trending downward. A +1 coefficient is, conversely, perfect positive linear correlation. A correlation of 0 is no linear correlation at all. </span></p>
<p><span style="line-height: 1.6;">Making a scatterplot in <a href="http://www.minitab.com/products/minitab/">Minitab</a> can give you a quick visualization of the correlation between variables, and you can get the correlation coefficient by going to <strong>Stat > Basic Statistics > Correlation...</strong> Here's a few examples of data sets that a correlation coefficient can accurately assess. </span></p>
<p style="margin-left: 40px;"><span style="line-height: 1.6;"><img alt="+ corr" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/732ead34-1005-4470-b034-d7f8b87fabcf/Image/fa27c0ba9cbc4d5bf213c273cd109ca4/scatterplot_of_c3_vs_c4.jpg" style="width: 800px; height: 533px;" /></span></p>
<p>This graph shows a positive correlation of 0.7; close to 1. As you can see from the scatterplot, it's a fairly strong linear relationship. As the values of X tend to increase, Y tends to increase as well. Below is a similar plot, but here the relationship shows a negative direction.</p>
<p style="margin-left: 40px;"><img alt="nega" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/732ead34-1005-4470-b034-d7f8b87fabcf/Image/5ffb305c6102329bb03e6c7fb06982b5/negative.jpg" style="width: 800px; height: 533px;" /></p>
Correlation's Limits
<p>However, there are some drawbacks and limitations to simple linear correlation. A correlation coefficient can only tell whether your two variables have a linear relationship. Take, for example, the following chart, which has a correlation coefficient of about 0; we can pretty easily see that there isn't much of a relationship at all:</p>
<p style="margin-left: 40px;"><img alt="norel" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/732ead34-1005-4470-b034-d7f8b87fabcf/Image/35b15c026f4a692279663cdc28d0014c/no_relat.jpg" style="width: 800px; height: 533px;" /></p>
<p>However, now take a look at this graph, in which there is an obvious relationship, but not a linear one. Notice that the correlation coefficient is also 0 in this case:</p>
<p style="margin-left: 40px;"><img alt="nonl" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/732ead34-1005-4470-b034-d7f8b87fabcf/Image/903ab33f31dc3f4bd3f2fc71b6adc5f5/nonline.jpg" style="width: 800px; height: 533px;" /></p>
<p>This is what you have to keep in mind when interpreting correlations. <span style="line-height: 20.8px;">The correlation coefficient will only detect </span><em style="line-height: 20.8px;">linear</em><span style="line-height: 20.8px;"> relationships. </span><span style="line-height: 1.6;">Just because the correlation coefficient is near 0, it doesn't mean that there isn't </span><em style="line-height: 1.6;">some</em><span style="line-height: 1.6;"> type of relationship there. </span></p>
<p>The other thing to remember is something most of us hear soon after we begin exploring data—that correlation does not imply causation. Just because X and Y are correlated in some way does not mean that X <em>causes</em> a change in Y, or vice versa.</p>
<p>Here's my favorite example for this. If we look at two variables, shark attacks and ice cream sales, we know intuitively that there's no way one variable has a cause-and-effect impact on the other. However, both shark attacks and ice cream sales will have greater numbers in summer months, so they will be strongly correlated with each other. Be careful not to fall into this trap with your data!</p>
<p>Correlation has a lot of benefits, and it is still a good starting point in a number of different cases, but it's important to know its limitations as well. </p>
Data AnalysisStatisticsStatistics HelpStatsMon, 08 Aug 2016 12:08:00 +0000http://blog.minitab.com/blog/starting-out-with-statistical-software/correlation%3A-what-it-shows-you-and-what-it-doesntEric HeckmanAnalyzing the History of Olympic Events with Time Series
http://blog.minitab.com/blog/the-statistics-game/analyzing-the-history-of-olympic-events-with-time-series
<p><img alt="Olympics" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/a8081eeca606f0a3351825d49270d062/olympics.jpg" style="width: 320px; height: 226px; float: right;" />The Olympic games are about to begin in Rio de Janeiro. Over the next 16 days, more than 11,000 athletes from 206 countries will be competing in 306 different events. That's the most events ever in any Olympic games. It's almost twice as many events as there were 50 years ago, and exactly three times as many as there were 100 years ago.</p>
<p>Since the number of Olympic events has changed over time, this makes it a great data set for a <a href="http://support.minitab.com/en-us/minitab/17/topic-library/modeling-statistics/time-series/basics/what-is-a-time-series/" target="_blank">time series analysis</a>.</p>
<p>A time series is a sequence of observations over regularly spaced intervals of time. The first step when analyzing time series data is to create a time series plot to look for trends and seasonality. A trend is a long-term tendency of a series to increase or decrease. Seasonality is the periodic fluctuation in the time series within a certain period—for example, sales for a store might increase every year in November and December. Here is a time series plot of the number of Olympic events since 1896.</p>
<p style="margin-left: 40px;"><img alt="Time Series Plot" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/51ef7f79b2b625a967eaa72a402c437a/histogram_of_olympic_events.jpg" style="width: 576px; height: 384px;" /></p>
<p>There is clearly an upward trend, but no seasonal pattern. The data is also a little choppy at the beginning. Part of the explanation is that the data points are not evenly spaced. Most Olympic games are 4 years apart, but a few of them are just 2 years apart, and during World War I and World War II there were 8-year and 12-year gaps, respectively. Since <a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/time-series-plots-theres-gold-in-them-thar-hills">time series data</a> should be evenly spaced over time, we'll only look at data from 1948 on, when the Olympics started being held every 4 years without any interruptions.</p>
<p style="margin-left: 40px;"><img alt="Time Series Plot" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/f77b40bf156b5fc850fc6b96881d8808/time_series_plot_of_olympic_events.jpg" style="width: 576px; height: 384px;" /></p>
<p>Now that we have an evenly spaced series that clearly exhibits a trend, we can use a trend analysis in Minitab Statistical Software to model the data. With a trend analysis, you can use four different types of models: linear, quadratic, exponential growth, and s-curve. We'll analyze our data using both the linear and s-curve models. An additional time series analysis you can use when your data exhibit a trend is double exponential smoothing, so we'll use that method too. </p>
<p style="margin-left: 40px;"><img alt="Trend Analysis" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/fcf72a68cbdcce5a5a6284f5e5581a48/trend_analysis_linear.jpg" style="width: 576px; height: 384px;" /></p>
<p style="margin-left: 40px;"><img alt="Trend Analysis" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/dc6786e4cf82be102d53d42eba48c2fd/trend_analysis_s_curve.jpg" style="width: 576px; height: 384px;" /></p>
<p style="margin-left: 40px;"><img alt="Double Exponential Smoothing" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/a1f20460303b913fcb75a67b41b05eea/double_exponential_smoothing_plot_for_olympic_events.jpg" style="width: 576px; height: 384px;" /></p>
<p>You can use the accuracy measures (MAPE, MAD, MSD) to compare the fits of different time series models. For all three of these statistics, smaller values usually indicate a better-fitting model. If a single model does not have the lowest values for all three statistics, MAPE is usually the preferred measurement.</p>
<p>For the time series of olympic event data, the s-curve model has the lowest values of MAPE and MAD, while the double exponential smoothing method has the lowest value for MSD. Based on the "MAPE breaks all ties" guideline, it appears that the s-curve model is the one we want to use.</p>
<p>However, accuracy measures shouldn't be the sole criteria you use to select a model. It's also important to examine the fit of the model, especially at the end of the series. And if the last 5 Olympics are any indication, it appears that the trend of adding large quantities of events to the Olympic Games is coming to an end. In the last 16 years, only 6 events have been added.</p>
<p>The double exponential smoothing model appears to have adjusted for this change, whereas the two trend analysis models have not. Given this additional consideration, the double exponential smoothing model is the one we should pick, especially if we want to use it to forecast future observations.</p>
<p>And now that we've settled on a model, we can sit back, relax, and watch all 918 medals be won. Let the games begin!</p>
<p> </p>
Fun StatisticsStatisticsStatistics in the NewsFri, 05 Aug 2016 12:00:00 +0000http://blog.minitab.com/blog/the-statistics-game/analyzing-the-history-of-olympic-events-with-time-seriesKevin RudyAnalyzing the Jaywalking Habits of New England Wildlife
http://blog.minitab.com/blog/data-analysis-and-quality-improvement-and-stuff/analyzing-the-jaywalking-habits-of-new-england-wildlife
<p>My recent beach vacation began with the kind of unfortunate incident that we all dread: killing a distant relative. </p>
<p>It was about 3 a.m. Me, my two sons, and our dog had been on the road since about 7 p.m. the previous day to get to our beach house on Plum Island, Massachusetts. Google maps said our exit was coming up and that we were only about 15 minutes away from our palace. Buoyed by that projection, I sat a little taller in my seat.</p>
<p><em>Is that the salty sea air filling my nostrils?</em> I thought to myself. <em style="line-height: 1.6;">Is that a refreshing ocean breeze cooling the air?</em></p>
<p>And then:</p>
<p><em>Is that a f—</em><strong>thumpity bump bump bump</strong>—<em>ox that just disappeared under my car?!<img alt="fox" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/575d15b5218b039455c7f2c8aac31cf8/fox.jpg" style="width: 300px; height: 232px; margin: 10px 15px; float: right;" /></em></p>
<p>"I think that was a fox, dad." My son answered my question before I could ask it.</p>
<p>"That's what I thought, too. Darn. Kind of ironic. And not in a good way."</p>
<p>"Yeah, way to go, dad," my other son added. </p>
<p>Everyone's a critic.</p>
<p>The irony is that my last name is Fox. And I've always kind of identified with the handsome, intelligent, and resourceful creatures. I couldn't feel too bad though; there was nothing I could have done about it. The poor critter had been sprinting across the highway. No sooner had its small frame popped into the glow of my headlamps, then it had disappeared into the empty void under our feet. Oh well, at least for him it was over quickly. And at least we were almost to the beach. </p>
<p>Before I could ponder this potential omen too long, we came to our exit. There, in the middle of the exit ramp, were the 2-dimensional remains of what looked to have been, in life, another fox. Apparently, we were traveling through an area of dense foxes. By which I mean that there was a high density of foxes in the area, not that the foxes in the area were highly dense. Although, truth be told, I was feeling a little dense myself at the time. Did I mention that it was now 3 a.m.? </p>
<p>We continued onto a back-country road that Google maps promised would lead us to our beach house. <em>Is that a marsh off the left just up ahead?</em> I thought to myself. <em><span style="line-height: 1.6;">Are those sea grasses waving in the gentle breeze?</span></em></p>
<p><em>Is that the oil light glowing on my console?</em></p>
<p>"Oh crap."</p>
<p>Stepping out of the car, I noticed the smells of sea air and motor oil mingling with <span style="line-height: 1.6;">the scent of the forest. I had hoped that the warning light was just an electrical glitch. However, a casual inspection confirmed that the oil that should be inside the engine, had been working it's way outside of the engine, where it is considerably less effective. I was reminded of the words of a noted transportation engineer, "If I push 'er any further cap'n, the engine's gonna blow!" </span></p>
<p>This sentiment was echoed by the tow truck driver as well. As he descended from his cab to assess the scene, he exclaimed "You left quite a trail. Can't be much oil left in that engine." I told him what had happened. He scratched his chin and asked, "Did you say a fox? That's funny because I towed a customer last week who hit a fox in a rental car. Busted the oil filter. What do you know?" </p>
<p>As we stood by the side of the road waiting for our taxi, dawn's first light broke slowly over the marsh, the birds began singing to greet the new day, and the mosquitoes worked persistently to move sizable quantities of blood from inside of our bodies to outside of our bodies. Where it is considerably less effective. Even so, it was kind of a nice moment. Moved by a surprising sense of peace, I turned to my sons.</p>
<p>"I think I know what this all means. I think that perhaps my spirit animal appeared in physical form to test me. To remind me that—to a large extent—happiness is a choice. And if I allow circumstance to rob me of my happiness, that, too, is a choice."</p>
<p>"Spirit animal, huh?" As he spoke I could actually hear my son's eyes rolling back in his head.</p>
<p>My other son chimed in, "If he wasn't a spirit before, he is now."</p>
<p>Everyone's a critic. </p>
<p>The rest of our vacation went swimmingly. (Pun intended.) In the end, the momentary hassle and added expense of the incident didn't detract at all from our enjoyment of the trip. However, I was curious about the confluence of jaywalking wildlife, so I started doing a little research and learned that some states are actively collecting data on such accidents. I found that Massachusetts has a web page where you can report animal collisions, so I contributed my data for the cause.</p>
<p>I also found out that California and Maine actually enlist and train "<a href="http://blog.minitab.com/blog/statistics-and-quality-improvement/how-to-analyze-like-a-citizen-data-scientist-in-flint">citizen scientists</a>" to peruse roadways in a coordinated effort to determine where animals are most frequently hit, and what kinds of animals are hit in each location. This is important data, because animal crossings represent a significant hazard to motorists and wildlife alike. Knowing what kinds of animals are frequently hit in different locations can help authorities focus efforts to introduce culverts, bridges, and other means of safe passage for critters so they can get where they need to go safely, without venturing onto the black top.</p>
<p>You can read the details of a three-year Maine study and explore an interactive map on the <a href="http://maineaudubon.org/wildlife-habitat/wildlife-road-watch/" target="_blank">Maine Audubon web site</a>. I thought it might be interesting to create a few graphs in <a href="http://www.minitab.com/products/minitab/">Minitab Statistical Software</a> to bring the roadkill data to life, so to speak. (Pun intended. Ill-advised, perhaps, but intended.) </p>
<p><span style="line-height: 1.6;">The first thing I noticed was that collisions with foxes are definitely not that unusual. The following bar chart shows the number of each species found during the data collection. </span></p>
<p style="margin-left: 40px;"><img alt="Bar chart of counts by species" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/6b21e6f4c1867d659d460ed5d3436843/bar_chart_species.jpg" style="width: 501px; height: 335px;" /></p>
<p>The web site also gives data for whether the animals found during data collection were alive or dead. As this stacked bar chart makes clear, animals with wings fare much better than earthbound critters when they encounter an automobile.</p>
<p style="margin-left: 40px;"><img alt="Stacked bar chart by animal group" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/f8453349d49984a2ea5790e6aa15e997/stacked_bar_chart_by_group.jpg" style="width: 450px; height: 301px;" /></p>
<p>The same trend is clear in this pie chart. The red slice in each pie shows the proportion of animals that survived the encounter. For birds, the red slice is much bigger than the blue slice. </p>
<p style="margin-left: 40px;"><img alt="Pie chart of dead vs. live by group" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/31109f9f0bc72820a1c5b80eece285e9/pie_chat_dead_by_grp.jpg" style="width: 519px; height: 347px;" /></p>
<p>Next time I encounter a spirit animal, or any animal on the road, I hope it has wings. </p>
Data AnalysisStatisticsWed, 03 Aug 2016 12:00:00 +0000http://blog.minitab.com/blog/data-analysis-and-quality-improvement-and-stuff/analyzing-the-jaywalking-habits-of-new-england-wildlifeGreg FoxHave You Accidentally Done Statistics?
http://blog.minitab.com/blog/statistics-and-quality/have-you-accidentally-done-statistics
<p>Have you ever accidentally done statistics? Not all of us can (or would want to) be “stat nerds,” but the word “statistics” shouldn’t be scary. In fact, we all analyze things that happen to us every day. Sometimes we don’t realize that we are compiling data and analyzing it, but that’s exactly what we are doing. Yes, there are advanced statistical concepts that can be difficult to understand—but there are many concepts that we use every day that we don’t realize are statistics.</p>
<p>I consider myself a student of baseball, so my example of unknowingly performing statistical procedures concerns my own experiences playing that game.</p>
<p>My baseball career ended as a 5’7” college freshman walk-on. When I realized that my ceiling as a catcher was a lot lower than my 6’0”-6’5” teammates I hung up my spikes. As an adult, while finishing my degree in Business Statistics, I had the opportunity to shadow a couple of scouts from the Major League Baseball Scouting Bureau. Yes, I’ve seen <a href="http://blog.minitab.com/blog/the-statistics-game/moneyball-shows-the-power-of-statistics"><em>Moneyball </em></a>and I know that traditional scouting methods are reputed to conflict with the methods of stat nerds like myself, but as a former player I wanted to see what these scouts were looking at. </p>
<p><img alt="baseball statistics" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/076e1f8132a222e6204e393eb0d3e9a2/baseball_stats.jpg" style="width: 278px; height: 313px; margin: 10px 15px; float: right;" />My first day with the scouts I found out they were traditional baseball guys. They didn’t believe data could tell how good a player is better than observation could, and ultimately they didn't think statistics were important to what they do. </p>
<p>I found their thinking to be a little off, and a little funny. Although they didn’t believe in statistics, the tools they use for their jobs actually quantify a player's attributes. I watched as they used a radar gun to measure pitch speed, a stopwatch to measure running speed, and a notepad to record their measurements (they didn’t realize they were compiling data). As one of the scouts was conversing with me, asking how statistics are going to be brought into baseball, he was making a dot plot by hand of the pitcher's pitches by speed to find the velocity distribution of the pitcher.</p>
<p style="margin-left: 40px;"><img height="343" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/b51a0c86-e2dd-456e-878a-4196c7381c3a/File/8361f15f80b379a88187b539c124cad0/8361f15f80b379a88187b539c124cad0.png" width="514" /></p>
<p>After I explained to him that was unknowing creating a dot plot (like the one I created for Rasiel Iglesias using Minitab, and which has a <a href="http://support.minitab.com/minitab/17/topic-library/basic-statistics-and-graphs/summary-statistics/measures-of-central-tendency/">bimodal distribution</a>) we started talking about grading players’ skills. The scouts would grade how players hit, their power, how they run, arm strength, and fielding ability. They used a numeric grading system from 20-80 for each of the characteristics, with 20 being the lowest, 50 being average, and 80 being elite. After they compiled this data they would give the players grades through analysis, and they would create a report with these grades to convey to others what they saw in the player.</p>
<p style="margin-left: 40px;"><img height="401" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/b51a0c86-e2dd-456e-878a-4196c7381c3a/File/a57bd643816872de2fee895f303c0ddc/a57bd643816872de2fee895f303c0ddc.png" width="602" /></p>
<p>I was amazed at how these scouts—true, old-school baseball guys who said stats weren’t important for their jobs—were compiling data and analyzing it for their reports. </p>
<p>A few of the other statistical ideas the scouts were (accidentally) concerned about included the sample size of observations of a player, comparison analysis, and predicting a where a player falls within their physical development (regression).</p>
<p>Like the baseball scouts, many of us are unwittingly doing statistics. Just like these scouts, we run into data all day long without recognizing that we can compile and analyze it. In work we worry about customer satisfaction, wait time, average transaction value, cost ratios, efficiency, etc. And while many people get intimidated when we use the word "statistics," we don’t need advanced degrees to embrace observing, compiling data, and making solid decisions based on our analysis.</p>
<p>So, are <em>you </em>accidentally doing statistics? If you are wanting to get beyond accidentally doing statistics and analyze a little more deliberately, Minitab has many tools like the <a href="http://www.minitab.com/products/minitab/assistant/">Assistant menu</a>, and Stat Guide to help you on your stats journey.</p>
Data AnalysisFun StatisticsHypothesis TestingStatisticsStatistics in the NewsStatsTue, 02 Aug 2016 12:00:00 +0000http://blog.minitab.com/blog/statistics-and-quality/have-you-accidentally-done-statisticsJoseph HartsockAll About Run Charts
http://blog.minitab.com/blog/real-world-quality-improvement/all-about-run-charts
<p>I blogged a few months back about three different Minitab tools you can use to <a href="http://blog.minitab.com/blog/real-world-quality-improvement/3-ways-to-examine-data-over-time" target="_blank">examine your data over time</a>. Did you know you that you can also use a simple run chart to display how your process data changes over time? Of course those “changes” could be evidence of special-cause variation, which a run chart can help you see.</p>
What’s special-cause variation, and how’s it different from common-cause variation?
<p>You know that variation occurs in all processes, and common-cause is just that—a natural part of any process. Special-cause variation comes from outside the system and causes recognizable patterns, shifts, or trends in the data. A run chart shows graphically whether special causes are affecting your process.</p>
<p>A process is in control when special causes of variation have been eliminated.</p>
How can I create a run chart in Minitab?
<p>It’s easy! Follow along with this example:</p>
<p>Suppose you want to be sure the widgets your company makes are within the correct weight specifications requested by your customer. You’ve collected a data set that contains weight measurements from the injection molding process used to create the widgets (<em>Open the worksheet WEIGHT.MTW that’s included with Minitab’s sample data sets—in Minitab 17.3, open <strong>Help > Sample Data</strong></em>).</p>
<p>To evaluate the variation in weight measurements, you create a run chart in Minitab:</p>
<ol>
<li>Choose <strong>Stat > Quality Tools > Run Chart</strong></li>
<li>In <strong>Single column</strong>, enter <em>Weight</em></li>
<li>In <strong>Subgroup size</strong>, enter 1. Click <strong>Ok</strong>.</li>
</ol>
<p>Here’s what Minitab creates for you:</p>
<p><img alt="Minitab Run Chart" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/2fdb7dc188ca1d11b4694b28ab7d6c9a/run_chart.jpg" style="border-width: 1px; border-style: solid; width: 475px; height: 317px;" /></p>
<p>*Note that Minitab plots the value of each data point in the order that they were collected and draws a horizontal reference line at the median.</p>
What does my run chart tell me about my data?
<p>You can examine the run chart to see if there are any obvious patterns, but Minitab includes two tests for randomness that provide information about non-random variation due to trends, oscillation, mixtures, and clustering in your data. Such patterns indicate that the variation observed is due to special-cause variation.</p>
<p>In the example above, because the approximate p-values for clustering, mixtures, trends, and oscillation are all greater than the significance level of 0.05, there’s no indication of special-cause variation or non-randomness. The data appear to be randomly distributed with no temporal patterns, but to be certain, you should examine the tests for runs about the median and runs up or down. However, it looks as if the variation in widget weights will be acceptable to your customer.</p>
Tell me more about these nonrandom patterns that can be identified by a run chart …
<p>There are four basic patterns of nonrandomness that a run chart will detect—mixture, cluster, oscillating, and trend patterns.</p>
<p>A mixture is characterized by an absence of points near the center line:</p>
<p style="margin-left: 40px;"><img alt="http://support.minitab.com/en-us/minitab/17/runchart_mixture.png" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/30fa7153c35060bbc2052b15bd185a12/mixture.jpg" style="border-width: 0px; border-style: solid; width: 160px; height: 106px;" /></p>
<p>Clusters are groups of points in one area of the chart:</p>
<p style="margin-left: 40px;"><img alt="http://support.minitab.com/en-us/minitab/17/runchart_cluster.png" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/3747f096223c04fa4ab94211b1313c88/cluster.jpg" style="border-width: 0px; border-style: solid; width: 160px; height: 106px;" /></p>
<p>Oscillation occurs when the data fluctuates up and down:</p>
<p style="margin-left: 40px;"><img alt="http://support.minitab.com/en-us/minitab/17/runchart_oscillation.png" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/9ade689c96679266debf736dca994ba2/osci.jpg" style="border-width: 0px; border-style: solid; width: 160px; height: 106px;" /></p>
<p>A trend is a sustained drift in the data, either up or down:</p>
<p style="margin-left: 40px;"><img alt="http://support.minitab.com/en-us/minitab/17/runchart_trend.png" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/fbc9c62b96f14792622c6185fb115adc/trend.jpg" style="border-width: 0px; border-style: solid; width: 160px; height: 106px;" /></p>
<p>To learn more about what these patterns can tell you about your data, visit <a href="http://support.minitab.com/en-us/minitab/17/topic-library/quality-tools/quality-tools/run-chart-basics/" target="_blank">run chart basics</a> on Minitab 17 Support. </p>
Lean Six SigmaLearningProject ToolsStatisticsMon, 01 Aug 2016 12:00:00 +0000http://blog.minitab.com/blog/real-world-quality-improvement/all-about-run-chartsCarly BarryModel Fit: Don't be Blinded by Numerical Fundamentalism
http://blog.minitab.com/blog/statistics-and-quality-data-analysis/model-fit%3A-dont-be-blinded-by-numerical-fundamentalism
<p>Statistics is all about modelling. <span style="line-height: 1.6;">But that doesn’t mean strutting down the catwalk with a pouty expression. </span></p>
<p>It means we’re often looking for a mathematical form that best describes relationships between variables in a population, which we can then use to estimate or predict data values, based on known probability distributions.</p>
<p>To aid in the search and selection of a “top model,” we often utilize calculated indices for model fit.</p>
<p>In a time series trend analysis, for example, mean absolute percentage error (MAPE) is used to compare the fit of different time series models. Smaller values of MAPE indicate a better fit.<br />
<br />
You can see that in the following two trend analysis plots:</p>
<p style="margin-left: 40px;"><img alt="low Mape" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/8790e6ca562a37a0585c3dd48ad9ae51/trend_analysis_plot_for_number_low_mape.jpg" style="width: 1008px; height: 641px;" /></p>
<p style="margin-left: 40px;"><img alt="high MAPE" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/86ecb297078661b2448f275926a09907/trend_analysis_plot_for_number_high_mape.jpg" style="width: 1003px; height: 654px;" /></p>
<p>The MAPE value is much lower in top plot for Model A (9.37) than it is for the bottom plot with Model B (24.84). So Model A fits its data better than Model B fits its dat—ah…er, wait…that doesn’t seem right.. I mean… Model B <em>looks</em> like a closer fit, doesn’t it…hmmm…do I have it backwards…what the...???</p>
Step back from the numbers!
<p>Statistical indices for model fit can be great tools, but they work best when interpreted<span style="line-height: 1.6;"> using a broad, flexible attitude, rather than a narrow, dogmatic approach. Here are a few tips to make sure you're getting the big picture:</span><span style="line-height: 1.6;"> </span></p>
<ul>
<li>
Look at your data
</li>
</ul>
<p>No, don't just look. Gaze lovingly. Stare rudely. Peer penetratingly. Because it's too easy to get carried away by calculated stats. If you graphically examine your data carefully, you can make sure that what you see, on the graph, is what you get, with the statistics. Looking at the data for these two trend models, you <em>know</em> the MAPE value isn’t telling the whole story.</p>
<ul>
<li>
Understand the metric
</li>
</ul>
<p>MAPE measures the absolute percentage error in the model. To do that, it divides the absolute error of the model by the actual data values. Why is that important to know? If there are data values close to 0, dividing by those very small fractional values greatly inflates the value of MAPE.</p>
<p>That’s what’s going on in Model B. To see this, look what happens when you add 200 to each value in the data set for Model B—and fit the same model. </p>
<p style="margin-left: 40px;"><img alt="MAPE lowest" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/dd91f6dfcd96a7e7b980dc3a361787f8/trend_plot_lowest_mape.jpg" style="width: 1005px; height: 638px;" /></p>
<p>Same trend, same fit, but now the absolute percentage of error is more than 25 times lower (0.94611) than it was with the data that included values close 0—and more than 10 times lower than the MAPE value in Model A. That result makes more sense, and is coherent with the model fit shown on the graph.</p>
<ul>
<li>
Examine multiple measures
</li>
</ul>
<p>MAPE is often considered the go-to measurement for the fit of time series models. But notice that there are two other measures of model error in the trend plots: MAD (mean absolute deviation) and MSD (mean squared deviation). Notice that in both trend plots for Model B, those values are low<span style="line-height: 20.8px;">—</span>and identical. They’re not affected by values close to 0. </p>
<p>Examining multiple measures helps ensure you won't be hoodwinked by a quirk for a single measure.</p>
<ul>
<li>
Interpret within the context
</li>
</ul>
<p>Generally you’re safest using measures of fit to compare the fits of candidate models for a single data set. Comparing model fits across different data sets, in different contexts, leads to invalid comparisons. That’s why you should be wary of blanket generalizations (and you’ll hear them), such as “every regression model should have an R-squared of at least 70%.” It really depends what you’re modelling, and what you’re using the model for. For more on that, read <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-high-should-r-squared-be-in-regression-analysis" target="_blank">this post </a><span style="line-height: 20.8px;"><a href="http://blog.minitab.com/blog/adventures-in-statistics/how-high-should-r-squared-be-in-regression-analysis" target="_blank">by Jim Frost</a> </span>on R-squared.</p>
Finally, a good model is more than just a perfect fit
<p>Don't let small numerical differences in model fit be your be-all and end-all. There are other important practical considerations, as shown by these models.</p>
<p><img alt="simple" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/693a6f508ff2572a6b8a00295cc8d10d/simple_elegant2.jpg" style="float: left; width: 364px; height: 230px;" /><img alt="complex" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/5c4f7aa899a2124d83cdc7ca1bf1f7ac/complicated_model.jpg" style="width: 373px; height: 284px; float: right;" /></p>
<p> </p>
<p> </p>
<p> </p>
<p> </p>
<div> </div>
Data AnalysisStatisticsFri, 29 Jul 2016 12:00:00 +0000http://blog.minitab.com/blog/statistics-and-quality-data-analysis/model-fit%3A-dont-be-blinded-by-numerical-fundamentalismPatrick RunkelOne-Sample t-test: Calculating the t-statistic is not really a bear
http://blog.minitab.com/blog/marilyn-wheatleys-blog/one-sample-t-test-calculating-the-t-statistic-is-not-really-a-bear
<p>While some posts in our Minitab blog focus on <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-t-tests-t-values-and-t-distributions">understanding t-tests and t-distributions</a> this post will focus more simply on how to hand-calculate the t-value for a one-sample t-test (and how to replicate the p-value that Minitab gives us). </p>
<p>The formulas used in this post are available within <a href="http://www.minitab.com/en-us/products/minitab/">Minitab Statistical Software</a> by choosing the following menu path: <strong>Help</strong> > <strong>Methods and Formulas</strong> > <strong>Basic Statistics</strong> > <strong>1-sample t</strong>.</p>
<p>The null and three alternative hypotheses for a one-sample t-test are shown below:</p>
<p style="margin-left: 40px;"><img border="0" height="184" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/553bfcce02e2394b13b5175655c99df6/553bfcce02e2394b13b5175655c99df6.png" width="368" /></p>
<p>The default alternative hypothesis is the last one listed: The true population mean is not equal to the mean of the sample, and this is the option used in this example.</p>
<p><img alt="bear" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/88db51bd8ccbfcbb306372bb65fa4902/bear.jpg" style="margin: 10px 15px; float: right; width: 400px; height: 290px;" />To understand the calculations, we’ll use a sample data set available within Minitab. The name of the dataset is <strong>Bears.MTW</strong>, because the calculation is not a huge bear to wrestle (plus who can resist a dataset with that name?). The path to access the sample data from within Minitab depends on the version of the software. </p>
<p>For the current version of Minitab, <a href="http://www.minitab.com/en-us/products/minitab/whats-new/">Minitab 17.3.1</a>, the sample data is available by choosing <strong>Help</strong> > <strong>Sample Data</strong>.</p>
<p>For previous versions of Minitab, the data set is available by choosing <strong>File</strong> > <strong>Open Worksheet</strong> and clicking the <strong>Look in Minitab Sample Data folder</strong> button at the bottom of the window.</p>
<p>For this example, we will use column C2, titled Age, in the Bears.MTW data set, and we will test the hypothesis that the average age of bears is 40. First, we’ll use <strong>Stat</strong> > <strong>Basic Statistics</strong> > <strong>1-sample t</strong> to test the hypothesis:</p>
<p style="margin-left: 40px;"><img border="0" height="315" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/d3336e100a9a4a91501ed1206c8e807f/d3336e100a9a4a91501ed1206c8e807f.png" width="400" /></p>
<p>After clicking <strong>OK</strong> above we see the following results in the session window:</p>
<p style="margin-left: 40px;"><img border="0" height="118" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/e62a2a776614c60eff0dd6383f66e5f5/e62a2a776614c60eff0dd6383f66e5f5.png" width="464" /></p>
<p>With a high p-value of 0.361, we don’t have enough evidence to conclude that the average age of bears is significantly different from 40. </p>
<p>Now we’ll see how to calculate the T value above by hand.</p>
<p>The formula for the T value (0.92) shown above is calculated using the following formula in Minitab:</p>
<p style="margin-left: 40px;"><img border="0" height="172" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/701f9c0efa98a38fb397f3c3ec459b66/701f9c0efa98a38fb397f3c3ec459b66.png" width="247" /></p>
<p>The output from the 1-sample t test above gives us all the information we need to plug the values into our formula:</p>
<p style="margin-left: 40px;">Sample mean: 43.43</p>
<p style="margin-left: 40px;">Sample standard deviation: 34.02</p>
<p style="margin-left: 40px;">Sample size: 83</p>
<p>We also know that our target or hypothesized value for the mean is 40.</p>
<p>Using the numbers above to calculate the t-statistic we see:</p>
<p style="margin-left: 40px;">t = (43.43-40)/34.02/√83) = <strong>0.918542</strong><br />
(which rounds to 0.92, as shown in Minitab’s 1-sample t-test output)</p>
<p>Now, we <em>could </em>dust off a statistics textbook and use it to compare our calculated t of 0.918542 to the corresponding critical value in a t-table, but that seems like a pretty big bear to wrestle when we can easily get the p-value from Minitab instead. To do that, I’ve used <strong>Graph</strong> > <strong>Probability Distribution Plot</strong> > <strong>View Probability</strong>:</p>
<p style="margin-left: 40px;"><img border="0" height="382" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/e43510dc233e71f22b93f190deb5e523/e43510dc233e71f22b93f190deb5e523.png" width="419" /></p>
<p>In the dialog above, we’re using the t distribution with 82 degrees of freedom (we had an N = 83, so the degrees of freedom for a 1-sample t-test is N-1). Next, I’ve selected the <strong>Shaded Area</strong> tab:</p>
<p style="margin-left: 40px;"><img border="0" height="383" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/e36572b6cead5cf393763d880b6f229a/e36572b6cead5cf393763d880b6f229a.png" width="414" /></p>
<p>In the dialog box above, we’re defining the shaded area by the X value (the calculated t-statistic), and I’ve typed in the t-value we calculated in the <strong>X value</strong> field. This was a 2-tailed test, so I’ve selected <strong>Both Tails</strong> in the dialog above.</p>
<p>After clicking <strong>OK</strong> in the window above, we see:</p>
<p style="margin-left: 40px;"><img border="0" height="384" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/a12abfcbe5ecea6902e4a138e96a53a6/a12abfcbe5ecea6902e4a138e96a53a6.png" width="576" /></p>
<p>We add together the probabilities from both tails, 0.1805 + 0.1805 and that equals 0.361 – the same p-value that Minitab gave us for the 1-sample t test. </p>
<p>That wasn’t so bad—not a difficult bear to wrestle at all!</p>
Data AnalysisFun StatisticsHypothesis TestingLearningStatisticsStatistics HelpStatsWed, 27 Jul 2016 17:57:00 +0000http://blog.minitab.com/blog/marilyn-wheatleys-blog/one-sample-t-test-calculating-the-t-statistic-is-not-really-a-bearMarilyn WheatleyOn Paying Bills, Marriage, and Alert Systems
http://blog.minitab.com/blog/meredith-griffith/on-paying-bills-marriage-and-alert-systems
<p>When I blogged about <a href="http://blog.minitab.com/blog/meredith-griffith/what-a-trip-to-the-dentist-taught-us-about-automation">automation</a> back in March, I made my husband out to be an automation guru. Well, he certainly is. But what you don’t know about my husband is that while he loves to automate everything in his life, sometimes he drops the ball. He’s human; even I have to cut him a break every now and then.</p>
<p>On the other hand, instances of hypocrisy in his behavior tend to make for a good story. So here we are again.</p>
<span style="line-height: 1.2;">On Paying Bills</span>
<p>When we married 5 years ago and began combining our bank accounts, I learned a few things about my husband. Nothing that I haven’t already shared with you. Because he loves automation, it came as no surprise to me that all his accounts resided in a single online repository (mint.com) where he could view his net worth—assets such as his home and car value, and debts including the loan left on his home and bills and credit card expenses that needed to be paid. He’d also made sure to automate the payment of all loans, utility bills, and credit cards—and the respective account would notify him when a payment was made.</p>
<p>This mint.com account served as one dashboard view of all possible accounts he would otherwise have to access independently to see statements and make payments. It was genius! </p>
<p><img alt="mint" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/dae6c7b7-fc22-4616-9d65-f04909c20ab1/Image/299516c1c0685e413532648e7a185d6e/mint.jpg" style="width: 1000px; height: 563px;" /></p>
<p>He could set up savings goals, budgets, email alerts for credit card payment reminders and notification of payment, suspicious account activity, and just about any other miscellaneous charge or activity or change in spending habits. It really did make life easier.</p>
<p>Until I entered the picture.</p>
<span style="line-height: 1.2;">On Marriage</span>
<p>We married, I synced my bank accounts, and we combined cash. I scoured his historical data to observe spending habits—areas where we could save money (Taco Bell topped the ‘high spending’ for the Food/Dining category). As I began poking around his accounts, I noticed a monthly fee his Chase Freedom Visa credit card was charging him. I asked him about the fee; he pleaded ignorance. When I investigated further, I discovered that he’d been charged this fee for <em>years</em>, since he first got the credit card.</p>
<p>I researched online and discovered that other cardholders had complained of being erroneously enrolled in a protection program when they first got their Chase Freedom card, and were being charged a similar fee of varying amounts monthly. Turns out this monthly fee was a percentage of monthly spending—and the Chase Freedom Visa credit card incentivized a cardholder to make all his purchases with that card, given its offer of 5% cash back on all purchases at the time.</p>
<p>Needless to say, I wanted that money back. No less than a few minutes later, we were on the phone with Chase disputing the program enrollment and monthly charges. They acknowledged their error and refunded us the money lost over a span of several years.</p>
<p>The lesson in all of this? Marry someone who’s not afraid to dig through your historical data.</p>
On Alert Systems
<p>More seriously, automating processes or workflows is incredibly helpful, but without the proper attention and alert systems in place, you may still encounter holes in the story. Automation and alerts must go hand-in-hand to be effective—and as a consumer of the information you’re automating, you still must be invested enough to look at the big picture.</p>
<p>For my husband, the beauty in automating his bill payments and aggregating all his accounts on mint.com was to save time he'd otherwise spend paying bills separately and checking cash flows in multiple different accounts. But he failed to set up alerts about important aspects of the process he was automating, and he failed to check in on his process from time to time. Mint.com provides an incredibly useful dashboard to give you the big picture overview of your accounts and your net worth; it also provides a plethora of alert options that save a consumer time from digging for red flags <em>after</em> the undesirable event has become a regular occurrence in the process (like I did). But without checking the status of the system or using its full automation potential, the system is only as good as its inputs until you revisit it or tweak it.</p>
<p>This is just one piece of the puzzle. Alert systems offer so much more!</p>
<ol>
<li><strong>Awareness</strong>—setting alerts through mint.com with regard to miscellaneous fees would have offered insight about the credit card program my husband had been erroneously enrolled in.</li>
<li><strong>Immediate Feedback</strong>—the first time a fee was charged, he would have been able to take immediate action rather than waiting years later for his wife to discover the charge (manually, mind you).</li>
<li><strong>Time Saver</strong>—aside from automating bill pay and combining all accounts into a single repository for a big picture view of one’s financial status (which is certainly a time-saver in reviewing accounts and paying bills in various locations), an alert system would have saved me a lot of time in digging through my husband’s financial data to understand the origin of the fee Chase was charging him.</li>
<li><strong>Money Saver</strong>—while we <em>were </em>refunded all the money charged in monthly fees by Chase, clearly an alert system would have been a more foolproof way to save money in the first place. Alerts are also effective in ensuring bill pay occurs on time, notifying you when a statement has been prepared, when the bill is due, and when the bill has been paid.</li>
</ol>
<p>As process engineers or quality managers in the manufacturing world, you are very close to your process and its inputs. You want to know when something goes wrong, right when it happens. You don’t want a consumer to discover a flaw in a part or product you manufactured and sold years before, only to be faced with product recalls, customer reimbursements, time and money invested to re-manufacture and replace the defective product for unhappy customers, and in some cases, lawsuits. The stakes are high.</p>
<p>Minitab offers a solution to this pain point in its Real-Time SPC dashboard. The dashboard is completely powered by Minitab Statistical Software, taking the graphs and output you know and love and placing them on customized dashboard views that show the current state of your processes. The dashboard gives you a big picture view of your processes across all your production sites, for instance, and highlights where improvements can be made. You can incorporate any graph or analysis you want—such as histograms, control charts, or process capability analysis. You can automatically generate quality reports about your processes, and set up any alert that will help you respond to defects faster.</p>
<p><img alt="qualityDashboard" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/dae6c7b7-fc22-4616-9d65-f04909c20ab1/Image/c9c6bb0f36670d640bf29072a830b9d5/qualitydashboard.jpg" style="width: 900px; height: 651px;" /></p>
<p><img alt="spcDashboard" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/dae6c7b7-fc22-4616-9d65-f04909c20ab1/Image/27347695ab637e3931fe251860d12079/spcdashboard.jpg" style="line-height: 1.6; width: 900px; height: 665px;" /></p>
<p><span style="line-height: 1.6;">In the case of my marriage, alert systems are certainly practical from a financial standpoint. But in the world of manufacturing, ensuring alerts are set up around your automated processes has far-reaching implications as the time- and money-saving elements of alert systems greatly impacts a company’s bottom line. To learn more about how Minitab can help you, contact us at </span><a href="mailto:sales@minitab.com" style="line-height: 1.6;">Sales@minitab.com</a><span style="line-height: 1.6;">.</span></p>
<p>And if you’ve ever thought twice about whether or not you should marry, let this story be an encouragement to you—you may actually find a spouse who can make you richer.</p>
<p> </p>
AutomationData AnalysisQuality ImprovementSix SigmaMon, 25 Jul 2016 12:00:00 +0000http://blog.minitab.com/blog/meredith-griffith/on-paying-bills-marriage-and-alert-systemsMeredith Griffith