Minitab | MinitabBlog posts and articles about using Minitab software in quality improvement projects, research, and more.
http://blog.minitab.com/blog/minitab/rss
Sun, 20 Aug 2017 23:13:49 +0000FeedCreator 1.7.3Flight of the Chickens: A Statistical Bedtime Story, Part 2
http://blog.minitab.com/blog/statistics-in-the-field/flight-of-the-chickens-a-statistical-bedtime-story-part-2
<p><em>by Matthew Barsalou, guest blogger</em></p>
<p>At the end of <a href="http://blog.minitab.com/blog/statistics-in-the-field/flight-of-the-chickens-a-statistical-bedtime-story-part-1">the first part of this story</a>, a group of evil trouble-making chickens had convinced all of their fellow chickens to march on the walled city of Wetzlar, where, said the evil chickens, they all would be much happier than they were on the farm. </p>
The chickens marched through the night and arrived at Wetzlar on the Lahn as the sun came up. “Let us in!” demanded the chickens.
<p><img alt="https://upload.wikimedia.org/wikipedia/commons/a/aa/Wetzlarskyline.jpg" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/3dd478e1d22595f46e61e30633cc1cc8/image015.jpg" style="text-align: -webkit-center; border-width: 0px; border-style: solid; width: 603px; height: 452px;" /></p>
<p align="center" style="font-size:9px">By Krusto - self made by Krusto, CC BY 2.0 de, https://commons.wikimedia.org/w/index.php?curid=1712222</p>
<p>"No," said the Swan of the Lahn, the ruler of Wetzlar.</p>
<p>The chickens spent the day trying to force open the gates of Wetzlar. One chicken snuck off to meet with a goose known for dealing in antiques such as lamps, chairs, and main battle tanks. The chicken returned by early evening driving a slightly used T-55 tank.</p>
<p><img alt="https://upload.wikimedia.org/wikipedia/commons/thumb/6/68/T-54A_Panzermuseum_Thun.jpg/1280px-T-54A_Panzermuseum_Thun.jpg" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/b9f47d36ed923bb9a8d4957b50aaed3d/image017.jpg" style="text-align: -webkit-center; border-width: 0px; border-style: solid; width: 627px; height: 388px;" /></p>
<p style="font-size:9px">By Sandstein - Own work, CC BY 3.0, <a href="https://commons.wikimedia.org/w/index.php?curid=5069466">https://commons.wikimedia.org/w/index.php?curid=5069466</a></p>
<p>Sid, the undercover duck who'd infiltrated the flock to spy on the evil chickens for the Swan of the Lahn, realized he needed to do something, and fast. So he looked up the amount of fuel used for the distance driven for 47 T-55s, then performed a regression analysis to determine how far this one could go if it had full fuel tanks.</p>
<p>To recreate Sid's analysis in Minitab, download his <a href="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a84e66ffcbe9f854a779cdacfc914915/flightofthechickens.mtw">data set</a> (and the trial version of <a href="http://www.minitab.com/products/minitab/free-trial">Minitab 18</a>, if you need it) and go to <strong>Stat > Regression > Fit Regression Model...,</strong> and select Distance as the Response and Fuel as the Continuous predictor. Then click on Graphs and select Four in one.</p>
<p><img alt="regression dialog box" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/7897aea8776afdf207dfd87f6f5d151b/regression_dialog.png" style="width: 600px; height: 396px;" /></p>
<p>Click OK twice, and Minitab produces the following output: </p>
<p style="margin-left: 40px;"><img alt="residual plots" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/27b58becfc3954e9d298592ddbc4b0e5/image020.png" style="border-width: 0px; border-style: solid; width: 576px; height: 384px;" /></p>
<p style="margin-left: 40px;"><img alt="Regression Analysis Distance vs Fuel" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/752f79e57ba64ac2348a4f07e721f5f0/regression_output_18.png" style="width: 472px; height: 691px;" /></p>
<p>We can see that there is a statistically significant relationship between fuel used and distance traveled. The R-squared statistics indicates the amount of fuel used explains 95.28% of the variability in distance traveled. There seems to be something odd with the order of the data, as seen in the Four in One graph. The Session window output shows three unusual values, two of which have large residuals. This is an indication that these data are not perfect for a regression analysis.</p>
<p>However, better data would not matter in this case, since Sid the Duck once again investigated the wrong question. Predicting how far the tank could travel was irrelevant, given it had already arrived at the town. The real question Sid should have asked was, “Can a T-55 round penetrate the gates of Wetzlar?”</p>
<p><img alt="Gates of Wetzlar" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/5c7810ee0247cead4fd90134fc27817c/image023.jpg" style="text-align: -webkit-center; border-width: 0px; border-style: solid; width: 477px; height: 318px;" /></p>
<p style="font-size:9px"><span style="text-align: -webkit-center;">By Peter Haas /, CC BY-SA 3.0, </span><a href="https://commons.wikimedia.org/w/index.php?curid=28496059" style="text-align: -webkit-center;">https://commons.wikimedia.org/w/index.php?curid=28496059</a></p>
<p>The chicken inside the tank huffed and puffed and fired the main gun directly at the gates of Wetzlar, but the round simply bounced off. He fired again and again, but the rounds just bounced off again and again. Eventually, the T-55 broke down—as they are known to do—so the chickens gathered in force and attempted to knock the gates down by running into them.</p>
<p>But a gate that can survive a tank’s main gun round will not budge when rammed by chickens, no matter how determined they are.</p>
<p>The Swan of Wetzlar had had enough by this time, so boiling chicken soup with noodles and vegetables was poured onto the chickens. This was too much for the chickens, so they fled.</p>
<p><img alt="Chicken Noodle Soup.jpg" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/c12f89615912f1c1867393240da3c76a/image025.jpg" style="text-align: -webkit-center; border-width: 0px; border-style: solid; width: 292px; height: 219px;" /></p>
<p style="font-size:9px"><span style="text-align: -webkit-center;">Chicken Noodle Soup by </span><a href="https://commons.wikimedia.org/wiki/File:Chicken_Noodle_Soup.jpg" style="text-align: -webkit-center;">Hoyabird8</a><span style="text-align: -webkit-center;"> at English Wikipedia</span></p>
<p>Unfortunately, the road they had followed had washed out in the heavy rains so the only route home was through the Bird Mountains...the<em> terribly misnamed</em> Bird Mountains, which could more accurately be called the Hungry Foxes Everywhere Mountains. </p>
<p><img alt="Hungry Foxes Everywhere Mountains" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/99127e2feebe508d0b16544ac123f305/image027.jpg" style="text-align: -webkit-center; border-width: 0px; border-style: solid; width: 364px; height: 243px;" /></p>
<p style="font-size:9px"><span style="text-align: -webkit-center;">Von Pulv - Eigenes Werk, CC BY-SA 3.0, </span><a href="https://commons.wikimedia.org/w/index.php?curid=15979924" style="text-align: -webkit-center;">https://commons.wikimedia.org/w/index.php?curid=15979924</a></p>
<p>The chickens fled into the forests of the Bird Mountains. Knowing the dangers of these forests, the evil chickens let the other chickens lead so that they would encounter the foxes first. However, the evil chickens failed to consider that foxes are, as they say, sly as foxes. The foxes of the woefully misnamed Bird Mountains waited till the chickens were well into their territory, and then fell upon those in the rear—the evil chickens.</p>
<p>Evil chickens nonetheless taste like chicken, and the foxes feasted.</p>
<p><img alt="hungry fox" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/e81feb7a838f17127a03bb8c7774b747/image029.jpg" style="text-align: -webkit-center; border-width: 0px; border-style: solid; width: 384px; height: 276px;" /></p>
<p style="font-size:9px"><span style="text-align: -webkit-center;">By Foto: Jonn Leffmann, CC BY 3.0, </span><a href="https://commons.wikimedia.org/w/index.php?curid=21536441" style="text-align: -webkit-center;">https://commons.wikimedia.org/w/index.php?curid=21536441</a></p>
<p>Upon returning with the flock to the farm, Sid suspected the evil chickens had been decimated, so he did a survey. Originally, 647 out of the population of 1,541 chickens were evil, so he randomly sampled 175 chickens and found only 22 of these chickens were evil. Sid wanted to know if the new proportion of evil chickens was less than the older portion, so so he did a one-tailed two proportion test.</p>
<p>To do this in Minitab, go to <strong>Stat > Basic Statistics > 2 Proportions...</strong> and select Summarized data in the drop down menu. Enter 22 for the number of events and 175 for the number of trials under Sample 1 and enter 647 for the number of events and 1,541 for the number of trials under Sample 2. Click on options and select Difference < hypothesized difference...</p>
<p style="margin-left: 40px;"><img alt="2 proportions test dialog" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/5c5778bdb7360c9974fe8d8e0f059a36/2_proportions_dialog.png" style="width: 499px; height: 400px;" /></p>
<p>Then click OK twice. </p>
<p style="margin-left: 40px;"><img alt="2 proportions test output" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/e5cfd2f156bdf33a539975586475c0e0/2_proportions_output.png" style="width: 348px; height: 599px;" /></p>
<p>The resulting p-value is less than 0.05, so Sid can conclude there is a statically significant difference between the samples.</p>
<p>After returning home, the chickens were confused. Just days earlier all had been well, and suddenly they had found themselves in such an adventure. The remaining evil chickens weren't confused, since they had instigated it all. But, they were understandably upset with how things and ended and they began to argue and blame each other for the failure. Both recriminations and feathers flew.</p>
<p>Evil chickens started turning each other into the farmer which resulted in a weight gain for the farmer and an even greater reduction in the proportion of evil chickens in the flock. Sid surreptitiously arranged a few “accidents” to dispatch the remaining evil chickens.</p>
<p>The pigs eventually forgave the innocent chickens for the egg-throwing incident, and the remaining chickens lived happily ever after. The cow spent the rest of her life hoping for another dinner of eggs.</p>
<p>As for Sid, his next assignment was the infiltration of a rabbit den. How a duck disguised himself as a rabbit is a tale for another time.</p>
<p><img alt="Rabbit burrow" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/e218aafacbf66d1da87cf6fb0b2c2ddb/image033.jpg" style="text-align: -webkit-center; border-width: 0px; border-style: solid; width: 464px; height: 348px;" /></p>
<p align="center" style="font-size:9px">By Brammers - Own work, Public Domain, <a href="https://commons.wikimedia.org/w/index.php?curid=7862100">https://commons.wikimedia.org/w/index.php?curid=7862100</a></p>
<p><strong>There is a moral to this story: If you need help with statistics, call a statistician...<em>not </em>a duck.</strong></p>
<p> </p>
<p><strong>About the Guest Blogger</strong></p>
<div>
<p><em><a href="https://www.linkedin.com/pub/matthew-barsalou/5b/539/198" target="_blank">Matthew Barsalou</a> is a statistical problem resolution Master Black Belt at <a href="http://www.3k-warner.de/" target="_blank">BorgWarner</a> Turbo Systems Engineering GmbH. He is a Smarter Solutions certified Lean Six Sigma Master Black Belt, ASQ-certified Six Sigma Black Belt, quality engineer, and quality technician, and a TÜV-certified quality manager, quality management representative, and auditor. He has a bachelor of science in industrial sciences, a master of liberal studies with emphasis in international business, and has a master of science in business administration and engineering from the Wilhelm Büchner Hochschule in Darmstadt, Germany. He is author of the books </em><a href="http://www.amazon.com/Root-Cause-Analysis-Step---Step/dp/148225879X/ref=sr_1_1?ie=UTF8&qid=1416937278&sr=8-1&keywords=Root+Cause+Analysis%3A+A+Step-By-Step+Guide+to+Using+the+Right+Tool+at+the+Right+Time" target="_blank">Root Cause Analysis: A Step-By-Step Guide to Using the Right Tool at the Right Time</a><em>, </em><a href="http://asq.org/quality-press/display-item/index.html?item=H1472" target="_blank">Statistics for Six Sigma Black Belts</a><em> and </em><a href="http://asq.org/quality-press/display-item/index.html?item=H1473&xvl=76115763" target="_blank">The ASQ Pocket Guide to Statistics for Six Sigma Black Belts</a><em>.</em></p>
</div>
<div style="clear:both;"> </div>
Fun StatisticsStatisticsThu, 17 Aug 2017 13:59:00 +0000http://blog.minitab.com/blog/statistics-in-the-field/flight-of-the-chickens-a-statistical-bedtime-story-part-2Guest BloggerFlight of the Chickens: A Statistical Bedtime Story, Part 1
http://blog.minitab.com/blog/statistics-in-the-field/flight-of-the-chickens-a-statistical-bedtime-story-part-1
<p><em>by Matthew Barsalou, guest blogger</em></p>
<p>Once upon a time, in the Kingdom of Wetzlar, there was a farm with over a thousand chickens, two pigs, and a cow. The chickens were well treated, but a few rabble-rousers among them got the rest of the chickens worked up. These trouble-making chickens <em>looked </em>almost like the other chickens, but in fact they were <em>evil </em>chickens. </p>
<img alt="chickens" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/99db35580e3f59363035593e45311e89/image001.jpg" style="width: 331px; height: 248px;" />
<p style="font-size: 9px; text-align: center;"><em>By HerbertT - Eigenproduktion, CC BY-SA 3.0, <a href="https://commons.wikimedia.org/w/index.php?curid=962579">https://commons.wikimedia.org/w/index.php?curid=962579</a></em></p>
<p>Hidden among the good chickens and the evil chickens was Sid. Sid was not like other chickens. He was a secret spy for The Swan of the Lahn, who ruled Wetzlar and was concerned about the infiltration of evil chickens. Sid was also a duck. That's right, a duck disguised as a chicken. Sid knew who the evil chickens were, and sent regular reports on their activities back to Wetzler.</p>
<img alt="duck" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/29a7e9ed327615ff06708ca7d629e12b/image004.jpg" style="width: 273px; height: 182px;" />
<p style="font-size: 9px;">Mallard drake by <a href="https://commons.wikimedia.org/wiki/File:Mallard_drake_.02.jpg">Bert de tilly</a></p>
<p>One stormy and dark night, an evil chicken snuck out with an enormous basket of beautiful hand-painted eggs to throw at the two pigs and the cow. Sid snuck out into the pouring rain and took a sample of 18 of the eggs. The intrepid duck spy was familiar with a previous study of 157 eggs, which showed that the mean of those eggs was <a href="http://archive.org/stream/standarddeviatio195atwo/standarddeviatio195atwo_djvu.txt" target="_blank">57.079 grams</a> with a standard deviation of 2.30 grams. Sid was determined to find out if the mean of his current samples had a statistically significant difference from the mean of the previous study.</p>
<img alt="https://upload.wikimedia.org/wikipedia/commons/thumb/1/10/Easter_eggs_-_straw_decoration.jpg/1024px-Easter_eggs_-_straw_decoration.jpg" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ea1fe788c97d1f12d8f000fc568f0ad9/image005.jpg" style="text-align: -webkit-center; border-width: 0px; border-style: solid; width: 306px; height: 229px;" />
<p style="font-size: 9px;">By Jan Kameníček - Own work, Public Domain, <a href="https://commons.wikimedia.org/w/index.php?curid=732984" target="_blank">https://commons.wikimedia.org/w/index.php?curid=732984</a></p>
<p>If you'd like to recreate Sid's analysis, download his <a href="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a84e66ffcbe9f854a779cdacfc914915/flightofthechickens.mtw">data set</a> and, if you need it, the <a href="http://www.minitab.com/products/minitab/free-trial">free trial of Minitab</a> 18 Statistical Software. We will need to use summarized data since we only have actual values for the sample from the study and not the full data set. Go to <strong>Stat > Basic Statistics > Display Descriptive Statistics...</strong> and select the column containing the data as the Variable. Click on Graphs and select Individual value plot to view a graph of the data.</p>
<p style="margin-left: 40px;"><img alt="descriptive statistics dialog" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f67868a217eeb27025975888d27ff118/descro_tove_doa_pg.png" style="width: 527px; height: 354px;" /></p>
<p>Click OK twice and Minitab will create an individual value plot of the data and the mean and standard deviation will appear in the session window with the rest of the descriptive statistics.</p>
<p align="center"><img alt="individual value plot of eggs" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/72d17166ab9bf6fb9bd400f1fa864d02/image008.png" style="border-width: 0px; border-style: solid; width: 576px; height: 384px;" /></p>
<p> </p>
<p align="center"><img alt="Descriptive Statistics" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/b9ef39d053dfd80c026f6fdfafcb3193/descriptive_statistics_eggs.png" style="border-width: 0px; border-style: solid; width: 646px; height: 151px;" /></p>
<p> </p>
<p>We can see that the sample mean is 57.315 and the standard deviation is 2.439 so now we can perform a 2 sample t-test to compare the means by going to <strong>Stat > Basic Statistics > 2-Sample t... </strong>and selecting Summarized data in the drop down menu. Enter the sample size of 18, sample mean of 57.315 and standard deviation of 2.439 under Sample 1 and enter the sample size of 157, mean of 57.079, and the population standard deviation of 2.30 under Sample 2.</p>
<p style="margin-left: 40px;"><img alt="two-sample t test dialog" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/23cbaedfab65fc90c48259b8dbbad0e0/2_sample_t_dialog.png" style="width: 424px; height: 296px;" /></p>
<p>Then click OK.</p>
<p style="margin-left: 40px;"><img alt="t-test output" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/dfaecdaefcbd971827c75e8ad15ebd7f/t_test_output_2.png" style="border-width: 0px; border-style: solid; width: 327px; height: 555px;" /></p>
<p>The p -value is greater than 0.05 so we can conclude there is no statistically significant difference between the means of the eggs the evil chickens planned to throw and the eggs in the previous study.</p>
<p>Unfortunately, Sid made a critical mistake. The first step in an analysis is to ask the right question. Sid's statistics were correct, but he asked the wrong question: “Is the mean of the second sample different from the mean of the first sample with an alpha of 0.05?” </p>
<p>What he <em>should </em>have asked was, “What will happen when the pigs and the cow get hit by eggs?” The weight of the eggs was irrelevant; what mattered was the consequences of the pigs and cow being pummeled with eggs.</p>
<p>If Sid had prepared a report for The Swan of the Lahn that only said the eggs collected by the evil chickens weighed the same as eggs in the earlier study, the Swan would conclude that the process had not changed. But had the right question been answered, the correct conclusion would have been, “Trouble may be brewing.”</p>
<p><img alt="Swan" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/3c61f8f1ca1091cd188b9716190ed656/image013.jpg" style="text-align: -webkit-center; border-width: 0px; border-style: solid; width: 399px; height: 301px;" /></p>
<p><span style="text-align: -webkit-center;">By Dick Daniels (http://carolinabirds.org/) - Own work, CC BY-SA 3.0, </span><a href="https://commons.wikimedia.org/w/index.php?curid=11053305" style="text-align: -webkit-center;">https://commons.wikimedia.org/w/index.php?curid=11053305</a></p>
<p>Trouble did indeed result when the evil chickens put their egg-throwing plan into action. As darkness fell, first the cow and then the pigs were bombarded by egg after messy egg.</p>
<p>The cow simply ate the eggs. But the pigs, holding <em>all </em>the chickens to be responsible, were outraged. They rampaged and terrorized the poor chickens all that night. By midnight, the muddy fields were full of pig prints and feathers were ruffled in the chicken coop. </p>
<p>One of the evil chickens seized on the traumatized crowd's passions, and demanded of the others, “How can we live like this?" The evil chickens soon convinced the others that they would all be happier if they moved to the high-walled village of Wetzlar beside the Lahn River. The chickens began to march into the stormy night.</p>
<p><em><a href="http://blog.minitab.com/blog/statistics-in-the-field/flight-of-the-chickens-a-statistical-bedtime-story-part-2">Continued in Part 2</a></em></p>
<p> </p>
<p><strong>About the Guest Blogger</strong></p>
<div>
<p><em><a href="https://www.linkedin.com/pub/matthew-barsalou/5b/539/198" target="_blank">Matthew Barsalou</a> is a statistical problem resolution Master Black Belt at <a href="http://www.3k-warner.de/" target="_blank">BorgWarner</a> Turbo Systems Engineering GmbH. He is a Smarter Solutions certified Lean Six Sigma Master Black Belt, ASQ-certified Six Sigma Black Belt, quality engineer, and quality technician, and a TÜV-certified quality manager, quality management representative, and auditor. He has a bachelor of science in industrial sciences, a master of liberal studies with emphasis in international business, and has a master of science in business administration and engineering from the Wilhelm Büchner Hochschule in Darmstadt, Germany. He is author of the books </em><a href="http://www.amazon.com/Root-Cause-Analysis-Step---Step/dp/148225879X/ref=sr_1_1?ie=UTF8&qid=1416937278&sr=8-1&keywords=Root+Cause+Analysis%3A+A+Step-By-Step+Guide+to+Using+the+Right+Tool+at+the+Right+Time" target="_blank">Root Cause Analysis: A Step-By-Step Guide to Using the Right Tool at the Right Time</a><em>, </em><a href="http://asq.org/quality-press/display-item/index.html?item=H1472" target="_blank">Statistics for Six Sigma Black Belts</a><em> and </em><a href="http://asq.org/quality-press/display-item/index.html?item=H1473&xvl=76115763" target="_blank">The ASQ Pocket Guide to Statistics for Six Sigma Black Belts</a><em>.</em></p>
</div>
<div style="clear:both;"> </div>
Fun StatisticsStatisticsStatistics HelpTue, 15 Aug 2017 13:59:00 +0000http://blog.minitab.com/blog/statistics-in-the-field/flight-of-the-chickens-a-statistical-bedtime-story-part-1Guest Blogger5 More Critical Six Sigma Tools: A Quick Guide
http://blog.minitab.com/blog/understanding-statistics/5-more-critical-six-sigma-tools-a-quick-guide
<p>The Six Sigma quality improvement methodology has lasted for decades because it gets results. Companies in every country around the world, and in every industry, have used this logical, step-by-step method to improve the quality of their processes, products, and services. And they've saved billions of dollars along the way.</p>
<p>However, Six Sigma involves a good deal of statistics and data analysis, which makes many people uneasy. Individuals who are new to quality improvement often feel intimidated by the statistical aspects.</p>
<p>Don't be intimidated. Data analysis may be a critical component of improving quality, but the good news is that most of the analyses we use in Six Sigma aren't hard to understand, even if statistics isn't something you're comfortable with.</p>
<p>Just getting familiar with the tools used in Six Sigma is a good way to get started on your quality journey. In my last post, I offered a rundown of 5 tools that crop up in most Six Sigma projects. In this post, I'll review 5 more common statistical tools, and explain what they do and why they’re important in Six Sigma.</p>
1. t-Tests
<p><img alt="Pareto Chart" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/9836f7ec0e12d309f6a3472557a5f424/5_more_six_sigma_tools_t_tests.jpg" style="width: 600px; height: 395px;" /></p>
<p>We use t-tests to compare the average of a sample to a target value, or to the average of another sample. For example, a company that sells beverages in 16-oz. containers can use a 1-sample t-test to determine if the production line’s average fill is on or off target. If you buy flavored syrup from two suppliers and want to determine if there’s a difference in the average volume of their respective shipments, you can use a 2-sample t-test to compare the two suppliers. </p>
2. ANOVA
<p><img alt="ANOVA" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/56cc203b4012c25d4fa4e28fc96787f3/5_more_six_sigma_tools_anova.jpg" style="width: 600px; height: 395px;" /></p>
<p>Where t-tests compare a mean to a target, or two means to each other, ANOVA—which is short for Analysis of Variance—lets you compare more than two means. For example, ANOVA can show you if average production volumes across 3 shifts are equal. You can also use ANOVA to analyze means for more than 1 variable. For example, you can simultaneously compare the means for 3 shifts and the means for 2 manufacturing locations. </p>
3. Regression
<p><img alt="Regression" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/54e06038732315d016e9703a866d74f0/5_more_six_sigma_tools_regression.jpg" style="width: 600px; height: 395px;" /></p>
<p>Regression helps you determine whether there's a relationship between an output and one or more input factors. For instance, you can use regression to examine if there is a relationship between a company’s marketing expenditures and its sales revenue. When a relationship between the variables exists, you can use the regression equation to describe that relationship and predict future output values for given input values.</p>
4. DOE (Design of Experiments)
<p><img alt="DOE" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/558c592fd82aafe591c2d087d49bfa4c/5_more_six_sigma_tools_doe.jpg" style="width: 600px; height: 395px;" /><br />
Regression and ANOVA are most often used for data that’s already been collected. In contrast, Design of Experiments (DOE) gives you an efficient strategy for collecting your data. It permits you to change or adjust multiple factors simultaneously to identify if relationships exist between inputs and outputs. Once you collect the data and identify the important inputs, you can then use DOE to determine the optimal settings for each factor. </p>
5. Control Charts
<p><img alt="Control Charts" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/e7cd9d3ebc70528d9c617d8b3980be8f/5_more_six_sigma_tools_control_charts.jpg" style="width: 600px; height: 395px;" /></p>
<p>Every process has some natural, inherent variation, but a stable (and therefore predictable) process is a hallmark of quality products and services. It's important to know when a process goes beyond the normal, natural variation, because it can indicate a problem that needs to be resolved. A control chart distinguishes “special-cause” variation from acceptable, natural variation. These charts graph data over time and flag out-of-control data points, so you can detect unusual variability and take action when necessary. Control charts also help you ensure that you sustain process improvements into the future. </p>
<p><strong>Conclusion</strong></p>
<p>Any organization can benefit from Six Sigma projects, and those benefits <span style="background-color: rgb(246, 213, 217);">are based on </span>data analysis. However, many Six Sigma projects are completed by practitioners who are highly skilled, but not expert statisticians. But a basic understanding of common Six Sigma statistics, combined with easy-to-use statistical software, will let you handle these statistical tasks and analyze your data with confidence. </p>
Lean Six SigmaSix SigmaThu, 10 Aug 2017 13:58:00 +0000http://blog.minitab.com/blog/understanding-statistics/5-more-critical-six-sigma-tools-a-quick-guideEston Martz5 Critical Six Sigma Tools: A Quick Guide
http://blog.minitab.com/blog/understanding-statistics/5-critical-six-sigma-tools-a-quick-guide
<p>Six Sigma is a quality improvement method that businesses have used for decades—because it gets results. A Six Sigma project follows a clearly defined series of steps, and companies in every industry in every country around the world have used this method to resolve problems. Along the way, they've saved billions of dollars.</p>
<p>But Six Sigma relies heavily on statistics and data analysis, and many people new to quality improvement feel intimidated by the statistical aspects.</p>
<p>You needn't be intimidated. While it's true that data analysis is critical in improving quality, the majority of analyses in Six Sigma are not hard to understand, even if you’re not very knowledgeable about statistics.</p>
<p>Familiarizing yourself with these tools is a great place to start. This post briefly explains 5 statistical tools used in Six Sigma, what they do, and why they’re important.</p>
1. Pareto Chart
<p><img alt="Pareto Chart" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/014b0ef2847e14b49bd9d18adeb9b309/5_six_sigma_tools_pareto.jpg" style="width: 600px; height: 395px;" /></p>
<p>The Pareto Chart stems from an idea called the Pareto Principle, which asserts that about 80% of outcomes result from 20% of the causes. It's easy to think of examples even in our personal lives. For instance, you may wear 20% of your clothes 80% of the time, or listen to 20% of the music in your library 80% of the time.</p>
<p>The Pareto chart helps you visualize how this principle applies to data you've collected. It is a specialized type of bar chart designed to distinguish the “critical few” causes from the “trivial many” enabling you to focus on the most important issues. For example, if you collect data about defect types each time one occurs, a Pareto chart reveals which types are most frequent, so you can focus energy on solving the most pressing problems. </p>
2. Histogram
<p><img alt="Histogram" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/2bb10c9c739b156c8753d81b2a63cc16/5_six_sigma_tools_histogram.jpg" style="width: 600px; height: 395px;" /></p>
<p>A histogram is a graphical snapshot of numeric, continuous data. Histo­grams enable you to quickly identify the center and spread of your data. It shows you where most of the data fall, as well as the minimum and maximum values. A histogram also reveals if your data are bell-shaped or not, and can help you find unusual data points and outliers that may need further investigation. </p>
3. Gage R&R
<p><img alt="gage R&R" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/53a58036bcacc1abbbe345a171bd3cc8/5_six_sigma_tools_gage.jpg" style="width: 600px; height: 444px;" /></p>
<p>Accurate measurements are critical. Would you want to weigh yourself with a scale you know is unre­liable? Would you keep using a thermometer that never shows the right temperature? If you can't measure a process accurately, you can't improve it, which is where <span><a href="http://blog.minitab.com/blog/meredith-griffith/fundamentals-of-gage-rr">Gage R&R</a></span> comes in. This tool helps you determine if your continuous numeric measurements—such as weight, diameter, and pressure—are both repeatable and reproducible, both when the same person repeatedly measures the same part, and when different operators measure the same part.</p>
4. Attribute Agreement Analysis
<p><img alt="Attribute" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a248d1a75f744990aea0ce8414219166/5_six_sigma_tools_attribute.jpg" style="width: 600px; height: 395px;" /><br />
Another tool for making sure you can trust your data is attribute agreement analysis. Where Gage R&R assesses the reliability and reproducibility of numeric measurements, attribute agree­ment analysis assess categorical assessments, such as Pass or Fail. This tool shows whether people rating these categories agree with a known standard, with other appraisers, and with themselves. </p>
5. Process Capability
<p><img alt="Capability" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/155f21ef5bb7ddf08d0af4cce5425340/5_six_sigma_tools_capability.jpg" style="width: 600px; height: 444px;" /></p>
<p>Nearly every process has an acceptable lower and/or upper bound. For example, a supplier's parts can’t be too large or too small, wait times can’t extend beyond an acceptable threshold, fill weights need to exceed a specified minimum. Capability analysis shows you how well your process meets specifications and provides insight into how you can improve a poor process. Frequently cited capability metrics include Cpk, Ppk, defects per million opportunities (DPMO), and Sigma level. </p>
Conclusion
<p>Six Sigma can bring significant benefits to any business, but reaping those benefits requires the collection and analysis of data so you can understand opportunities for improvement and make significant and sustainable changes.</p>
<p>The success of Six Sigma projects often depends on practitioners who are highly skilled experts in many fields, but not statistics. But with a basic understanding of the most commonly used Six Sigma statistics and easy-to-use statistical software, you can handle the statistical tasks associated with improving quality, and analyze your data with confidence. </p>
<p> </p>
<p> </p>
Lean Six SigmaSix SigmaTue, 08 Aug 2017 13:58:00 +0000http://blog.minitab.com/blog/understanding-statistics/5-critical-six-sigma-tools-a-quick-guideEston MartzHow to Estimate the Probability of a No-Show using Binary Logistic Regression
http://blog.minitab.com/blog/using-data-and-statistics/how-to-estimate-the-probability-of-a-no-show-using-binary-logistic-regression
<p>In April 2017, overbooking of flight seats hit the headlines when a United Airlines customer was dragged off a flight. A TED talk by Nina Klietsch gives a good, but simplistic explanation of why overbooking is so attractive to airlines.</p>
<p></p>
<p>Overbooking is not new to the airlines; these strategies were officially sanctioned by The American Civil Aeronautics Board in 1965, and since that time complex statistical models have been researched and developed to set the ticket pricing and overbooking strategies to deliver maximum revenue to the airlines. </p>
<img alt="airline travel" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/dd1b2a0560ece3771b8dce0b31ca7b4f/airline_passenger.png" style="width: 300px; height: 198px; margin: 10px 15px; float: right;" />
<div>
<p>In this blog, I would like to look at one aspect of this: the probability of a no-show. In Klietsch’s talk, she assumed that the probability of a no-show (a customer not turning up for a flight) is identical for all customers. In reality, this is not true—factors such as time of day, price, time since booking, and whether a traveler is alone or in a group will impact the probability of a no show.</p>
<div>By using this information about our customers, we can predict the probability of a no-show using <a href="http://blog.minitab.com/blog/marilyn-wheatleys-blog/coffee-or-tea-analyzing-categorical-data-with-minitab-v2">binary logistic regression</a>. This type of modeling is common to many services and industries. Some of the applications, in addition to predicting no-shows, include:</div>
<ul>
<li>Credit scores: What is the probability of default? </li>
<li>Marketing offers: What are the chances you'll buy a product based on a specific offer?</li>
<li>Quality: What is the probability of a part failing?</li>
<li>Human resources: What is the sickness absence rate likely to be? </li>
</ul>
<p>In all cases, your outcome (the event you are predicting) is discrete and can be split into two separate groups; for example, purchase/no purchase, pass/fail, or show/no show. Using the characteristics of your customers or parts as predictors you can use this modeling technique to predict the outcome.</p>
<p><img alt="cereal purchase worksheet" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/be467ae943684f0a147b4995f4b1d1ea/cereal_purchase_data.png" style="margin: 10px 15px; float: right; border-width: 1px; border-style: solid;" /> Let’s look at an example. I was unable to find any airline data, so I am illustrating this with one of our Minitab sample data sets, <a href="http://support.minitab.com/datasets/regression-data-sets/cereal-purchases/">Cerealpurchase.mtw</a>.</p>
<p>In this example, a food company surveys consumers to measure the effectiveness of their television ad in getting viewers to buy their cereal. The Bought column has the value 1 if the respondent purchased the cereal, and the value 0 if not. In addition to asking if respondents have seen the ad, the survey also gathers data on the household income and the number of children, which the company also believes might influence the purchase of this cereal.</p>
<p>Using <strong>Stat > Regression > Binary Logistic Regression</strong>,<strong> </strong>I entered the details of the response I wanted to predict, <strong>Bought,</strong> and the value in the Response Event which indicated a purchase. I then entered the Continuous predictor, <strong>Income </strong>and the Categorical predictors <strong>Children </strong>and <strong>ViewAd. </strong>My completed dialog box looks like this: </p>
<p style="margin-left: 40px;"><img alt="binary logistic regression dialog" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a32342393b5bc4ae7c684dcced3674fa/binary_dialog.png" style="width: 574px; height: 314px; border-width: 1px; border-style: solid;" /></p>
<p>After pressing OK, Minitab performs the analysis and displays the results in the Session window. From this table at the top of the output I can see that the researchers surveyed a sample of 71 customers, of which 22 purchased the cereal.</p>
<p style="margin-left: 40px;"><img alt="response information" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/5be946269b2221d79fd670b0bd0dc99d/binary_output_1.png" style="width: 312px; height: 165px;" /></p>
<p>With Logistic regression, the output features a Deviance Table instead of an Analysis of Variance Table. The calculations and test statistics used with this type of data are different, but we still use the P-value on the far right to determine which factors have an effect on our response.</p>
<p style="margin-left: 40px;"><img alt="deviance table" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/c474f30c96355fab2a5d7a9228636b3e/deviance_table.png" style="width: 430px; height: 232px;" /></p>
<p>As we would when using other regression methods, we are going to reduce the model by eliminating non-significant terms one at a time. In this case, as highlighted above, Income is not significant. We can simply press Ctrl-E to recall the last dialog box, remove the Income term from the model, and rerun the analysis. Minitab returns the following results: </p>
<p style="margin-left: 40px;"><img alt="deviance table" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/8e02f72a6d7627468824742132cabf8e/deviance_table_2.png" style="width: 441px; height: 224px;" /></p>
<p>After removing Income, we can see that both Children and ViewAd are significant at the 0.05 significance level. This could be good news for the Marketing Department, as it clearly indicates that viewing the ad did influence the decision to buy. However from this table it is not possible to see if this effect is positive or negative.</p>
<p>To understand this, we need to look at another part of the output. In Binary Logistic Regression, we are trying to estimate the probability of an event. To do this we use the Odds Ratio, which compares the odds of two events by dividing the odds of success under condition A by the odds of success under condition B. </p>
<p style="margin-left: 40px;"><img alt="Odds Ratio" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/905f9b30cf4722086fd56cd3a84bf3b8/odds_ratio.png" style="width: 367px; height: 192px;" /></p>
<p>In this example, the Odds Ratio for Children is telling us that respondents who reported they do have children are 5.1628 times more likely to purchase the cereal than those who did not report having children. The good news for the Marketing Department is that customers who viewed the ad were 3.0219 times more likely to purchase the cereal. If the Odds Ratio was less than 1, we would conclude that seeing the advert reduces sales! </p>
<p><img alt="storage dialog" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/4626c90540439bd12cc8d214dd927cb4/binary_logistic_regression_storage.png" style="margin: 10px 15px; float: right;" /> The other way to look at these results is to calculate the probability of purchase and analyse this. </p>
<p>It is easy to calculate the probability of a sale by clicking on the <strong>Storage </strong>button in the <strong>Binary Logistic Regression </strong>dialog box and checking the box labeled <strong>Fits (event probabilities)</strong>. This will store the probability of purchase in the worksheet.</p>
<p style="margin-left: 40px;"><img alt="data with stored fits" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/16a008b51b973ea4c1f39d6960d99c2d/data.png" style="width: 359px; height: 236px;" /></p>
<p>Using the fits data, we can produce a table summarizing the Probability of Purchase for all the combinations of Children and ViewAd, as follows:</p>
<p style="margin-left: 40px;"><img alt="tabulated statistics" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/7bf3901e06faf493312623f31e8deabe/tabulated_statistics.png" style="width: 416px; height: 371px;" /></p>
<p>In the rows we have the Children indicator, and in the columns we have the ViewAd indicator. In each cell the top number is the probability of cereal purchase, and the bottom number is the count of customers observed in each of the groups. </p>
<p>Based on this table, customers with children who have seen the ad have a 51% chance of purchase, whereas customers without children who have not seen the ad have a 6% chance of purchase.</p>
<p>Now let's bring this back to our airline example. Using the information about their customers' demographics and flight preferences, an airline can use binary logistic regression to estimate the probabilities of a “no-show” for a whole plane and then determine by how much they should overbook seats. Of course, no model is perfect, and as we saw with United, getting it wrong can have severe consequences. </p>
<p> </p>
</div>
Regression AnalysisThu, 03 Aug 2017 13:57:00 +0000http://blog.minitab.com/blog/using-data-and-statistics/how-to-estimate-the-probability-of-a-no-show-using-binary-logistic-regressionGillian Groom3 Keys to Getting Reliable Data
http://blog.minitab.com/blog/understanding-statistics/3-keys-to-getting-reliable-data
<p><em>Can you trust your data? </em></p>
<p><img alt="disk" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/6e0d1a81bdbaadb7aef1d687501264d4/disk.png" style="width: 250px; height: 261px; float: right; margin: 10px 15px;" />That's the very first question we need to ask when we perform a statistical analysis. If the data's no good, it doesn't matter what statistical methods we employ, nor how much expertise we have in analyzing data. If we start with bad data, we'll end up with unreliable results. <em>Garbage in, garbage out, </em>as they say.</p>
<p>So, <em>can </em>you trust your data? Are you positive? Because, let's admit it, many of us forget to ask that question altogether, or respond too quickly and confidently.</p>
<p>You can’t just assume we have good data—you need to <em>know </em>you do. That may require a little bit more work up front, but the energy you spend getting good data will pay off in the form of better decisions and bigger improvements.</p>
<p>Here are 3 critical actions you can take to maximize your chance of getting data that will lead to correct conclusions. </p>
1: Plan How, When, and What to Measure—and Who Will Do It
<p>Failing to plan is a great way to get unreliable data. That’s because a solid plan is the key to successful data collection. Asking why you’re gathering data at the very start of a project will help you pinpoint the data you really need. A data collection plan should clarify:</p>
<ul>
<li>What data will be collected</li>
<li>Who will collect it</li>
<li>When it will be collected</li>
<li>Where it will be collected</li>
<li>How it will be collected</li>
</ul>
<p>Answering these questions in advance will put you well on your way to getting meaningful data.</p>
2: Test Your Measurement System
<p>Many quality improvement projects require measurement data for factors like weight, diameter, or length and width. Not verifying the accuracy of your measurements practically guarantees that your data—and thus your results—are not reliable.</p>
<p>A branch of statistics called <span><a href="http://blog.minitab.com/blog/adventures-in-statistics-2/three-measurement-system-analysis-questions-to-ask-before-you-take-a-single-measurement">Measurement System Analysis</a></span> lets you quickly assess and improve your measurement system so you can be sure you’re collecting data that is accurate and precise.</p>
<p>When gathering quantitative data, Gage Repeatability and Reproducibility (R&R) analysis confirms that instruments and operators are measuring parts consistently.</p>
<p>If you’re grading parts or identifying defects, an Attribute Agreement Analysis verifies that different<br />
evaluators are making judgments consistent with each other and with established standards.</p>
<p>If you do not examine your measurement system, you’re much more likely to add variation and<br />
inconsistency to your data that can wind up clouding your analysis.</p>
3: Beware of Confounding or Lurking Variables
<p>As you collect data, be careful to avoid introducing unintended and unaccounted-for variables. These “lurking” variables can make even the most carefully collected data unreliable—and such hidden factors often are insidiously difficult to detect.</p>
<p>A well-known example involves World War II-era bombing runs. Analysis showed that accuracy increased when bombers encountered enemy fighters, confounding all expectations. But a key variable hadn’t been factored in: weather conditions. On cloudy days, accuracy was terrible<br />
because the bombers couldn’t spot landmarks, and the enemy didn’t bother scrambling fighters.</p>
<p>Suppose that data for your company’s key product shows a much larger defect rate for items made by the second shift than items made by the first.</p>
<p style="margin-left: 40px;"><img alt="defects per shift" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/1eceb685455e3bed234c0b635ef60dc9/defects_per_shift.jpg" style="width: 576px; height: 377px;" /></p>
<p>Given only this information, your boss might suggest a training program for the second shift, or perhaps even more drastic action.</p>
<p>But could something else be going on? Your raw materials come from three different suppliers.</p>
<p>What does the defect rate data look like if you include the supplier along with the shift?</p>
<p style="margin-left: 40px;"><img alt="Defects per Shift per Suppleir" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ef2c2d3749d20b117106b4c122388846/defects_per_shift_with_supplier.jpg" style="width: 544px; height: 383px;" /></p>
<p>Now you can see that defect rates for both shifts are higher when using supplier 2’s materials. Not<br />
accounting for this confounding factor almost led to an expensive “solution” that probably would do little to reduce the overall defect rate.</p>
Take the Time to Get Data You Can Trust…
<p>Nobody sets out to waste time or sabotage their efforts by not collecting good data. But it’s all too easy to get problem data even when you’re being careful! When you collect data, be sure to spend<br />
the little bit of time it takes to make sure your data is truly trustworthy. </p>
Data AnalysisQuality ImprovementStatisticsTue, 01 Aug 2017 13:56:00 +0000http://blog.minitab.com/blog/understanding-statistics/3-keys-to-getting-reliable-dataEston MartzSunny Day for A Statistician and A Householder – An Update
http://blog.minitab.com/blog/using-data-and-statistics/sunny-day-for-a-statistician-and-a-householder%E2%80%93an-update
<p>We had solar panels fitted on our property in 2011. Last year, we had a few problems with the equipment. It was shutting down at various times throughout the day, typically when it was very sunny, resulting in no electricity being generated.</p>
<img alt="solar panels" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/0ee09d62f414b4bd79601d23995458bf/solar.jpg" style="width: 400px; height: 267px; margin: 10px 15px; float: right;" />
<div>
<div>
<p>In summer 2016, I completed a statistical analysis in Minitab to confirm my suspicions that my solar panels were not working as well as they did when they were installed. Details of this first analysis can be found <a href="http://blog.minitab.com/blog/using-data-and-statistics/sunny-day-for-a-statistician-vs-dark-day-for-a-householder-with-solar-panels">here.</a></p>
<p>After completing this analysis, I spoke to the company who manufactured our inverter and they identified a problem. On the 15th of July 2016, an engineer set up our inverter (the equipment that converts the solar energy from DC to AC), with the correct settings.</p>
<p>I now have the data for the months Jan–June 2017, which I am going to compare to the first six months for the years 2012–2016 to see if this fix has solved the problem.</p>
<p>I am going to use the same analysis as last year, the one-way analysis of variance via the Assistant: <strong> </strong><strong>Assistant > Hypothesis Tests > One-Way ANOVA</strong>.</p>
<p>The updated descriptive results were as follows:</p>
<p><img alt="solar energy output from Minitab 18" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/0c9657d141afd560725c5ac35f758e4e/solar_1.png" style="width: 798px; height: 224px;" /></p>
<p>Just looking at the summary statistics above, I can clearly see that the average electric units generated per day for the first six months of 2017, at 8.13, is higher than the 5.69 generated per day in 2016. </p>
<p>In my analysis of last year’s data, I found that 2016 was significantly worse than the previous year's. Using the results from this latest one-way ANOVA, and reviewing the Means Comparison Chart, shown below, I am hoping to see that 2017’s performance is as good as the years 2012–2015, when there were no problems with the solar-generating equipment. </p>
<p><img alt="solar energy means comparison chart" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/afebf341c3ca0eede9efd18393eb236a/solar_2.png" style="width: 561px; height: 507px;" /></p>
<p>The great news is that, the engineers fix has worked and the amount of electricity generated in the first 6 six months in 2017 is significantly better than 2016 and not significantly different to that generated in the years 2012-2015. </p>
<p>A sunny day for a householder and a statistician!</p>
<p> </p>
</div>
</div>
ANOVAData AnalysisStatisticsThu, 27 Jul 2017 12:00:00 +0000http://blog.minitab.com/blog/using-data-and-statistics/sunny-day-for-a-statistician-and-a-householder%E2%80%93an-updateGillian GroomHow to Eliminate False Alarms on P and U Control Charts
http://blog.minitab.com/blog/understanding-statistics/how-to-eliminate-false-alarms-on-p-and-u-control-charts
<p>All processes have variation, some of which is inherent in the process, and isn't a reason for concern. But when processes show unusual variation, it may indicate a change or a "special cause" that requires your attention. </p>
<p>Control charts are the primary tool quality practitioners use to detect special cause variation and distinguish it from natural, inherent process variation. These charts graph process data against an upper and a lower control limit. To put it simply, when a data point goes beyond the limits on the control chart, investigation is probably warranted.</p>
Traditional Control Charts and False Alarms
<p>It seems so straightforward. But veteran control chart users can tell you about “false alarms,” instances where data points went outside the control limits, and those limits fell close to the mean—even though the process <em>was </em>in statistical control.</p>
<p>The attribute charts, the <a href="http://blog.minitab.com/blog/understanding-statistics/what-control-chart-should-i-use">traditional P and U control charts</a> we use monitor defectives and defects, are particularly prone to false alarms due to a phenomenon known as overdispersion. That problem had been known for decades, until quality engineer David Laney solved it by devising P' and U' charts. </p>
<p>The P' and U' charts avoid false alarms so only important process deviations are detected. In contrast to the traditional charts, which assume a defective or defect rate remains constant, P' and U' charts assume that no process has a truly constant rate, and accounts for that when calculating control limits.</p>
<p>That's why the P' and U' charts deliver a more reliable indication of whether the process is really in control, or not.</p>
<p>Minitab's control chart capabilities include P' and U' charts, and the software also includes a diagnostic tool that identifies situations where you need to use them. When you choose the right chart, you can be confident any special-cause variation you're observing truly exists.</p>
The Cause of Control Chart False Alarms
<p>When you have too much variation, or “overdispersion,” in your process data, false alarms can result—especially with data collected in large subgroups. The larger the subgroups, the narrower the control limits on a traditional P or U chart. But those artificially tight control limits can make points on a traditional P chart appear out of control, even if they aren't.</p>
<p>However, too little variation, or “underdispersion,” in your process data also can lead to problems. Underdispersion can result in artificially wide control limits on a traditional P chart or U chart. Under that scenario, some points that appear to be in control could well be ones you <em>should </em>be concerned about.</p>
<p>If your data is affected by overdispersion or underdispersion, you need to use a P' or U' chart will to reliably distinguish common-cause from special-cause variation. </p>
Detecting Overdispersion and Underdispersion
<p>If you aren't sure whether or not your process data has over- or underdispersion, the P Chart or U Chart Diagnostic in Minitab can test it and tell you if you need to use a Laney P' or U' chart.</p>
<p>Choose <strong>Stat > Control Charts > Attributes Charts > P Chart Diagnostic</strong> or <strong>Stat > Control Charts > Attributes Charts > U Chart Diagnostic</strong>. </p>
<p><img alt="P Chart Diagnostic" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ab78f289474fe548db3deb7fa8581075/diagnostic_menu.png" style="width:575px;height:323px;" /></p>
<p>The following dialog appears:</p>
<p style="margin-left: 40px;"><img alt="P chart diagnostic" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/464df821310229bb5a439ca94db4f151/p_diagnostic_dialog_en_1_.jpg" style="width:400px;height:265px;" /></p>
<p>Enter the worksheet column that contains the number of defectives under "Variables." If all of your samples were collected using the same subgroup size, enter that number in Subgroup sizes. Alternatively, identify the appropriate column if your subgroup sizes varied.</p>
<p>Let’s run this test on the <a href="https://support.minitab.com/datasets/control-charts-data-sets/defective-records-data/" target="_blank">DefectiveRecords.MTW</a> from Minitab's sample data sets. This data set features very large subgroups, each having an average of about 2,500 observations.</p>
<p>The diagnostic for the P chart gives the following output:</p>
<p><img alt="P Chart Diagnostic" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/e9a9ae99c4c095e3af4ca390a8ebacd2/p_chart_diagnostic_for_defectives.png" style="width:576px;height:384px;" /></p>
<p>Below the plot, check out the ratio of observed to expected variation. If the ratio is greater than the 95% upper limit that appears below it, your data are affected by overdispersion. Underdispersion is a concern if the ratio is less than 60%. Either way, a Laney P' chart will be a more reliable option than the traditional P chart.</p>
Creating a P' Chart
<p>To make a P' chart, go to <strong>Stat > Control Charts > Attributes Charts > Laney P'</strong>. Minitab will generate the following chart for the Defectives.MTW data. </p>
<p><img alt="P' Chart of Defectives" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/67bf34d0435c1b131ddbe7a0ea5b26a6/laney_p____chart_of_defectives.png" style="width:576px;height:384px;" /></p>
<p>This P' chart shows a stable process with no out-of-control points. But create a traditional P chart with this data, and several of the subgroups appear to be out of control, thanks to the artificially narrow limits caused by the overdispersion. </p>
<p><img alt="P Chart of Defectives" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/361a23fa2f3d3553de39d5f211786302/p_chart_of_defectives.png" style="width:576px;height:384px;" /></p>
<p>So why do these data points appear to be out of control on the P but not the P' chart? It’s the way each defines and calculates process variation. The Laney P' chart control limits account for the overdispersion when calculating the variation and eliminate these false alarms.</p>
<p>The Laney P' chart calculations include within-subgroup variation as well as the variation <em>between </em>subgroups to adjust for overdispersion or underdispersion.</p>
<p>If over- or underdispersion is not a problem, the P' chart compares to a traditional P chart. But the P' chart expands the control limits where overdispersion exists, ensuring that only important deviations are identified as out of control. And in the case of underdispersion, the P' chart calculations result in narrower control limits.</p>
<p>To learn more about the statistical foundation underlying the Laney P' and U' charts, read <a href="http://www.minitab.com/en-us/published-articles/On-the-Charts--A-Conversation-with-David-Laney/">On the Charts: A Conversation with David Laney</a>.</p>
Control ChartsTue, 25 Jul 2017 13:58:00 +0000http://blog.minitab.com/blog/understanding-statistics/how-to-eliminate-false-alarms-on-p-and-u-control-chartsEston MartzArea Graphs: An Underutilized Tool
http://blog.minitab.com/blog/starting-out-with-statistical-software/area-graphs-an-underutilized-tool
<p>In my time at Minitab, I’ve gotten a good understanding of what types of graphs users create. Everyone knows about histograms, bar charts, and time series plots. Even relatively less familiar plots like the interval plot and <span><a href="http://blog.minitab.com/blog/understanding-statistics/trouble-starting-an-analysis-graph-your-data-with-an-individual-value-plot">individual value plot</a></span> are still used quite often. However, one of the most underutilized graphs we have available is the area graph. If you’re not familiar with an Area Graph, here’s the example from the Minitab help menu of what it looks like:</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/732ead34-1005-4470-b034-d7f8b87fabcf/Image/67c9fc3399dc4a8a5c72d2ace452db62/areagraph1.png" style="width: 366px; height: 245px;" /></p>
<p>As you can see, an area graph is a great way to be able to view multiple time series trends in one plot, especially if those plots form a part of one whole. There are numerous ways this can be used to visualize things. Anytime you are interested in multiple series that make up a whole, an area graph can do the job. You could use it to show enrollment rates by gender, precipitation rates by county, population totals by city, etc.</p>
<p>I’m going to show you how to go about creating one in <a href="http://www.minitab.com/products/minitab">Minitab</a>. First, we need to put our data in our worksheet. For this graph, we need each of the series, or sections, in a separate column. An additional constraint on this graph is that we need all of the columns to be of equal length, so be sure that’s the case. In our example we will use <a href="//cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/File/8ad6bf30e2b5eed510c2bd1e19f52e1c/areagraphblogdata.mtw">sales data</a> from different regional branches, and show that an area graph can be an improvement over a simple time series plot.</p>
<p>Once it’s in your worksheet, we can go to <strong>Graph > Time Series Plot</strong>, and look at the data in a basic time series plot. As you can see, there are a few challenges with interpreting this plot. </p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/732ead34-1005-4470-b034-d7f8b87fabcf/Image/dc1239167766d00e1eeea7a8e1227414/areagraph2.png" style="width: 577px; height: 385px;" /></p>
<p>First, the plot looks extremely messy. While it gives a good look at the sales from the individual branches, it is very hard to track an individual branch through time. And it’s not much better to look at 4 (or more) separate individual plots, because it then makes it harder to compare. Additionally, when you make separate plots, an important piece of information is lost: total sales. For example, in August, Philadelphia, London, and Seattle had a total sales increase, while New York had its worst month of the year. Was this an overall gain or overall loss? We can’t really tell from individual plots. </p>
<p>Instead, let’s look at an Area Graph. You can find this by going to <strong>Graph > Area Graph</strong>, and entering the series the same way as we did the time series plot. Take a look at our output below:</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/732ead34-1005-4470-b034-d7f8b87fabcf/Image/f5f5d94367127534af22a16b9dc364d1/areagraph3.png" style="width: 577px; height: 385px;" /></p>
<p>For starters, it looks much cleaner. We are able to see clear trends in the overall pattern. We can see that overall sales spiked in August, answering our question from above. We can use this to evaluate trends in multiple series, <em>as well as</em> the contribution of each series to the total quantity. We get all the information about total sales month-to-month, as well as the individual series for each location, in one plot, instead of in the messy, hard-to-read Time Series plot we created first.</p>
<p>Next time you need to evaluate multiple series together, considering taking a look at the Area Graph to get a cleaner picture of your data!</p>
Data AnalysisStatisticsStatsThu, 20 Jul 2017 12:00:00 +0000http://blog.minitab.com/blog/starting-out-with-statistical-software/area-graphs-an-underutilized-toolEric HeckmanPoisson Data: Examining the Number Deaths in an Episode of Game of Thrones
http://blog.minitab.com/blog/the-statistics-game/poisson-data-examining-the-number-deaths-in-an-episode-of-game-of-thrones
<p><img alt="Game of Thrones" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/d11b4341996f340e24132eb12253d8e5/game_of_thrones.jpg" style="float: right; width: 250px; height: 141px; margin: 10px 15px; border-width: 1px; border-style: solid;" />There may not be a situation more perilous than being a character on <a href="http://www.hbo.com/game-of-thrones" target="_blank"><em>Game of Thrones</em></a>. Warden of the North, Hand of the King, and apparent protagonist of the entire series? Off with your head before the end of the first season! Last male heir of a royal bloodline? Here, have a pot of molten gold poured on your head! Invited to a wedding? Well, you probably know what happens at weddings in the show. </p>
<p>So what do all these gruesome deaths have to do with statistics? They are data that come from a <a href="http://blog.minitab.com/blog/fun-with-statistics/poisson-processes-and-probability-of-poop">Poisson distribution</a>.</p>
<p>Data from a Poisson distribution describe the number of times an event occurs in a finite observation space. For example, a Poisson distribution can describe the number of defects in the mechanical system of an airplane, the number of calls to a call center, or in our case it can describe the number of deaths in an episode of Game of Thrones.</p>
Goodness-of-Fit Test for Poisson
<p>If you're not certain whether your data follow a Poisson distribution, you can use <a href="http://www.minitab.com/en-us/products/minitab/" target="_blank">Minitab Statistical Software</a> to perform a goodness-of-fit test. If you don't already use Minitab and you'd like to follow along with this analysis, download the <a href="http://www.minitab.com/products/minitab/free-trial/">free 30-day trial</a>.</p>
<p>I collected the <a href="http://genius.com/Game-of-thrones-list-of-game-of-thrones-deaths-annotated" target="_blank">number of deaths for each episode</a> of Game of Thrones (as of this writing, 57 episodes have aired), and put them in a Minitab worksheet. Then I went to <strong>Stat > Basic Statistics > Goodness-of-Fit Test for Poisson </strong>to determine whether the data follow a Poisson distribution. You can get the data I used <a href="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f73acb13fa520a25583149f8b780a31c/game_of_thrones_deaths.mtw">here</a>. </p>
<p style="margin-left: 40px;"><img alt="Goodness-of-Fit Test for Poisson Distribution " src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/0c9dcb9ecb6eb644109d86e3501143b3/gof_test_poisson.jpg" style="width: 492px; height: 417px;" /></p>
<p>Before we interpret the p-value, we see that we have a problem. Three of the categories have an expected value less than 5. If the expected value for any category is less than 5, the results of the test may not be valid. To fix our problem, we can combine categories to achieve the minimum expected count. In fact, we see that Minitab actually already started doing this by combining all episodes with 7 or more deaths.</p>
<p>So we'll just continue by making the highest category 6 or more deaths, and the lowest category 1 or 0 deaths. To do this, I created a new column with the categories 1, 2, 3, 4, 5 and 6. Then I made a frequency column that contained the number of occurrences for each category. For example, the "1" category is a combination of episodes with 0 deaths and 1 death, so there were 15 occurrences. Then I ran the analysis again with the new categories.</p>
<p style="margin-left: 40px;"><img alt="Goodness-of-Fit Test for Poisson Distribution " src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/93551e38ce5c4cc5321c249fee184e24/gof_test_poisson_2.jpg" style="width: 420px; height: 323px;" /></p>
<p>Now that all of our categories have expected counts greater than 5, we can examine the p-value. If the p-value is less than the significance level (usually 0.05 works well), you can conclude that the data do not follow a Poisson distribution. But in this case the p-value is 0.228, which is greater than 0.05. Therefore, we cannot conclude that the data do not follow the Poisson distribution, and can continue with analyses that assume the data follow a Poisson distribution. </p>
Confidence Interval for 1-Sample Poisson Rate
<p>When you have data that come from a Poisson distribution, you can use <strong>Stat > Basic Statistics > 1-Sample Poisson Rate</strong> to get a rate of occurrence and calculate a range of values that is likely to include the population rate of occurrence. We'll perform the analysis on our data.</p>
<p style="margin-left: 40px;"><img alt="1-Sample Poisson Rate" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/259b9b0cb11fed7e5b7467703f7037ad/1_poisson_rate.jpg" style="width: 489px; height: 133px;" /></p>
<p>The rate of occurrence tells us that on average there are about 3.2 deaths per episode on <em>Game of Thrones</em>. If our 57 episodes were a sample from a much larger population of <em>Game of Thrones</em> episodes, the confidence interval would tell us that we can be 95% confident that the population rate of deaths per episode is between 2.8 and 3.7.</p>
<p>The length of observation lets you specify a value to represent the rate of occurrence in a more useful form. For example, suppose instead of deaths per episode, you want to determine the number of deaths per season. There are 10 episodes per season. So because an individual episode represents 1/10 of a season, 0.1 is the value we will use for the length of observation. </p>
<p style="margin-left: 40px;"><img alt="1-Sample Poisson Rate" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/b6fa9d2e740aacc86d4223ea75487d95/1_poisson_rate_season.jpg" style="width: 495px; height: 106px;" /></p>
<p>With a different length of observation, we see that there are about 32 deaths per season with a confidence interval ranging from 28 to 37.</p>
Poisson Regression
<p>The last thing we'll do with our Poisson data is perform a regression analysis. In Minitab, go to <strong>Stat > Regression > Poisson Regression > Fit Poisson Model</strong> to perform a Poisson regression analysis. We'll look at whether we can use the episode number (1 through 10) to predict how many deaths there will be in that episode.</p>
<p style="margin-left: 40px;"><img alt="Poisson Regression" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/0540d6716d13c4de50421155038b2c03/poisson_regression.jpg" style="width: 402px; height: 238px;" /></p>
<p>The first thing we'll look at is the p-value for the predictor (episode). The p-value is 0.042, which is less than 0.05, so we can conclude that there is a statistically significant association between the episode number and the number of deaths. However, the Deviance R-Squared value is only 18.14%, which means that the episode number explains only 18.14% of the variation in the number of deaths per episode. So while an association exists, it's not very strong. Even so, we can use the coefficients to determine how the episode number affects the number of deaths. </p>
<p style="margin-left: 40px;"><img alt="Poisson Regression" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/adb7514fd7892c3b8591895321c96918/poisson_regression_2.jpg" style="width: 241px; height: 227px;" /></p>
<p>The episode number was entered as a categorical variable, so the coefficients show how each episode number affects the number of deaths relative to episode number 1. A positive coefficient indicates that episode number is likely to have more deaths than episode 1. A negative coefficient indicates that episode number is likely to have fewer deaths than episode 1.</p>
<p>We see that the start of each season usually starts slow, as 7 of the 9 episode numbers have positive coefficients. Episodes 8, 9, and 10 have the highest coefficients, meaning relative to the first episode of the season they have the greatest number of deaths. So even though our model won't be great at predicting the exact number of deaths for each episode, it's clear that the show ends each season with a bang.</p>
<p>So, if you're a <em>Game of Thrones</em> viewer you should brace yourself, because death is coming. Or, as they would say in Essos:</p>
<p><em>Valar morghulis.</em></p>
Data AnalysisFun StatisticsStatisticsStatistics in the NewsTue, 18 Jul 2017 12:03:00 +0000http://blog.minitab.com/blog/the-statistics-game/poisson-data-examining-the-number-deaths-in-an-episode-of-game-of-thronesKevin RudySealing Up Patient Safety with Monte Carlo Simulation
http://blog.minitab.com/blog/understanding-statistics/sealing-up-patient-safety-with-monte-carlo-simulation
<p>If you have a process that isn’t meeting specifications, using the Monte Carlo simulation and optimization tools in <a href="http://companionbyminitab.com">Companion by Minitab</a> can help. Here’s how you, as an engineer in the medical device industry, could use Companion to improve a packaging process and help ensure patient safety.</p>
<p><img alt="sealed bags" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ae6757ada46a7ca1efd229a5136c679f/sealed_bags.png" style="margin: 10px 15px; float: right; width: 258px; height: 183px;" />Your product line at AlphaGamma Medical Devices is shipped in heat-sealed packages with a minimum seal strength requirement of 13.5 Newtons per square millimeter (N/mm2). Meeting this specification is critical, because when a seal fails the product inside is no longer sterile and puts patients at risk.</p>
<p>Seal strength depends on the temperature of the sealing device and the sealing time. The relationship between the two factors is expressed by the model:</p>
<p>Seal Strength= 9.64 + 0.003*Temp + 4.0021*Time + 0.000145 Temp*Time</p>
<p>Currently, your packages are sealed at an average temperature of 120 degrees Celsius, with a standard deviation of 25.34. The mean sealing time is 1.5 seconds, with a standard deviation of 0.5. Both parameters follow a normal distribution.</p>
Building your process model
<p>To assess the process capability under the current conditions, you can enter the parameter, transfer function, and specification limits into Companion’s straightforward interface, verify that the model diagram matches your process, and then instantly simulate 50,000 package seals using your current input conditions.</p>
<img alt="seal strength model monte carlo simulation" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/20152d62bb190da9adf41f64cd0dc396/seal_strength1.png" style="border: 1px solid; line-height: 22.4px; width: 750px; height: 539px;" />
Understanding your results
<img alt="monte carlo output for seal strength simulation round 1" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ef3a1f0d4322dd9a947c6b7d5330916e/seal_strength2.png" style="border: 0px none; line-height: 22.4px; width: 750px; height: 609px;" />
<p>The process performance measurement (Cpk) for your process is 0.42, far less than the minimum standard of 1.33. Under the current conditions, more than 10% of your seals will fail to meet specification.</p>
Finding optimal input settings
<p>Companion’s intuitive workflow guides you to the next step: optimizing your inputs.</p>
<img alt="parameter optimzation" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/dd5c9986aa27ecc57b1717ffca351940/paper_brightness3.png" style="border: 0px none; line-height: 22.4px; width: 700px; height: 72px;" />
<p>You set the goal—in this case, minimizing the percent out of spec—and enter high and low values for your inputs. Companion does the rest.</p>
<img alt="defining optimization objectives" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/00d8e90f539441dbf799188c4b44cfb2/seal_strength4.png" style="border: 0px none; line-height: 22.4px; width: 750px; height: 478px;" />
Simulating the new process
<p>After finding the optimal input settings in the ranges you specified, Companion presents the simulated results of the recommended process changes.</p>
<img alt="second seal strength model monte carlo simulation" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/bd4faa4966b29d0639d086d7730cf35f/seal_strength5.png" style="border: 0px none; line-height: 22.4px; width: 750px; height: 811px;" />
<p>The simulation shows the new settings would reduce the amount of faulty seals to less than 1% with a Ppk of 0.78—an improvement, but still shy of the 1.33 Ppk standard.</p>
Understanding variability
<p>To further improve the package sealing process, Companion then suggests that you perform a sensitivity analysis.</p>
<img alt="sensitivity analysis" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f4244a7a487d7f7b1804a4a59320ce4f/paper_brightness6.png" style="border: 0px none; line-height: 22.4px; width: 700px; height: 85px;" />
<p>Companion’s unique graphic presentation of the sensitivity analysis provides you with insight into how the variation of your inputs influences seal strength.</p>
<img alt="sensitivity analysis results" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/13d48b485042169f5695e22ca8125f5b/seal_strength7.png" style="border: 0px none; line-height: 22.4px; width: 750px; height: 569px;" />
<p>The blue line representing time indicates that this input’s variability has more of an impact on percent of spec than temperature. The blue line also indicates how much of an impact you can expect to see: in this case, reducing the variability in time by 50% will reduce the percent out of spec to about 0 percent. Based on these results, you run another simulation to visualize the strength of your seals using the 50% variation reduction in time.</p>
<img alt="third monte carlo model simulation for seal strength" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/d63092260b0a97afea2cde89bb804c70/seal_strength8.png" style="border: 0px none; line-height: 22.4px; width: 750px; height: 717px;" />
<p>The simulation shows that reducing the variability will result in a Ppk of 1.55, with 0% of your seals out of spec, and you’ve just helped AlphaGamma Medical Devices bolster its reputation for excellent quality.</p>
Getting great results
<p>Figuring out how to improve a process is easier when you have the right tool to do it. With Monte Carlo simulation to assess process capability, Parameter Optimization to identify optimal settings, and Sensitivity Analysis to pinpoint exactly where to reduce variation, Companion can help you get there.</p>
<p>To try the Monte Carlo simulation tool, as well as Companion's more than 100 other tools for executing and reporting quality projects, learn more and get the free 30-day trial version for you and your team at <a href="http://companionbyminitab,com.">companionbyminitab,com</a>.</p>
Medical DevicesMonte CarloMonte Carlo SimulationFri, 14 Jul 2017 12:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/sealing-up-patient-safety-with-monte-carlo-simulationEston MartzHow Many Samples Do You Need to Be Confident Your Product Is Good?
http://blog.minitab.com/blog/the-statistical-mentor/how-many-samples-do-you-need-to-be-confident-your-product-is-good
<p>How many samples do you need to be “95% confident that at least 95%—or even 99%—of your product is good?</p>
<p>The answer depends on the type of response variable you are using, <a href="http://blog.minitab.com/blog/understanding-statistics/understanding-qualitative-quantitative-attribute-discrete-and-continuous-data-types">categorical or continuous</a>. The type of response will dictate whether you 'll use:</p>
<ol>
<li><strong>Attribute Sampling:</strong> Determine the sample size for a categorical response that classifies each unit as Good or Bad (or, perhaps, In-spec or Out-of-spec).<br />
</li>
<li><strong>Variables Sampling:</strong> Determine the sample size for a continuous measurement that follows a Normal distribution.</li>
</ol>
<p>The attribute sampling approach is valid regardless of the underlying distribution of the data. The variables sampling approach has a strict normality assumption, but requires fewer samples.</p>
<p>In this blog post, I'll focus on the attribute approach.</p>
Attribute Sampling
<p>A simple formula gives you the sample size required to make a 95% confidence statement about the probability an item will be in-spec when your sample of size <em>n</em> has zero defects.</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/fad9313f89693a9b0360530962ab7cd3/samples1.png" style="width: 129px; height: 35px;" />, where the reliability is the probability of an in-spec item.</p>
<p>For a reliability of 0.95 or 95%, <img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/1ddd6b2ce480f2ff3e7c0592aaf961c5/samples2.png" style="width: 216px; height: 35px;" /></p>
<p>For a reliability of 0.99 or 99%, <img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f5d653516c2331614ec90cdd5d572cc5/samples3.png" style="width: 177px; height: 35px;" /></p>
<p>Of course, if you don't feel like calculating this manually, you can use the <strong>Stat > Basic Statistics > 1 Proportion</strong> dialog box in Minitab to see the reliability levels for different sample sizes. </p>
<p style="margin-left: 40px;"><img alt="one-sample-proportion" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/241a39744cce5b363be898d42c309393/samples4.png" style="width: 412px; height: 320px;" /></p>
<p style="margin-left: 40px;"><img alt="1-proportion test" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/05c501e3d682fe296cb0928db01db8fa/1_proportion_output.jpg" style="width: 330px; height: 108px;" /></p>
<p style="margin-left: 40px;"><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/1f0265155c51089ec36018af8434e97f/samples6.png" style="width: 412px; height: 320px;" /></p>
<p style="margin-left: 40px;"><img alt="1-proportion output" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/11c30b3d7c6b10211dc74381cc86fe99/1_proportion_output2.jpg" style="width: 346px; height: 113px;" /></p>
<p>These two sampling plans are really just <span><a href="http://blog.minitab.com/blog/applying-statistics-in-quality-projects/attribute-acceptance-sampling-for-an-acceptance-number-of-0">C=0 Acceptance Sampling plans</a></span> with an infinite lot size. The same sample sizes can be generated using <strong>Stat > Quality Tools > Acceptance Sampling by Attributes</strong> by:</p>
<ol>
<li>Setting RQL at 5% for 95% reliability or 1% for 99% reliability.</li>
<li>Setting the Consumer’s Risk (β) at 0.05, which results in a 95% confidence level.</li>
<li>Setting AQL at an arbitrary value lower than the RQL, such as 0.1%.</li>
<li>Setting Producer’s Risk (α) at an arbitrary high value, such as 0.5 (note, <span style="line-height: 20.7999992370605px;">α</span> must be less than 1-<span style="line-height: 20.7999992370605px;">β</span> to run).</li>
</ol>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/1725fe6e26887a420fc96723e12c89a6/samples8.png" style="width: 493px; height: 402px;" /><img src="file:///C:\Users\emartz\AppData\Local\Temp\msohtmlclip1\01\clip_image008.png" /></p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/8f1b12c461612a6ca2e7eeadcec8880c/samples9.png" style="width: 577px; height: 385px;" /></p>
<p>By changing RQL to 1%, the following C=0 plan can be obtained:</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/1ddaf928b4ab3e01a4723e0d7893ecf9/samples10.png" style="width: 577px; height: 385px;" /></p>
<p>If you want to make the same confidence statements while allowing 1 or more defects in your sample, the sample size required will be larger. For example, allowing 1 defect in the sample will require a sample size of 93 for the 95% reliability statement. This is a C=1 sampling plan. It can be generated, in this case, by lowering the Producer’s risk to 0.05.</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a8c64d321a4a5b592ea04a0c354df462/samples11.png" style="line-height: 20.7999992370605px; width: 493px; height: 402px;" /><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/74b9611081d2b7f50e849128670c2c72/samples12.png" style="line-height: 1.6; width: 577px; height: 385px;" /></p>
<p><span style="line-height: 1.6;">As you can see, the sample size for an acceptance number of 0 is much smaller—in this case, raising the acceptance number from 0 to 1 has raised the sample size from 59 to 93.</span></p>
<p><span style="line-height: 20.8px;">Check out this post </span>for <span><a href="http://blog.minitab.com/blog/applying-statistics-in-quality-projects/attribute-acceptance-sampling-for-an-acceptance-number-of-0">more information about acceptance sampling</a></span>. </p>
<p> </p>
<p> </p>
Reliability AnalysisWed, 12 Jul 2017 12:00:00 +0000http://blog.minitab.com/blog/the-statistical-mentor/how-many-samples-do-you-need-to-be-confident-your-product-is-goodJim ColtonGetting the Most Out of Your Text Data Part III
http://blog.minitab.com/blog/marilyn-wheatleys-blog/getting-the-most-out-of-your-text-data-part-iii
<p>The two previous posts in this series focused on manipulating data using Minitab’s <a href="http://blog.minitab.com/blog/marilyn-wheatleys-blog/getting-the-most-out-of-your-text-data-part-1">calculator</a> and the <a href="http://blog.minitab.com/blog/marilyn-wheatleys-blog/getting-the-most-out-of-your-text-data-part-2">Data menu</a>.</p>
<p><img alt="text data manipulation" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/4f4beea32033918a81791a588e080ff5/wordcloud.png" style="margin: 10px 15px; float: right; width: 300px; height: 150px;" />In this third and final post, we continue to explore helpful features for working with text data and will focus on some features in Minitab’s Editor menu.</p>
Using the Editor Menu
<p>The <span><a href="http://blog.minitab.com/blog/understanding-statistics/a-swiss-army-knife-for-analyzing-data">Editor menu</a></span> is unique in that the options displayed depend on what is currently active (worksheet, graph, or session window). In this blog post, we’ll focus on some of the options available when a worksheet is active. Here's the Editor menu in Minitab 18:</p>
<p style="margin-left: 40px;"><img alt="Minitab 18 Editor Menu" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/35c2dae084658b85de67bf18a1a472fe/18_editor_menu.png" style="width: 225px; height: 345px;" /></p>
<p>There is also some duplication between the Data menu, which was the focus of my previous post, and the Editor menu: both menus provide the option for Conditional Formatting. The same conditional formatting options can now be accessed via either the Data or the Editor menus.</p>
<p>Let's consider some examples using features from Find and Replace (Find/Replace Formatted Cell Value), Cell Properties (Comment, Highlight & Custom Formats), Column Properties (Value Order) and Subset Worksheet (Custom Subset).</p>
Find and Replace
<p>This section of the Editor menu includes options for <em>Find Format</em> and <em>Replace Formatted Cell Value</em>—either selection will display the <strong>Find Format and Replace Value</strong> dialog box:</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/Image/ece515ded69ebe481c1ed410bb75afd0/capture.PNG" style="width: 667px; height: 397px;" /></p>
<p>We can toggle between the two options by using the <strong>Find</strong> and the <strong>Replace</strong> tabs at the top.</p>
<p>Both of these options could be useful when making changes to a worksheet that has been formatted using the new conditional formatting options discussed in the previous post in this series.</p>
<p>For example, if we’ve applied conditional formatting to a worksheet to highlight cells with the value ‘Low’ in green, we could use the Replace tab to find all the cells that are green and replace the values in the cells with new values. For example, we can replace the ‘Low’ values that are marked in green with the new value ‘Insignificant’:</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/Image/866aa00739bd70f91947c03a99c97256/capture.PNG" style="width: 717px; height: 297px;" /></p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/Image/3c1185808f65161b72f121f8b7d42bc5/capture.PNG" style="width: 319px; height: 288px;" /></p>
Cell Properties
<p>The new Cell Properties option in the Editor menu provides options for adding a <strong>Comment</strong> to the selected cell, to <strong>Highlight</strong> specific cells in the worksheet, and to create <strong>Custom Formats. </strong>These same options can be accessed by right-clicking on a cell and choosing <strong>Cell Properties:</strong></p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/Image/f9ac417a96739f2699d0b90b9649bbf4/capture.PNG" style="width: 795px; height: 461px;" /></p>
<p>The ability to add a comment to a specific cell is new. In previous versions of <a href="http://www.minitab.com/products/minitab">Minitab</a> it was possible to add a comment to a worksheet or column only. Now we can select the <strong>Comment</strong> option to add a comment to a cell:</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/Image/8392dbec9d49663541738d353eee321f/capture.PNG" style="width: 439px; height: 323px;" /></p>
<p>Notice that the top of the window confirms where the comment will be added. In the example above, it will be in C3 in row 6.</p>
<p>Similar to conditional formatting, we can use the <strong>Highlight</strong> options to highlight only the selected cell or cells:</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/Image/ea4a4719b20c97b12a8fcc7507f6aa2d/capture.PNG" style="width: 387px; height: 382px;" /></p>
<p>Finally, the <strong>Custom Formats</strong> option allows us flexibility in terms of the fill color, font color, and style for the selected cell or cells:</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/Image/07c56b61765c435bb6a24d24910bcc31/capture.PNG" style="width: 655px; height: 440px;" /></p>
Column Properties
<p>This option in the Editor menu allows us to control the order of text strings in a column. For example, if I create a bar chart using <strong>Graph > Bar Chart > Counts of Unique Values</strong> using the data in column 1 below, the default output is in alphabetical order:</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/Image/e11c8a620ae30b4e95aeb770da96e2d7/capture.PNG" style="width: 597px; height: 336px;" /></p>
<p>In some cases, it would be more intuitive to display the order of the bars beginning with Low, then Medium, then High- that is where the Editor menu can help.</p>
<p>First, we click in any cell in column 1 so that the column we want to modify is active, then we select <strong>Editor > Column Properties > Value Order</strong>:</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/Image/06017297115cf248941eb5b042e59768/capture.PNG" style="width: 794px; height: 355px;" /></p>
<p>To change the alphabetical order default, we select the radio button next to <strong>User-specified order</strong>, and then edit the order under <strong>Define an order</strong> and click OK. Now the default order will be Low, Medium, High, and we can update our bar chart to reflect that change:</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/Image/57f3b2237951b632cf32a29f05e19cd9/capture.PNG" style="width: 514px; height: 385px;" /></p>
<p> </p>
Subset Worksheet
<p>One of the best new enhancements to the Editor menu gives us the ability to quickly and easily create a subset of a worksheet without having to manually type a formula into the calculator.</p>
<p>For example, we may want to create a new worksheet that excludes items that are marked as Low priority. To do that, we can use <strong>Editor > Subset Worksheet > Custom Subset</strong>:</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/Image/2612106879badb196bb0988e61bba4b9/capture.PNG" style="width: 624px; height: 613px;" /></p>
<p>In this example, we’re telling Minitab that we want to use a condition when we subset- we want to <strong>Exclude</strong> rows that match our condition. Our condition is based on the column <strong>Priority</strong>. When we introduce that text column, Minitab automatically shows all the unique values in that column. We select <strong>Low</strong> as the value we want to exclude from the new worksheet, and then click <strong>OK</strong>. It’s that simple- no need to guess whether we need to type in single or double-quotes in the subset condition!</p>
<p>I hope this series of posts on working with text data has been useful. If you have an older version of Minitab and would like to use the new features described in these posts, you can download and install the <a href="http://it.minitab.com/products/minitab/free-trial.aspx">free 30-day trial</a> and check it out!</p>
Data AnalysisFun StatisticsLearningProject ToolsQuality ImprovementSix SigmaMon, 10 Jul 2017 12:00:00 +0000http://blog.minitab.com/blog/marilyn-wheatleys-blog/getting-the-most-out-of-your-text-data-part-iiiMarilyn WheatleyWhat Does It Mean When Your Probability Plot Has Clusters?
http://blog.minitab.com/blog/the-statistical-mentor/what-does-it-mean-when-your-probability-plot-has-clusters
<p><span style="line-height: 1.6;">Have you ever had a probability plot that looks like this?</span></p>
<p><img alt="Probability Plot of Patient Weight Before and After Surgery" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/885af8cd49c44b34709a529d3c4a78dd/probability_plot.png" style="width: 550px; height: 366px;" /></p>
<p>The probability plot above is based on patient weight (in pounds) after surgery minus patient weight (again, in pounds) before surgery.</p>
<p>The red line appears to go through the data, indicating a <a href="http://blog.minitab.com/blog/fun-with-statistics/normal-the-kevin-bacon-of-distributions">good fit to the Normal</a>, but there are clusters of plotting points at the same measured value. This occurs on a probability plot when there are many ties in the data. If the true measurement can take on any value (in other words, if the variable is continuous), then the cause of the clusters on the probability plot is poor measurement resolution.</p>
<p>The Anderson-Darling Normality test typically rejects normality when there is poor measurement resolution. In a previous blog post (<span><a href="http://blog.minitab.com/blog/the-statistical-mentor/normality-tests-and-rounding">Normality Tests and Rounding</a></span>) I recommended using the Ryan-Joiner test in this scenario. The Ryan-Joiner test generally does not reject normality due to poor measurement resolution. </p>
<p>In this example, the Ryan-Joiner p-value is above 0.10. A probability plot that supports using a Normal distribution would be helpful to confirm the Ryan-Joiner test results. How can we see a probability plot of the true weight differences? Simulation can used to show how the true weight differences might look on a probability plot.</p>
<p>The difference in weight values were rounded to the nearest pound. In effect, we want to add a random value from -0.5 to +0.5 to each value to get a simulated measurement. The steps are as follows:</p>
<ol>
<li>Store simulated noise values from -0.5 to +0.5 in a column using <strong>Calc > Random Data > Uniform</strong>.</li>
<li>Use <strong>Calc > Calculator</strong> to add the noise column to the original column of data.</li>
<li>Create a normal probability plot using <strong>Stat > Basic Statistics > Normality Test</strong>.</li>
<li>Repeat steps 1-3 several times if you want to see how the results are affected by the simulated values.</li>
</ol>
<p>The resulting graph from one iteration of these steps is shown below. It suggests that the Normal distribution is a good model for the difference in weights for this surgery.</p>
<p><img alt="Probability plot with simulated measurements" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/9035bd61788a55da21c2fb4c437a0fc1/probability_plot_simulated_measurements.png" style="width: 550px; height: 369px;" /></p>
<p> </p>
Data AnalysisHealth Care Quality ImprovementStatisticsFri, 07 Jul 2017 12:00:00 +0000http://blog.minitab.com/blog/the-statistical-mentor/what-does-it-mean-when-your-probability-plot-has-clustersJim ColtonWhat Is the Difference between Linear and Nonlinear Equations in Regression Analysis?
http://blog.minitab.com/blog/adventures-in-statistics-2/what-is-the-difference-between-linear-and-nonlinear-equations-in-regression-analysis
<p><img alt="Fourier nonlinear function" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/e9de64f1c065b86f718856ead9a23e57/fourier1.png" style="float: right; width: 180px; height: 193px;" />Previously, I’ve written about <a href="http://blog.minitab.com/blog/adventures-in-statistics/linear-or-nonlinear-regression-that-is-the-question" target="_blank">when to choose nonlinear regression</a> and <a href="http://blog.minitab.com/blog/adventures-in-statistics/curve-fitting-with-linear-and-nonlinear-regression" target="_blank">how to model curvature with both linear and nonlinear regression</a>. Since then, I’ve received several comments expressing confusion about what differentiates nonlinear equations from linear equations. This confusion is understandable because both types can model curves.</p>
<p>So, if it’s not the ability to model a curve, what <em>is</em> the difference between a linear and nonlinear regression equation?</p>
Linear Regression Equations
<p>Linear regression requires a linear model. No surprise, right? But what does that really mean?</p>
<p>A model is linear when each term is either a <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-how-to-interpret-the-constant-y-intercept" target="_blank">constant</a> or the product of a <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/introductory-concepts/basic-concepts/parameters/" target="_blank">parameter</a> and a <a href="http://support.minitab.com/en-us/minitab/17/topic-library/modeling-statistics/regression-and-correlation/regression-models/what-are-response-and-predictor-variables/" target="_blank">predictor variable</a>. A linear equation is constructed by adding the results for each term. This constrains the equation to just one basic form:</p>
<p>Response = constant + parameter * predictor + ... + parameter * predictor</p>
<p>Y = b o + b1X1 + b2X2 + ... + bkXk</p>
<p>In statistics, a regression equation (or function) is linear when it is linear in the parameters. While the equation must be linear in the parameters, you can transform the predictor variables in ways that produce curvature. For instance, you can include a squared variable to produce a U-shaped curve.</p>
<p>Y = b o + b1X1 + b2X12</p>
<p>This model is still linear in the parameters <em>even though the predictor variable is squared</em>. You can also use log and inverse functional forms that are linear in the parameters to produce different types of curves.</p>
<p>Here is an example of a linear regression model that uses a squared term to fit the <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-predict-with-minitab-using-bmi-to-predict-the-body-fat-percentage-part-1" target="_blank">curved relationship between BMI and body fat percentage</a>.</p>
<p><img alt="Linear model with squared term" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/6152b3a741106f4e66c3bca3c7df128d/fittedlp_bmi.gif" style="width: 576px; height: 384px;" /></p>
Nonlinear Regression Equations
<p>While a linear equation has one basic form, nonlinear equations can take many different forms. The easiest way to determine whether an equation is nonlinear is to focus on the term “nonlinear” itself. Literally, it’s not linear. If the equation doesn’t meet the criteria above for a linear equation, it’s nonlinear.</p>
<p>That covers many different forms, which is why nonlinear regression provides the most flexible curve-fitting functionality. Here are several examples from <a href="http://www.minitab.com/en-us/products/minitab/" target="_blank">Minitab’s</a> nonlinear function catalog. Thetas represent the parameters and X represents the predictor in the nonlinear functions. Unlike linear regression, these functions can have more than one parameter per predictor variable.</p>
<strong>Nonlinear function</strong>
<strong>One possible shape</strong>
Power (convex): Theta1 * X^Theta2
<img alt="Power function in nonlinear regression" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/093cc20ff4c83278340d5e0ef46dabb3/power.png" style="width: 194px; height: 195px;" />
Weibull growth: Theta1 + (Theta2 - Theta1) * exp(-Theta3 * X^Theta4)
<img alt="Weibull growth function in nonlinear regression" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/b31955558cd15b73be6f7ecf0606df8e/weibull_growth.png" style="width: 194px; height: 193px;" />
Fourier: Theta1 * cos(X + Theta4) + (Theta2 * cos(2*X + Theta4) + Theta3
<img alt="Fourier function for nonlinear regression" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/e9de64f1c065b86f718856ead9a23e57/fourier1.png" style="width: 180px; height: 193px;" />
<p>Here is an example of a nonlinear regression model of the <a href="http://blog.minitab.com/blog/adventures-in-statistics/linear-or-nonlinear-regression-that-is-the-question" target="_blank">relationship between density and electron mobility</a>.</p>
<p><img alt="Nonlinear regression model for electron mobility" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/54fabed19e3bb42ac0cba35afd56da24/flpnonlinear.gif" style="width: 502px; height: 335px;" /></p>
<p>The nonlinear equation is so long it that it doesn't fit on the graph:</p>
<p style="margin-left: 40px;">Mobility = (1288.14 + 1491.08 * Density Ln + 583.238 * Density Ln^2 + 75.4167 * Density Ln^3) / (1 + 0.966295 * Density Ln + 0.397973 * Density Ln^2 + 0.0497273 * Density Ln^3)</p>
<p>Linear and nonlinear regression are actually named after the functional form of the models that each analysis accepts. I hope the distinction between linear and nonlinear equations is clearer and that you understand how it’s possible for linear regression to model curves! It also explains why you’ll see R-squared displayed for some curvilinear models even though <a href="http://blog.minitab.com/blog/adventures-in-statistics/why-is-there-no-r-squared-for-nonlinear-regression" target="_blank">it’s impossible to calculate R-squared for nonlinear regression</a>.</p>
<p>If you're learning about regression, read my <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-tutorial-and-examples" target="_blank">regression tutorial</a>!</p>
Regression AnalysisThu, 06 Jul 2017 12:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics-2/what-is-the-difference-between-linear-and-nonlinear-equations-in-regression-analysisJim Frost