Minitab | MinitabBlog posts and articles about using Minitab software in quality improvement projects, research, and more.
http://blog.minitab.com/blog/minitab/rss
Sat, 27 Dec 2014 15:45:52 +0000FeedCreator 1.7.3A Minitab Holiday Tale: Featuring the Two Sample t-Test
http://blog.minitab.com/blog/statistics-in-the-field/a-minitab-holiday-tale-featuring-the-two-sample-t-test
<p><em><span style="line-height: 1.6;">by Matthew Barsalou, guest blogger</span></em></p>
<p>Aaron and Billy are two very competitive—and not always well-behaved—eight-year-old twin brothers. They constantly strive to outdo each other, no matter what the subject. If the boys are given a piece of pie for dessert, they each automatically want to make sure that their own piece of pie is bigger than the other’s piece of pie. This causes much exasperation, aggravation and annoyance for their parents. Especially when it happens in a restaurant (although the restaurant situation has improved, since they have been asked not to return to most local restaurants).</p>
<p><img alt="A bag of coal" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/d2ccbe9f7c8e887281272ae49854893f/bag_of_coal.jpg" style="border-width: 1px; border-style: solid; margin: 10px 15px; float: right; width: 200px; height: 200px;" />Sending the boys to their rooms never helped. The two would just compete to see who could stay in their room longer. This Christmas their parents were at wits' ends, and they decided the boys needed to be taught a lesson so they could grow up to be upstanding citizens. Instead of the new bicycles the boys were going to get—and probably just race till they crashed anyway—their parents decided to give them each a bag of coal.</p>
<p>An astute reader might ask, “But what does this have to do with <a href="http://www.minitab.com/products/minitab">Minitab</a>?” Well, dear reader, the boys need to figure out who got the most coal. Immediately upon opening their packages, the boys carefully weighed each piece of coal and entered the data into Minitab.</p>
<p><span style="line-height: 1.6;">Then they selected <strong>Stat > Basic Statistics > Display Descriptive Statistics</strong> and used the "Statistics" options dialog to select the metrics they wanted, including the sum of the weights they'd entered:</span></p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/dacaebac62e3cc4c2e29329d0a779720/descriptivestatistics.png" style="width: 600px; height: 208px;" /></p>
<p><span style="line-height: 1.6;">Billy quickly saw that he had the most coal, and yelled, “I have 279.383 ounces and you only have 272.896 ounces. Our parents must love me more.” </span></p>
<p><span style="line-height: 1.6;">“Not so fast,” said Aaron. “You may have more, but is the difference statistically significant?” There was only one thing left for the boys to do: perform a <a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/t-for-2-should-i-use-a-paired-t-or-a-2-sample-t">two sample t-test</a>.</span></p>
<p><span style="line-height: 1.6;">In Minitab, Aaron selected </span><strong><span style="line-height: 1.6;">Stat > Basic Statistics > 2-Sample t…</span></strong></p>
<p>The boys left the default values at a confidence level of 95.0 and a hypothesized difference of 0. The alternative hypothesis was “Difference ≠ hypothesized difference” because the only question they were asking was “Is there a statistically significant difference?” between the two data sets.</p>
<p>The two troublemakers also selected “Graphs” and checked the options to display an individual value plot and a boxplot. They knew they should look at their data. Having the graphs available would also make it easier for them to communicate their results to higher authorities, in this case, their poor parents.</p>
<p><img alt="Individual Value Plot of Coal" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/bf541d8df2461a8edff9060789394b00/individual_value_plot_of_coal.png" style="width: 577px; height: 385px;" /></p>
<p><img alt="Boxplot of Coal" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/8945d7a038de654d008f68dc0a8886d3/boxplot_of_coal.png" style="width: 577px; height: 385px;" /></p>
<p>Both the individual value plots and boxplots showed that Aaron's bag of coal had pieces with the highest individual weights. But he also had the pieces with the least weight. So the values for his Christmas coal were scattered across a wider range than the values for Billy‘s Christmas coal. But was there really a difference?</p>
<p>Billy went running for his tables of Student‘s t-scores so he could interpret the resulting t-value of -0.71. Aaron simply looked at the resulting p-value of 0.481. The p-value was greater than 0.05 so the boys could not conclude there was a true difference in the weight of their Christmas "presents."</p>
<p><img alt="600" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/549762a9cb277536a76baedba32617d3/2_sample_t_test_coal.png" style="width: 683px; height: 305px;" /></p>
<p><span style="line-height: 1.6;">The boys dutifully reported the results, with illustrative graphs, each demanding that they get a little more to best the other. Clearly, receiving coal for Christmas had done nothing to reduce their level of competitiveness. Their parents realized the boys were probably not going to grow up to be upstanding citizens, but they may at least become good statisticians.</span></p>
<p>Happy Holidays.</p>
<p> </p>
<p><strong>About the Guest Blogger</strong></p>
<p><em><a href="https://www.linkedin.com/pub/matthew-barsalou/5b/539/198" target="_blank">Matthew Barsalou</a> is an engineering quality expert in <a href="http://www.3k-warner.de/" target="_blank">BorgWarner</a> Turbo Systems Engineering GmbH’s global engineering excellence department. He is a Smarter Solutions certified Lean Six Sigma Master Black Belt, ASQ-certified Six Sigma Black Belt, quality engineer, and quality technician, and a TÜV-certified quality manager, quality management representative, and auditor. He has a bachelor of science in industrial sciences, a master of liberal studies with emphasis in international business, and has a master of science in business administration and engineering from the Wilhelm Büchner Hochschule in Darmstadt, Germany. He is author of the books <a href="http://www.amazon.com/Root-Cause-Analysis-Step---Step/dp/148225879X/ref=sr_1_1?ie=UTF8&qid=1416937278&sr=8-1&keywords=Root+Cause+Analysis%3A+A+Step-By-Step+Guide+to+Using+the+Right+Tool+at+the+Right+Time" target="_blank">Root Cause Analysis: A Step-By-Step Guide to Using the Right Tool at the Right Time</a>, <a href="http://asq.org/quality-press/display-item/index.html?item=H1472" target="_blank">Statistics for Six Sigma Black Belts</a> and <a href="http://asq.org/quality-press/display-item/index.html?item=H1473&xvl=76115763" target="_blank">The ASQ Pocket Guide to Statistics for Six Sigma Black Belts</a>.</em></p>
Fun StatisticsHypothesis TestingStatisticsTue, 23 Dec 2014 13:00:00 +0000http://blog.minitab.com/blog/statistics-in-the-field/a-minitab-holiday-tale-featuring-the-two-sample-t-testGuest BloggerUnderstanding Qualitative, Quantitative, Attribute, Discrete, and Continuous Data Types
http://blog.minitab.com/blog/understanding-statistics/understanding-qualitative-quantitative-attribute-discrete-and-continuous-data-types
<p>"Data! Data! Data! I can't make bricks without clay."<br />
— Sherlock Holmes, in Arthur Conan Doyle's <em>The Adventure of the Copper Beeches</em></p>
<p>Whether you're the world's greatest detective trying to crack a case or a person trying to solve a problem at work, you're going to need information. Facts. <em>Data</em>, as Sherlock Holmes says. </p>
<p><img alt="jujubes" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/96d7c87addccc11b6072d6dfa38d0039/jujubes.jpg" style="line-height: 20.7999992370605px; margin: 10px 15px; float: right; width: 200px; height: 200px;" /></p>
<p>But not all data is created equal, especially if you plan to analyze as part of a quality improvement project.</p>
<p>If you're using Minitab Statistical Software, you can access the Assistant to <a href="http://www.minitab.com/products/minitab/assistant">guide you through your analysis step-by-step</a>, and help identify the type of data you have.</p>
<p>But it's still important to have at least a basic understanding of the different types of data, and the kinds of questions you can use them to answer. </p>
<p>In this post, I'll provide a basic overview of the types of data you're likely to encounter, and we'll use a box of my favorite candy—<a href="http://en.wikipedia.org/wiki/Jujube_(confectionery)" target="_blank">Jujubes</a>—to illustrate how we can gather these different kinds of data, and what types of analysis we might use it for. </p>
The Two Main Flavors of Data: Qualitative and Quantitative
<p>At the highest level, two kinds of data exist: <em><strong>quantitative</strong></em> and <em><strong>qualitative</strong></em>.</p>
<p><strong><em>Quantitative</em> </strong>data deals with numbers and things you can measure objectively: dimensions such as height, width, and length. Temperature and humidity. Prices. Area and volume.</p>
<p><strong><em>Qualitative </em></strong>data deals with characteristics and descriptors that can't be easily measured, but can be observed subjectively—such as smells, tastes, textures, attractiveness, and color. </p>
<p>Broadly speaking, when you measure something and give it a number value, you create quantitative data. When you classify or judge something, you create qualitative data. So far, so good. But this is just the highest level of data: there are also different types of quantitative and qualitative data.</p>
Quantitative Flavors: Continuous Data and Discrete Data
<p>There are two types of quantitative data, which is also referred to as numeric data: <em><strong>continuous </strong></em>and <em><strong>discrete</strong>. </em><span style="line-height: 20.7999992370605px;">As a general rule, </span><em style="line-height: 20.7999992370605px;">counts </em><span style="line-height: 20.7999992370605px;">are discrete and </span><em style="line-height: 20.7999992370605px;">measurements </em><span style="line-height: 20.7999992370605px;">are continuous.</span></p>
<p><strong><em>Discrete </em></strong>data is a count that can't be made more precise. Typically it involves integers. For instance, the number of children (or adults, or pets) in your family is discrete data, because you are counting whole, indivisible entities: you can't have 2.5 kids, or 1.3 pets.</p>
<p><strong><em>Continuous</em> </strong>data, on the other hand, could be divided and reduced to finer and finer levels. For example, you can measure the height of your kids at progressively more precise scales—meters, centimeters, millimeters, and beyond—so height is continuous data.</p>
<p>If I tally<span style="line-height: 1.6;"> the number of individual Jujubes in a box, that number is a piece of discrete data. </span></p>
<p style="margin-left: 40px;"><img alt="a count of jujubes is discrete data" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f5e3c44269356903cf156c065b10746a/jujubes_count_tally.jpg" style="width: 200px; height: 200px;" /></p>
<p><span style="line-height: 1.6;">If I use a scale to measure the weight of each Jujube, or the weight of the entire box, that's continuous data. </span></p>
<p style="margin-left: 40px;"><span style="line-height: 1.6;"><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/d11051162c9e2375e531ac589fd5a20e/jujube_weight_continuous_data.jpg" style="width: 200px; height: 200px;" /></span></p>
<p>Continuous data can be used in many different kinds of <a href="http://blog.minitab.com/blog/understanding-statistics/what-statistical-hypothesis-test-should-i-use">hypothesis tests</a>. For example, to assess the accuracy of the weight printed on the Jujubes box, we could measure 30 boxes and perform a 1-sample t-test. </p>
<p>Some analyses use continuous and discrete quantitative data at the same time. For instance, we could perform a <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-tutorial-and-examples">regression analysis</a> to see if the weight of Jujube boxes (continuous data) is correlated with the number of Jujubes inside (discrete data). </p>
Qualitative Flavors: Binomial Data, Nominal Data, and Ordinal Data
<p>When you classify or categorize something, you create <em>Qualitative</em> or attribute<em> </em>data. There are three main kinds of qualitative data.</p>
<p><em><strong>Binary </strong></em>data place things in one of two mutually exclusive categories: right/wrong, true/false, or accept/reject. </p>
<p>Occasionally, I'll get a box of Jujubes that contains a couple of individual pieces that are either too hard or too dry. If I went through the box and classified each piece as "Good" or "Bad," that would be binary data. I could use this kind of data to develop a statistical model to predict how frequently I can expect to get a bad Jujube.</p>
<p>When collecting <em><strong>unordered </strong></em>or <em><strong>nominal </strong></em>data, we assign individual items to named categories that do not have an implicit or natural value or rank. If I went through a box of Jujubes and recorded the color of each in my worksheet, that would be nominal data. </p>
<p style="margin-left: 40px;"><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ce64d648ac395d5c8098985caabc754f/jujubes_sorted_nominal_data.jpg" style="width: 200px; height: 97px;" /></p>
<p>This kind of data can be used in many different ways—for instance, I could use <a href="http://blog.minitab.com/blog/understanding-statistics/chi-square-analysis-of-halloween-and-friday-the-13th-is-there-a-slasher-movie-gender-gap">chi-square anlaysis</a> to see if there are statistically significant differences in the amounts of each color in a box. </p>
<p>We also can have <strong><em>ordered </em></strong>or <em><strong>ordinal </strong></em>data, in which items are assigned to categories that do have some kind of implicit or natural order, such as "Short, Medium, or Tall." <span style="line-height: 1.6;">Another example is a survey question that asks us to rate an item on a 1 to 10 scale, with 10 being the best. This implies that 10 is better than 9, which is better than 8, and so on. </span></p>
<p>The uses for ordered data is a matter of some debate among statisticians. Everyone agrees its appropriate for creating bar charts, but beyond that the answer to the question "What should I do with my ordinal data?" is "It depends." Here's a post from another blog that offers an excellent summary of the <a href="http://learnandteachstatistics.wordpress.com/2013/07/08/ordinal/" target="_blank">considerations involved</a>. </p>
Additional Resources about Data and Distributions
<p>For more fun statistics you can do with candy, check out this article (PDF format): <a href="http://www.minitab.com/uploadedFiles/Content/Academic/sweetening_statistics.pdf">Statistical Concepts: What M&M's Can Teach Us.</a> </p>
<p>For a deeper exploration of the probability distributions that apply to different types of data, check out my colleague Jim Frost's posts about <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-and-using-discrete-distributions">understanding and using discrete distributions</a> and <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-identify-the-distribution-of-your-data-using-minitab">how to identify the distribution of your data</a>.</p>
Fun StatisticsLearningStatistics HelpFri, 19 Dec 2014 13:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/understanding-qualitative-quantitative-attribute-discrete-and-continuous-data-typesEston MartzAn Unauthorized Biography of the Stem-And-Leaf Plot - Part II: A New Leaf
http://blog.minitab.com/blog/data-analysis-and-quality-improvement-and-stuff/an-unauthorized-biography-of-the-stem-and-leaf-plot-part-ii%3A-a-new-leaf
<p>At the end of <a href="http://blog.minitab.com/blog/data-analysis-and-quality-improvement-and-stuff/an-unauthorized-biography-of-the-stem-and-leaf-plot2c-part-i3a-a-stem-by-any-other-name">my previous post</a>, aspiring statisticians Woodrow "Woody" Stem and August "Russell" Leaf, creators of the famed Stem-and-Leaf plot, were in bad shape. They had beaten each other statsless after an argument about the challenge given to them by their mentor, Dr. Histeaux Graham. That challenge: to devise a simple, yet elegant way to examine the distribution of values in a sample.</p>
<p>After their fateful bout of pugilism, Woody convalesced at the renowned <em>Saint Tukey Center for Post Hoc Health</em>. There, he continued to refine his theory of equally spaced intervals (or "bins," as he did eventually resort to calling them). His approach was great for evaluating the <em>range</em> of the sample—the proverbial minimum and maximum. But alas! How could he examine the values betwixt these values!?! The problem vexed him greatly.</p>
<p>Russell sought care from the Gaussian order of the Brothers of Functional Likelihood. Their fabled monastery provided a quiet place for Russell to rest and think. And the brothers of the order also brewed a mean stout! (Their porter...meh. But the stout was absolutely to recover for!) In the solitude and the dampitude of the large stone structure, Russell whiled away the days dipping his quill, creating random samples, sorting them, and meticulously copying the values onto artisanal parchment. (To this day, Russell's data sets are highly prized for their marginally ornate illustrations. [Or is that 'ornate margin illustrations'? I'm not sure. I'd have to look it up. So let's go with 'marginally ornate.'])</p>
<p><img alt="" src="https://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/50298e03565c09daa8cec9928f91d0df/snplot_artistic_rendering_mini.jpg" style="float: left; width: 76px; height: 117px;" /></p>
<p>Russell bemused himself with art, he abused himself with stout, but he never disabused himself of one central notion: In order to understand a sample, one needed to know each value therein. Russell's methods did indeed gave him a deep understanding of each datum, but alas! How could he shine light on the nature of the distribution: the clumps, the gaps, the peaks, the tails, and, par chance, the outliers!?! The problem vexed him greatly. Until that fateful day...</p>
An Almighty Wind
<p>It was a cloudy afternoon, sometime after both protagonists had regained the power of ambulation. Woody, still weak from his prolonged requiescence, sought the rejuvenation one can only derive from abundant fresh air. Strolling the local park, his head immersed in a dense cloud of smoke from his trusty briar, Woody chanced to glimpse a lone figure on a bench. The man appeared to be creating random samples, sorting them, and meticulously copying the values onto artisanal parchment. "What a dolt," Woody thought, and decided to go over and give the man a piece of his mind. And, in a way, that's just what he did.</p>
<p>Historians differ on exactly what happened next. Some say that as the two men—once friends, now bitter rivals—met each other's gaze, the clouds and smoke parted and a blinding shaft of light engulfed them both. (Mind you, this was before sunglasses.) The men froze, slack-jawed at the spectacle. A sudden gust wrested the pipe from Woody's weak fingers. The pipe flew up in the draft and struck first Woody, then Russell, thwacking each upside the head. It was as if God himself had grown weary of their stubbornness and had reached down from the heavens to give each of his beloved, but wayward children a dope slap. (I'm not sure what the other historians say about this event. I'd have to look it up, so let's go with dope slap.)</p>
<p><img alt="" src="https://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/d4367867f15d0e005549fb400af5e71a/snplot_from_minitab.jpg" style="float: right; width: 353px; height: 318px; margin: 10px 15px;" /></p>
<p>As the memory of their past friendship slowly returned to our heroes, each was suddenly able to look beyond his own foibles and prejudice to grasp the value-added synergies that might be realized from collaboration and cooperation, vis-à-vis cheating on their assignment. To this day, the creation that was inspired in that singular moment still bears the names of its creators.</p>
<p>(Some historians have also claimed that in the moment of epiphany, Woody's ever-present chocolate snack tumbled from his hand and came to rest in a bowl filled with a peanut-based butter substitute that Russell was known to enjoy. The result was said to have been quite tasty. However, these claims have never been substantiated.)</p>
<p>Amazing as this story is, some today have never heard of the Stem-and-Leaf plot. You see, Dr. Graham eventually appropriated (read 'stole') the ideas of his charges and created his own graph which he named after himself: Dr. Histeaux Graham's Magic Distribution Tonic. The name was later shortened to Histeaux Graham's Plot, and finally Histogram. (It was also known for a time by other names, such as Histeaux's Odyssey, the Graham Tracker, and That One With The Bars.)</p>
<p><u>Example of Dr. Graham's Tracker Plot (a.k.a. Histogram)</u></p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/5f7bd484a6fbafbe931ff7f37363b05c/grahamtracker.jpg" style="width: 289px; height: 193px;" /></p>
<p>Eventually, with the advent of plotting machines, computers, and later the interwebs, the venerable Stem-and-Leaf plot fell into widespread disuse. It's just too easy these days to <a href="http://support.minitab.com/minitab/17/topic-library/basic-statistics-and-graphs/graphs/graphs-of-distributions/histograms/create-a-histogram-with-frequency-data/">create a histogram</a>. And many practitioners feel the histogram connotes a more sophisticated esthetic than does the cruder, but way-easier-to-make-by-hand Stem-and-Leaf plot.</p>
<p>But don't feel too bad for Dr. Stem and Dr. Leaf, gentle readers. For at least their crowning achievement is still known and revered, if only in statistical circles. </p>
<p> </p>
Fun StatisticsWed, 17 Dec 2014 13:00:00 +0000http://blog.minitab.com/blog/data-analysis-and-quality-improvement-and-stuff/an-unauthorized-biography-of-the-stem-and-leaf-plot-part-ii%3A-a-new-leafGreg FoxHow to Calculate B10 Life with Statistical Software
http://blog.minitab.com/blog/meredith-griffith/how-to-calculate-b10-life-with-statistical-software
<p><span style="line-height: 1.6;">Over the last year or so I’ve heard a lot of people asking, “How can I calculate B10 life in Minitab?” Despite being a statistician and industrial engineer (mind you, one who has never been </span><em style="line-height: 1.6;">in</em><span style="line-height: 1.6;"> the field like the customers asking this question) and having taken a reliability engineering course, I’d never heard of B10 life. So I did some research.</span></p>
<p>The B10 life metric originated in the ball and roller bearing industry, but has become a metric used across a variety of industries today. It’s particularly useful in establishing warranty periods for a product. The “BX” or “Bearing Life” nomenclature, which refers to the time at which X% of items in a population will fail, speaks to these roots.</p>
<p>So then, B10 life is the time at which 10% of units in a population will fail. Alternatively, you can think of it as the 90% reliability of a population at a specific point in its lifetime—or the point in time when an item has a 90% probability of survival. The B10 life metric became popular among ball and roller bearing makers due to the industry’s strict requirement that no more than 10% of bearings in a given batch fail by a specific time due to fatigue failure. </p>
<p>Now that I know what the term means, I can tell people who ask that <a href="http://blog.minitab.com/blog/fun-with-statistics/what-i-learned-from-treating-childbirth-as-a-failure">Minitab’s reliability analysis</a> can easily compute this metric. (In fact, our <a href="http://www.minitab.com/products/minitab">statistical software</a> can compute any “BX” lifetime—but we’ll save that for another blog post.) B10 life is also known as the 10th percentile and can be found in Minitab’s Table of Percentiles output, which is displayed in Minitab’s session window.</p>
<p><img alt="B10 Life - Table of Percentiles" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/1ac7bbfe20e1a18c284babde45ce84af/b10life_image1.png" style="width: 461px; height: 324px;" /></p>
<p>And unlike other reliability metrics, B10 life directly correlates the maximum allowable percentile of failures (or the minimum allowable reliability) with an application-specific life point in time.</p>
<p>So we can get the B10 life metric by looking at the Table of Percentiles in Minitab’s session window output. But you might still be asking two questions: how do I create this table, and how do I interpret it?</p>
<img alt="You can't just put one of these into a pacemaker, after all! " src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/6c60a30de1566a4cc65dbb03c730680e/batteries.jpg" style="border-width: 1px; border-style: solid; margin: 10px 15px; float: right; width: 180px; height: 180px;" />Finding B10 Life, Step by Step
<p>Suppose we have tracked and recorded the battery life times over a certain number of years for 1,970 pacemakers. The reliability of pacemakers is critical, because patients’ lives depend on these devices!</p>
<p>We observed exact failure times—defined as the time at which a low battery signal was detected—for 1,019 of those pacemakers. The remaining 951 pacemakers never warned of a low battery, so they “survived.”</p>
<p>Our data is organized as follows:</p>
<p><img alt="B10 Life - Data Organization" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/90b7e4084faeee5bbe82a2abd7ff7c7e/b10life_image2.png" style="width: 183px; height: 264px;" /></p>
<p>When we have both observed failures and units surviving beyond a given time, we call the data “right-censored.” And we know from process knowledge that the <a href="http://blog.minitab.com/blog/understanding-statistics/why-the-weibull-distribution-is-always-welcome">Weibull distribution</a> best describes the lifetime of these pacemaker batteries. Knowing this information will help us use Minitab’s reliability analysis correctly.</p>
Setting Up the Reliability Analysis
<p>Because we have right-censored data and we know our distribution, we are ready to access Minitab’s <strong>Statistics > Reliability/Survival > Distribution Analysis (Right Censoring) > Parametric Distribution Analysis </strong>menu to compute the B10 life.</p>
<p>We want to know the batteries’ reliability—or probability of survival—at different times, so our variable of interest is the number of years a pacemaker battery has survived. In the Parametric Distribution Analysis dialog, you’ll notice the Weibull distribution is already selected as the assumed distribution. We’ll leave this default setting since we know the Weibull distribution best describes battery life times.</p>
<p><img alt="B10 Life Metric - Right Censoring" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/646dd8ee5563d9748c90545d8f0a9fa0/b_10_life_image_3.png" style="width: 507px; height: 345px;" /></p>
<p>We also know whether the number in the ‘Years’ column was an exact failure time or a censored time (beyond which the battery survived). We must account for the censored data. By clicking the button labeled ‘Censor’, we can include a censoring column that contains values indicating whether or not the pacemaker survived or failed at the recorded time. In our Minitab worksheet, “Failed or Survived” is the censoring column. Our censoring value is ‘S’, which stands for ‘Survived’, indicating no failure was observed during the pacemaker battery tracking period.</p>
<p><img alt="B10 Life - censoring column" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/6042f1f9401bec3fe3c11ac335dfb834/b_10_life_image_4.png" style="width: 426px; height: 313px;" /></p>
Interpreting the Table of Percentiles and B10 Life
<p>Once we click OK through all dialogs to carry out the analysis, Minitab outputs the Table of Percentiles, where we can find our B10 life:</p>
<p> <img alt="B10 Life - Corresponding Percentile" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/2394c488f49f9f0e2615ae743926cde9/b_10_life_image_5a.png" style="width: 191px; height: 240px;" /></p>
<p>Where the Percent column displays 10, the corresponding Percentile value tells us that the B10 life of pacemaker batteries is 6.36 years—or, to put it another way, 6.36 years is the time at which 10% of the population of pacemaker batteries will fail.</p>
<p>There we have it! The next time you are looking to compute the B10 life of a product, and perhaps seeking to establish suitable warranty periods, you need look no further than Minitab’s reliability tools and the Table of Percentiles.</p>
Reliability AnalysisStatisticsMon, 15 Dec 2014 13:00:00 +0000http://blog.minitab.com/blog/meredith-griffith/how-to-calculate-b10-life-with-statistical-softwareMeredith GriffithAnalyzing the Final College Football Playoff Poll
http://blog.minitab.com/blog/the-statistics-game/analyzing-the-final-college-football-playoff-poll
<p><img alt="College Football Playoff" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/8ac74acf42052d068b6cd0eeec32f609/cfb_playoff.jpg" style="width: 259px; height: 194px; float: right;" />Throughout the college football season, I’ve been looking at the influence of the preseason AP Poll on rankings later in the season. Each analysis found a positive association between preseason rankings and the current rankings. That is, between top-ranked teams with a similar number of losses, teams ranked higher in the preseason are also ranked higher in current polls. The biggest exception is SEC teams, who were able to consistently jump over non-SEC teams who ranked higher in the preseason.</p>
<p>Now that we have the final college football playoff poll, let’s do one more analysis to see if the final rankings correlate with the preseason AP poll.</p>
The Top 6 Teams
<p>We’ll look at the top 6 teams that were vying for a playoff spot: Alabama, Oregon, Florida State, Ohio State, TCU, and Baylor. First we’ll look at an <a href="http://blog.minitab.com/blog/real-world-quality-improvement/three-ways-individual-value-plots-can-help-you-analyze-data">individual value plot</a> showing each team’s preseason rank versus their final rank.</p>
<p><img alt="Individual Value Plot" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/0f7d9e5c2d05a2c7f9f8ee1451430d13/individual_value_plot_of_top_final.jpg" style="width: 720px; height: 480px;" /></p>
<p>The only change from the preseason rankings is that Alabama and Oregon jumped ahead of Florida State. Other than that the final ranking of the teams is exactly the same as it was in the preseason. In addition, the <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/tables/other-statistics-and-tests/what-are-spearman-s-rho-and-pearson-s-r-for-ordinal-categories/">correlation coefficient</a> is 0.83 and there are 13 <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/tables/data-and-table-layouts/what-are-concordant-and-discordant-pairs/">concordant pairs</a> to only 2 discordant pairs. These statistics further show that teams ranked higher in the preseason will also be ranked higher in the final poll. </p>
<p><img alt="Cross Tabulation" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/0cb8814207722a349d34d89b7aad5679/correlation.jpg" style="width: 688px; height: 250px;" /></p>
<p>But to the committee’s credit, they did drop Florida State. This was unprecedented since Florida State went undefeated and Alabama and Oregon both lost a game. But all season long, Florida State has played close games against average teams. And whether you’re winning or losing, playing so many close games against average competition means you yourself are an average team. If this were the BCS era, the championship game would be between Florida State and Alabama, and we would all be howling about how Oregon got left out. But for most of the season, Oregon and Alabama have looked like the two best teams, and both will get their chance to show it on the field.</p>
<p>But are Alabama and Oregon <em>really</em> the two best teams, or is it simply confirmation bias? Before the season it was clear that pollsters (and I’m sure fans, too) thought Alabama and Oregon were two of the best teams. So by winning they simply confirmed a belief we already had. Imagine if Arizona had gone 12-1 and won the Pac-12 instead of Oregon, and Oklahoma went 11-1 and won the Big 12 instead of TCU or Baylor. Oklahoma started the year ranked #4 while Arizona was unranked. Any doubt the Pac-12 would have been the conference left out of the playoff in that scenario?</p>
<p>The college football playoff committee did set a few new precedents. As I noted, they dropped an undefeated major conference team behind one-loss teams for what has to be the first time ever. (I found it ironic that soon after the playoff committee did this, the AP and Coaches Polls did the same thing. They would have never done that before.) The committee also showed they wouldn’t be locked in to the prior week’s poll. In previous seasons, a team would only drop late in the season after a loss. But the fact that TCU dropped from #3 to #6 in the final two polls shows the committee isn’t locked into last week’s thinking. Whether you agree with the decision to drop TCU or not, this new way of thinking is refreshing. (And for what it's worth, TCU, the <a href="http://www.usatoday.com/sports/ncaaf/sagarin/">Sagarin Ratings</a> still have you at #2.)</p>
<p>But when it comes to the top-ranked teams, it’s still better to have confirmed our preseason expectations that you’re one of the best teams than disprove our preseason expectations that you’re not. So Baylor, next year make sure to hire the PR firm <em>before</em> the season starts!</p>
Fun StatisticsStatistics in the NewsFri, 12 Dec 2014 13:00:00 +0000http://blog.minitab.com/blog/the-statistics-game/analyzing-the-final-college-football-playoff-pollKevin RudyWhich Is Better, Stepwise Regression or Best Subsets Regression?
http://blog.minitab.com/blog/adventures-in-statistics/which-is-better%2C-stepwise-regression-or-best-subsets-regression
<p>Stepwise regression and best subsets regression are both automatic tools that help you identify useful predictors during the exploratory stages of model building for linear regression. These two procedures use different methods and present you with different output.</p>
<p>An obvious question arises. Does one procedure pick the true model more often than the other? I’ll tackle that question in this post.</p>
<p><img alt="Sign: which way?" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/6a52ca2a2c3220b54af553ef4b571389/dscn8976_w1024.jpeg" style="line-height: 20.7999992370605px; float: right; width: 225px; height: 174px; border-width: 1px; border-style: solid; margin: 10px 15px;" /></p>
<p>First, a quick refresher about the two procedures and their different results:</p>
<ul>
<li>Stepwise regression presents you with a single model constructed using the <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-interpret-regression-analysis-results-p-values-and-coefficients" target="_blank">p-values</a> of the <a href="http://support.minitab.com/en-us/minitab/17/topic-library/modeling-statistics/regression-and-correlation/regression-models/what-are-response-and-predictor-variables/" target="_blank">predictor variables</a></li>
<li>Best subsets regression assess all possible models and displays a subset along with their <a href="http://blog.minitab.com/blog/adventures-in-statistics/multiple-regession-analysis-use-adjusted-r-squared-and-predicted-r-squared-to-include-the-correct-number-of-variables" target="_blank">adjusted R-squared</a> and Mallow’s Cp values</li>
</ul>
<p>The key benefit of the stepwise procedure is the simplicity of the single model. Best subsets does not pick a final model for you but it does present you with multiple models and information to help you choose the final model. For more details, read this post where I <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-smackdown-stepwise-versus-best-subsets" target="_blank">compare stepwise regression to best subsets regression</a> and present examples using both analyses.</p>
Determining the Better Model Selection Method
<p>A study by Olejnik, Mills, and Keselman* compares how often stepwise regression, best subsets regression using the lowest Mallow’s Cp, and best subsets using the highest adjusted R-squared selects the true model.</p>
<p>The authors assessed 32 conditions that differed by the number of candidate variables, number of authentic variables, sample size, and level of <a href="http://blog.minitab.com/blog/adventures-in-statistics/what-are-the-effects-of-multicollinearity-and-when-can-i-ignore-them" target="_blank">multicollinearity</a>. For each condition, the authors created 1,000 computer-generated datasets and analyzed them with both stepwise and best subsets to determine how often each procedure selected the correct model.</p>
<p>And, the winner is...stepwise regression!! Congratulations! Well, sort of, as we’ll see.</p>
<p>Best subsets regression using the lowest Mallow’s Cp is a very close second. The overall difference between Mallow’s Cp and stepwise selection is less than 3%. The adjusted R-squared performed much more poorly than either stepwise or Mallow’s Cp.</p>
<p>However, before we pop open the champagne to celebrate stepwise regression’s victory, there’s a huge caveat to reveal. </p>
<p>Stepwise selection <span style="line-height: 20.7999992370605px;">usually </span><span style="line-height: 1.6;">did not identify the <em>correct</em> model. Gasp! </span></p>
Digging into the Results
<p>Let’s look at the results more closely to see how well stepwise selection performs and what affects its performance. I’ll only cover stepwise selection, but the results for Mallow’s Cp are essentially tied and follow the same patterns. I’ll give my thoughts on the matter at the end.</p>
<p>In the results below, stepwise regression identifies the correct model <em>if</em> it selects all of the authentic predictors and excludes all of the noise predictors.</p>
<p><strong>Best case scenario</strong></p>
<p>In the study, stepwise regression performs the best when there are four candidate variables, three of which are authentic; there is zero correlation between the predictors; and there is an extra-large sample size of 500 observations. For this case, the stepwise procedure selects the correct model 84% of the time. Unfortunately, this is not a realistic scenario and the accuracy diminishes from here.</p>
<p><strong>Number of candidate predictors and number of authentic predictors</strong></p>
<p>The study looks at scenarios where there are either 4 or 8 candidate predictors. It is harder to choose the correct model when there are more candidates simply because there are more possible models to choose from. The same pattern holds true for the number of authentic predictors.</p>
<p>The table below shows the results for models with no multicollinearity and a good sample size (100-120 observations). Notice the decrease in the percent correct as both the number of candidates and number of authentic predictors increase.</p>
<strong> Candidate predictors</strong>
<strong> Authentic predictors</strong>
<strong> % Correct model</strong>
4
1
62.7
2
54.3
3
34.4
8
2
31.3
4
12.7
6
1.1
<p><strong>Multicollinearity</strong></p>
<p>The study varies multicollinearity to determine how correlated predictors affect the ability of stepwise regression to choose the correct model. When predictors are correlated, it’s harder to determine the individual effect each one has on the response variable. The study set the correlation between predictors to 0, 0.2, and 0.6.</p>
<p>The table below shows the results for models with a good sample size (100-120 observations). As correlation increases, the percent correct decreases.</p>
<strong>Candidate predictors</strong>
<strong>Authentic predictors</strong>
<strong>Correlation</strong>
<strong>% Correct model</strong>
4
2
0.0
54.3
0.2
43.1
0.6
15.7
8
4
0.0
12.7
0.2
1.0
0.6
0.4
<p><strong>Sample size</strong></p>
<p>The study uses two sample sizes to see how that influences the ability to select the correct model. The size of the smaller samples is calculated to achieve 0.80 power, which amounts to 100-120 observations. These sample sizes are consistent with good practices and can be considered a good sample size.</p>
<p>The very large sample size is 500 observations and it is 5 times the size that you need to achieve the benchmark power of 0.80.</p>
<p>The table below shows that a very large sample size improves the ability of stepwise regression to choose the correct model. When choosing your sample size, you may want to consider a larger sample than what the power and sample size calculations suggest in order to improve the variable selection process.</p>
<strong> Candidate predictors</strong>
<strong> Authentic predictors</strong>
<strong> Correlation </strong>
<strong> % Correct - good sample size</strong>
<strong> % Correct - very large sample</strong>
4
2
0.0
54.3
72.1
0.2
43.1
72.9
0.6
15.7
69.2
8
4
0.0
12.7
53.9
0.2
1.0
39.5
0.6
0.4
1.8
<p><strong>Closing Thoughts</strong></p>
<p>Stepwise regression generally can’t pick the true model. This is true even with the small number of candidate predictors that this study looks at. In the real world, researchers often have many more candidates, which lowers the chances even further.</p>
<p>Reality is complex and we should not expect that an automated algorithm can figure it out for us. After all, the stepwise algorithm follows simple rules and it knows nothing about the underlying process or subject area. However, stepwise regression <em>can</em> get you to right ballpark. At a glance, you’ll have a rough idea of what is going on in your data. </p>
<p>It’s up to you to get from the rough idea to the correct model. To do this, you’ll need to use your expertise, theory, and common sense rather than relying solely on simplistic model selection rules.</p>
<p>For tips about how to do this, read my post <a href="http://blog.minitab.com/blog/adventures-in-statistics/four-tips-on-how-to-perform-a-regression-analysis-that-avoids-common-problems">Four Tips on How to Perform a Regression Analysis that Avoids Common Problems</a>.</p>
<p>If you're learning about regression, read my <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-tutorial-and-examples">regression tutorial</a>!</p>
<p>*Stephen Olejnik, Jamie Mills, and Harfey Keselman, “Using Wherry’s Adjusted R2 and Mallow’s Cp for Model Selection from All Possible Regressions”, <em>The Journal of Experimental Education</em>, 2000, 68(4), 365-380.</p>
Regression AnalysisThu, 11 Dec 2014 16:03:11 +0000http://blog.minitab.com/blog/adventures-in-statistics/which-is-better%2C-stepwise-regression-or-best-subsets-regressionJim FrostThe World-Famous Disappearing-Reappearing-Analysis-Settings Act
http://blog.minitab.com/blog/statistics-and-quality-improvement/the-world-famous-disappearing-reappearing-analysis-settings-act
<p>Sure, Minitab Statistical Software is powerful and easy to use, but did you know that it’s also magic? One of the illusions that Minitab can peform is the world famous disappearing-reappearing-analysis-settings act. Of course, as with many illusions, it’s not so hard once you know the trick. In this case, it’s downright easy once you know about Minitab project files.</p>
<p><img alt="The statue of liberty" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/145817ecdac35066aeeea5b9fb106d99/statue_present.png" style="line-height: 20.7999992370605px; float: right; width: 250px; height: 180px; border-width: 1px; border-style: solid; margin: 10px 15px;" /></p>
<p>If you’ve done any work in Minitab you may very well have saved a project file and been grateful that <span>your data, graphs, and statistical tables could all be saved together in a single file</span>. But, it’s just as amazing that Minitab can remember exactly how you did your analysis the last time.</p>
<p>Imagine that you routinely <a href="http://blog.minitab.com/blog/understanding-statistics/i-think-i-can-i-know-i-can-a-high-level-overview-of-process-capability-analysis">run a capability analysis</a> on the same process. The first time you did the analysis, you changed several of the options to get the output that you wanted. When you open Minitab the next time, you want to perform the same analysis on a new data set. Having a saved project makes it easy. Try it for yourself if you want, following the steps below. Begin by downloading <a href="http://it.minitab.com/products/minitab/free-trial.aspx">our free trial</a> if you don't already have our statistical software, then download worksheets <a href="http://support.minitab.com/en-us/datasets/Basil.MTW">Basil.MTW</a> and <a href="http://support.minitab.com/en-us/datasets/Basil2.MTW">Basil2.MTW</a>.</p>
Introduce your Assistant
<ol>
<li>Open the Basil.MTW worksheet.</li>
<li>Choose <strong>Stat > Quality Tools > Capability Analysis > Multiple Variables (Normal)</strong>.</li>
<li>In <strong>Variables</strong>, enter <em>T1H1 T1H2</em>.</li>
<li>In <strong>Subgroup sizes</strong>, enter <em>4</em>.</li>
<li>In <strong>Lower spec</strong>, enter 2.</li>
<li>In <strong>Upper spec</strong>, enter 8.</li>
<li>Click <strong>Graphs</strong>.</li>
<li>Uncheck <strong>Normal probability plot</strong>. Click <strong>OK</strong>.</li>
<li>Click <strong>Options</strong>.</li>
<li>Under <strong>Display</strong>, select <strong>Benchmark Z’s (σ level)</strong> and check <strong>Include confidence intervals</strong>.</li>
<li>Click <strong>OK </strong>twice.</li>
</ol>
<p>The capability analysis is in your project file.</p>
<img alt="Statue not visible" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/7640a1717783092e715bc8a9321145ed/david_copperfield_statue_gone.jpg" style="width: 250px; height: 188px; float: right; border-width: 1px; border-style: solid; margin: 10px 15px;" />Presto, they’re gone!
<ol>
<li>Close Minitab. When asked if you want to save changes to the project, click <strong>Yes</strong>.</li>
<li>Name the file and click <strong>Save</strong>.</li>
</ol>
<p>Minitab Statistical Software is closed. The settings for your analysis are nowhere to be found.</p>
Abracadabra—they’re back!
<ol>
<li>Reopen the project file that you saved.</li>
<li>Open the Basil2.MTW worksheet.</li>
<li>Choose <strong>Stat > Quality Tools > Capability Analysis > Multiple Variables (Normal)</strong>.</li>
</ol>
<p>The settings from your previous analysis have reappeared! All you have to do to complete the capability analysis, with all of your customizations, is click <strong>OK</strong>.</p>
Bask in the applause from the audience
<p>Keeping all of the parts of your analysis in one place is a great feature of Minitab’s project files. For people who routinely repeat the same analysis, the fact that the project file also remembers the settings that you used for your analysis is a fantastic time saver.</p>
<p>Whether you repeat an analysis weekly, quarterly, or even annually, Minitab’s ready to pick up right where you left off. This might not be quite as astounding as David Copperfield making the Statue of Liberty disappear and reappear, but if you want to get your statistical results fast and easy, it’s the best kind of magic.</p>
<p>Ready for more? Projects files and many other fundamental features of Minitab, are explained in the online <a href="http://support.minitab.com/en-us/minitab/17/getting-started/">Getting Started Guide</a>.</p>
Project ToolsStatistics HelpWed, 10 Dec 2014 15:42:00 +0000http://blog.minitab.com/blog/statistics-and-quality-improvement/the-world-famous-disappearing-reappearing-analysis-settings-actCody SteeleHow Cpk and Ppk Are Calculated, part 2
http://blog.minitab.com/blog/marilyn-wheatleys-blog/how-cpk-and-ppk-are-calculated2c-part-2
<p>Minitab's capability analysis output gives you estimates of the capability indices Ppk and Cpk, and we receive many questions about the difference between them. Some of my colleagues have taken other approaches to explain the difference between Ppk and Cpk, so I wanted to show you how they differ by detailing precisely how each one is calculated. </p>
<p><span style="line-height: 1.6;">When you're using <a href="http://www.minitab.com/products/minitab">statistical software</a> like Minitab, you don't need to do these calculations by hand, but I also want to lift the lid off the "black box" to show you what Minitab does behind the scenes to provide these figures. </span></p>
<p><span style="line-height: 1.6;">In my previous post, we saw <a href="http://blog.minitab.com/blog/marilyn-wheatleys-blog/how-cpk-and-ppk-are-calculated2c-part-1">how Ppk is calculated</a>. This time, we'll go through the calculation of Cpk, using the same sample data set in Minitab.</span><span style="line-height: 1.6;"> Go to <strong>File > Open Worksheet</strong>, click the "Look in Minitab Sample Data folder" button at the bottom, and open the dataset named CABLE.MTW.</span></p>
Calculating Within-Subgroup Standard Deviation
<p>Where Ppk uses the overall standard deviation, Cpk uses the within-subgroup standard deviation. Calculating Cpk is easy once we have an estimate of the within-subgroup standard deviation. The default method in Minitab for the within-subgroup calculation is the pooled standard deviation. The formula for this calculation from Methods and formulas is:</p>
<p><img alt="formula for pooled standard deviation" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/e40d989c5285189e341d5ab615b9bfe0/pooledsd.png" style="width: 642px; height: 376px;" /></p>
<p>This looks a little intimidating, but you’ll see it’s not so bad if we take it one step at a time.</p>
<p>First, we’ll calculate Sp. For this example, the subgroup size is fixed at 5. We’ll begin with a clean worksheet containing only the Diameter data in C1.</p>
<p>We need to estimate the mean of the data in each subgroup and store those values in the worksheet. To do that, we’ll create a column that defines our subgroups using <strong>Calc > Make Patterned Data > Simple Set of Numbers</strong>, and then completing the dialog box as shown below:</p>
<p><img alt="subgroups" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/98f4740c6516242f060b4f22b0fa43ca/subgroup.png" style="width: 638px; height: 360px;" /></p>
<p>With 100 data points and 5 points in each subgroup, we have 20 subgroups.</p>
<p>Now we can use our new column containing the subgroups to calculate the mean of each subgroup, using <strong>Stat > Basic Statistics > Store Descriptive Statistics</strong>. We complete the dialog box like in the example below, entering the <em>Diameter </em>column under Variables and the <em>Subgroup </em>column as the By variable:</p>
<p><img alt="descriptive statistics" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/9a50586cefdd8896b974097384cf37b4/descr_stats.png" style="width: 600px; height: 309px;" /></p>
<p><span style="line-height: 1.6;">We then click Options and choose <strong>Store a row of output for each row of input</strong>, uncheck <strong>Store district values of By variables</strong>, and then click OK in each dialog box. Now column C3 will show the average of each subgroup; the first 5 rows from C1 were used to calculate the mean of those first 5 rows, and that same mean value is displayed in the first 5 rows of C3.</span></p>
<p>We will now use these values to calculate the numerator for Sp using <strong>Calc > Calculator</strong>:</p>
<p><img alt="numerator for Sp" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/53aad02dfe93c8744f272a3ec3dabb76/calculator2.png" style="width: 609px; height: 400px;" /></p>
<p>We are summing the squared differences between each measurement and its subgroup mean. The Numerator column in the Minitab worksheet will show <strong>0.02735</strong> using the formula above.</p>
<p>Next, we calculate the denominator for Sp, which is the subgroup size minus 1, summed over all subgroups. Since we have a constant subgroup size of 5, and a total of 20 subgroups, an easy way to enter this in the calculator is:</p>
<p><img alt="denominator for Sp" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/22ec46ed79ede93c79af6e8288e8224b/calculator3.png" style="width: 593px; height: 396px;" /></p>
<p>Now with the numerator and denominator for Sp stored in the worksheet, we take the square root of Numerator/Denominator:</p>
<p><img alt="square root of numerator/denominator" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/29fe9f85f7fb535497948f3c5eb38451/calculator4.png" style="width: 590px; height: 394px;" /></p>
<p>Notice that the Sp value 0.0184899 is the estimate of the subgroup standard deviation if we tell Minitab NOT to use the unbiasing constant, C4, by clicking the Estimate button in the Normal Capability Analysis dialog box and then unchecking <strong>Use unbiasing constants</strong>. </p>
<p>Now to finish calculating the within-subgroup standard deviation using C4 (the default), we can look up C4 in the table that is linked in Methods and Formulas under the Methods heading.</p>
<p>The C4 value we need is C4 for (d + 1). As defined in Methods and formulas, d is the sum of (subgroup size – 1); in our case the subgroup size is fixed at 5, so 20*(5-1) = 80. If d = 80, we add 1 and get 81, so we look up N = 81 in the C4 column of unbiasing constants:</p>
<p><img alt="unbiasing constants" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/400dffc756ede4ae4f272cc10fb7a256/c4.png" style="width: 532px; height: 82px;" /></p>
<p><span style="line-height: 1.6;">We enter 0.996880 in column C7 in the worksheet and use it in the calculator to get the </span><span style="line-height: 20.7999992370605px;">pooled within-subgroup standard deviation</span><span style="line-height: 1.6;">:</span></p>
<p><span style="line-height: 1.6;"><img alt="within subgroup standard deviation" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/d22f12738d7eade1bef550a3bfb061c1/sdwithin.png" style="width: 595px; height: 400px;" /></span></p>
<p> We can see that this value matches the output from our initial capability analysis graph.</p>
<p><img alt="initial graph" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/6b18e5ea5e14c5f5992cc335766e505f/initial_graph.png" style="width: 171px; height: 139px;" /></p>
Calculating Cpk
<p>Finally, we use our within-subgroup standard deviation to calculate CPU and CPL. <span style="line-height: 1.6;">Cpk is the lesser of CPU and CPL, and we find these two formulas in <strong>Methods and Formulas</strong>:</span></p>
<p><img alt="formula for CPL" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/6ecec9d3f48c1f1209985461f453017c/cpl.png" style="width: 391px; height: 178px;" /><img alt="formula for cpu" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/dc08335c4a2e31422ec35c4bbad332e7/cpu.png" style="width: 369px; height: 180px;" /></p>
<p><span style="line-height: 1.6;">We calculate CPL and CPU as shown below using the calculator and the mean of the data that we previously calculated:</span></p>
<p><img alt="calculate cpl and cpu" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/cef4860ce67f56b0eded93d66f2a9fbc/calculator5.png" style="width: 600px; height: 464px;" /></p>
<p><span style="line-height: 1.6;">Since Cpk is the lesser of the two resulting values, Cpk is 0.83. That matches the Cpk value in Minitab’s capability output:</span></p>
<p><img alt="process capability for diameter" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/fa1311e847201a8b17e8993d6d4cd889/capability_for_diameter.png" style="width: 600px; height: 363px;" /></p>
<p>As long as you're using Minitab, you won't need to calculate Ppk and Cpk by hand. But I hope seeing the calculations Minitab uses to get these capability indices provides some insight into the differences between them! </p>
Data AnalysisQuality ImprovementStatisticsTue, 09 Dec 2014 13:00:00 +0000http://blog.minitab.com/blog/marilyn-wheatleys-blog/how-cpk-and-ppk-are-calculated2c-part-2Marilyn WheatleyHow Cpk and Ppk Are Calculated, part 1
http://blog.minitab.com/blog/marilyn-wheatleys-blog/how-cpk-and-ppk-are-calculated2c-part-1
<p>In technical support, we frequently receive calls from Minitab users who have questions about the differences between Cpk and Ppk. </p>
<p>Michelle Paret already wrote a great post about the <a href="http://blog.minitab.com/blog/michelle-paret/process-capability-statistics-cpk-vs-ppk">differences between Cpk and Ppk</a>, but it also helps to have a better understanding of the math behind these numbers. So in this post I will show you how to calculate Ppk using <a href="http://www.minitab.com/products/minitab">Minitab’s </a>default settings when the subgroup size is greater than 1. Then, in my next post, I’ll show you how to calculate Cpk.</p>
<p><strong>Default Capability Methods</strong></p>
<p>For data that follow the normal distribution, we use a Normal Capability Analysis (<strong>Stat</strong> > <strong>Quality Tools</strong> > <strong>Capability Analysis</strong> > <strong>Normal</strong>). If we click the <strong>Estimate</strong> button in the dialog box that comes up, we can see the default methods used in Minitab:</p>
<p><img alt="Capability Analysis dialog" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/52275bb70147375161326eab0f2bd313/normal_capability_formulas.png" style="width: 486px; height: 331px;" /></p>
<p>From the dialog box above, we can see that Minitab 17 uses the <strong>Pooled standard deviation</strong> when subgroup sizes are greater than 1. To see details of the formulas used, we can click the <strong>Help</strong> button in the lower-left corner. Then, from the Help window shown below, click the <strong>see also</strong> link, and then choose <strong>Methods and formulas</strong>:</p>
<p><img alt="See Also" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/29192d844342b339fd74c237791d03e3/see_also.png" style="width: 500px; height: 105px;" /></p>
<p><span style="line-height: 1.6;">The Methods and Formulas section shows the formulas Minitab uses. If we click </span><strong style="line-height: 1.6;">Estimating standard deviation</strong><span style="line-height: 1.6;">, we can find the formulas for the pooled standard deviation. Under the </span><strong style="line-height: 1.6;">Potential capability</strong><span style="line-height: 1.6;"> heading we find the formulas for Cpk, and under </span><strong style="line-height: 1.6;">Overall capability</strong><span style="line-height: 1.6;"> we find the formulas for Ppk.</span></p>
<p><img alt="Normal capability formulas" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/25d7bb4bf86d8f71581e45258abec264/normal_capability_3.png" style="width: 550px; height: 316px;" /></p>
<p>To illustrate the calculation of Ppk and Cpk, we’ll use a sample data set available in Minitab. Go to <strong>File</strong> > <strong>Open Worksheet</strong>, click the <strong>Look in Minitab Sample Data folder</strong> button at the bottom, and open the dataset named <strong>CABLE.MTW</strong>.</p>
<p>This sample data set is from a manufacturer of cable wire; the data was collected in subgroups of 5, then the diameters of the cables were recorded and entered in the Minitab worksheet. The lower spec limit for the cable diameter is 0.5 and the upper spec limit is 0.6.</p>
Calculating the Mean and Overall Standard Deviation
<p>If we use the information above to complete the Capability dialog box as shown below (accepting the default settings for the within-subgroup estimation method), Minitab gives us estimates of Cpk and Ppk:</p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/71b6cb372d266a437650ac2961d8c364/normal_capability_4.png" style="width: 600px; height: 369px;" /></p>
<p>But how exactly does Minitab arrive at these numbers? That's what we're going to find out. </p>
<p>Calculating Ppk is easier than Cpk, since Ppk is based on the overall standard deviation of the data instead of the within-subgroup standard deviation. We'll see the formula for that in my next post, but to obtain the overall standard deviation we can use <strong>Stat</strong> > <strong>Basic Statistics</strong> > <strong>Store Descriptive Statistics</strong> and store the standard deviation and the mean (which we’ll also need) in the worksheet:</p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/8f83bde3a5c94a2b86177fc5da9084dc/normal_capability_5.png" style="width: 620px; height: 379px;" /></p>
<p>(Be sure to click the <strong>Statistics</strong> button in the dialog box above to make sure only Mean and Standard Deviation are selected before clicking OK.)</p>
Calculating Ppk
<p>To get Ppk (which is the lesser of PPU and PPL), we need to calculate PPU and PPL using the standard deviation and mean shown above, following the formulas shown in Methods and Formulas:</p>
<p><img alt="PPU Fomula" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/bd3b1753ddfc6b8f0f294ebcb56bf14e/ppu.png" style="width: 385px; height: 223px;" /><img alt="PPL Formula" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/1b032a7193a7975b9d8b91ebd0dc3bdc/ppl.png" style="width: 385px; height: 230px;" /></p>
<p>In Minitab, we use <strong>Calc</strong> > <strong>Calculator</strong> to enter the formulas and store them in the worksheet:<br />
<br />
<img alt="calculator" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/981642c29835fcbb147bf452a74c8e6d/calculator.png" style="width: 550px; height: 653px;" /></p>
<p>The formulas above give us a PPU of 0.922719 and PPL of 0.800701. Since Ppk is the lesser of these two values, the Ppk is 0.80.</p>
<p>You'll note that this matches the results from the Minitab capability analysis output shown earlier:</p>
<p><img alt="Ppk" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f0c15293f5a3009d73400b379c275fe0/ppk.png" style="width: 220px; height: 127px;" /></p>
<p>Of course, it's easier to use Minitab's capability output than it is to do these calculations manually, but my goal is to lift the lid off the "black box" and give you an appreciation for what Minitab does behind the scenes to provide these figures. In my next post, we'll see how Cpk is calculated. </p>
<p> </p>
Data AnalysisQuality ImprovementStatisticsStatistics HelpMon, 08 Dec 2014 13:00:00 +0000http://blog.minitab.com/blog/marilyn-wheatleys-blog/how-cpk-and-ppk-are-calculated2c-part-1Marilyn WheatleyWhat Does It Mean When Your Probability Plot Has Clusters?
http://blog.minitab.com/blog/the-statistical-mentor/what-does-it-mean-when-your-probability-plot-has-clusters
<p><span style="line-height: 1.6;">Have you ever had a probability plot that looks like this?</span></p>
<p><img alt="Probability Plot of Patient Weight Before and After Surgery" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/885af8cd49c44b34709a529d3c4a78dd/probability_plot.png" style="width: 550px; height: 366px;" /></p>
<p>The probability plot above is based on patient weight (in pounds) after surgery minus patient weight (again, in pounds) before surgery.</p>
<p>The red line appears to go through the data, indicating a <a href="http://blog.minitab.com/blog/fun-with-statistics/normal-the-kevin-bacon-of-distributions">good fit to the Normal</a>, but there are clusters of plotting points at the same measured value. This occurs on a probability plot when there are many ties in the data. If the true measurement can take on any value (in other words, if the variable is continuous), then the cause of the clusters on the probability plot is poor measurement resolution.</p>
<p>The Anderson-Darling Normality test typically rejects normality when there is poor measurement resolution. In a previous blog post (<span><a href="http://blog.minitab.com/blog/the-statistical-mentor/normality-tests-and-rounding">Normality Tests and Rounding</a></span>) I recommended using the Ryan-Joiner test in this scenario. The Ryan-Joiner test generally does not reject normality due to poor measurement resolution. </p>
<p>In this example, the Ryan-Joiner p-value is above 0.10. A probability plot that supports using a Normal distribution would be helpful to confirm the Ryan-Joiner test results. How can we see a probability plot of the true weight differences? Simulation can used to show how the true weight differences might look on a probability plot.</p>
<p>The difference in weight values were rounded to the nearest pound. In effect, we want to add a random value from -0.5 to +0.5 to each value to get a simulated measurement. The steps are as follows:</p>
<ol>
<li>Store simulated noise values from -0.5 to +0.5 in a column using <strong>Calc > Random Data > Uniform</strong>.</li>
<li>Use <strong>Calc > Calulator</strong> to add the noise column to the original column of data.</li>
<li>Create a normal probability plot using <strong>Stat > Basic Statistics > Normality Test</strong>.</li>
<li>Repeat steps 1-3 several times if you want to see how the results are affected by the simulated values.</li>
</ol>
<p>The resulting graph from one iteration of these steps is shown below. It suggests that the Normal distribution is a good model for the difference in weights for this surgery.</p>
<p><img alt="Probability plot with simulated measurements" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/9035bd61788a55da21c2fb4c437a0fc1/probability_plot_simulated_measurements.png" style="width: 550px; height: 369px;" /></p>
<p>Minitab will deliver a presentation on <a href="http://www.ihi.org/education/Conferences/Forum2014/Pages/Overview.aspx" target="_blank">Detecting and Analyzing Non-Normal Data at the IHI conference in Orlando FL</a> on Monday, December 8th 2014. We also are developing a 1-day training course called Detecting and Analyzing Non-Normal data to be released in 2015.</p>
Data AnalysisHealth Care Quality ImprovementStatisticsThu, 04 Dec 2014 13:00:00 +0000http://blog.minitab.com/blog/the-statistical-mentor/what-does-it-mean-when-your-probability-plot-has-clustersJim ColtonAn Unauthorized Biography of the Stem-And-Leaf Plot, Part I: A Stem by Any Other Name
http://blog.minitab.com/blog/data-analysis-and-quality-improvement-and-stuff/an-unauthorized-biography-of-the-stem-and-leaf-plot2c-part-i3a-a-stem-by-any-other-name
<p>Greetings fair reader. In the past, I've written several posts with practical tips related to Minitab graphs, such as:</p>
<ul>
<li>How to discuss the sensitive issue of <a href="http://blog.minitab.com/blog/data-analysis-and-quality-improvement-and-stuff/a-visit-with-my-doctor-part-1-monitoring-cholesterol-and-wait-times">P charts</a> and <a href="http://blog.minitab.com/blog/data-analysis-and-quality-improvement-and-stuff/control-charts-and-a-visit-with-my-doctor-part-2-its-important-to-talk-to-your-doctor-about-overdispersion" >Laney P' charts</a> with your doctor</li>
<li>How to use a <a href="http://blog.minitab.com/blog/data-analysis-and-quality-improvement-and-stuff/using-g-whiz-charts-to-track-elusive-affirmations-from-almost-adolescents">G chart</a> to monitor parenting success</li>
<li>How to use a <a href="http://blog.minitab.com/blog/data-analysis-and-quality-improvement-and-stuff/it-was-the-best-of-times-it-was-the-end-of-times">scatterplot</a> to start your own doomsday cult</li>
</ul>
<p> In this post, I thought I'd take a step back and explore the historical side of Minitab's graphs. Specifically, we'll explore the history behind the dodo bird of the graph kingdom: the Stem-and-Leaf plot. </p>
<p>Recently, I was chatting with Bob (not her real name) in Minitab's excellent <a href="http://support.minitab.com">Technical Support department</a>. Bob recently got a call from a customer who wanted to know how the Stem-and-Leaf plot got its name. Imagine my shock and horror as Bob explained that many budding statisticians don't know this story! (For the record, I personally do not subscribe to the notion that <em>budding </em>is the only successful reproductive strategy for statisticians like myself. There's also cloning.) </p>
<p>For the curious, but uninitiated, here's an example of a Stem-and-Leaf plot:</p>
<p style="margin-left: 40px;"><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/6b2adf3d191bebea0c987a314e0d2a73/snplot_sans_counts.jpg" style="width: 269px; height: 240px;" /></p>
<p>Each digit in the right column represents a single value from the sample. The left column serves as a base of equally spaced intervals (or "bins," as I will later call them). Together, the value in the left column and the digits in the right column give you the data values in each bin. For example, the largest value in the sample of budding rates is 15.3 (bottom row).</p>
<p>At this juncture, experienced readers might ask, "Wait a minute, what about the counts?" I, for one, take no stock in antiquated monarchistic hierarchies, so I tend to leave out the counts. However, you are correct: Stem-and-Leaf plots often include a column of counts on the far left. For example, here is a Stem-and-Leaf plot as it appears in <a href="http://www.minitab.com/products/minitab">Minitab Statistical Software</a>:</p>
<p style="margin-left: 40px;"><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/d4367867f15d0e005549fb400af5e71a/snplot_from_minitab.jpg" style="width: 353px; height: 318px;" /></p>
<p>The left-most column shows the cumulative count of observations from the top to the middle and from the bottom to the middle. In the example, the counts indicated that the top 5 rows include a total of 14 observations. And the bottom 5 rows also contain 14 observations.</p>
The Seeds of a Plot
<p>The Stem-and-Leaf plot is sometimes called a "character graph", owing (no doubt) to the colorful and interesting characters who invented it: Dr. Woodrow "Woody" Stem, and Dr. August "Russell" Leaf. (I think they won the Nobel prize for their work. I don't know for sure, but I'd have to look it up, so let's go with that.)</p>
<p>Woody and Russell were aspiring students under the careful tutelage of renowned statistician, Dr. Histeaux Graham. The good professor challenged our heroes, as he did all of his students, to come up with a new and better way to examine the distribution of values in a sample. (Mind you, this was before computers, calculators, or even highly-leveraged derivatives trading.)</p>
<p>Legend has it that our heroes Woody and Russell "got into it" one night at a pub, after a particularly intense lecture by Dr. Graham on the twin scourges of platykurtosis and leptokurtosis. (Mind you, this was before penicillin.)</p>
<p>Woody was adamant that in order to meet Dr. Graham's, challenge they must divide the sample into equal intervals (or "bins," as he would later call them).</p>
<p>Russell insisted that Woody's idea was a load of BS (Basic Statistics). "How can you understand a sample," Russell demanded, "by looking at a bunch of evenly spaced intervals, or 'bins' as you will no doubt resort to calling them?!?"</p>
<p>Sadly, it was at this juncture that our learned gentlemen succumbed to more pugilistic impulses: fisticuffs broke out. (Mind you, this was before boxing gloves, moisturizing cream, or even those convenient Isotoner one-size-fits-most gloves.)</p>
<p>I will spare you the details, but suffice it to say that the damage was significant. Woody looked like he had been worked over with the business end of a boxplot, and Russell's wallis was completely kruskaled. It looked like Dr. Graham's challenge might go answered. And it may have, but for an unlikely twist of fate.</p>
<p><u><a href="http://blog.minitab.com/blog/data-analysis-and-quality-improvement-and-stuff/an-unauthorized-biography-of-the-stem-and-leaf-plot-part-ii%3A-a-new-leaf"><strong><span style="color:#008080;">To be continued ...</span></strong></a></u></p>Data AnalysisStatisticsWed, 03 Dec 2014 13:02:00 +0000http://blog.minitab.com/blog/data-analysis-and-quality-improvement-and-stuff/an-unauthorized-biography-of-the-stem-and-leaf-plot2c-part-i3a-a-stem-by-any-other-nameGreg FoxFour Quick Tips for Editing Control Charts
http://blog.minitab.com/blog/quality-data-analysis-and-statistics/four-quick-tips-for-editing-control-charts
<p><span style="line-height: 1.6;">Hi everyone! Over the past month, I fielded some interesting customer calls regarding control chart creation and editing. I wanted to share these potential scenarios with you in hopes that you will find them informative and useful. For these scenarios, I used the XBar-R chart as my template, but you could easily apply them to many of the <a href="http://blog.minitab.com/blog/understanding-statistics/control-chart-tutorials-and-examples">other control charts</a> in Minitab. </span></p>
<strong>Scenario 1: Create a Control Chart with Stages</strong>
<p>Suppose you want to create an XBar-R Chart with stages. Stages show how a process changes over specific time periods. At each stage, Minitab <a href="http://www.minitab.com/products/minitab">Statistical Software</a> recalculates the center line and control limits on the chart by default.</p>
<p align="center"><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f7e1af57-c25e-4ec3-a999-2166d525717e/Image/9b347489cec48da784d0531caa260717/blog1a.png" style="width: 452px; height: 172px;" /></p>
<p>You decide to create a two-stage chart. However, you want to use a historical standard deviation of 2 for the first stage, as opposed to letting Minitab calculate it for you. For the second stage, you’ll let Minitab calculate the standard deviation.</p>
<p>You can enter historical estimates for the standard deviation under the Parameters tab under the Xbar-R Options sub-menu:</p>
<p style="text-align: center;"><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f7e1af57-c25e-4ec3-a999-2166d525717e/Image/66eafd0c7e3ed55fd36b3c989dacc6e1/blog1a1.png" style="text-align: center; line-height: 1.6; width: 444px; height: 376px;" /></p>
<p>You may be inclined to enter in <em>2 </em>into the second box and hit OK, but that will set the standard deviation to 2 for both stages. You’ll need to add an asterisk to represent the stage that is not affected by the historical estimate:</p>
<p align="center"><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f7e1af57-c25e-4ec3-a999-2166d525717e/Image/5ef238c65d592a2af67bed95db83babe/blog1b.png" style="line-height: 20.7999992370605px; width: 444px; height: 376px;" /></p>
<p>The resulting Xbar-R chart will set the standard deviation to 2 for the first stage, leaving the second stage unaffected.</p>
<strong>Scenario 2: Showing Control Limits for Different Stages</strong>
<p>Let’s piggyback off of our first scenario and look at an Xbar-R chart with stages:</p>
<p align="center"><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f7e1af57-c25e-4ec3-a999-2166d525717e/Image/d78dd4fc738005585ce6eb242ef802dc/blog1c.png" style="line-height: 20.7999992370605px; width: 577px; height: 385px;" /></p>
<p>You'll notice that <span style="line-height: 20.7999992370605px;">only</span><span style="line-height: 1.6;"> the last stage’s control limits are displayed, but you really want the first stage's to be displayed as well. This change can be made in the Xbar-R Options sub-menu, under the Display tab:</span></p>
<p align="center"><img src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f7e1af57-c25e-4ec3-a999-2166d525717e/Image/8fbb1fceb178b530bbc642ee06aae151/blog1d.png" style="line-height: 20.7999992370605px; width: 456px; height: 304px;" /></p>
<p>After checking this box and hitting OK a few times, your Xbar-R Chart will show the control limits for all stages. You could set this to be the default behavior under <strong>Tools > Options >Control Charts</strong> and <strong>Quality Tools > Other</strong>:</p>
<p align="center"><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f7e1af57-c25e-4ec3-a999-2166d525717e/Image/08b0f57bddda14f9321125c92f2d4cf3/blog1e.png" style="line-height: 20.7999992370605px; width: 637px; height: 427px;" /></p>
<strong>Scenario 3: Hiding Symbols on Your Control Charts</strong>
<p>On our Xbar-R chart, let's hide the symbols from the top graph (Xbar) by right-clicking on one of the symbols, and going to the Edit Symbols sub-menu…</p>
<p align="center"><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f7e1af57-c25e-4ec3-a999-2166d525717e/Image/93f62c65882b181e0d40cfe02d28a523/blog1f.png" style="width: 432px; height: 376px;" /></p>
<p>Choose "None" under "Custom" and select OK. The symbols disappear from the top graph. But what if we have a change of heart and want those symbols back? We right-click on different places on the graph, but all we see are options for “Edit Figure Region…”, “Edit Data Region”, or “Edit Connect Line…” Uh-oh. Have we lost our symbols?</p>
<p>Fortunately, we have not. Go to the Editor Menu. (Make sure you have your graph selected prior to doing this. The Editor Menu dynamically changes based on what you are currently selecting with Minitab.) Under <strong>Editor > </strong><strong>Select Item</strong>, select "Symbols." If we had hidden the points from the bottom graph, we would have selected "Symbols 2."</p>
<p align="center"><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f7e1af57-c25e-4ec3-a999-2166d525717e/Image/8241a72e4f608e7b20478e38bba1e292/blog1g.png" style="width: 534px; height: 444px;" /></p>
<p>After the symbols have been selected, press CTRL + T to go to the ‘Edit Symbols’ dialog window or simply to go <strong>Editor > </strong><strong>Edit Symbols…</strong></p>
<strong>Scenario 4: Setting Data Labels on Control Charts</strong>
<p>I would like to close this post with a very minor but useful tip. When adding reference lines to a control chart, you can choose whether you want the data label to appear on the lower end or higher end of the line. <span style="line-height: 1.6;">First add a reference line to your control chart by right clicking on your chart and selecting </span><strong style="line-height: 1.6;">Add > </strong><strong style="line-height: 1.6;">Reference Lines…</strong></p>
<p>Fill out the dialog as you please by entering a few values, and hit OK to add your reference line to the chart. With the reference line selected, press CTRL+T to open up the Edit Reference Lines dialog. Under the Show tab, you can choose what side of the reference line you want the label to appear.</p>
<p align="center"><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f7e1af57-c25e-4ec3-a999-2166d525717e/Image/9f4ca10a24fb6dad9e86ff1d3ab7a1b1/blog1h.png" style="width: 432px; height: 376px;" /></p>
<p align="center"><strong>Before:</strong></p>
<p align="center"><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f7e1af57-c25e-4ec3-a999-2166d525717e/Image/5118e4e23cd44b5b12665d3141fd96f7/blog1i.png" style="width: 562px; height: 116px;" /></p>
<p align="center"><strong>After Changing to ‘Low Side’:</strong></p>
<p align="center"><img src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f7e1af57-c25e-4ec3-a999-2166d525717e/Image/300b52362f3278add60fc0d42ddeb0ca/blog1j.png" style="width: 533px; height: 130px;" /></p>
<p> </p>
<p><span style="line-height: 1.6;">Sharing these tips has helped many people who have contacted us about control charts over the years, and I hope they will help you the next time you find yourself in our control chart menus!</span></p>
<p> </p>
Data AnalysisQuality ImprovementMon, 01 Dec 2014 13:00:00 +0000http://blog.minitab.com/blog/quality-data-analysis-and-statistics/four-quick-tips-for-editing-control-chartsAndy CheshireGiving Thanks for Ways to Edit a Bar Chart of Pies
http://blog.minitab.com/blog/statistics-and-quality-improvement/giving-thanks-for-ways-to-edit-a-bar-chart-of-pies
<p>My siblings occasionally remind me that because I’m getting older, one day, my metabolism is going to collapse. When that day comes, consuming mass quantities of food will surely lead to the collapse of my body, mind, and soul. But, as that day is coming slowly, on Thanksgiving, I’m an every-pie-kind-of-guy.</p>
<p>Now, I know what you’re thinking. It’s Thanksgiving. I’ve just mentioned pies. We’re going to look at pie charts of pies. If you really want to look at pie charts of pies, go ahead and get it out of your system:</p>
<p><a href="http://cf2s1.cbncdn.com/wp-content/blogs.dir/1/files/2012/11/pies_final.jpg">2012 survey by National Public Radio about pie preferences</a></p>
<p><a href="http://www.livescience.com/33111-favorite-pie-america.html">2008 survey by Schwan’s Consumer Brands North America</a></p>
<p><a href="http://thecreatorsproject.vice.com/blog/a-robot-that-puts-pie-charts-onto-actual-pies">A Robot that Puts Pie Charts onto Actual Pies</a></p>
<p>In this post, we’re going to do something more like this:</p>
<p></p>
<p>At our house, we usually do three pies for Thanksgiving: Pumpkin, Chess, and Pecan. I’m going to use a chart of these to show you the things I’m most thankful you can do after you’ve made your bar chart in Minitab. Let’s say that we start with a chart of the calories per slice.</p>
<p><img alt="The default graph has all blue bars. In this case, the order of the bars is the order from the worksheet." src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/0e3e9622c239cb3ac8587a5168c98f95/default_graph.png" style="border-width: 0px; border-style: solid; width: 576px; height: 384px;" /></p>
Reorder the bars
<p>These bars are presently in the order that they were listed in the worksheet. But I like to eat them in order of difficulty, starting with the pecan and easing towards the pumpkin. This tends to follow the order of the calories, so we can put the pies in descending order.</p>
<ol>
<li>Double-click the bars.</li>
<li>Select the <strong>Chart Options</strong> tab.</li>
<li>In <strong>Order Main X Groups By</strong>, select <strong>Decreasing Y</strong>. Click <strong>OK</strong>.</li>
</ol>
<p><img alt="The pecan pie is on the left because it has the most calories. Other pies follow in descending order." src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/1f20a2cee7c03e5d894dd21791999e57/step_1_ordering.png" style="border-width: 0px; border-style: solid; width: 577px; height: 385px;" /></p>
Add labels that show the y-values
<p>Bar charts are great for making comparisons. Ordering them makes it even clearer which categories are greatest and which are least. But if you want to get precise numbers, you can easily add labels that show the values from the data.</p>
<ol>
<li>Right-click the graph.</li>
<li>Select <strong>Add > Data Labels</strong>. Click <strong>OK</strong>.</li>
</ol>
<p><img alt="The numbers above the bars give the exact number of calories per slice." src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/4b92591995ee62865f6354f8d7ac6215/step_2_labeling.png" style="border-width: 0px; border-style: solid; width: 576px; height: 384px;" /></p>
Accumulate bars
<p>As an every-pie-kind-of-guy, one of the things I might want to know is how many calories I eat when I have a slice of each pie. That’s the kind of situation when it’s helpful to accumulate Y across X.</p>
<ol>
<li>Double-click the bars.</li>
<li>Select the <strong>Chart Options</strong> tab.</li>
<li>In <strong>Percent and Accumulate</strong>, check <strong>Accumulate Y across X</strong>. Click <strong>OK</strong>.</li>
</ol>
<p>The resulting graph shows the number of calories for a slice of pecan, for a slice of pecan and a slice of chess, and for a slice of all 3.</p>
<p><img alt="The right bar shows the number of calories if I eat one slice of each pie." src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/fa96c4e43cfb45b871b91caa0265999e/step_3_accumulate.png" style="border-width: 0px; border-style: solid; width: 576px; height: 384px;" /></p>
Edit the fill patterns
<p>Like when you’re making a graph about pies, it’s often helpful to make colorful bars that help to represent the categories in the data. In this case, all you have to do is follow these steps:</p>
<ol>
<li>Click the bars in the graph once to select all of them.</li>
<li>Click one of the bars in the graph once to select only one bar.</li>
<li>Double-click the selected bar to edit the bar.</li>
<li>In <strong>Fill Pattern</strong>, select <strong>Custom</strong>.</li>
<li>From <strong>Background color</strong>, select the color that represents your category. Click <strong>OK</strong>.</li>
</ol>
<p>For example, we could make the pecan bar “chestnut,” the chess bar “gold,” and the pumpkin bar “orange.”</p>
<p><img alt="Colors of the bars are the colors of the pies." src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/8b4d20a4aff1a324e61f847949df3d89/step_4_coloring.png" style="border-width: 0px; border-style: solid; width: 576px; height: 384px;" /></p>
<p>It’s generally best to leave this step to last, because some other editing steps, like changing the order, can change the bar colors.</p>
Wrap up
<p>Very often, editing a graph so that it presents the message that you want is easier once you’re able to see the graph. That makes it wonderful that it’s so easy to edit a graph after you’ve already made it in Minitab. To see even more about what you can do with different types of graphs, check out <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/graph-options/">the list of graph options</a>. And have a Happy Thanksgiving where you are!</p>
Fun StatisticsWed, 26 Nov 2014 15:16:11 +0000http://blog.minitab.com/blog/statistics-and-quality-improvement/giving-thanks-for-ways-to-edit-a-bar-chart-of-piesCody SteeleLessons in Quality from Guadalajara and Mexico City
http://blog.minitab.com/blog/understanding-statistics-and-its-application/lessons-in-quality-from-guadalajara-and-mexico-city
<p><img alt="View of Mexico City" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/8e5ec9217bc8fbc2ca7a6784a1efcdfa/mexico_df_400w.jpg" style="border-width: 1px; border-style: solid; margin: 10px 15px; float: right; width: 400px; height: 235px;" />Last week, thanks to the collective effort from many people, we held very successful events in Guadalajara and Mexico City, which gave us a unique opportunity to meet with over 300 Spanish-speaking Minitab users. They represented many different industries, including automotive, textile, pharmaceutical, medical devices, oil and gas, electronics, and mining, as well as academic institutions and consultants.</p>
<p>As I listened to my peers Jose Padilla and <a href="http://blog.minitab.com/blog/marilyn-wheatleys-blog">Marilyn Wheatley</a> deliver their presentations, it was interesting to see people's reactions as they learned more about our products and services. Several attendees were particularly pleased to learn more about Minitab's ease-of-use and <a href="http://www.minitab.com/products/minitab/assistant/">step-by-step help with analysis</a> offered by the Assistant menu. I saw others react to demonstrations of Minitab's comprehensive Help system, the use of executables for automation purposes, and several of the tips and tricks discussed throughout our presentations.</p>
<p>We also had multiple conversations on Minitab's flexible licensing options. Several attendees who spend a lot of time on the road were particularly glad to learn about our <a href="http://support.minitab.com/installation/frequently-asked-questions/license-fulfillment/borrow-a-license-of-minitab-companion/">borrowing functionality</a>, which lets you “check out” a license so you can use Minitab software without accessing your organization’s license server.</p>
Acceptance Sampling Plans
<p>There were plenty of technical discussions as well. One interesting question came from a user who asked how Minitab's Acceptance Sampling Plans compare to the <a href="http://asq.org/knowledge-center/ANSI_ASQZ1_4-2008/index.html">ANSI Z1.4</a> standard (a.k.a. MIL-STD 105E). The short answer is that the tables provided by the ANSI Z1.4 are for a specific AQL (Acceptable Quality Level), while implicitly assuming a certain RQL (Rejectable Quality Level) based solely on the lot size. The ANSI Z1.4 is an AQL-based system, while Minitab's acceptance sampling plans give you the flexibility to create a customized sampling scheme for a specific AQL, RQL, or lot size using both the binomial or hypergeometric distributions.</p>
Destructive Testing and Gage R&R
<p>Other users had questions about Gage R&R and destructive testing. Practitioners commonly assess a destructive test using Nested Gage R&R; however, this is not always necessary. The main problem with destructive testing is that every part tested is destroyed and thus can only be measured by a single operator. Since the purpose of this type of analysis is to measure the repeatability and reproducibility of the measurement system, one must identify parts that are as homogeneous as possible. Typically, instead of 10 parts, practitioners may use multiple parts from each of 10 batches. If the within-batch variation is small enough then the parts from each batch can be considered to be "the same" and thus the readings measured by all the operators can be used to produce repeatability and reproducibility measures. The main trick is to have homogenous units or batches that can give you enough samples to be tested by all operators for all replicates. If this is the case, you can analyze a destructive test with crossed gage R&R.</p>
Control Charts and Subgroup Size
<p>We also had an interesting discussion about the sensitivity of Shewhart <a href="http://blog.minitab.com/blog/understanding-statistics/control-chart-tutorials-and-examples">control charts</a> to the subgroup size. Specifically, one of the attendees asked our recommendation for subgroup size: 4, or 5? </p>
<p>The answer to this intriguing question requires an understanding of the reason why subgroups are recommended. Control charts have limits that are constructed so that if the process is stable, the probability of observing points out of these control limits is very small; this probability is typically referred to as the false alarm rate and it is usually set at 0.0027. This calculation assumes the process is normally distributed, so if we were plotting the individual data as in an Individuals chart, the control limits would be effective to determine an out-of-control situation only if the data came from a normal distribution. To reduce the dependence on normality, Shewhart suggested collecting the data in subgroups, because if we plot the means instead of the individual data the control limits would become less and less sensitive to normality as the subgroup size increases. This is a result of the Central Limit Theorem (CLT), which states that regardless of the underlying distribution of the data, that if we take independent samples and compute the average (or a sum) of all the observations in each sample then the distribution of these sample means will converge to a normal distribution.</p>
<p>So going back to the original question, what is the recommended subgroup size for building control charts? The answer depends on how skewed the underlying distribution may be. For various distributions a subgroup size of 5 is sufficient to have the CLT kick in making our control charts robust to normality; however for extremely skewed distributions like the exponential, the subgroup sizes may need to be much larger than 50. This topic was discussed in a paper Schilling and Nelson titled "<a href="http://asq.org/qic/display-item/?item=5238">The Effect of Non-normality on the Control Limits of Xbar Charts</a>" published in JQT back in 1976.</p>
Analyzing Variability
<p>We also had a great discussion about modeling variability in a process. One of the attendees, working for McDonald's, was looking for statistical methods for reducing the variation of the weight of apple slices. An apple is cut in 10 slices, and the goal was to minimize the variation in weight so that exactly four slices be placed in each bag without further rework. This gave me the opportunity to demonstrate how to use the <a href="http://blog.minitab.com/blog/adventures-in-statistics/assessing-variability-for-quality-improvement">Analyze Variability</a> command in Minitab, which happens to be one of the topics we cover in our <a href="http://www.minitab.com/training/courses/#doe-in-practice-manufacturing">DOE in Practice</a> course.</p>
We Love Your Questions
<p>For me and my fellow trainers, there’s nothing better than talking with people who are using Minitab software to solve problems. Sometimes we’re able to provide a quick, helpful answer. Sometimes a question provokes a great discussion about some quality challenge we all have in common. And sometimes a question will lead to a great idea that we’re able to share with our developers and engineers to make our software better. </p>
<p>If you have a question about Minitab, statistics, or quality improvement, please feel free to comment here. And if you use Minitab software, you can always contact our <a href="http://www.minitab.com/support/">customer support</a> team for direct assistance from specialists in IT, statistics, and quality improvement.</p>
<p> </p>
Quality ImprovementStatisticsStatistics HelpWed, 19 Nov 2014 13:57:02 +0000http://blog.minitab.com/blog/understanding-statistics-and-its-application/lessons-in-quality-from-guadalajara-and-mexico-cityEduardo SantiagoWhat to Do When Your Data's a Mess, part 3
http://blog.minitab.com/blog/understanding-statistics/what-to-do-when-your-datas-a-mess2c-part-3
<p>Everyone who analyzes data regularly has the experience of getting a worksheet that just isn't ready to use. Previously I wrote about tools you can use to <a href="http://blog.minitab.com/blog/understanding-statistics/what-to-do-when-your-data-is-a-mess-part-1">clean up and elminate clutter in your data</a> and <a href="http://blog.minitab.com/blog/understanding-statistics/what-to-do-when-your-datas-a-mess2c-part-2">reorganize your data</a>. </p>
<p><span style="line-height: 1.6;">In this post, I'm going to highlight tools that help you get the most out of messy data by altering its characteristics.</span></p>
Know Your Options
<p>Many problems with data don't become obvious until you begin to analyze it. A shortcut or abbreviation that seemed to make sense while the data was being collected, for instance, might turn out to be a time-waster in the end. What if abbreviated values in the data set only make sense to the person who collected it? Or a column of numeric data accidentally gets coded as text? You can solve those problems quickly with <a href="http://www.minitab.com/products/minitab">statistical software</a> packages.</p>
Change the Type of Data You Have
<p>Here's an instance where a data entry error resulted in a column of numbers being incorrectly classified as text data. This will severely limit the types of analysis that can be performed using the data.</p>
<p><img alt="misclassified data" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/c45b427d3e5e2b5eac4a505ed5c3b24f/misclassified_data.png" style="width: 200px; height: 156px;" /></p>
<p>To fix this, select <strong>Data > Change Data Type</strong> and use the dialog box to choose the column you want to change.</p>
<p><img alt="change data type menu" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/46ece127300500409098383a2e476a9b/text_to_numeric_data.png" style="width: 376px; height: 175px;" /></p>
<p>One click later, and the errant text data has been converted to the desired numeric format:</p>
<p><img alt="numeric data" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f1b9df0211f9085e577a41b0e3661b45/numeric_data.png" style="width: 200px; height: 156px;" /></p>
Make Data More Meaningful by Coding It
<p>When this company collected data on the performance of its different functions across all its locations, it used numbers to represent both locations and units. </p>
<p><img alt="uncoded data" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/d22a57fe9e9e398bd948e86c0adafe34/uncoded_data.png" style="width: 135px; height: 158px;" /></p>
<p>That may have been a convenient way to record the data, but unless you've memorized what each set of numbers stands for, interpreting the results of your analysis will be a confusing chore. You can make the results easy to understand and communicating by coding the data. </p>
<p>In this case, we select <strong>Data > Code > Numeric to Text...</strong></p>
<p><img alt="code data menu" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/c75e46cc190497fd41b0e6736518c0fe/code_data_menu.png" style="width: 384px; height: 255px;" /></p>
<p>And we complete the dialog box as follows, telling the software to replace the numbers with more meaningful information, like the town each facility is located in. </p>
<p><img alt="Code data dialog box" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/cd75c14324187806b8f3a74a3b8996b4/code_data_dialog.png" style="width: 400px; height: 345px;" /></p>
<p>Now you have data columns that can be understood by anyone. When you create graphs and figures, they will be clearly labelled. </p>
<p><img alt="Coded data" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/7ff81bdb08170d6d8a4e8547623cf557/coded_data.png" style="width: 161px; height: 200px;" /></p>
Got the Time?
<p>Dates and times can be very important in looking at performance data and other indicators that might have a cyclical or time-sensitive effect. But the way the date is recorded in your data sheet might not be exactly what you need. </p>
<p>For example, if you wanted to see if the day of the week had an influence on the activities in certain divisions of your company, a list of dates in the MM/DD/YYYY format won't be very helpful. </p>
<p><img alt="date column" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f5b0dd178afbc0352f8dc2d9378e887b/date_column.png" style="width: 240px; height: 223px;" /></p>
<p>You can use <strong>Data > Date/Time > Extract to Text... </strong>to identify the day of the week for each date.</p>
<p><img alt="extract-date-to-text" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/7e6f7e8a87ee8291b9c6d51507092c19/extract_date_to_text.png" style="width: 351px; height: 132px;" /></p>
<p>Now you have a column that lists the day of the week, and you can easily use it in your analysis. </p>
<p><img alt="day column" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/dede93c9621917a0cfb54beef121d4e2/day_column.png" style="width: 249px; height: 205px;" /></p>
Manipulating for Meaning
<p>These tools are commonly seen as a way to correct data-entry errors, but as we've seen, you can use them to make your data sets more meaningful and easier to work with.</p>
<p>There are many other tools available in Minitab's Data menu, including an array of options for arranging, combining, dividing, fine-tuning, rounding, and otherwise massaging your data to make it easier to use. Next time you've got a column of data that isn't quite what you need, try using the Data menu to get it into shape.</p>
<p> </p>
<p> </p>
Data AnalysisStatisticsStatsMon, 17 Nov 2014 13:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/what-to-do-when-your-datas-a-mess2c-part-3Eston Martz