Data Analysis Software | MinitabBlog posts and articles with tips for using statistical software to analyze data for quality improvement.
http://blog.minitab.com/blog/data-analysis-software/rss
Sun, 28 May 2017 02:57:32 +0000FeedCreator 1.7.3Reducing the Phone Bill with Statistical Analysis
http://blog.minitab.com/blog/understanding-statistics/reducing-the-phone-bill-with-statistical-analysis
<p>One of the most memorable presentations at the inaugural Minitab Insights conference reminded me that data analysis and quality improvement methods aren't only useful in our work and businesses: they can make our home life better, too. </p>
<p><img alt="you won't believe how cheap my phone bill is now! " src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/8bd65d370741a7670c0c17d88ea157d2/phone_kid.jpg" style="width: 291px; height: 225px; float: right; margin: 10px 15px;" />The presenter, a continuous improvement training program manager at an aviation company in the midwestern United States, told attendees how he used Minitab Statistical Software, and some simple quality improvement tools, to reduce his phone bill.</p>
<p>He took the audience back to 2003, when his family first obtained their cell phones. For a few months, everything was fine. Then the April bill arrived, and it was more than they expected. The family had used too many minutes. </p>
<p>The same thing happened again in May. In June, the family went over the number of minutes allocated in their phone plan again, for the third month in row. Something had to change!</p>
Defining the Problem
<p>His wife summed up the problem this way: "There is a problem with our cell phone plan, because the current minutes are not enough for the family members over the past three months." </p>
<p>He wasn't sure that "too few minutes" was the real problem. But instead of arguing, he applied his quality improvement training to find common ground. He and wife agreed that the previous three months' bills were too much, and they were able to agree that the family went over the plan minutes—for an unknown reason. Based on their areas of agreement, they revised the initial problem statement: </p>
<p style="margin-left: 40px;"><em>There is a problem with our cell phone usage, and this is known because the minutes are over the plan for the past 3 months, leading to a strain on the family budget.</em></p>
<p>They further agreed that before taking further action—like switching to a costlier plan with more minutes—they needed to identify the root cause of the overage. </p>
Using Data to Find the Root Cause(s)
<img alt="pie chart of phone usage" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a60a867c650a9998e5abcad46b6f68c0/phone_usage_1.png" style="width: 200px; height: 250px; margin: 10px 15px; float: right;" />
<div>
<p>At this point, he downloaded the family's phone logs from their cell phone provider and began using <a href="http://www.minitab.com/products/minitab/">Minitab Statistical Software</a> to analyze the data. First, he used a simple pie chart to look at who was using the most minutes. Since he also had a work-provided cell phone, it wasn't surprising to see that his wife used 4 minutes for each minute of the family plan he used. </p>
<p>Since his wife used 75% of the family's minutes, he looked more closely for patterns and insights in her call data. He created time series plots of her daily and individual call minutes, and created I-MR and Xbar-S charts to assess the stability of her calling process over time. </p>
<p style="margin-left: 40px;"><img alt="I-MR chart of daily phone minutes" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/1c1bba795b4182fbabcde66f1e0623bf/phone_usage_2_i_mr.png" style="width: 500px; height: 333px; border-width: 1px; border-style: solid;" /></p>
<p style="margin-left: 40px;"><img alt="Xbar-S Chart of Daily Minutes Per Week" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/c525f6527170e396bfc1f513794cd2b7/phone_usage_3_xbar_s.png" style="width: 500px; height: 334px; border-width: 1px; border-style: solid;" /></p>
<p>He also subgrouped calls by day of the week and displayed them in a boxplot. </p>
<p style="margin-left: 40px;"><img alt="Boxplot of daily minutes used" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/714c6ead49902f9f657ae36b4a33b90b/phone_usage_4_boxplot.png" style="width: 500px; height: 332px; border-width: 1px; border-style: solid;" /></p>
<p>These analyses revealed that daily minute usage did contain some "<a href="http://blog.minitab.com/blog/understanding-statistics/control-charts-show-you-variation-that-matters">special cause variation</a>," shown in the I-MR chart. They also showed that, compared to other days of the week, Thursdays had a greater average daily minutes and variance. </p>
<p>Creating a Pareto chart of his wife's phone calls provided further insight. </p>
<p style="margin-left: 40px;"><img alt="Pareto chart of number called" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/cdf58a1aa4f7d633243d64c989a1306b/phone_usage_5_pareto.png" style="width: 500px; height: 334px; border-width: 1px; border-style: solid;" /></p>
<p>The Minitab analysis helped them see where and when most of their minutes were going. But as experienced professionals know, sometimes the numbers alone don't tell the entire story. So the family discussed the results to put those numbers in context and to see where some improvements might be possible.</p>
<p>The most commonly called number belonged to his wife's best friend, who used a different cell phone provider than the family did. This explained the Thursday calls, because every weekend his wife and her friend took turns shopping garage sales on opposite sides of town to get clothes for their children. They did their coordination on Thursday evenings.</p>
<p>Calls to her girlfriend could have been free if they just used the same provider, but the presenter's family didn't want to change, and it wasn't fair to expect the other family to change. But while a few calls to her girlfriend may have been costing a few dollars, the family was saving many more dollars on clothes for the kids. </p>
<p>Given the complete context, this was a situation where the calls were paying for themselves, so the family moved on to the next most frequently called number: the presenter's mother's land line.</p>
<p>His wife spoke very frequently with his mother to arrange childcare and other matters. His mother had a cell phone from the same provider, so calls to the cell phone should be free. Why, then, was his wife calling the land line? "Because," his wife informed him, "your mother never answers her cell phone." </p>
Addressing the Root Cause
<p>The next morning, the presenter visited his mother and eventually he steered the conversation to her cell phone. "I just love using the cell phone on weekends," his mother told him. "I use it to call my old friends during breakfast, and since it's the weekend the minutes are free!" </p>
<p>When he asked how she liked using the cell phone during the week, his mother's face darkened. "I hate using the cell phone during the week," she declared. "The phone rings all the time, but when I answer there's never anyone on the line!" </p>
<p>This seemed strange. To get some more insight, her son worked with her to create a spaghetti diagram that showed her typical movements during the weekday when her cell phone rang. That diagram, shown below, revealed two important things.</p>
<p style="margin-left: 40px;"><img alt="spaghetti diagram" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/b3955e2dfdccdfce57bc529e7ba2f1dd/phone_usage_6_spaghetti_diagram.jpg" style="width: 500px; height: 386px;" /></p>
<p>First, it showed that his mother loved watching television during the day. But second, and more important when it came to using the cell phone, his mother needed to get up from her chair, walk into the dining room, and retrieve her cell phone—which she always kept on the dining room table—in order to answer it. </p>
<p>Her cell phone automatically sent callers to voice mail after three rings. But it took his mother longer than three rings to get from her chair to the phone. What's more, since she never learned to use the voice mail ("Son, there is no answering machine connected to this phone!"), his mother almost exclusively used the cell phone to make outgoing calls. </p>
<p>Now that the real root cause underlying this major drain on the family's cell phone minutes was known, a potential solution could be devised and tested. In this case, rather than force his mother to start using voicemail, he came up with an elegant and simple alternative: </p>
<p style="margin-left: 40px;"><strong>Job Instructions for Mom:</strong></p>
<p style="margin-left: 40px;">When receiving call on weekday:</p>
<ul>
<li style="margin-left: 40px;">Go to cell phone.</li>
<li style="margin-left: 40px;">Pick up phone.</li>
<li style="margin-left: 40px;">Press green button twice.</li>
<li style="margin-left: 40px;">Wait for person who called to answer phone.</li>
</ul>
<p>After a few test calls to make sure his mother was comfortable with the new protocol, they tested the new system for a month. </p>
The Results
<p>To recap, solving this problem required four steps. First, the presenter and his wife needed to clearly define the problem. Second, they used statistical software to get insight into the problem from the available data. From there, a spaghetti chart and a set of simple job instructions provided a very viable solution to test. And the outcome? </p>
<p style="margin-left: 40px;"><img alt="Bar Chart of Phone Bills" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/262a6433521414d59edc6b86fd38efcd/phone_usage_7_bar_chart.png" style="width: 500px; height: 333px; border-width: 1px; border-style: solid;" /></p>
<p>As the bar graph shows, July's minutes were well within their plan's allotment. In that month's Pareto chart, what had been the second-largest bar dropped to near zero. His mother enjoyed her cell phone much more, and his wife was able to arrange child care with just one call. </p>
<p>And to this day, when the presenter wants to talk to his mother, he: </p>
<p style="margin-left: 40px;">1. Calls her cell phone.<br />
2. Lets it ring 3 times.<br />
3. Hangs up.<br />
4. Waits for her return call.</p>
<p>Happily, this solution turned out to be very sustainable, as the monthly minutes remained within the family's allowance and budget for quite some time...and then his daughter got a cell phone, and texting issues began.</p>
<p>Where could you apply data analysis to get more insight into the challenges you face? </p>
</div>
Control ChartsFun StatisticsQuality ImprovementWed, 10 May 2017 13:04:00 +0000http://blog.minitab.com/blog/understanding-statistics/reducing-the-phone-bill-with-statistical-analysisEston MartzFor Want of an FMEA, the Empire Fell
http://blog.minitab.com/blog/statistics-in-the-field/for-want-of-an-fmea-the-empire-fell
<p><img alt="Don't worry about it, we'll be fine without an FMEA!" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/d7ff51df-2032-4f24-b2b4-eb0eb76d0b01/Image/14c70b8de916788ea0e994086aaa793e/tie_pilot.jpg" style="border-width: 1px; border-style: solid; margin: 10px 15px; float: right; width: 300px; height: 225px;" /><em>by Matthew Barsalou, guest blogger</em></p>
<p align="center"><em>For want of a nail the shoe was lost,</em><br />
<em>For want of a shoe the horse was lost,</em><br />
<em>For want of a horse the rider was lost</em><br />
<em>For want of a rider the battle was lost</em><br />
<em>For want of a battle the kingdom was lost</em><br />
<em>And all for the want of a horseshoe nail. (Lowe, 1980, 50)</em></p>
<p>According to the old nursery rhyme, "For Want of a Nail," an entire kingdom was lost because of the lack of one nail for a horseshoe. The same could be said for the Galactic Empire in Star Wars. The Empire would not have fallen if the technicians who created the first Death Star had done a proper <a href="http://blog.minitab.com/blog/understanding-statistics/fmea-a-much-better-alternative-to-fml">Failure Mode and Effects Analysis (FMEA)</a>.</p>
<p>A group of rebels in <em>Star Wars, Episode IV: A New Hope</em> stole the plans to the Death Star and found a critical weakness that lead to the destruction of the entire station. A simple thermal exhaust port was connected to a reactor in a way which permitted an explosion in the exhaust port to start a chain reaction that blew up the entire station. This weakness was known, but considered insignificant because the weakness could only be exploited by small space fighters and the exhaust port was protected by turbolasers and TIE fighters. It was thought that nothing could penetrate the defenses; however, a group of Rebel X-Wing fighters proved that this weakness could be exploited. One proton torpedo fired into the thermal exhaust port started a chain reaction that led to the station reactors and destroyed the entire battle station (Lucas, 1976).</p>
Why the Death Star Needed an FMEA
<p>The Death Star was designed by the engineer Bevil Lemelisk under the command of Grand Moff Wilhuff Tarkin; whose doctrine called for a heavily armed mobile battle station carrying more than 1,000,000 imperial personnel as well as over 7,000 TIE fighters and 11,000 land vehicles (Smith, 1991). It was constructed in orbit around the penal planet Despayre in the Horuz system of the Outer Rim Territories and was intended to be a key element of the Tarkin Doctrine for controlling the Empire. The current estimate for the cost of building of a Death Star is $850,000,000,000,000,000 (Rayfield, 2013).</p>
<p>Such an expensive, resource-consuming project should never be attempted without a design FMEA. The loss of the Death Star could have been prevented with just one <a href="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/c9903a6d98d9f8596ded9c2b45d817aa/death_star_fmea.png" target="_blank">properly filled-out FMEA</a> during the design phase:</p>
<p style="margin-left: 40px;"><a href="//cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/d7ff51df-2032-4f24-b2b4-eb0eb76d0b01/File/1259907fab015a0c10e09c7c1709fb33/fmea_death_star_full.jpg" target="_blank"><img alt="FMEA Example" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f9d2203d062bc0a09e41ffcb1e5dcb08/death_star_fmea_partial.png" style="width: 500px; height: 510px;" /></a></p>
<p>The Galactic Empire's engineers frequently built redundancy into the systems on the Empire’s capital ships and space stations; unfortunately, the Death Star's systems were all connected to the main reactor to ensure that power would always be available for each individual system. This interconnectedness resulted in thermal exhaust ports that were directly connected to the main reactor.</p>
<p>The designers knew that an explosion in a thermal exhaust port could reach the main reactor and destroy the entire station, but they were overconfident and believed that limited prevention measures--such as turbolaser towers, shielding that could not prevent the penetration of small space fighters, and wings of TIE fighters--could protect the thermal exhaust ports (Smith, 1991). Such thinking is little different than discovering a design flaw that could lead to injury or death, but deciding to depend upon inspection to prevent anything bad from happening. Bevil Lemelisk could not have ignored this design flaw if he had created an FMEA.</p>
Assigning Risk Priority Numbers to an FMEA
<p>An FMEA can be done with a pencil and paper, although Minitab's <a href="http://www.minitab.com/products/companion" title="process improvement software tools">Companion software for executing and reporting on process improvement</a> has a built-in FMEA form that automates calculations, and shares data with process maps and other forms you'll probably need for your project. </p>
<p>An FMEA uses a Risk Priority Number (RPN) to determine when corrective actions must be taken. RPN numbers range from 1 to 1,000 and lower numbers are better. The RPN is determined by multiplying severity (S) by occurrence (O) and detection D.</p>
<p style="margin-left: 40px;">RPN = S x O x D</p>
<p>Severity, occurrence and detection are each evaluated and assigned a number between 1 and 10, with lower numbers being better.</p>
Failure Mode and Effects Analysis Example: Death Star Thermal Exhaust Ports
<p>In the case of the Death Star's thermal exhaust ports, the failure mode would be an explosion in the exhaust port and the resulting effect would be a chain reaction that reaches the reactors. The severity would be rated as 10 because an explosion of the reactors would lead to the loss of the station as well as the loss of all the personnel on board. A 10 for severity is sufficient reason to look into a redesign so that a failure, no matter how improbable, does not result in injury or loss of life.</p>
<p style="margin-left: 40px;"><img alt="FMEA Failure Mode Severity Example" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/570cea9af649e7b273aad1c9eb4c169f/death_star_fmea2.png" style="width: 485px; height: 214px;" /></p>
<p>The potential cause of failure on the Death Star would be attack or sabotage; the designers did not consider this likely to happen, so occurrence is a 3. The main control measure was shielding that would only be effective against attack by large ships. This was rated as a 4 because the Empire believed these measures to be effective.</p>
<p style="margin-left: 40px;"><img alt="Potential Causes and Current Controls" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/02f45dd8de6c4dc5e420f378f9f59cc9/death_star_fmea3.png" style="width: 247px; height: 188px;" /></p>
<p>The resulting RPN would be S x O x D = 10 x 3 x 4 = 120. An RPN of 120 should be sufficient reason to take actions, but even a lower RPN requires a corrective action due to the high rating for severity. The Death Star's RPN may even be too low due to the Empire's overconfidence in the current controls. Corrective actions are definitely needed. </p>
<p style="margin-left: 40px;"><img alt="FMEA Risk Priority Number" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/b61201bbd687b5a699a59ac53175214e/death_star_fmea3a.png" style="width: 650px; height: 179px;" /></p>
<p>Corrective actions are easier and cheaper to implement early in the design phase; particularly if the problem is detected before assembly is started. The original Death Star plans could have been modified with little effort before construction started. The shielding could have been improved to prevent any penetration and more importantly, the interlinks between the systems could have been removed so that a failure of one system, such a an explosion in the thermal exhaust port, does not destroy the entire Death Star. The RPN needs to be reevaluated after corrective actions are implemented and verified; the new Death Star RPN would be 5 x 3 x 2 = 30.</p>
<p style="margin-left: 40px;"><img alt="FMEA Revised Metrics" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/3da18764bfce8f650477c8fa5a733982/death_star_fmea4.png" style="width: 650px; height: 204px;" /></p>
<p>Of course, doing the FMEA would have had more important impacts than just achieving a low number on a piece of paper. Had this step been taken, the Empire could have continued to implement the Tarkin Doctrine, and the Universe would be a much different place today. </p>
Do You Need to Do an FMEA?
<p>A simple truth is demonstrated by the missing nail and the kingdom, as well as the lack of an FMEA and the Death Star: when designing a new product, whether it is an oil rig, a kitchen appliance, or a Death Star, you'll avoid many future problems by performing an FMEA early in the design phase.</p>
<div>
<div><strong>About the Guest Blogger: </strong></div>
<div><em>Matthew Barsalou is an engineering quality expert in BorgWarner Turbo Systems Engineering GmbH’s Global Engineering Excellence department. He has previously worked as a quality manager at an automotive component supplier and as a contract quality engineer at Ford in Germany and Belgium. He possesses a bachelor of science in industrial sciences, a master of liberal studies and a master of science in business administration and engineering from the Wilhelm Büchner Hochschule in Darmstadt, Germany.</em><em>.</em></div>
<div><em> </em></div>
</div>
<p><em><strong>Would you like to publish a guest post on the Minitab Blog? Contact <a href="mailto:publicrelations@minitab.com?subject=I%20Would%20Like%20to%20Be%20a%20Guest%20Blogger">publicrelations@minitab.com</a>. </strong></em></p>
<p> </p>
<p><strong>References</strong></p>
<p>Lucas, George. <em>Star Wars, Episode IV: A New Hope</em>. New York: Del Rey, 1976. <a href="http://www.amazon.com/Star-Wars-Episode-IV-Hope/dp/0345341465/ref=sr_1_2?ie=UTF8&qid=1358180992&sr=8-2&keywords=Star+Wars%2C+Episode+IV%3A+A+New+Hope" target="_blank">http://www.amazon.com/Star-Wars-Episode-IV-Hope/dp/0345341465/ref=sr_1_2?ie=UTF8&qid=1358180992&sr=8-2&keywords=Star+Wars%2C+Episode+IV%3A+A+New+Hope</a></p>
<p> Opie, Iona and Opie, Peter. ed. <em>Oxford Dictionary of Nursery Rhymes</em>. Oxford, 1951, 324. Quoted in Lowe, E.J. “For Want of a Nail.” <em>Analysis</em> 40 (January 1980), 50-52. <a href="http://www.jstor.org/stable/3327327">http://www.jstor.org/stable/3327327</a></p>
<p>Rayfield, Jillian. “White House Rejects 'Death Star' Petition.” <em>Salon,</em> January 13, 2013. Accessed 1anuary 14, 2013 from <a href="http://www.salon.com/2013/01/13/white_house_rejects_death_star_petition/" target="_blank">http://www.salon.com/2013/01/13/white_house_rejects_death_star_petition/</a></p>
<p>Smith, Bill. ed. <em>Star Wars: Death Star Technical Companion.</em> Honesdale, PA: West End Games, 1991. <a href="http://www.amazon.com/Star-Wars-Death-Technical-Companion/dp/0874311209/ref=sr_1_1?s=books&ie=UTF8&qid=1358181033&sr=1-1&keywords=Star+Wars%3A+Death+Star+Technical+Companion" target="_blank">http://www.amazon.com/Star-Wars-Death-Technical-Companion/dp/0874311209/ref=sr_1_1?s=books&ie=UTF8&qid=1358181033&sr=1-1&keywords=Star+Wars%3A+Death+Star+Technical+Companion</a>.</p>
Project ToolsQuality ImprovementThu, 04 May 2017 12:00:00 +0000http://blog.minitab.com/blog/statistics-in-the-field/for-want-of-an-fmea-the-empire-fellGuest BloggerUnderstanding Qualitative, Quantitative, Attribute, Discrete, and Continuous Data Types
http://blog.minitab.com/blog/understanding-statistics/understanding-qualitative-quantitative-attribute-discrete-and-continuous-data-types
<p>"Data! Data! Data! I can't make bricks without clay."<br />
— Sherlock Holmes, in Arthur Conan Doyle's <em>The Adventure of the Copper Beeches</em></p>
<p>Whether you're the world's greatest detective trying to crack a case or a person trying to solve a problem at work, you're going to need information. Facts. <em>Data</em>, as Sherlock Holmes says. </p>
<p><img alt="jujubes" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/96d7c87addccc11b6072d6dfa38d0039/jujubes.jpg" style="line-height: 20.7999992370605px; margin: 10px 15px; float: right; width: 200px; height: 200px;" /></p>
<p>But not all data is created equal, especially if you plan to analyze as part of a quality improvement project.</p>
<p>If you're using Minitab Statistical Software, you can access the Assistant to <a href="http://www.minitab.com/products/minitab/assistant">guide you through your analysis step-by-step</a>, and help identify the type of data you have.</p>
<p>But it's still important to have at least a basic understanding of the different types of data, and the kinds of questions you can use them to answer. </p>
<p>In this post, I'll provide a basic overview of the types of data you're likely to encounter, and we'll use a box of my favorite candy—<a href="http://en.wikipedia.org/wiki/Jujube_(confectionery)" target="_blank">Jujubes</a>—to illustrate how we can gather these different kinds of data, and what types of analysis we might use it for. </p>
The Two Main Flavors of Data: Qualitative and Quantitative
<p>At the highest level, two kinds of data exist: <em><strong>quantitative</strong></em> and <em><strong>qualitative</strong></em>.</p>
<p><strong><em>Quantitative</em> </strong>data deals with numbers and things you can measure objectively: dimensions such as height, width, and length. Temperature and humidity. Prices. Area and volume.</p>
<p><strong><em>Qualitative </em></strong>data deals with characteristics and descriptors that can't be easily measured, but can be observed subjectively—such as smells, tastes, textures, attractiveness, and color. </p>
<p>Broadly speaking, when you measure something and give it a number value, you create quantitative data. When you classify or judge something, you create qualitative data. So far, so good. But this is just the highest level of data: there are also different types of quantitative and qualitative data.</p>
Quantitative Flavors: Continuous Data and Discrete Data
<p>There are two types of quantitative data, which is also referred to as numeric data: <em><strong>continuous </strong></em>and <em><strong>discrete</strong>. </em><span style="line-height: 20.7999992370605px;">As a general rule, </span><em style="line-height: 20.7999992370605px;">counts </em><span style="line-height: 20.7999992370605px;">are discrete and </span><em style="line-height: 20.7999992370605px;">measurements </em><span style="line-height: 20.7999992370605px;">are continuous.</span></p>
<p><strong><em>Discrete </em></strong>data is a count that can't be made more precise. Typically it involves integers. For instance, the number of children (or adults, or pets) in your family is discrete data, because you are counting whole, indivisible entities: you can't have 2.5 kids, or 1.3 pets.</p>
<p><strong><em>Continuous</em> </strong>data, on the other hand, could be divided and reduced to finer and finer levels. For example, you can measure the height of your kids at progressively more precise scales—meters, centimeters, millimeters, and beyond—so height is continuous data.</p>
<p>If I tally<span style="line-height: 1.6;"> the number of individual Jujubes in a box, that number is a piece of discrete data. </span></p>
<p style="margin-left: 40px;"><img alt="a count of jujubes is discrete data" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f5e3c44269356903cf156c065b10746a/jujubes_count_tally.jpg" style="width: 200px; height: 200px;" /></p>
<p><span style="line-height: 1.6;">If I use a scale to measure the weight of each Jujube, or the weight of the entire box, that's continuous data. </span></p>
<p style="margin-left: 40px;"><span style="line-height: 1.6;"><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/d11051162c9e2375e531ac589fd5a20e/jujube_weight_continuous_data.jpg" style="width: 200px; height: 200px;" /></span></p>
<p>Continuous data can be used in many different kinds of <a href="http://blog.minitab.com/blog/understanding-statistics/what-statistical-hypothesis-test-should-i-use">hypothesis tests</a>. For example, to assess the accuracy of the weight printed on the Jujubes box, we could measure 30 boxes and perform a 1-sample t-test. </p>
<p>Some analyses use continuous and discrete quantitative data at the same time. For instance, we could perform a <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-tutorial-and-examples">regression analysis</a> to see if the weight of Jujube boxes (continuous data) is correlated with the number of Jujubes inside (discrete data). </p>
Qualitative Flavors: Binomial Data, Nominal Data, and Ordinal Data
<p>When you classify or categorize something, you create <em>Qualitative</em> or attribute<em> </em>data. There are three main kinds of qualitative data.</p>
<p><em><strong>Binary </strong></em>data place things in one of two mutually exclusive categories: right/wrong, true/false, or accept/reject. </p>
<p>Occasionally, I'll get a box of Jujubes that contains a couple of individual pieces that are either too hard or too dry. If I went through the box and classified each piece as "Good" or "Bad," that would be binary data. I could use this kind of data to develop a statistical model to predict how frequently I can expect to get a bad Jujube.</p>
<p>When collecting <em><strong>unordered </strong></em>or <em><strong>nominal </strong></em>data, we assign individual items to named categories that do not have an implicit or natural value or rank. If I went through a box of Jujubes and recorded the color of each in my worksheet, that would be nominal data. </p>
<p style="margin-left: 40px;"><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ce64d648ac395d5c8098985caabc754f/jujubes_sorted_nominal_data.jpg" style="width: 200px; height: 97px;" /></p>
<p>This kind of data can be used in many different ways—for instance, I could use <a href="http://blog.minitab.com/blog/understanding-statistics/chi-square-analysis-of-halloween-and-friday-the-13th-is-there-a-slasher-movie-gender-gap">chi-square analysis</a> to see if there are statistically significant differences in the amounts of each color in a box. </p>
<p>We also can have <strong><em>ordered </em></strong>or <em><strong>ordinal </strong></em>data, in which items are assigned to categories that do have some kind of implicit or natural order, such as "Short, Medium, or Tall." <span style="line-height: 1.6;">Another example is a survey question that asks us to rate an item on a 1 to 10 scale, with 10 being the best. This implies that 10 is better than 9, which is better than 8, and so on. </span></p>
<p>The uses for ordered data is a matter of some debate among statisticians. Everyone agrees its appropriate for creating bar charts, but beyond that the answer to the question "What should I do with my ordinal data?" is "It depends." Here's a post from another blog that offers an excellent summary of the <a href="http://learnandteachstatistics.wordpress.com/2013/07/08/ordinal/" target="_blank">considerations involved</a>. </p>
Additional Resources about Data and Distributions
<p>For more fun statistics you can do with candy, check out this article (PDF format): <a href="http://www.minitab.com/uploadedFiles/Content/Academic/sweetening_statistics.pdf">Statistical Concepts: What M&M's Can Teach Us.</a> </p>
<p>For a deeper exploration of the probability distributions that apply to different types of data, check out my colleague Jim Frost's posts about <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-and-using-discrete-distributions">understanding and using discrete distributions</a> and <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-identify-the-distribution-of-your-data-using-minitab">how to identify the distribution of your data</a>.</p>
Data AnalysisStatsFri, 28 Apr 2017 13:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/understanding-qualitative-quantitative-attribute-discrete-and-continuous-data-typesEston MartzUnderstanding Monte Carlo Simulation with an Example
http://blog.minitab.com/blog/adventures-in-statistics-2/understanding-monte-carlo-simulation-with-an-example
<p>As someone who has collected and analyzed real data for a living, the idea of using simulated data for a Monte Carlo simulation sounds a bit odd. How can you improve a real product with simulated data? In this post, I’ll help you understand the methods behind Monte Carlo simulation and walk you through a simulation example using Companion by Minitab.</p>
<p><img alt="Process capability chart" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/8b31c0befc7c93d3b4ceeea2bc8479e8/main_image.png" style="line-height: 20.7999992370605px; float: right; width: 300px; height: 241px; margin: 10px 15px;" /></p>
<p>Companion by Minitab is a software platform that combines a desktop app for executing quality projects with a web dashboard that makes reporting on your entire quality initiative literally effortless. Among the first-in-class tools in the desktop app is a Monte Carlo simulation tool that makes this method extremely accessible. </p>
What Is Monte Carlo Simulation?
<p>The Monte Carlo method uses repeated random sampling to generate simulated data to use with a mathematical model. This model often comes from a statistical analysis, such as a <a href="http://support.minitab.com/en-us/minitab/17/topic-library/modeling-statistics/doe/basics/what-is-a-designed-experiment/">designed experiment</a> or a <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-tutorial-and-examples">regression analysis</a>.</p>
<p>Suppose you study a process and use statistics to model it like this:</p>
<p style="margin-left: 40px;"><img alt="Regression equation for the process" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/174c81a027515c63241c34903d579ee6/regression_equation.png" style="width: 576px; height: 83px;" /></p>
<p>With this type of linear model, you can enter the process input values into the equation and predict the process output. However, in the real world, the input values won’t be a single value thanks to variability. Unfortunately, this input variability causes variability and defects in the output.</p>
<p>To design a better process, you could collect a mountain of data in order to determine how input variability relates to output variability under a variety of conditions. However, if you understand the typical distribution of the input values and you have an equation that models the process, you can easily generate a vast amount of simulated input values and enter them into the process equation to produce a simulated distribution of the process outputs.</p>
<p>You can also easily change these input distributions to answer "what if" types of questions. That's what Monte Carlo simulation is all about. In the example we are about to work through, we'll change both the mean and standard deviation of the simulated data to improve the quality of a product.</p>
<p>Today, simulated data is routinely used in situations where resources are limited or gathering real data would be too expensive or impractical.</p>
How Can Monte Carlo Simulation Help You?
<p>With Companion by Minitab, engineers can easily perform a Monte Carlo analysis in order to:</p>
<ul>
<li>Simulate product results while accounting for the variability in the inputs</li>
<li>Optimize process settings</li>
<li>Identify critical-to-quality factors</li>
<li>Find a solution to reduce defects</li>
</ul>
<p>Along the way, Companion interprets simulation results and provides step-by-step guidance to help you find the best possible solution for reducing defects. I'll show you how to accomplish all of this right now!</p>
Step-by-Step Example of Monte Carlo Simulation
<p>A materials engineer for a building products manufacturer is developing a new insulation product. The engineer performed an experiment and used statistics to analyze process factors that could impact the insulating effectiveness of the product. (The data for this DOE is just one of the many data set examples that can be found in <a href="http://support.minitab.com/en-us/datasets/">Minitab’s Data Set Library</a>.) For this Monte Carlo simulation example, we’ll use the regression equation shown above, which describes the statistically significant factors involved in the process.</p>
<p>Let's open Companion by Minitab's desktop app (if you don't already have it, you can <a href="http://www.minitab.com/products/companion/try-it-free/">try Companion free</a> for 30 days). Open or start a new a project, then right-click on the project Roadmap™ to insert the Monte Carlo Simulation tool.</p>
<p style="margin-left: 40px;"><img alt="insert monte carlo simulation" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/5c535f39df60b03ecfedee627de78c42/companion_insert_monte_carlo.png" style="width: 350px; height: 299px;" /></p>
<p><strong>Step 1: Define the Process Inputs and Outputs</strong></p>
<p>The first thing we need to do is to define the inputs and the distribution of their values. The process inputs are listed in the regression output and the engineer is familiar with the typical mean and standard deviation of each variable. For the output, we simply copy and paste the regression equation that describes the process from <a href="http://www.minitab.com/products/minitab/features/">Minitab statistical software</a> right into Companion's Monte Carlo tool!</p>
<p>When the Monte Carlo tool opens, we are presented with these entry fields:</p>
<p style="margin-left: 40px;"><img alt="Setup the process inputs and outputs" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a9d206b562248523a19361c6ed8d68ac/monte_carlo_dialog_1.png" style="width: 700px; height: 233px; border-width: 0px; border-style: solid;" /></p>
<p>It's an easy matter to enter the information about the inputs and outputs for the process as shown.</p>
<p style="margin-left: 40px;"><img alt="Setup the input values and the output equation" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a86d77de3e6e79ef9483dca610ea2af7/monte_carlo_dialog_2.png" style="width: 800px; height: 510px;" /></p>
<p>Verify your model with the above diagram and then click <strong>Simulate</strong> in the application ribbon.</p>
<p style="margin-left: 40px;"><img alt="perform the monte carlo simulation" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/cecd73581290afdf570f6b4432f92033/monte_carlo_dialog_3.png" style="width: 473px; height: 277px;" /></p>
<p><em><strong>Initial Simulation Results</strong></em></p>
<p>After you click <strong>Simulate</strong>, Companion very quickly runs 50,000 simulations by default, though you can specify a higher or lower number of simulations. </p>
<p style="margin-left: 40px;"><img alt="Initial simulation results" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/5800795d00dbdb15ca6fb00cbcc85582/monte_carlo_output1.png" style="width: 750px; height: 375px; border-width: 0px; border-style: solid;" /></p>
<p>Companion interprets the results for you using output that is typical for <a href="http://support.minitab.com/en-us/minitab/17/topic-library/quality-tools/capability-analyses/basics/uses-of-capability-analysis/" target="_blank">capability analysis</a>—a capability histogram, percentage of defects, and the Ppk statistic. Companion correctly points out that our Ppk is below the generally accepted minimum value of Ppk.</p>
<p><em><strong>Step-by-Step Guidance for the Monte Carlo Simulation</strong></em></p>
<p>But Companion doesn’t just run the simulation and then let you figure what to do next. Instead, Companion has determined that our process is not satisfactory and presents you with a smart sequence of steps to improve the process capability.</p>
<p>How is it smart? Companion knows that it is generally <a href="http://blog.minitab.com/blog/adventures-in-statistics/quality-improvement-controlling-variability-more-difficult-than-the-mean">easier to control the mean than the variability</a>. Therefore, the next step that Companion presents is <strong>Parameter Optimization</strong>, which finds the mean settings that minimize the number of defects while still accounting for input variability.</p>
<p style="margin-left: 40px;"><img alt="Next steps leading to parameter optimization" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f7a9208c5d0bd683b879fe5e47b86ba5/monte_carlo_parameter_optimization.png" style="width: 750px; height: 78px;" /></p>
<p><strong>Step 2: Define the Objective and Search Range for Parameter Optimization</strong></p>
<p>At this stage, we want Companion to find an optimal combination of mean input settings to minimize defects. After you click <strong>Parameter Optimization</strong>, you'll need to specify your goal and use your process knowledge to define a reasonable search range for the input variables.</p>
<p style="margin-left: 40px;"><img alt="Setup for parameter optimization" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ed51e72e05c366214832195727bfe1b9/monte_carlo_parameter_optimization_dialog.png" style="width: 750px; height: 478px;" /></p>
<p>And, here are the simulation results!</p>
<p style="margin-left: 40px;"><img alt="Results of the parameter optimization" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/509bc51406b7f0a9187214c3231daf69/monte_carlo_parameter_optimization_results_1.png" style="width: 750px; height: 376px; border-width: 1px; border-style: solid;" /></p>
<p>At a glance, we can tell that the percentage of defects is way down. We can also see the optimal input settings in the table. However, our Ppk statistic is still below the generally accepted minimum value. Fortunately, Companion has a recommended next step to further improve the capability of our process.</p>
<p style="margin-left: 40px;"><img alt="Next steps leading to a sensitivity analysis" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/fd4a5321a6f3e59e33111414a2cca9f6/monte_carlo_parameter_optimization_next.png" style="width: 750px; height: 106px; border-width: 0px; border-style: solid;" /></p>
<p><strong>Step 3: Control the Variability to Perform a Sensitivity Analysis</strong></p>
<p>So far, we've improved the process by optimizing the mean input settings. That reduced defects greatly, but we still have more to do in the Monte Carlo simulation. Now, we need to reduce the variability in the process inputs in order to further reduce defects.</p>
<p>Reducing variability is typically more difficult. Consequently, you don't want to waste resources controlling the standard deviation for inputs that won't reduce the number defects. Fortunately, Companion includes an innovative graph that helps you identify the inputs where controlling the variability will produce the largest reductions in defects.</p>
<p style="margin-left: 40px;"><img alt="Setup for the sensitivity analysis" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/1ac0aad99b18d9906744787cc88aa017/monte_carlo_sensitivity_dialog_1.png" style="width: 750px; height: 569px;" /></p>
<p>In this graph, look for inputs with sloped lines because reducing these standard deviations can reduce the variability in the output. Conversely, you can ease tolerances for inputs with a flat line because they don't affect the variability in the output.</p>
<p>In our graph, the slopes are fairly equal. Consequently, we'll try reducing the standard deviations of several inputs. You'll need to use process knowledge in order to identify realistic reductions. To change a setting, you can either click the points on the lines, or use the pull-down menu in the table.</p>
<p><strong>Final Monte Carlo Simulation Results</strong></p>
<p style="margin-left: 40px;"><img alt="Results of the sensitivity analysis" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/99ea718e1b2a5b1248e79a5faae25e98/monte_carlo_sensitivity_output.png" style="width: 750px; height: 632px; border-width: 0px; border-style: solid;" /></p>
<p>Success! We've reduced the number of defects in our process and our Ppk statistic is 1.34, which is above the benchmark value. The assumptions table shows us the new settings and standard deviations for the process inputs that we should try. If we ran <strong>Parameter Optimization</strong> again, it would center the process and I'm sure we'd have even fewer defects.</p>
<p>To improve our process, Companion guided us on a smart sequence of steps during our Monte Carlo simulation:</p>
<ol>
<li>Simulate the original process</li>
<li>Optimize the mean settings</li>
<li>Strategically reduce the variability</li>
</ol>
<p>If you want to try Monte Carlo simulation for yourself, get <a href="http://www.minitab.com/products/companion/try-it-free/">the free trial of Companion by Minitab</a>!</p>
Monte CarloMonte Carlo SimulationProject ToolsQuality ImprovementStatisticsStatistics HelpTue, 25 Apr 2017 12:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics-2/understanding-monte-carlo-simulation-with-an-exampleJim FrostWhat Do Ventilated Shelf Installation and Measurement Systems Analysis Have in Common?
http://blog.minitab.com/blog/quality-business/what-do-ventilated-shelf-installation-and-measurement-systems-analysis-have-in-common
<p><img alt="Ventilated Shelf" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/3c89dfbf7dc1971b6031bf669ac45625/ventilated_shelf.jpg" style="margin: 10px 15px; float: left; width: 148px; height: 113px;" />Have you ever tried to install ventilated shelving in a closet? You know: the heavy-duty, white- or gray-colored vinyl-coated wire shelving? The one that allows you to get organized, more efficient with space, and is strong and maintenance-free? Yep, that’s the one. Did I mention this stuff is strong? As in, <em>really </em>hard to cut? </p>
<p>It seems like a simple 4-step project. Measure the closet, go the store, buy the shelving, and install when you get home. Simple, right? Yeah, it sounded good in my head!</p>
<p>The lessons I learned in this project underscore the value of doing measurement system analysis in your quality improvement projects, with <a href="http://www.minitab.com/products/minitab/">statistical software such as Minitab</a>. Whatever you're trying to accomplish, if you don't get reliable measurements or data, the task is going to become more challenging.</p>
<p align="center"><img alt="Before Process Map" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/56432bf83b2d106b61f39f1ab76a8495/before_process_map.png" style="width: 600px; height: 145px; margin: 10px 15px;" /></p>
<p>Well it turned out to be more complicated and involved a lot of rework. Did I mention that this shelving is made of heavy gauge steel that is nearly impossible to cut with ordinary tools? So, my simple 4-step process turned into a 7-step process with lots of rework (multiple trips to the store to have the shelves re-cut).</p>
<p>My actual process looked more like this!</p>
<p><img alt="After Process Map" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/060c401c2db87e8176130b7e464441fb/after_process_map.png" style="width: 750px; height: 230px; margin: 10px 15px;" /></p>
<p>All the sources of variation from Measurement Systems Analysis (MSA) apply here: Repeatability, Reproducibility, Bias, Linearity, and Stability. Let’s review these terms and see how I could have done better at measuring the closet, the first time.</p>
<p align="center"><img alt="Components of Measurement Error" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/1a474c8c-3979-4eba-b70c-1e5a3f1d6601/Image/23550db0dc88ec22e9b5409ba6d6a592/components_of_measurement_error.png" style="width: 750px; height: 456px; margin: 10px 15px;" /></p>
<p>When it was time to measure the closet, I had a few measuring-device choices hanging around my garage: a yardstick, a cloth tape measure, and a steel tape measure. </p>
<p><strong>Bias</strong> examines the difference between the observed average measurement and a reference or master value. It answers the question: "How accurate is my gage when compared to a reference value?" Unless there is visible damage, all three of these measuring devices should be acceptable for my shelf project.</p>
<p><strong>Stability</strong> is the change in bias over time. Measurement stability represents the total variation in measurements obtained on the same part measured over time, also known as drift. It is important to assess stability on an ongoing basis. While calibrations and <span><a href="http://blog.minitab.com/blog/meredith-griffith/fundamentals-of-gage-rr">gage studies</a></span> provide some information about changes in the measurement system, neither provides information on what is happening to the measurement process over time. But unless there is visible damage, all three of these measuring devices should be acceptable for use.</p>
<p><strong>Linearity</strong> examines how accurate your measurements are through the expected range of the measurements. It answers the question: "Does my gauge have the same accuracy across all reference values?" If you use the yardstick or steel tape measure, then the answer might be “yes” because of its solid construction. But the cloth tape measure could stretch when extended, making it less reliable at longer lengths. Examine the cloth measuring tape for evidence of stretching or wear. If damage is present, do not use the measuring device.</p>
<p><strong>Repeatability</strong> represents the variation that occurs when the same appraiser measures the same part with the same device. This is best represented with the advice “Measure twice, cut once!” In my case, if I had measured the closet width multiple times, I would have realized I was getting a different answer each time and therefore needed to take better care when measuring. Then I could have gotten more accurate measurements for each shelf. </p>
<p><strong>Reproducibility</strong> represents the variation that occurs when different appraisers measure the same part with the same device. In my case, if I'd asked my son to measure the same locations that I just measured, I would have discovered that we got different answers: I should have accounted for the mounting brackets in my measurements. (The fact that he <em>did </em>is why he’s in school to become a Mechanical Engineer.)</p>
<p>In summary, my afternoon shelf installation project ended up taking two days to complete, resulting in multiple trips to the store, a lot of frustration for me, and late dinners for my family because I was too busy to cook! </p>
<p>My lessons learned from this project are:</p>
<ol>
<li>Don’t assume your closet walls are exactly parallel at the top, middle and bottom of the closet. Instead, measure at each location where a shelf is to be installed. Remember the Rule of Thumb for Gage R&R: take measurements representing the entire range of process variation.</li>
<li>Apply the Gage R&R sources of measurement error when measuring:
<ol style="list-style-type:lower-alpha;">
<li>Visually inspect the measuring device before using to verify it is in good condition.</li>
<li>Measure twice, cut once. (Repeatability)</li>
<li>Ask my family for assistance in measuring. (Reproducibility)</li>
</ol>
</li>
<li>Did you know that you can purchase a laser measure for about $30 these days? If only I had known…</li>
<li>Consider hiring a professional because this project was harder than it originally seemed.</li>
</ol>
Quality ImprovementMon, 17 Apr 2017 15:03:00 +0000http://blog.minitab.com/blog/quality-business/what-do-ventilated-shelf-installation-and-measurement-systems-analysis-have-in-commonBonnie K. StoneR-Squared: Sometimes, a Square is just a Square
http://blog.minitab.com/blog/statistics-and-quality-data-analysis/r-squared-sometimes-a-square-is-just-a-square
<p><img alt="rsquare" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/85ce10f546bd53d18ba69912862812ac/rsquarebest.jpg" style="width: 250px; float: right; height: 247px; margin: 10px 15px;" />If you regularly perform regression analysis, you know that R2 is a statistic used to evaluate the fit of your model. You may even know the standard definition of R2: <em>the percentage of variation in the response that is explained by the model. </em></p>
<p>Fair enough. With <a href="http://www.minitab.com/en-us/products/minitab/" target="_blank">Minitab Statistical Software</a> doing all the heavy lifting to calculate your R2 values, that may be all you ever need to know.</p>
<p>But if you’re like me, you like to crack things open to see what’s inside. Understanding the essential nature of a statistic helps you demystify it and interpret it more accurately.</p>
R-squared: Where Geometry Meets Statistics
<p>So where <em>does </em> this mysterious R-squared value come from? To find the formula in Minitab, choose<strong> Help > Methods and Formulas</strong>. Click<strong> General statistics > Regression > Regression > R-sq</strong>.</p>
<p><img alt="rsqare" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/25c1b2591db7f6b7cc34f2405fb07a1e/rsquare_no_annotation.jpg" style="width: 342px; height: 104px" /></p>
<p>Some spooky, wacky-looking symbols in there. Statisticians use those to make your knees knock together.</p>
<p>But all the formula really says is: “R-squared is a bunch of squares added together, divided by another bunch of squares added together, subtracted from 1.“</p>
<p><img alt="rsquare annotation" height="113" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/d67d32e7e7b6ee83b244f5a680ea394a/rsquare_annotation_w640.jpeg" width="506" /></p>
<p><em>What</em> bunch of squares, you ask?</p>
<p><img alt="square dance guys" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/017a00856d05924f11812e2e1e26ea41/square_dance_3.jpg" style="width: 460px; height: 299px" /></p>
<p>No, not them.</p>
SS Total: Total Sum of Squares
<p>First consider the "bunch of squares" on the bottom of the fraction. Suppose your data is shown on the scatterplot below:</p>
<p><img alt="original data" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/a60bf6e6a3f1ebb09859d68536438410/scatterplot_of_y_vs_x.jpg" style="width: 576px; height: 384px" /></p>
<p>(Only 4 data values are shown to keep the example simple. Hopefully you have more data than this for your actual regression analysis! )</p>
<p>Now suppose you add a line to show the mean (average) of all your data points:</p>
<p><img alt="scatterplot with line" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/9b87c2062abf94036e60d67a2ea7a4ab/scatterplot_of_y_vs_x_with_line.jpg" style="width: 576px; height: 384px" /></p>
<p>The line y = mean of Y is sometimes referred to the “trivial model” because it doesn’t contain any predictor (X) variables, just a constant. How well would this line model your data points?</p>
<p>One way to quantify this is to measure the vertical distance from the line to each data point. That tells you how much the line “misses” each data point. This distance can be used to construct the sides of a square on each data point.</p>
<p><img alt="pinksquares" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/624ab41f8f2cfbaee30fa0e96d619f69/scatterplot_of_y_vs_x_pink.jpg" style="width: 576px; height: 384px" /></p>
<p>If you add up the pink areas of all those squares for all your data points you get the total sum of squares (SS Total), the bottom of the fraction.</p>
<p><img alt="SS Total" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/babb38258faeb3eeffcf297614c3a2db/r2_formula_ss_total.jpg" style="width: 556px; height: 149px" /></p>
SS Error: Error Sum of Squares
<p>Now consider the model you obtain using regression analysis.</p>
<p><img alt="regression model" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/11d77cb1244bea8639363e59dfc9dd25/scatterplot_of_y_vs_x_regression.jpg" style="width: 576px; height: 384px" /></p>
<p>Again, quantify the "errors" of this model by measuring the vertical distance of each data value from the regression line and squaring it.</p>
<p><img alt="ss error graph" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/44bb7b18d43254e721ec367b07eae546/scatterplot_of_y_vs_x_ss_error.jpg" style="width: 576px; height: 384px" /></p>
<p>If you add the green areas of theses squares you get the SS Error, the top of the fraction.</p>
<p><img alt="ss error formula" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/b2e227513ca63f20339b7fcd36985cc2/r2_formula_ss_error_w640.jpeg" style="width: 656px; height: 148px" /></p>
<p>So R2 basically just compares the errors of your regression model to the errors you’d have if you just used the mean of Y to model your data.</p>
R-Squared for Visual Thinkers
<p> </p>
<p><img alt="rsquare final" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/69a3a2540f6601c26d025fe495d175d5/rsquare_final_w640.jpeg" style="width: 640px; height: 313px" /></p>
<p>The smaller the errors in your regression model (the green squares) in relation to the errors in the model based on only the mean (pink squares), the closer the fraction is to 0, and the closer R2 is to 1 (100%).</p>
<p>That’s the case shown here. The green squares are much smaller than the pink squares. So the R2 for the regression line is 91.4%.</p>
<p>But if the errors in your reqression model are about the same size as the errors in the trivial model that uses only the mean, the areas of the pink squares and the green squares will be similar, making the fraction close to 1, and the R2 close to 0. </p>
<p>That means that your model, isn't producing a "tight fit" for your data, generally speaking. You’re getting about the same size errors you’d get if you simply used the mean to describe all your data points! </p>
R-squared in Practice
<p>Now you know exactly what R2 is. People have different opinions about <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-high-should-r-squared-be-in-regression-analysis" target="_blank">how critical the R-squared value is in regression analysis</a>. My view? No single statistic ever tells the whole story about your data. But that doesn't invalidate the statistic. It's always a good idea to evaluate your data using a variety of statistics. Then interpret the composite results based on the context and objectives of your specific application. If you understand how a statistic is actually calculated, you'll better understand its strengths and limitations.</p>
Related link
<p>Want to see how another commonly used analysis, the t-test, really works? Read <a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/what-is-a-t-test-and-why-is-it-like-telling-a-kid-to-clean-up-that-mess-in-the-kitchen" target="_blank">this post</a> to learn how the t-test measures the "signal" to the "noise" in your data.</p>
Regression AnalysisThu, 13 Apr 2017 13:06:00 +0000http://blog.minitab.com/blog/statistics-and-quality-data-analysis/r-squared-sometimes-a-square-is-just-a-squarePatrick RunkelWhy Is Continuous Data "Better" than Categorical or Discrete Data?
http://blog.minitab.com/blog/understanding-statistics/why-is-continuous-data-better-than-categorical-or-discrete-data
<p>Earlier, I wrote about the <a href="http://blog.minitab.com/blog/understanding-statistics/understanding-qualitative-quantitative-attribute-discrete-and-continuous-data-types">different types of data</a> statisticians typically encounter. In this post, we're going to look at why, when given a choice in the matter, we prefer to analyze continuous data rather than categorical/attribute or discrete data. </p>
<p>As a reminder, when we assign something to a group or give it a name, we have created <strong>attribute </strong>or <strong>categorical </strong>data. If we count something, like defects, we have gathered <strong>discrete </strong>data. And if we can measure something to a (theoretically) infinite degree, we have <strong>continuous </strong>data.</p>
<p>Or, to put in bullet points: </p>
<ul>
<li><strong>Categorical </strong>= naming or grouping data</li>
<li><strong>Discrete </strong>= count data</li>
<li><strong>Continuous</strong> = measurement data</li>
</ul>
<p>A <a href="http://www.minitab.com/products/minitab" style="font-size: 13px; line-height: 18.9090900421143px;">statistical software package</a><span style="font-size: 13px; line-height: 18.9090900421143px;"> like Minitab is extremely powerful and can tell us many valuable things</span><span style="font-size: 13px; line-height: 18.9090900421143px;">—as long as we're able to feed it good numbers. Without numbers, we have no analyses nor graphs. Even categorical or</span><span style="font-size: 13px; line-height: 18.9090900421143px;"> attribute data needs to be converted into numeric form by counting before we can analyze it. </span></p>
What Makes Numeric Data Discrete or Continuous?
<p>At this point, you may be thinking, "Wait a minute—we can't <em>really </em>measure <em>anything </em>infinitely,so isn't measurement data actually discrete, too?" That's a fair question. </p>
<p>If you're a strict literalist, the answer is "yes"—when we measure a property that's continuous, like height or distance, we are <i>de facto </i>making a discrete assessment. When we collect a lot of those discrete measurements, it's the amount of detail they contain that will dictate whether we can treat the collection as discrete or continuous.</p>
<p>I like to think of it as a question of scale. Say <span style="line-height: 1.6;">I want to measure the weight of 16-ounce cereal boxes coming off a production line, and I want to be sure that the weight of each box is at least 16 ounces, but no more than 1/2 ounce over that. </span></p>
<p><span style="line-height: 1.6;">With a scale calibrated to whole pounds, all I can do is put every box into one of three categories: less than a pound, 1 pound, or more than a pound. </span></p>
<p>With a scale that can distinguish ounces, I will be able to measure with a bit more accuracy just how close to a pound the individual boxes are. I'm getting nearer to continuous data, but there are still only 16 degrees between each pound. </p>
<p>But if I measure with a scale capable of distinguishing 1/1000th of an ounce, I will have quite a wide scale—a <em>continuum</em>—of potential values between pounds. The individual boxes could have any value between 0.000 and 1.999 pounds. The scale of these measurements is fine enough to be analyzed with powerful statistical tools made for continuous data. </p>
What Can I Do with Continuous Data that I Can't Do with Discrete?
<p>Not all data points are equally valuable, and you can glean a lot more insight from 100 points of continuous data than you can from 100 points of attribute or count data. <span style="line-height: 18.9090900421143px;">How does this finer degree of detail affect what we can learn from a set of data?</span><span style="line-height: 18.9090900421143px;"> It's easy to see. </span></p>
<p>Let's start with the simplest kind of data, attribute data that rates a the weight of a cereal box as good or bad. For 100 boxes of cereal, any that are under 1 pound are classified as bad, so each box can have one of only two values.</p>
<p>We can create a bar chart or a pie chart to visualize this data, and that's about it:</p>
<p><img alt="Attribute Data Bar Chart" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/9a3aaad00a1a5858433f17bfd121f465/attribute_data_bar_chart.png" style="width: 576px; height: 384px;" /></p>
<p>If we bump up the precision of our scale to differentiate between boxes that are over and under 1 pound, we can put each box of cereal into one of three categories. Here's what that looks like in a pie chart:</p>
<p><img alt="pie chart of count data" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ae87a08eae95accccbc82b97fe3f0ced/pie_chart_of_count_data.png" style="width: 576px; height: 384px;" /></p>
<p>This gives us a little bit more insight—we now see that we are overfilling more boxes than we are underfilling—but there is still a very limited amount of information we can extract from the data. </p>
<p>If we measure each box to the nearest ounce, we open the door to using methods for continuous data, and get a still better picture of what's going on. We can see that, on average, the boxes weigh 1 pound. But there's high variability, with a standard deviation of 0.9. There's also a wide range in our data, with observed values from 12 to 20 ounces: </p>
<p><img alt="graphical summary of ounce data" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/26b4e51027b7afa154d0e6e3f14ab8e9/summary_statistics_for_ounces.png" style="width: 575px; height: 431px;" /></p>
<p>If I measure the boxes with a scale capable of differentiating thousandths of an ounce, more options for analysis open up. For example, now that the data are fine enough to distinguish half-ounces (and then some), I can perform a capability analysis to see if my process is even capable of consistently delivering boxes that fall between 16 and 16.5 ounces. I'll use the Assistant in Minitab to do it, selecting <strong>Assistant > Capability Analysis</strong>: </p>
<p><img alt="capability analysis for thousandths" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/0b0a37d1515c25b2e1d8d633b09da447/capability_analysis_for_thousandths___summary_report.png" style="width: 575px; height: 431px;" /></p>
<p>The analysis has revealed that my process isn't capable of meeting specifications. Looks like I have some work to do...but the Assistant also gives me an I-MR control chart, which reveals where and when my process is going out of spec, so I can start looking for root causes.</p>
<p><img alt="IMR Chart" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/df4a5f568e1d931ddcb96404fd888547/imr_chart.png" style="width: 575px; height: 224px;" /></p>
<p>If I were only looking at attribute data, I might think my process was just fine. Continuous data has allowed me to see that I can make the process better, and given me a rough idea where to start. <span style="line-height: 1.6;">By making changes and collecting additional continuous data, I'll be able to conduct hypothesis tests, analyze sources of variances, and more. </span></p>
Some Final Advantages of Continuous Over Discrete Data
<p>Does this mean discrete data is no good at all? Of course not—we are concerned with many things that can't be measured effectively except through discrete data, such as opinions and demographics. But when you can get it, continuous data is the better option. The table below lays out the reasons why. </p>
<p><strong>Continuous Data</strong></p>
<p><strong>Discrete Data</strong></p>
Inferences can be made with few data points—valid analysis can be performed with small samples.
More data points (a larger sample) needed to make an equivalent inference.
Smaller samples are usually less expensive to gather
Larger samples are usually more expensive to gather.
High sensitivity (how close to or far from a target)
Low sensitivity (good/bad, pass/fail)
Variety of analysis options that can offer insight into the sources of variation
Limited options for analysis, with little indication of sources of variation
<p>I hope this very basic overview has effectively illustrated why you should opt for continuous data over discrete data whenever you can get it. </p>
Data AnalysisStatisticsStatistics HelpFri, 07 Apr 2017 12:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/why-is-continuous-data-better-than-categorical-or-discrete-dataEston MartzHow to Improve Cpk
http://blog.minitab.com/blog/michelle-paret/how-to-improve-cpk
<p><span style="line-height: 1.6;">You run a capability analysis and your Cpk is bad. Now what? </span></p>
<p><span style="line-height: 1.6;">First, let’s start by defining what “bad” is. In simple terms, the smaller the Cpk, the more defects you have. So the larger your Cpk is, the better. </span><span style="line-height: 1.6;">Many practitioners use a Cpk of 1.33 as the gold standard, so we’ll treat that as the gold standard here, too.</span></p>
<p>Suppose we collect some data and run a capability analysis using <a href="http://www.minitab.com/products/minitab/">Minitab Statistical Software</a>. The results reveal a Cpk of 0.35 with a corresponding DPMO (defects per million opportunities) of more than 140,000. Not good. So how can we improve it? There are two ways to figure that out:</p>
#1 Look at the Graph
<p><strong style="line-height: 20.8px;">Example 1: </strong><span style="line-height: 20.8px;">The Cpk for Diameter1 is 0.35, which is well below 1.33. This means we have a lot of measurements that are out of spec. </span></p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/ca492b411b474ef0a6fd89ae25713cce/process_capability_report_for_diameter1_w1024.jpeg" style="width: 400px; height: 296px; margin-left: 5px; margin-right: 5px;" /></p>
<p>Using the graph, we can see that the data—represented by the blue histogram—is not centered <span style="line-height: 1.6;">between the spec limits shown in red. Fortunately, variability does not appear to be an issue since the histogram and corresponding normal curve can physically fit between the specification limits.</span></p>
<p style="margin-left: 40px;"><em>Q: How can we improve Cpk?</em></p>
<p style="margin-left: 40px;"><em>A: Center the process by moving the mean closer to 100 – halfway between the spec limits </em><em style="line-height: 20.8px;">–</em><em> without increasing the variation.</em></p>
<p> </p>
<p><span style="line-height: 1.6;"><strong>Example 2: </strong>In the analysis for Diameter2, we see a meager Cpk of only 0.41. Fortunately, the data is </span><span style="line-height: 1.6;">centered relative to</span><span style="line-height: 1.6;"> the spec limits. However, the histogram and corresponding </span><span style="line-height: 1.6;">normal curve extend beyond the specs.</span></p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/52c8fbb956a0bc411db98e20ec236820/process_capability_report_for_diameter2_w1024.jpeg" style="line-height: 20.8px; margin-left: 5px; margin-right: 5px; width: 400px; height: 296px;" /></p>
<p style="margin-left: 40px;"><em>Q: How can we improve Cpk?</em></p>
<p style="margin-left: 40px;"><em>A: Reduce the variability, while maintaining the same average.</em></p>
<p> </p>
<p><strong>Example 3: </strong>In the analysis for Diameter3, we can see that the process is not centered between the specs. To make matters worse, the histogram and corresponding normal curve are wider than the tolerance <span style="line-height: 1.6;">(i.e. the distance between the spec limits),</span> which indicates that there’s also too much variability.</p>
<p><em><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/0b7c4891bd5afe50f0bda0548666e0d0/process_capability_report_for_diameter3_w1024.jpeg" style="margin-left: 5px; margin-right: 5px; width: 400px; height: 296px;" /></em></p>
<p style="margin-left: 40px;"><em>Q</em><em style="line-height: 1.6;">: How can we improve Cpk?</em></p>
<p style="margin-left: 40px;"><em>A. Shift the mean closer to 100 to center the process AND reduce the variation.</em></p>
<p> </p>
#2 Compare Cp to Cpk
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/89bd0f1821c52297eb8aeab6efae8428/caringarage.jpg" style="line-height: 20.8px; margin-left: 5px; margin-right: 5px; float: right; width: 250px; height: 193px;" /></p>
<p><span style="line-height: 1.6;">Cp is similar to Cpk in that the smaller the number, the worse the process, and we can use the same 1.33 gold standard. However, the two statistics and <a href="http://blog.minitab.com/blog/statistics-in-the-field/learning-process-capability-with-a-catapult-part-2">their corresponding formulas</a> differ in that Cp only compares the spread of the data to the tolerance width, and </span><span style="line-height: 1.6;">does <em>not</em> account for whether or not the process is actually centered between the spec limits.</span></p>
<p>Interpreting Cp is much like asking “will my car fit in the garage?” where the data is your car and the spec limits are the walls of your garage. We’re not accounting for whether or not you’re a crappy driver and can actually drive straight and center the car—we’re just looking at whether or not your car is narrow enough to physically fit.</p>
<p><strong>Example 1: </strong>The analysis for Diameter1 has a Cp of 1.64, which is very good. Because Cp is good, we know the variation is acceptable—we can physically fit our car in the garage. However, Cpk, which does acccount for whether or not the process is centered, is <em>awful</em>, at only 0.35.</p>
<p><em> Q: How can we improve Cpk?</em></p>
<p><em> A: Shift the mean to center the process between the specs, without increasing the variation.</em></p>
<p><strong>Example 2: </strong>The analysis for Diameter 2 shows that Cp = 0.43 and Cpk = 0.41. Because Cp is bad, we know there’s too much variation—our car cannot physically fit in the garage. And because the Cp and Cpk values are similar, this tells us that the process is fairly centered.</p>
<p><em style="line-height: 20.8px;"> Q: How can we improve Cpk?</em></p>
<p style="line-height: 20.8px;"><em> A: Reduce the variation, while maintaining the same average.</em></p>
<p style="line-height: 20.8px;"><strong>Example 3: </strong>The analysis for Diameter 3 has a Cp = 0.43 and Cpk = -0.23. Because Cp is bad, we know there’s too much variation. And because Cp is not even close to Cpk, we know that the process is also off center.</p>
<p style="line-height: 20.8px;"><em> Q: How can we improve Cpk?</em></p>
<p style="line-height: 20.8px;"><em> A. Shift the mean AND reduce the variation.</em></p>
<p> </p>
And for a 3rd way...
<p>Whether you look at a capability analysis graph or compare the Cp and Cpk statistics, you’re going to arrive at the same conclusion regarding how to improve your results. And if you want yet another way to figure out how to improve Cpk, you can also look at the mean and standard deviation—but for now, I’ll spare you the math lesson and stick with #1 and #2 above.</p>
<p>In summary:</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/6aa0804ba89d8dc383d912663fd91f95/summarygrid.jpg" style="width: 644px; height: 210px;" /></p>
AutomotiveCapability AnalysisData AnalysisLean Six SigmaLearningManufacturingQuality ImprovementSix SigmaStatisticsStatsWed, 05 Apr 2017 12:00:00 +0000http://blog.minitab.com/blog/michelle-paret/how-to-improve-cpkMichelle ParetWhat to Do When Your Data's a Mess, part 3
http://blog.minitab.com/blog/understanding-statistics/what-to-do-when-your-datas-a-mess2c-part-3
<p>Everyone who analyzes data regularly has the experience of getting a worksheet that just isn't ready to use. Previously I wrote about tools you can use to <a href="http://blog.minitab.com/blog/understanding-statistics/what-to-do-when-your-data-is-a-mess-part-1">clean up and eliminate clutter in your data</a> and <a href="http://blog.minitab.com/blog/understanding-statistics/what-to-do-when-your-datas-a-mess2c-part-2">reorganize your data</a>. </p>
<p><span style="line-height: 1.6;">In this post, I'm going to highlight tools that help you get the most out of messy data by altering its characteristics.</span></p>
Know Your Options
<p>Many problems with data don't become obvious until you begin to analyze it. A shortcut or abbreviation that seemed to make sense while the data was being collected, for instance, might turn out to be a time-waster in the end. What if abbreviated values in the data set only make sense to the person who collected it? Or a column of numeric data accidentally gets coded as text? You can solve those problems quickly with <a href="http://www.minitab.com/products/minitab">statistical software</a> packages.</p>
Change the Type of Data You Have
<p>Here's an instance where a data entry error resulted in a column of numbers being incorrectly classified as text data. This will severely limit the types of analysis that can be performed using the data.</p>
<p><img alt="misclassified data" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/c45b427d3e5e2b5eac4a505ed5c3b24f/misclassified_data.png" style="width: 200px; height: 156px;" /></p>
<p>To fix this, select <strong>Data > Change Data Type</strong> and use the dialog box to choose the column you want to change.</p>
<p><img alt="change data type menu" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/46ece127300500409098383a2e476a9b/text_to_numeric_data.png" style="width: 376px; height: 175px;" /></p>
<p>One click later, and the errant text data has been converted to the desired numeric format:</p>
<p><img alt="numeric data" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f1b9df0211f9085e577a41b0e3661b45/numeric_data.png" style="width: 200px; height: 156px;" /></p>
Make Data More Meaningful by Coding It
<p>When this company collected data on the performance of its different functions across all its locations, it used numbers to represent both locations and units. </p>
<p><img alt="uncoded data" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/d22a57fe9e9e398bd948e86c0adafe34/uncoded_data.png" style="width: 135px; height: 158px;" /></p>
<p>That may have been a convenient way to record the data, but unless you've memorized what each set of numbers stands for, interpreting the results of your analysis will be a confusing chore. You can make the results easy to understand and communicating by coding the data. </p>
<p>In this case, we select <strong>Data > Code > Numeric to Text...</strong></p>
<p><img alt="code data menu" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/c75e46cc190497fd41b0e6736518c0fe/code_data_menu.png" style="width: 384px; height: 255px;" /></p>
<p>And we complete the dialog box as follows, telling the software to replace the numbers with more meaningful information, like the town each facility is located in. </p>
<p><img alt="Code data dialog box" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/cd75c14324187806b8f3a74a3b8996b4/code_data_dialog.png" style="width: 400px; height: 345px;" /></p>
<p>Now you have data columns that can be understood by anyone. When you create graphs and figures, they will be clearly labeled. </p>
<p><img alt="Coded data" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/7ff81bdb08170d6d8a4e8547623cf557/coded_data.png" style="width: 161px; height: 200px;" /></p>
Got the Time?
<p>Dates and times can be very important in looking at performance data and other indicators that might have a cyclical or time-sensitive effect. But the way the date is recorded in your data sheet might not be exactly what you need. </p>
<p>For example, if you wanted to see if the day of the week had an influence on the activities in certain divisions of your company, a list of dates in the MM/DD/YYYY format won't be very helpful. </p>
<p><img alt="date column" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f5b0dd178afbc0352f8dc2d9378e887b/date_column.png" style="width: 240px; height: 223px;" /></p>
<p>You can use <strong>Data > Date/Time > Extract to Text... </strong>to identify the day of the week for each date.</p>
<p><img alt="extract-date-to-text" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/7e6f7e8a87ee8291b9c6d51507092c19/extract_date_to_text.png" style="width: 351px; height: 132px;" /></p>
<p>Now you have a column that lists the day of the week, and you can easily use it in your analysis. </p>
<p><img alt="day column" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/dede93c9621917a0cfb54beef121d4e2/day_column.png" style="width: 249px; height: 205px;" /></p>
Manipulating for Meaning
<p>These tools are commonly seen as a way to correct data-entry errors, but as we've seen, you can use them to make your data sets more meaningful and easier to work with.</p>
<p>There are many other tools available in Minitab's Data menu, including an array of options for arranging, combining, dividing, fine-tuning, rounding, and otherwise massaging your data to make it easier to use. Next time you've got a column of data that isn't quite what you need, try using the Data menu to get it into shape.</p>
<p> </p>
<p> </p>
Data AnalysisStatisticsStatsTue, 28 Mar 2017 12:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/what-to-do-when-your-datas-a-mess2c-part-3Eston MartzTrouble Starting an Analysis? Graph Your Data with an Individual Value Plot
http://blog.minitab.com/blog/understanding-statistics/trouble-starting-an-analysis-graph-your-data-with-an-individual-value-plot
<p><span style="line-height: 1.6;">You've collected a bunch of data. It wasn't easy, but you did it. Yep, there it is, right there...just look at all those numbers, right there in neat columns and rows. Congratulations. </span></p>
<p><span style="line-height: 1.6;">I hate to ask...but what are you going to <em>do</em> with your data? </span></p>
<p><span style="line-height: 1.6;">If you're not sure precisely <em>what </em>to do with the data you've got, graphing it is a great way to get some valuable insight and direction. And a good graph to start with is an individual value plot, which you can create in Minitab <a href="http://www.minitab.com/products/minitab">Statistical Software</a> by going to <strong>Graph > Individual Value Plot</strong>. </span></p>
<span style="line-height: 20.7999992370605px;">How can individual value plots help me?</span>
<p><span style="line-height: 1.6;">There are <span><a href="http://blog.minitab.com/blog/understanding-statistics/seven-alternatives-to-pie-charts">other graphs</a></span> you could start with, so what makes the individual value plot such a strong contender? That fact it lets you view important data features, find miscoded values, and identify unusual cases. </span></p>
<p>In other words, taking a look at an individual value plot can help you to choose the appropriate direction for your analysis and to avoid wasted time and frustration.</p>
<p><strong>IDENTIFY INDIVIDUAL VALUES</strong></p>
<p>Many people like to look at their data in boxplots, and you can learn many valuable things from those graphs. Unlike boxplots, individual value plots display all data values and may be more informative than boxplots for small amounts of data.</p>
<p><img alt="boxplot of length" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/49712c7981ed83a0d5e9a678f783cd20/ivp1_boxplot_of_length.png" style="width: 576px; height: 384px;" /></p>
<p>The boxplots for the two variables look identical.</p>
<p><img alt="individual value plot" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/15420a5ce214daebe193faac3fb1d74c/ivp2_individual_value_plot_of_length.png" style="width: 576px; height: 384px;" /></p>
<p>The individual value plot of the same data shows that there are many more values for Batch 1 than for Batch 2.</p>
<p>You can use individual value plots to identify possible outliers and other values of interest. Hover the cursor over any point to see its exact value and position in the worksheet.</p>
<p><img alt="clustered data distribution" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ab3edfd6d0898085a25d56bc0684986b/ivp3_outlier_selected.png" style="width: 573px; height: 380px;" /></p>
<p>Individual value plots can also clearly illustrate characteristics of the data distribution. In this graph, most values are in a cluster between 4 and 10. Minitab can jitter (randomly nudge) the points horizontally, so that one value doesn’t obscure another. You can edit the plot to turn on or turn off jitter.</p>
<p><strong>MAKE GROUP COMPARISONS</strong></p>
<p>Because individual value plots display all values for all groups at the same time, they are especially helpful when you compare variables, groups, and even subgroups.</p>
<p><img alt="time vs. shift plot" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/e78b561b70c524bbbe1dc8e262f11556/ivp4_individual_value_plot_of_diameter.png" style="width: 576px; height: 384px;" /></p>
<p>This plot shows the diameter of pipes from two lines over four shifts. You can see that the diameters of pipes produced by Line 1 seem to increase in variability across shifts, while the diameters of pipes from Line 2 appear more stable.</p>
<p><strong>SUPPORT OTHER ANALYSES</strong></p>
<p>An individual value plot is one of the built-in graphs that are available with many Minitab statistical analyses. You can easily display an individual value plot while you perform these analyses. In the analysis dialog box, simply clickGraphs and check Individual Value Plot.</p>
<p>Some built-in individual value plots include specific analysis information. For example, the plot that accompanies a 1-sample t-test displays the 95% confidence interval for the mean and the reference value for the null hypothesis mean. These plots give you a graphical representation of the analysis results.</p>
<p><img alt="horizontal plot " src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/034da921e1e89e0d4ba22ce0556517a8/ivp5_individual_value_plot_of_diameter.png" style="width: 576px; height: 384px;" /></p>
<p>This plot accompanies a 1-sample t-test. All of the data values are between 4.5 and 5.75. The reference mean lies outside of the confidence interval, which suggests that the population mean differs from the hypothesized value.</p>
Individual Value Plot: A Case Study
<p>Suppose that salad dressing is bottled by four different machines and that you want to make sure that the bottles are filled correctly to 16 ounces. You weigh 30 samples from each machine. You plan to run an ANOVA to see if the means of the samples from each machine are equal. But, first, you display an individual value plot of the samples to get a better understanding of the data.</p>
<p><img alt="data" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/c6542a20ef538f767d7721e64599c8d6/ivp7_data.jpg" style="line-height: 20.8px; width: 192px; height: 200px;" /></p>
<p>Choose <strong>Graph > Individual Value Plot</strong>.<br />
Under <strong>One Y</strong>, choose <strong>With Groups</strong>.<br />
Click <strong>OK</strong>.<br />
In <strong>Graph </strong>variables, enter <em>Weight</em>.<br />
In <strong>Categorical variables for grouping</strong>, enter <em>Machine</em>.<br />
Click <strong>Data View</strong>.<br />
Under <strong>Data Display</strong>, check Interval bar and Mean symbol.<br />
Click <strong>OK </strong>in each dialog box.</p>
<p><img alt="individual value plot of weight" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/fa1af70661f2f3e8ebbba090bee4f5f9/ivp8_individual_value_plot_of_weight.png" style="width: 576px; height: 384px;" /></p>
<p>The mean fill weight is about 16 ounces for Fill2, Fill3, and Fill4, with no suspicious data points. For Fill1, however, the mean appears higher, with a possible outlier at the lower end.</p>
<p>Before you continue with the analysis, you may want to investigate problems with the Fill1 machine.</p>
Putting individual value plots to use
<p>Use Minitab’s individual value plot to get a quick overview of your data before you begin your analysis—especially if you have a small data set or if you want to compare groups. The insight that you gain can help you to decide what to do next and may save you time exploring other paths.</p>
<p>For more information on individual value plots and other Minitab graphs, see <a href="http://support.minitab.com/en-us/minitab/17/">Minitab Help</a>.</p>
Data AnalysisStatisticsStatistics HelpStatsThu, 23 Mar 2017 12:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/trouble-starting-an-analysis-graph-your-data-with-an-individual-value-plotEston MartzWhat to Do When Your Data's a Mess, part 2
http://blog.minitab.com/blog/understanding-statistics/what-to-do-when-your-datas-a-mess2c-part-2
<p><span style="line-height: 1.6;">In my last post, I wrote about making a cluttered data set easier to work with by removing unneeded columns entirely, and by displaying just those columns you want to work with <em>now</em>. But <a href="http://blog.minitab.com/blog/understanding-statistics/what-to-do-when-your-data-is-a-mess-part-1">too much unneeded data</a> isn't always the problem. </span></p>
<p><span style="line-height: 1.6;">What can you do when someone gives you data that isn't organized the way you need it to be? </span></p>
<p><span style="line-height: 1.6;">That happens for a variety of reasons, but most often it's because the simplest way for people to collect data is with a format that might make it difficult to assess in a worksheet. Most <a href="http://www.minitab.com/products/minitab">statistical software</a> will accept a wide range of data layouts, but just because a layout is readable doesn't mean it will be easy to analyze.</span></p>
<p><span style="line-height: 1.6;">You may not be in control of how your data were collected, but you can use tools like sorting, stacking, and ordering to put your data into a format that makes sense and is easy for you to use. </span></p>
Decide How You Want to Organize Your Data
<p>Depending on how its arranged, the same data can be easier to work with, simpler to understand, and can even yield deeper and more sophisticated insights. I can't tell you the best way to organize your specific data set, because that will depend on the types of analysis you want to perform, and the nature of the data you're working with. However, I can show you some easy ways to rearrange your data into the form that you select. </p>
Unstack Data to Make Multiple Columns
<p>The data below show concession sales for different types of events held at a local theater. </p>
<p><img alt="stacked data" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/8ea617d9de8138f26f2da0f3f95f4b88/stackedata.png" style="width: 202px; height: 188px;" /></p>
<p><span style="line-height: 20.7999992370605px;">If we wanted to perform an analysis that requires each type of event to be in its own column, we can choose <strong>Data > Unstack Columns...</strong> and complete the dialog box as shown: </span></p>
<p><span style="line-height: 20.7999992370605px;"><img alt="unstack columns dialog" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/fc098d3ddcbc21fe12602cb45336949c/unstack_columns.png" style="width: 350px; height: 263px;" /> </span></p>
<p>Minitab creates a new worksheet that contains a separate column of Concessions sales data for each type of event:</p>
<p><img alt="Unstacked Data" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f24dd4ac29678e25069d299ccc13c535/unstacked_data.png" style="width: 400px; height: 150px;" /></p>
Stack Data to Form a Single Column (with Grouping Variable)
<p>A similar tool will help you put data from separate columns into a single column for the type of analysis required. The data below show sales figures for four employees: </p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f546e2611e4fd6fe804de7c0aee3d230/stacked_data.png" style="width: 265px; height: 92px;" /></p>
<p>Select <strong>Data > Stack > Columns...</strong> and select the columns you wish to combine. Checking the "Use variable names in subscript column" will create a second column that identifies the person who made each sale. </p>
<p><img alt="Stack columns dialog" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a09dba196e68e5e75d0f248339a53e11/stack_data_dialog.jpg" style="width: 400px; height: 292px;" /></p>
<p>When you press OK, the sales data are stacked into a single column of measurements and ready for analysis, with Employee available as a grouping variable: </p>
<p><img alt="stacked columns" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/c26bec8bec9447ab1df6b9ad669d9a1a/stacked_columns.jpg" style="width: 138px; height: 181px;" /></p>
Sort Data to Make It More Manageable
<p>The following data appear in the worksheet in the order in which individual stores in a chain sent them into the central accounting system.</p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/431dcae640fa0855a8db03b14bad3998/unsorted_data.jpg" style="width: 200px; height: 228px;" /></p>
<p>When the data appear in this uncontrolled order, finding an observation for any particular item, or from any specific store, would entail reviewing the entire list. We can fix that problem by selecting <strong>Data > Sort...</strong> and reordering the data by either store or item. </p>
<p><img alt="sorted data by item" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/0c982bb11359a001c048cb6c39ab1f60/sorted_data_by_item.jpg" style="width: 221px; height: 246px;" /> <img alt="sorted data by store" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/53e9a3f22b4a959af11952995703d7d4/sorted_data_by_store.jpg" style="width: 209px; height: 248px;" /></p>
Merge Multiple Worksheets
<p>What if you need to analyze information about the same items, but that were recorded on separate worksheets? For instance, if one group was gathering historic data about all of a corporation's manufacturing operations, while another was working on strategic planning, and your analysis required data from each? </p>
<p><img alt="two worksheets" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f63ed557c91fb6136b28ab43001b48b4/two_worksheets.png" style="width: 350px; height: 327px;" /></p>
<p>You can use <strong>Data > Merge Worksheets</strong> to bring the data together into a single worksheet, using the Division column to match the observations:</p>
<p><img alt="merging worksheets" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/651d3d676a4099a71eb180344d2e8282/merge_worksheets.png" style="width: 393px; height: 363px;" /></p>
<p>You can also choose whether or not <span style="line-height: 20.7999992370605px;">multiple</span><span style="line-height: 1.6;">, missing, or unmatched observations will be included in the merged worksheet. </span></p>
Reorganizing Data for Ease of Use and Clarity
<p>Making changes to the layout of your worksheet does entail a small investment of time, but it can bring big returns in making analyses quicker and easier to perform. The next time you're confronted with raw data that isn't ready to play nice, try some of these approaches to get it under control. </p>
<p>In my next post, I'll share some tips and tricks that can help you get more information out of your data.</p>
Data AnalysisStatisticsStatsWed, 22 Mar 2017 12:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/what-to-do-when-your-datas-a-mess2c-part-2Eston MartzWhat to Do When Your Data's a Mess, part 1
http://blog.minitab.com/blog/understanding-statistics/what-to-do-when-your-data-is-a-mess-part-1
<p>Isn't it great when you get a set of data and it's perfectly organized and ready for you to analyze? I love it when the people who collect the data take special care to make sure to format it consistently, arrange it correctly, and eliminate the junk, clutter, and useless information I don't need. </p>
<p><img alt="Messy Data" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ad531bc1c0dc575e774b7ecef670b231/messydata.png" style="border-width: 1px; border-style: solid; margin: 10px 15px; width: 250px; height: 248px; float: right;" />You've never received a data set in such perfect condition, you say?</p>
<p>Yeah, me neither. But I can dream, right? </p>
<p><span style="line-height: 1.6;">The truth is, when other people give me data, it's typically not ready to analyze. It's frequently messy, disorganized, and inconsistent. I get big headaches if I try to analyze it without doing a little clean-up work first. </span></p>
<p>I've talked with many people who've shared similar experiences, so I'm writing a series of posts on how to get your data in usable condition. In this first post, I'll talk about some basic methods you can use to make your data easier to work with. </p>
Preparing Data Is a Little Like Preparing Food
<p>I'm not complaining about the people who give me data. In most cases, they aren't statisticians and they have many higher priorities than giving me data in exactly the form I want. </p>
<p>The end result is that getting data is a little bit like getting food: it's not always going to be ready to eat when you pick it up. You don't eat raw chicken, and usually you can't analyze raw data, either. <span style="line-height: 20.7999992370605px;"> </span><span style="line-height: 1.6;">In both cases, you need to prepare it first or the results aren't going to be pretty. </span></p>
<p><span style="line-height: 1.6;">Here are a couple of very basic things to look for when you get a messy data set, and how to handle them. </span></p>
<span style="line-height: 1.6;">Kitchen-Sink Data and Information Overload</span>
<p>Frequently I get a data set that includes a lot of information that I don't need for my analysis. I also get data sets that combine or group information in ways that make analyzing it more difficult. </p>
<p>For example, let's say I needed to analyze data about different types of events that take place at a local theater. Here's my raw data sheet: </p>
<p><img alt="April data sheet" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/14fe4e9930171f54848b589c0e8139d1/april_data_raw.png" style="width: 400px; height: 224px;" /></p>
<p>With each type of event jammed into a single worksheet, it's a challenge to analyze just one event category. What would work better? A separate worksheet for each type of occasion. In Minitab <a href="http://www.minitab.com/products/minitab">Statistical Software</a>, I can go to <strong>Data > Split Worksheet...</strong> and choose the Event column: </p>
<p><img alt="split worksheet" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/69c63e422339f9871ada5a244222dcfc/split_worksheet.png" style="width: 300px; height: 309px;" /></p>
<p>And Minitab will create new worksheets that include only the data for each type of event. </p>
<p><img alt="separate worksheets by event type" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/8b97ea00ae39da8cb60e307ebe6140dc/separate_data_sheets.png" style="width: 300px; height: 243px;" /></p>
<p><span style="line-height: 20.7999992370605px;">Minitab also lets you merge worksheets to </span>combine items provided in separate data files. </p>
<p><span style="line-height: 1.6;">Let's say the data set you've been given contains a lot of columns that you don't need: irrelevant factors, redundant information, and the like. Those items just clutter up your data set, and getting rid of them will make it easier to identify and access the columns of data you actually need. </span><span style="line-height: 20.7999992370605px;">You can delete rows and columns you don't need, or use the</span><strong style="line-height: 20.7999992370605px;"> Data > Erase Variables</strong><span style="line-height: 20.7999992370605px;"> tool to make your worksheet more manageable. </span></p>
<span style="line-height: 1.6;">I Can't See You Right Now...Maybe Later</span>
<p>What if you don't want to actually <em>delete </em>any data, but you only want to see the columns you intend to use? For instance, in the data below, I don't need the Date, Manager, or Duration columns now, but I may have use for them in the future: </p>
<p><img alt="unwanted columns" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/99d785a0b5ff0cbac36f0c6af05b1cac/unwantedcolumns.png" style="width: 400px; height: 225px;" /></p>
<p>I can select and right-click those columns, then use <strong>Column > Hide Selected Columns</strong> to make them disappear. </p>
<p><img alt="hide selected columns" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/00defa2646d5e100873ef2961d374ff0/hideselectedcolumns.png" style="width: 400px; height: 308px;" /></p>
<p>Voila! They're gone from my sight. Note how the displayed columns jump from C1 to C5, indicating that some columns are hidden: </p>
<p><img alt="hidden columns" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a140bb6413744b431460e70f523e5a0b/hiddencolumns.png" style="width: 323px; height: 138px;" /></p>
<p>It's just as easy to bring those columns back in the limelight. When I want them to reappear, I select the C1 and C5 columns, right-click, and choose "Unhide Selected Columns." </p>
<p>Data may arrive in a disorganized and messy state, but you don't need to keep it that way. Getting rid of extraneous information and choosing the elements that are visible can make your work much easier. But that's just the tip of the iceberg. In my next post, I'll cover some more <a href="http://blog.minitab.com/blog/understanding-statistics/what-to-do-when-your-datas-a-mess2c-part-2">ways to make unruly data behave</a>. </p>
Data AnalysisStatisticsWed, 15 Mar 2017 14:52:00 +0000http://blog.minitab.com/blog/understanding-statistics/what-to-do-when-your-data-is-a-mess-part-1Eston MartzP-value Roulette: Making Hypothesis Testing a Winnerâ€™s Game
http://blog.minitab.com/blog/rkelly/p-value-roulette-making-hypothesis-testing-a-winner%E2%80%99s-game
<p>Welcome to the Hypothesis Test Casino! The featured game of the house is roulette. But this is no <em>ordinary</em> game of roulette. This is p-value roulette!</p>
<p>Here’s how it works: We have two roulette wheels, the Null wheel and the Alternative wheel. Each wheel has 20 slots (instead of the usual 37 or 38). You get to bet on one slot.</p>
<p><img alt="http://upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Edvard_Munch_-_At_the_Roulette_Table_in_Monte_Carlo_-_Google_Art_Project.jpg/256px-Edvard_Munch_-_At_the_Roulette_Table_in_Monte_Carlo_-_Google_Art_Project.jpg" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/8647ae2930d63e128d09f0b2cc5cdb87/p_value_roulette.jpg" style="line-height: 20.7999992370605px; border-width: 1px; border-style: solid; margin: 10px 15px; width: 256px; height: 166px; float: right;" /></p>
<p>What happens if the ball lands in the slot you bet on? Well, that depends on which wheel we spin. If we spin the Null wheel, you lose your bet. But if we spin the Alternative wheel, you win!</p>
<p>I’m sorry, but we can’t tell you which wheel we’re spinning.</p>
<p>Doesn’t that sound like a good game?</p>
<p>Not convinced yet? I assure you the odds are in your favor <em>if </em>you choose your slot wisely. Look, I’ll show you a graph of some data from the Null wheel. We spun it 10,000 times and counted how many times the ball landed in each slot. As you can see each slot is just as likely as any other, with a probability of about 0.05 each. That means there’s a 95% probability the ball won’t land on your slot, so you have only a 5% chance of losing—no matter what—<em>if</em> we happen to spin the Null wheel.</p>
<p><img alt="histogram of p values for null hypothesis" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/dc5efcd7001f33a77bea1c635af837e5/histogram_of_p_values_null_hypothesis.png" style="width: 576px; height: 384px;" /></p>
<p>What about that Alternative wheel, you ask? Well, we’ve had quite a few different Alternative wheels over the years. Here’s a graph of some data from one we were spinning last year:</p>
<p><img alt="histogram of p values from alternative hypothesis" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/dd0cafe3375f3202adaf3542d15eb9ab/histogram_of_p_values_alternative_hypothesis.png" style="width: 576px; height: 384px;" /></p>
<p>And just a few months ago, we had a different one. Check out the data from this one. It was very, very popular.</p>
<p><img alt=" histogram of p-values from popular alternative hypothesis" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/fc6f0ff641e7eb4d3f7750c8163ac968/histogram_of_p_values_alternative_hypothesis_2.png" style="width: 576px; height: 384px;" /></p>
<p>Now that’s what I call an Alternative! People in the know always picked the first slot. You can see why.</p>
<p>I’m not allowed to show you data from the current game. But I assure you the Alternatives all follow this same pattern. They tend to favor those smaller numbers.</p>
<p>So, you’d like to play? Great! Which slot would you like to bet on?</p>
Is this on the level?
<p>No, I don’t really have a casino with two roulette wheels. My graphs are simulated p-values for a <a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/what-is-a-t-test-and-why-is-it-like-telling-a-kid-to-clean-up-that-mess-in-the-kitchen">1-sample t-test</a>. The null hypothesis is that the mean of a process or population is 5. The two-sided alternative is that the mean is different from 5. In my first graph, the null hypothesis was true: I used Minitab to generate random samples of size 20 from a normal distribution with mean 5 and standard deviation of 1. For the other two graphs, the only thing I changed was the mean of the normal distribution I sampled from. For the second graph, the mean was 5.3. For the final graph, the mean was 5.75.</p>
<p>For just about any hypothesis test you do in Minitab <a href="http://www.minitab.com/products/minitab">Statistical Software</a>, you will see a p-value. Once you understand how p-values work, you will have greater insight into what they are telling you. Let’s see what we can learn about p-values from playing p-value roulette.</p>
<ol>
<li>Just as you didn’t know whether you are spinning the Null or Alternative wheel, you don’t know for sure whether the null hypothesis is true or not. But basing your decision to reject the null hypothesis on the p-value favors your chance of making a good decision.<br />
</li>
<li>If the null hypothesis is true, then any p-value is just as likely as any other. You control the probability of making a Type I error by rejecting only when the p-value falls within a narrow range, typically 0.05 or smaller. A <a href="http://blog.minitab.com/blog/the-stats-cat/understanding-type-1-and-type-2-errors-from-the-feline-perspective-all-mistakes-are-not-equal">Type I error</a> occurs if you incorrectly reject a true null hypothesis.<br />
</li>
<li>If the alternative hypothesis is true, then smaller p-values become more likely and larger p-values become less likely. That’s why you can think of a small p-value as evidence in favor of the alternative hypothesis.<br />
</li>
<li>It is tempting to try to interpret the p-value as the probability that the null hypothesis is true. But that’s not what it is. The null hypothesis is either true, or it’s not. Each time you “spin the wheel” the ball will land in a different slot, giving you a different p-value. But the truth of the null hypothesis—or lack thereof—remains unchanged.<br />
</li>
<li>In the roulette analogy there were different alternative wheels, because there is not usually just a single alternative condition. There are infinitely many mean values that are not equal to 5; my graphs looked at just two of these.<br />
</li>
<li>The probability of rejecting the null hypothesis when the alternative hypothesis is true is called the power of the test. In the 1-sample t-test, the power depends on how different the mean is from the null hypothesis value, relative to the standard error. While you don’t control the true mean, you can reduce the standard error by taking a larger sample. This will give the test greater power.<br />
</li>
</ol>
You Too Can Be a Winner!
<p>To be a winner at p-value roulette, you need to make sure you are performing the right hypothesis test, and that your data fit the assumptions of that test. Minitab’s <a href="http://www.minitab.com/en-us/products/minitab/assistant/">Assistant menu</a> can help you with that. The Assistant helps you choose the right statistical analysis, provides easy-to-understand guidelines to walk you through data collection and analysis. Then it gives you clear graphical output to let you know how to interpret your p-value, while helping you evaluate whether your data are appropriate, so you can trust your results.</p>
<p> </p>
Hypothesis TestingStatisticsStatistics HelpStatsMon, 06 Mar 2017 13:00:00 +0000http://blog.minitab.com/blog/rkelly/p-value-roulette-making-hypothesis-testing-a-winner%E2%80%99s-gameRob KellyCreating and Reading Statistical Graphs: Trickier than You Think
http://blog.minitab.com/blog/understanding-statistics/creating-and-reading-statistical-graphs-trickier-than-you-think
<p>My colleague Cody Steele wrote a post that illustrated <a href="http://blog.minitab.com/blog/statistics-and-quality-improvement/how-painful-does-the-income-gap-look-to-you">how the same set of data can appear to support two contradictory positions</a>. He showed how changing the scale of a graph that displays mean and median household income over time drastically alters the way it can be interpreted, even though there's no change in the data being presented.</p>
<p><img alt="Graph interpretation is tricky, especially if you're doing it quickly" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f594d20f8daa8e00e29380f68010b1cc/hunh.jpg" style="margin: 10px 15px; float: right; width: 200px; height: 200px;" /> When we analyze data, we need to present the results in an objective, honest, and fair way. That's the catch, of course. What's "fair" can be debated...and that leads us straight into "Lies, damned lies, and statistics" territory. </p>
<p><span style="line-height: 20.7999992370605px;">Cody's post got me thinking about the importance of statistical literacy, especially in a mediascape saturated with overhyped news reports about seemingly every new study, not to mention omnipresent "infographics" of frequently dubious origin and intent.</span></p>
<p><span style="line-height: 20.7999992370605px;">As consumers and providers of statistics, can we trust our own impressions of the information we're bombarded with on a daily basis? It's an increasing challenge, even for the statistics-savvy. </span></p>
So Much Data, So Many Graphs, So Little Time
<p>The increased amount of information available, combined with the acceleration of the news cycle to speeds that wouldn't have been dreamed of a decade or two ago, means we have less time available to absorb and evaluate individual items critically. </p>
<p>A half-hour television news broadcast might include several animations, charts, and figures based on the latest research, or polling numbers, or government data. They'll be presented for several seconds at most, then it's on to the next item. </p>
<p>Getting news online is even more rife with opportunities for split-second judgment calls. We scan through the headlines and eyeball the images, searching for stories interesting enough to click on. But with 25 interesting stories vying for your attention, and perhaps just a few minutes before your next appointment, you race through them very quickly. </p>
<p>But when we see graphs for a couple of seconds, do we really absorb their meaning completely and accurately? Or are we susceptible to misinterpretation? </p>
<p>Most of the graphs we see are very simple: bar charts and pie charts predominate. But <span style="line-height: 1.6;">as statistics educator Dr. Nic points out in </span><a href="http://learnandteachstatistics.wordpress.com/2012/07/16/tricky_graphs/" style="line-height: 1.6;">this blog post</a>,<span style="line-height: 1.6;"> </span><span style="line-height: 20.7999992370605px;">interpreting</span><span style="line-height: 20.7999992370605px;"> </span><span style="line-height: 1.6;">even simple bar charts can be a deceptively tricky business</span><span style="line-height: 1.6;">. I've adapted her example to demonstrate this below. </span></p>
Which Chart Shows Greater Variation?
<p>A city surveyed residents of two neighborhoods about the quality of service they get from local government. Respondents were asked to rate local services on a scale of 1 to 10. Their responses were charted using Minitab <a href="http://www.minitab.com/products/minitab">Statistical Software</a>, as shown below. </p>
<p>Take a few seconds to scan the charts, then choose which neighborhood's responses exhibit the most variation, Ferndale or Lawnwood?</p>
<p><img alt="Lawnwood Bar Chart" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f88262f2732bc43e8ac0b919d43139a5/lawnwoodbarchart.gif" style="width: 500px; height: 333px;" /></p>
<p><img alt="Ferndale Bar Chart" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/67ee1909a89236e3caac2d11a9d42795/ferndalebarchart.gif" style="width: 500px; height: 333px;" /></p>
<p>Seems pretty straightforward, right? Lawnwood's graph is quite spiky and disjointed, with sharp peaks and valleys. The graph of Ferndale's responses, on the other hand, looks nice and even. Each bar's roughly the same height. </p>
<p>It looks like Lawnwood's responses have the most variation. But let's verify that impression with some basic descriptive statistics about each neighborhood's responses:</p>
<p style="margin-left: 40px;"><img alt="Descriptive Statistics for Fernwood and Lawndale" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/1eeed755d2a0baea0939dc7ccecacaea/descriptive_statistics.gif" style="width: 369px; height: 105px;" /></p>
<p>Uh-oh. A glance at the graphs suggested that Lawnwood has more variation, but the analysis demonstrates that Ferndale's variation is, in fact, much higher. <span style="line-height: 20.7999992370605px;">How did we get this so wrong?</span><span style="line-height: 20.7999992370605px;"> </span><span style="line-height: 1.6;"> </span></p>
Frequencies, Values, and Counterintuitive Graphs
<p><span style="line-height: 1.6;">The answer lies in how the data were presented. The charts above show frequencies, or counts, rather than individual responses. </span></p>
<p><span style="line-height: 1.6;">What if we graph the individual responses for each neighborhood? </span></p>
<p><img alt="Lawndale Individuals Chart" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/d8e91ae6c007e8f5327c54ac3ec65604/lawnwoodindividualsbarchart.gif" style="width: 500px; height: 333px;" /></p>
<p><img alt="Ferndale Individuals Chart" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/4c01c68dbb96e2126a1fd313ee38e001/ferndaleindividualsbarchart.gif" style="width: 500px; height: 333px;" /></p>
<p>In <em>these </em>graphs, it's easy to see that the responses of Ferndale's citizens had much more variation than those of Lawnwood. But unless you appreciate the differences between values and frequencies—and paid careful attention to how the first set of graphs was labeled—a quick look at the earlier graphs could well leave you with the wrong conclusion. </p>
Being Responsible
<p>Since you're reading this, you probably both create and consume data analysis. You may generate your own reports and charts at work, and see the results of other peoples' analyses on the news. We should approach both situations with a certain degree of responsibility. </p>
<p>When looking at graphs and charts produced by others, we need to avoid snap judgments. We need to pay attention to what the graphs really show, and take the time to draw the right conclusions based on how the data are presented. </p>
<p>When sharing our own analyses, we have a responsibility to communicate clearly. In the frequency charts above, the X and Y axes are labeled adequately—but couldn't they be more explicit? Instead of just "Rating," couldn't the label read "Count for Each Rating" or some other, more meaningful description? </p>
<p>Statistical concepts may seem like common knowledge if you've spent a lot of time working with them, but many people aren't clear on ideas like "correlation is not causation" and margins of error, let alone the nuances of statistical assumptions, distributions, and significance levels.</p>
<p>If your audience includes people without a thorough grounding in statistics, are you going the extra mile to make sure the results are understood? For example, many expert statisticians have told us they use <a href="http://www.minitab.com/products/minitab/assistant/">the Assistant</a> in Minitab Statistical Software to present their results precisely because it's designed to communicate the outcome of analysis clearly, even for statistical novices. </p>
<p><span style="line-height: 20.7999992370605px;">If you're already doing everything you can to make statistics accessible to others, kudos to you. </span><span style="line-height: 20.7999992370605px;">And if you're not, why aren't you? </span></p>
Data AnalysisStatisticsStatistics in the NewsStatsWed, 01 Mar 2017 13:30:00 +0000http://blog.minitab.com/blog/understanding-statistics/creating-and-reading-statistical-graphs-trickier-than-you-thinkEston MartzThree Common P-Value Mistakes You'll Never Have to Make
http://blog.minitab.com/blog/understanding-statistics/three-common-p-value-mistakes-youll-never-have-to-make
<p>Statistics can be challenging, especially if you're not analyzing data and interpreting the results every day. <a href="http://www.minitab.com/products/minitab/" title="statistical software for analyzing quality data">Statistical software</a> makes things easier by handling the arduous mathematical work involved in statistics. But ultimately, we're responsible for correctly interpreting and communicating what the results of our analyses show.</p>
<p>The p-value is probably the most frequently cited statistic. We use p-values to interpret the results of regression analysis, hypothesis tests, and many other methods. Every introductory statistics student and every Lean Six Sigma Green Belt learns about p-values. </p>
<p>Yet this common statistic is misinterpreted so often that at least one scientific journal has abandoned its use.</p>
What Does a P-value Tell You?
<p>Typically, a P value is defined as "the probability of observing an effect at least as extreme as the one in your sample data—<em>if the <span><a href="http://blog.minitab.com/blog/understanding-statistics/why-shrewd-experts-fail-to-reject-the-null-every-time">null hypothesis</a></span> is true</em>." Thus, the only question a p-value can answer is this one:</p>
<p><em>How likely is it that I would get the data I have, assuming the null hypothesis is true?</em></p>
<p>If your p-value is less than your selected <span><a href="http://blog.minitab.com/blog/adventures-in-statistics-2/understanding-hypothesis-tests%3A-significance-levels-alpha-and-p-values-in-statistics">alpha level</a></span> (typically 0.05), you <em>reject the null hypothesis</em> in favor of the alternative hypothesis. If the p-value is above your alpha value, you <em>fail to reject</em> the null hypothesis. It's important to note that the null hypothesis is never accepted; we can only <em>reject </em>or <em>fail to reject</em> it. </p>
The P-Value in a 2-Sample t-Test
<p>Consider a typical hypothesis test—say, a 2-sample t-test of the mean weight of boxes of cereal filled at different facilities. We collect and weigh 50 boxes from each facility to confirm that the mean weight for each line's boxes is the listed package weight of 14 oz. </p>
<p>Our null hypothesis is that the two means are equal. Our alternative hypothesis is that they are <em>not </em>equal. </p>
<p>To run this test in Minitab, we enter our data in a worksheet and select <strong>Stat > Basic Statistics > 2-Sample T-test</strong>. If you'd like to follow along, you can download the <a href="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/2edc594cf40ec4931e5cd0021df6703e/cereal_weight.mtw">data</a> and, if you don't already have it, get the <a href="http://www.minitab.com/products/minitab/free-trial/">30-day trial of Minitab</a>. In the t-test dialog box, select<em> Both samples are in one column</em> from the drop-down menu, and choose "Weight" for Samples, and "Facility" for Sample IDs.</p>
<p style="margin-left: 40px;"><img alt="t test for the mean" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/1a090752bef395f3b227511c6e57946d/dialog.png" style="width: 424px; height: 296px;" /></p>
<p>Minitab gives us the following output, and I've highlighted the p-value for the hypothesis test:</p>
<p style="margin-left: 40px;"><img alt="t-test output" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/3b27f14d1859460a1875c81384c52ccb/t_test_output.png" style="width: 544px; height: 222px;" /></p>
<p>So we have a p-value of 0.029, which is less than our selected alpha value of 0.05. Therefore, we reject the null hypothesis that the means of Line A and Line B are equal. Note also that while the evidence indicates the means are different, that difference is estimated at 0.338 oz—a pretty small amount of cereal. </p>
<p>So far, so good. But this is the point at which trouble often starts.</p>
Three Frequent Misstatements about P-Values
<p>The p-value of 0.029 means we reject the null hypothesis that the means are equal. But that doesn't mean any of the following statements are accurate:</p>
<ol>
<li><strong>"There is 2.9% probability the means are the same, and 97.1% probability they are different." </strong><br />
We don't know that at all. The p-value only says that <strong><em>if </em></strong>the null hypothesis is true, the sample data collected would exhibit a difference this large or larger only 2.9% of the time. Remember that the p-value doesn't tell you anything <em>directly </em>about what you've seen. Instead, it tells you the <em>odds </em>of seeing it. </li>
<br />
<li><strong>"The p-value is low, which indicates there's an important difference in the means." </strong><br />
Based on the 0.029 p-value shown above, we can conclude that a statistically significant difference between the means exists. But the estimated size of that difference is less than a half-ounce, and won't matter to customers. A p-value may indicate a difference exists, but it tells you nothing about its practical impact.</li>
<br />
<li><strong>"The low p-value shows the alternative hypothesis is true."</strong><br />
A low p-value provides statistical evidence to reject the null hypothesis—but that doesn't prove the truth of the alternative hypothesis. If your alpha level is 0.05, there's a 5% chance you will incorrectly reject the null hypothesis. Or to put it another way, if a jury fails to convict a defendant, it doesn't prove the defendant is <em>innocent</em>: it only means the prosecution failed to prove the defendant's guilt beyond a reasonable doubt. </li>
</ol>
<p>These misinterpretations happen frequently enough to be a concern, but that doesn't mean that we shouldn't use p-values to help interpret data. The p-value remains a very useful tool, as long as we're interpreting and communicating its significance accurately.</p>
P-Value Results in Plain Language
<p>It's one thing to keep all of this straight if you're doing data analysis and statistics all the time. It's another thing if you're only analyze data occasionally, and need to do many other things in between—like most of us. "Use it or lose it" is certainly true about statistical knowledge, which could well be another factor that contributes to misinterpreted p-values. </p>
<p>If you're leery of that happening to you, a good way to avoid that possibility is to use the Assistant in Minitab to perform your analyses. If you haven't used it yet, the Assistant menu guides you through your analysis from start to finish. The dialog boxes and output are all in plain language, so it's easy to figure out what you need to do and what the results mean, even if it's been a while since your last analysis. (But even expert statisticians tell us they like using the Assistant because the output is so clear and easy to understand, regardless of an audience's statistical background.) </p>
<p>So let's redo the analysis above using the Assistant, to see what that output looks like and how it can help you avoid misinterpreting your results—or having them be misunderstood by others!</p>
<p>Start by selecting <strong>Assistant > Hypothesis Test...</strong> from the Minitab menu. Note that a window pops up to explain exactly what a hypothesis test does. </p>
<p style="margin-left: 40px;"><img alt="assistant hypothesis test" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f26601f26db3576a7cf2b5bc3178f9ca/assistant_hypothesis_test.png" style="width: 420px; height: 252px;" /></p>
<p>The Assistant asks what we're trying to do, and gives us three options to choose from.</p>
<p style="margin-left: 40px;"><img alt="hypothesis test chooser" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/fba2ee28b10063e1c5f0f00eb77db1b2/assistant_hypothesis_test_chooser.png" style="width: 600px; height: 472px;" /></p>
<p>We know we want to compare a sample from Line A with a sample from Line B, but what if we can't remember which of the 5 available tests is the appropriate one in this situation? We can get guidance by clicking "Help Me Choose."</p>
<p style="margin-left: 40px;"><img alt="help me choose the right hypothesis test" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/51bb23fbb44603efff50fe4fa1d9dbd1/assistant_hypothesis_test_decision_tree.png" style="width: 700px; height: 551px;" /></p>
<p>The choices on the diagram direct us to the appropriate test. In this case, we choose continuous data instead of attribute (and even if we'd forgotten the difference, clicking on the diamond would explain it). We're comparing two means instead of two standard deviations, and we're measuring two different sets of items since our boxes came from different production lines. </p>
<p>Now we know what test to use, but suppose you want to make sure you don't miss anything that's important about the test, like requirements that must be met? Click the "more..." link and you'll get those details. </p>
<p style="margin-left: 40px;"><img alt="more info about the 2-Sampe t-Test" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/1b4f09a2438b0aaef14e8da6564524cf/assistant_hypothesis_test_more_info.png" style="width: 700px; height: 526px;" /></p>
<p>Now we can proceed to the Assistant's dialog box. Again, statistical jargon is minimized and everything is put in straightforward language. We just need to answer a few questions, as shown. Note that the Assistant even lets us tell it how big a difference needs to be for us to consider it practically important. In this case, we'll enter 2 ounces.</p>
<p style="margin-left: 40px;"><img alt="Assistant 2-sample t-Test dialog" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/994d9172bf788282258f765d4d08aefa/assistant_hypothesis_test_dialog.png" style="width: 641px; height: 495px;" /></p>
<p>When we press OK, the Assistant performs the t-test and delivers three reports. The first of these is a summary report, which includes summary statistics, confidence intervals, histograms of both samples, and more. And interpreting the results couldn't be more straightforward than what we see in the top left quadrant of the diagram. In response to the question, "Do the means differ?" we can see that p-value of 0.029 marked on the bar, very far toward the "Yes" end of the scale. </p>
<p style="margin-left: 40px;"><img alt="2-Sample t-Test summary report" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/8927b8bc833551678715f68149dd18ad/assistant_hypothesis_test_summary.png" style="width: 700px; height: 526px;" /></p>
<p>Next is the Diagnostic Report, which provides additional information about the test. </p>
<p style="margin-left: 40px;"><img alt="2-Sample t-Test diagnostic report" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/6467a0be0ba60329f2be282e14b9be33/assistant_hypothesis_test_diagnostic.png" style="width: 700px; height: 526px;" /></p>
<p>In addition to letting us check for outliers, the diagnostic report shows us the size of the observed difference, as well as the chances that our test could detect a practically significant difference of 2 oz. </p>
<p>The final piece of output the Assistant provides is the report card, which flags any problems or concerns about the test that we would need to be aware of. In this case, all of the boxes are green and checked (instead of red and x'ed). </p>
<p style="margin-left: 40px;"><img alt="2-Sample t-Test report card" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/0e4cd0dce832a8251701f8175de9a037/assistant_hypothesis_test_report_card.png" style="width: 700px; height: 526px;" /></p>
<p>When you're not doing statistics all the time, the Assistant makes it a breeze to find the right analysis for your situation and to make sure you interpret your results the right way. Using it is a great way to make sure you're not attaching too much, or too little, importance on the results of your analyses.</p>
<p> </p>
Hypothesis TestingStatisticsStatistics HelpStatsWed, 22 Feb 2017 14:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/three-common-p-value-mistakes-youll-never-have-to-makeEston Martz