Minitab | MinitabBlog posts and articles about using Minitab software in quality improvement projects, research, and more.
http://blog.minitab.com/blog/minitab/rss
Wed, 01 Jun 2016 05:21:01 +0000FeedCreator 1.7.3A Six Sigma Healthcare Project, part 1: Examining Factors with a Pareto Chart
http://blog.minitab.com/blog/understanding-statistics/a-six-sigma-healthcare-project-part-1-examining-factors-with-a-pareto-chart
<p>Over the past year I've been able to work with and learn from practitioners and experts who are using data analysis and Six Sigma to improve the quality of healthcare, both in terms of operational efficiency and better patient outcomes. I've been struck by how frequently a very basic analysis can lead to remarkable improvements, but some insights cannot be attained without conducting more sophisticated analyses. One such situation is covered in a 2011 <em>Quality Engineering</em> article on the application of <a href="http://dx.doi.org/10.1080/08982112.2011.553761">binary logistic regression in a healthcare Six Sigma project</a>.</p>
<p>In this series of blog posts, I'll follow the path of the project discussed in that article and show you how to perform the analyses described using Minitab Statistical Software. (I am using simulated data, so my analyses will not match those in the original article.)</p>
The Six Sigma Project Goal
<p>The goal of this Six Sigma project was to attract and retain more patients in a hospital's cardiac rehabilitation program. On being discharged, heart-surgery patients are advised to join this program, which offers psychological support and guidance on a healthy diet and lifestyle. Program participants also have two or three physical therapy sessions per week, for up to 45 sessions.</p>
<p>An average of 33 new patients begin participating in the program per month, and participants attend an average of 29 sessions. But many discharged patients do not enroll in the program, and many who do drop out before they complete it. Greater rates of participation would benefit individual patients' health and increase the hospital's revenues.</p>
<p>The project team identified two critical metrics they might improve:</p>
<ul>
<li>The number of patients participating in the program each month</li>
<li>The number of therapy sessions for each participant</li>
</ul>
<p>The team set a goal to increase the average number of new participants to 36 per month, and to increase the average number of sessions each patient attends to 32.</p>
Available Patient Data
<p>Existing data on the hospital's cardiac patients includes:</p>
<ul>
<li>The distance between each patient's home and the hospital</li>
<li>Patient's age and gender</li>
<li>Whether or not the patient has access to a car</li>
<li>Whether or not the patient participated in the rehabilitation program</li>
</ul>
<p>To illustrate the analyses conducted for this project, we will use a simulated set of data for 500 patients. Download the <a href="//cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/521a4efabc5f892eef403e8c4a354f9a/cardiacparticipationdata_1.mtw">data set</a> to follow along and try these analyses yourself. If you don't already have Minitab, you can <a href="http://www.minitab.com/products/minitab/free-trial/">download and use our statistical software free for 30 days</a>.</p>
Exploring Why Patients Leave the Program with a Pareto Chart
<p><span style="line-height: 20.8px;">Encouraging patients who start the program to complete it, or at least to attend a greater number of sessions, has the potential to be a quick and easy "win," </span>so the project team began by looking at why 156 patients who started the program eventually dropped out.</p>
<p>The reasons patients gave for dropping out of the rehabilitation program were placed into several different categories, then visualized with a <span><a href="http://blog.minitab.com/blog/understanding-statistics/explaining-quality-statistics-so-your-boss-will-understand-pareto-charts">Pareto chart</a></span>.</p>
<p><span style="line-height: 20.8px;">The Pareto chart is a must-have in any analyst’s toolbox. </span>The Pareto principle states that about 80% of outcomes come from 20% of the possible causes. <span style="line-height: 1.6;">By plotting the frequencies and corresponding percentages of a categorical variable, a Pareto chart helps identify the "vital few"—the “20%" that really matter, so you can focus your efforts where they can make the most difference.</span></p>
<p>To create this chart in Minitab, open <strong>Stat > Quality Tools > Pareto Chart...</strong> From our worksheet of simulated hospital data, select the <em>Reason</em> column as shown:</p>
<p style="margin-left: 40px;"><img alt="Pareto Dialog" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/dba69467c17b4b989011c6771b524e7e/pareto_dialog.png" style="width: 500px; height: 241px;" /></p>
<p>When you press <strong>OK</strong>, Minitab creates the following chart:</p>
<p style="margin-left: 40px;"><img alt="Pareto Chart of Reasons" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/4418104d45e4556ff52270cc56aec5c8/pareto_chart_of_reason.png" style="width: 576px; height: 384px;" /></p>
<p>Along the x-axis, Minitab displays the reasons people dropped out of the rehabilitation program, along with the percent of the total and the cumulative percentage each reason accounted for. We can see that some 80% of these patients dropped out of the program for one of the following reasons:</p>
<ul>
<li>They were readmitted to the hospital.</li>
<li>Work or other obligations conflicted with the program schedule.</li>
<li>They could not participate for medical reasons.</li>
<li>They had their own exercise facilities.</li>
</ul>
<p>While encouraging existing participants to complete the program seemed like a good strategy, the Pareto chart shows that most people stop participating due to factors that are beyond the hospital's control. Therefore, rather than focusing on keeping existing participants, the team decided to explore how to attract more new participants.</p>
Getting More Patients to Participate in the Program
<p>Having decided to focus on increasing initial enrollment, the <span style="line-height: 1.6;">project team next gathered cardiologists, physical therapists, patients, and other stakeholders to brainstorm about the factors that influence participation. </span></p>
<p><span style="line-height: 1.6;">At these brainstorming sessions, many stakeholders insisted that more people would participate in the rehabilitation program if the brochure about it were better. Another suggested solution involved sending a letter to cardiologists encouraging them to be more positive about the program and to mention it to patients at an earlier point in their treatment. </span></p>
<p>The project team recorded these suggestions, but they were wary of jumping to conclusions that weren't supported by data. They decided to look more closely at the data they had from existing patients before proceeding with any potential solutions.</p>
<p>In part 2, we will review how the team used graphs and basic descriptive statistics to get quick insight into how individual factors influenced patient participation in the program.</p>
Health Care Quality ImprovementTue, 31 May 2016 12:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/a-six-sigma-healthcare-project-part-1-examining-factors-with-a-pareto-chartEston MartzA Simple Guide to Multivariate Control Charts
http://blog.minitab.com/blog/applying-statistics-in-quality-projects/a-simple-guide-to-multivariate-control-charts
<p>This is an era of massive data. A huge amount of data is being generated from the web and from customer relations records, not to mention also from sensors used in the manufacturing industry (semiconductor, pharmaceutical, petrochemical companies and many other industries).</p>
Univariate Control Charts
<p>In the manufacturing industry, critical product characteristics get routinely collected to ensure that all products at every step of the process remain well within specifications. Dedicated univariate <span><a href="http://blog.minitab.com/blog/understanding-statistics/control-chart-tutorials-and-examples">control charts</a></span> are deployed to ensure that any drift gets detected as early as possible to avoid negative effects on the final product performance. Ideally, when a special cause gets identified, the equipment should be immediately stopped until the issue gets resolved.</p>
Monitoring Tool Process Parameters
<p>In modern plants, many manufacturing tools are connected to IT networks so that tool process parameters can be collected and stored in real time (pressures, temperatures etc.). Unfortunately, this type of data is, very often, not continuously monitored, although we might expect process parameters to play an important role in terms of final product quality. When a quality incident occurs, data from these numerous upstream process parameters are sometimes retrieved from databases, to investigate (after the fact) why this incident took place in the first place.</p>
<p>A more efficient approach would be to monitor these process parameters in real time and try to understand how they affect complex manufacturing processes: Which process parameters are really important, and which ones are not? What are their best settings?</p>
Multivariate Control Charts
<p>Monitoring upstream tool parameters might lead to a huge increase in the number of control charts, though. In this context, process engineers might benefit from using multivariate charts which let you monitor up to 7 or 8 parameters together in a single chart. Rather than using equipment process parameter data to investigate the causes of previous quality incidents <span style="line-height: 20.8px;">in a fire-fighting mode</span><span style="line-height: 1.6;">, this approach would focus on long-term improvements.</span></p>
<p>Multivariate control charts are based on squared standardized (generalized) multivariate distances from the general mean. In Minitab, the T² Hotelling method is used to generate multivariate charts. If you don't already have Minitab and you'd like to try creating some of the charts I'm discussing, you can download the <a href="http://www.minitab.com/products/minitab/free-trial/">free 30-day trial</a>.</p>
<p>An obvious advantage of using multivariate charts is that they enable you to minimize the total number of control charts you need to manage, but there are some additional related benefits involved as well:</p>
<ul>
<li>
<p><strong>Analyzing process parameters jointly</strong>: Many process parameters are related to one another, for example, for a particular process step we might expect the pressure value to be large when temperature is high. Considering every process parameter separately is not necessarily a good option and might even be misleading. Detecting any mismatch between parameter settings may be very useful.</p>
<p>In the graph below, the Y1 and Y2 parameter values are correlated (high values for Y1 are associated with high values for Y2) so that the red point in the lower right corner appears to be out-of-control (beyond the control ellipse) from a multivariate point of view. From a univariate perspective, this red point remains within the usual fluctuation bounds for both Y1 and Y2, though. This point clearly represents a mismatch between Y1 and Y2. The squared generalized multivariate distance from the red point to the scatterplot mean is unusually large.</p>
</li>
</ul>
<p style="margin-left: 40px;"><img height="421" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/f10e8ed1b4bf1119ff8a150b57a4109b/f10e8ed1b4bf1119ff8a150b57a4109b.png" width="629" /></p>
<ul>
<li>
<p><strong>Overall rate of false alarms</strong>: The probability of a false alarm with three-sigma standard limits in a control chart is 0,27%. If 100 charts are monitored at the same time, the probability of a false alarm automatically increases to 27% (0.27% * 100).</p>
<p>However, when numerous variables are monitored simultaneously using a single multivariate chart, the overall/family rate of false alarms remains close to 0.27%.</p>
<p><strong>3-D measurements</strong>: When three-dimensional measurements of a product are taken, the amount of data needed to ensure that all dimensions (X, Y and Z) remain within specifications can get pretty big. But if the product gets damaged in a particular area, it will usually affect more than one dimension, so the three dimensions should not be considered separately from one another. If a multivariate chart simultaneously monitors deviations from the ideal planned X, Y, Z values, their combined effects will be taken into account.</p>
</li>
</ul>
A Simple Example
<p>Eight process parameters have been monitored using eight univariate Xbar control charts. No out-of-control point has been detected (see below):</p>
<p style="margin-left: 40px;"><img height="384" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/f97a659a70088e92e8d3b7dcf8eccf56/f97a659a70088e92e8d3b7dcf8eccf56.png" width="576" /></p>
<p style="margin-left: 40px;"><img height="384" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/3d1ed649902069763005626f6bd40271/3d1ed649902069763005626f6bd40271.png" width="576" /></p>
<p>The eight control charts above may be replaced by a single multivariate chart that monitors the eight variables simultaneously. Although no out-of-control point had been detected in the univariate charts, subgroup number 12 turns out to be out of control in the multivariate chart:</p>
<p style="margin-left: 40px;"><img height="384" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/56360251e77b469718b37b68ce1efd4c/56360251e77b469718b37b68ce1efd4c.png" width="576" /></p>
<p>To investigate why an out-of-control point (subgroup 12) occurred in the multivariate chart, I used simple graphs (scatterplots) to analyze time trends. Note that as far as the X3, X4 and X5 parameters are involved, subgroup 12 is positioned far away from the other points.</p>
<p style="margin-left: 40px;"><img height="384" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/f68088341ae349ab41ee69ffa02579f8/f68088341ae349ab41ee69ffa02579f8.png" width="576" /></p>
<p style="margin-left: 40px;"><img height="384" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/250addfd138fcbc17e8a1b421bee3770/250addfd138fcbc17e8a1b421bee3770.png" width="576" /></p>
<p style="margin-left: 40px;"><img height="384" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/5e96d5c274efa3950498207718513346/5e96d5c274efa3950498207718513346.png" width="576" /></p>
Conclusion
<p>When process parameters have no direct critical effect, a <span style="line-height: 20.8px;">dedicated </span><span style="line-height: 1.6;">univariate chart is not necessarily required. Multivariate charts enable you to routinely monitor many tool process parameters with fewer charts. The objective would be to better understand whether out-of-control points in a multivariate chart may be used to anticipate quality issues as far as the product characteristics are concerned.</span></p>
<p>To better control a process, we need to assess how upstream tool parameters affect the final product. Multivariate charts are also very useful to monitor 3-D measurements. Identifying the reason for an out-of-control point in a multivariate chart is a key aspect of using it successfully.</p>
Data AnalysisQuality ImprovementSix SigmaStatsWed, 25 May 2016 13:02:00 +0000http://blog.minitab.com/blog/applying-statistics-in-quality-projects/a-simple-guide-to-multivariate-control-chartsBruno ScibiliaAre You Putting the Data Cart Before the Horse? Best Practices for Prepping Data for Analysis, ...
http://blog.minitab.com/blog/meredith-griffith/are-you-putting-the-data-cart-before-the-horse-best-practices-for-prepping-data-for-analysis-part-2
<p>Do you recall my “putting the cart before the horse” analogy in <a href="http://blog.minitab.com/blog/meredith-griffith/are-you-putting-the-data-cart-before-the-horse-best-practices-for-prepping-data-for-analysis,-part-1">part 1</a> of this blog series? The comparison is simple.</p>
<p>We all, at times, put the cart before the horse in relatively innocuous ways, such as eating your dessert before you’ve eaten your dinner, or deciding what to wear before you’ve been invited to the party. But performing some tasks in the wrong order, such as running a statistical analysis before you’ve prepared your data, might result in more serious consequences.</p>
<p>Eating your dessert first might merely spoil your appetite for dinner, but performing a statistical analysis on dirty data could have much more serious repercussions—including misleading results, mistaken decisions, or, if you’re lucky enough to catch your mistake before it's too late, costly rework.</p>
<p>Spending quality time with your data up front can prevent you from wasting time and energy on an analysis that either can’t work or can’t be trusted. We began exploring this idea in Part 1 of this best practices series, where I offered some tips for cleaning your data before you <a href="http://blog.minitab.com/blog/statistics-and-quality-improvement/minitab-173-takes-a-quantum-leap-with-data-import">import it into Minitab</a>. The biggest takeaway from <a href="http://blog.minitab.com/blog/meredith-griffith/are-you-putting-the-data-cart-before-the-horse-best-practices-for-prepping-data-for-analysis,-part-1">Part 1</a> is that cleaning your data before you begin an analysis can save time by preventing rework, such as reformatting data or correcting data entry errors, after you’ve already begun the analysis.</p>
<p>So, once our data is clean, what comes next?</p>
Use formatting and highlighting tools to explore and visualize your data
<p>You can use Minitab’s worksheet visualization tools to explore your data. <a href="http://support.minitab.com/en-us/minitab/17/topic-library/minitab-environment/data-and-data-manipulation/conditional-formatting/conditional-formatting-overview/">Conditional formatting</a> in particular brings color to your worksheet and can be used to highlight aspects of your data that you’d like to call attention to quickly.</p>
<p>In our data set, recall that we’ve recorded the amount of time a machine was out of operation, the reason for the machine being down, the shift number during which the machine went down, and the speed of the machine when it went down. Suppose you wish to identify frequently occurring values or points that are out-of-spec or out-of-control. You can use formatting rules to do just that!</p>
<p>In this example, I’ve used one of the statistical rules available in Minitab’s conditional formatting to identify values that are not within spec. Highlighting these values may indicate either a data entry error or be valid cause for investigation, and can help you better understand where to focus your exploration and visualization efforts moving forward.</p>
<p style="margin-left: 40px;"><img alt="Screenshot1" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/dae6c7b7-fc22-4616-9d65-f04909c20ab1/Image/5060fdad6105a8342225682933cfd995/screenshot1.png" style="width: 751px; height: 283px;" /></p>
<p style="margin-left: 40px;"><em>With a simple right-click directly in the Minitab worksheet, you can identify out-of-spec values you may wish to investigate before you begin your analysis.</em></p>
<p>You can also use Cell Properties (available by right-clicking in the worksheet) to highlight individual cells or rows, and add cell comments to draw attention to data that need further investigation, such as out-of-control points, unusual observations, or other data of interest. Rather than removing questionable data right away, you can take note of the data, perhaps by commenting on the cell as a reminder to follow-up. Doing this will keep you from committing the statistically unsound practice of cherry-picking data, and will ensure you handle the data correctly when it comes time to analyze it.</p>
<p style="margin-left: 40px;"><img alt="Screenshot2" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/dae6c7b7-fc22-4616-9d65-f04909c20ab1/Image/caa44251c2b1470a512b63b2d1d8f1c8/screenshot2.png" style="width: 366px; height: 155px;" /></p>
<p style="margin-left: 40px;"><em>In the Minitab worksheet, you can highlight an entire row to easily visualize all variables associated with particular data, or add a cell comment to an out-of-control point for future reference.</em></p>
Use subsets to uncover insights prior to your analysis
<p>Finally, data subsets are a good way to visualize only the data that is relevant to answering your questions. <a href="https://www.minitab.com/en-us/products/minitab/whats-new/">Minitab 17</a> makes it really easy to subset by right-clicking in the worksheet, and allows you to create subsets based on the data you’ve explored and highlighted with conditional formatting.</p>
<p>For example, suppose you want to understand why machines are experiencing downtime so you can address productivity problems. You can use conditional formatting to identify the most frequent reason for a machine’s downtime, and then subset your data based on those formatted rows to understand the relationship the most frequent cause of machine downtime has with other variables.</p>
<p style="margin-left: 40px;"><img alt="Screenshot3" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/dae6c7b7-fc22-4616-9d65-f04909c20ab1/Image/1f765d8799e70bac9b4871ae3bfb2cbf/screenshot3.png" style="width: 682px; height: 177px;" /></p>
<p style="margin-left: 40px;"><em>It’s easy to subset your data in Minitab by right-clicking directly within the worksheet.</em></p>
<p>All of the data cleaning and exploration you’ve seen in the worksheet is just the beginning—but consider how much insight you’ve drawn from your data before you’ve visualized it graphically or formally analyzed it!</p>
<p>Taking the time to clean and explore your data before you begin an analysis is well worth the investment. Doing so will help you better understand and answer key questions about your process, lead to a more efficient analysis as you tackle only the most relevant data for answering your questions, and ultimately yield results you can trust.</p>
Data AnalysisStatistics HelpStatsWed, 25 May 2016 12:00:00 +0000http://blog.minitab.com/blog/meredith-griffith/are-you-putting-the-data-cart-before-the-horse-best-practices-for-prepping-data-for-analysis-part-2Meredith GriffithSee How Easily You Can Do a Box-Cox Transformation in Regression
http://blog.minitab.com/blog/statistics-and-quality-improvement/see-how-easily-you-can-do-a-box-cox-transformation-in-regression
<p><img alt="Translink Ticket Vending Machine found at all train stations in south-east Queensland." src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/5376f9bbd5afc1cf86fd3888d5fb6848/translink_tv_machine.jpg" style="float: right; width: 99px; height: 200px; margin-left: 20px; margin-right: 20px; border-width: 1px; border-style: solid;" />For one reason or another, the response variable in a regression analysis might not satisfy one or more of <a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/violations-of-the-assumptions-for-linear-regression-closing-arguments-and-verdict">the assumptions of ordinary least squares regression</a>. The <a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/violations-of-the-assumptions-for-linear-regression-the-trial-of-lionel-loosefit-day-1">residuals might follow a skewed distribution</a> or the <a href="http://blog.minitab.com/blog/the-statistics-game/checking-the-assumption-of-constant-variance-in-regression-analyses">residuals might curve as the predictions increase</a>. A common solution when problems arise with the assumptions of ordinary least squares regression is to transform the response variable so that the data do meet the assumptions. Minitab makes the transformation simple by including the Box-Cox button. Try it for yourself and see how easy it is!</p>
<p>The government in Queensland, Australia shares <a href="http://translink.com.au/about-translink/reporting-and-publications/public-transport-performance-data" target="_blank">data about the number of complaints</a> about its public transportation service. </p>
<p>I’m going to use the data set titled “Patronage and Complaints.” I’ll analyze the data a bit more thoroughly later, but for now I want to focus on the transformation. The variables in this data set are the date, the number of passenger trips, the number of complaints about a frequent rider card, and the number of other customer complaints. I'm using the range of the data from the week ending July 7th, 2012 to December 22nd 2013. I’m excluding the data for the last week of 2012 because ridership is so much lower compared to other weeks.</p>
<p><span style="line-height: 20.8px;">If you want to follow along, you can download my Minitab <a href="//cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/1a886da629056f3970112c2d241b223a/complaints.mtw">data sheet</a>. If you don't already have it, you can <a href="http://www.minitab.com/products/minitab/free-trial/">download Minitab and use it free for 30 days</a>. </span></p>
<p>Let’s say that we want to use the number of complaints about the frequent rider card as the response variable. The number of other complaints and the date are the predictors. The resulting normal probability plot of the residuals shows an s-curve.</p>
<p><img alt="The residuals do not appear normal." src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/049010d82bc02e0dea0c27d337fbc56b/normplot_of_residuals_for_customer_complaints_on__go_card.png" style="width: 576px; height: 384px;" /></p>
<p>Because we see this pattern, we’d like to go ahead and do the Box-Cox transformation. Try this:</p>
<ol>
<li>Choose <strong>Stat > Regression > Regression > Fit Regression Model</strong>.</li>
<li>In <strong>Responses</strong>, enter the column with the number of complaints on the go card.</li>
<li>In <strong>Continuous Predictors</strong>, enter the columns that contain the other customer complaints and the date.</li>
<li>Click <strong>Options</strong>.</li>
<li>Under <strong>Box-Cox transformation</strong>, select <strong>Optimal λ</strong>.</li>
<li>Click <strong>OK</strong>.</li>
<li>Click <strong>Graphs</strong>.</li>
<li>Select<strong> Individual plots</strong> and check <strong>Normal plot of residuals</strong>.</li>
<li>Click <strong>OK</strong> twice.</li>
</ol>
<p><img alt="The residuals are more normal." src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/56490706c18dfd3444944f7d7073dc30/normplot_of_residuals_for_customer_complaints_on__go_card2.png" style="width: 576px; height: 384px;" /></p>
<p>The probability plot that results is more linear, although it still shows outlying observations where the number of complaints in the response are very high or very low relative to the number of other complaints. You'll still want to check the other regression assumptions, such as <a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/dont-be-a-victim-of-statistical-hippopotomonstrosesquipedaliophobia">homoscedasticity</a>.</p>
<p>So there it is, everything that you need to know to use a Box-Cox transformation on the response in a regression model. Easy, right? Ready for some more? Check out more of <a href="http://blog.minitab.com/blog/adventures-in-statistics/unleash-the-power-of-linear-models-with-minitab-17">the analysis steps that Minitab makes easy</a>.</p>
<span style="color:#a9a9a9;">The image of the Translink vending machine is by Brad Wood </span><span style="color:#a9a9a9;">and is licensed for reuse under this</span> <a href="http://creativecommons.org/licenses/by-sa/3.0/deed.en">Creative Commons License</a>.
<p> </p>
Regression AnalysisMon, 23 May 2016 13:17:00 +0000http://blog.minitab.com/blog/statistics-and-quality-improvement/see-how-easily-you-can-do-a-box-cox-transformation-in-regressionCody SteeleCreating a Fishbone Diagram in Minitab
http://blog.minitab.com/blog/marilyn-wheatleys-blog/creating-a-fishbone-diagram-in-minitab
<p>While many Six Sigma practitioners and other quality improvement professionals like to use the <a href="http://www.minitab.com/en-us/support/videos/?vid=qc3bf">Fishbone diagram in Quality Companion</a> for brainstorming because of its ease of use and integration with other <a href="http://www.minitab.com/en-us/products/quality-companion/">Quality Companion</a> tools, some Minitab users find an infrequent need for a Fishbone diagram. For the more casual user of the Fishbone diagram, Minitab has the right tool to get the job done.</p>
<p>Minitab’s Fishbone (or Cause-and-Effect) diagram can be accessed from the Quality Tools menu:</p>
<p><img border="0" height="482" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/2a2f568c83413e1aeaa5e07e4bc1d595/2a2f568c83413e1aeaa5e07e4bc1d595.png" width="577" /></p>
<p>There are two ways to complete the dialog box and create a Fishbone diagram in Minitab:</p>
<ol>
<li>
<p>By typing the information directly into the Cause-and-Effect dialog window, or</p>
</li>
<li>
<p>By entering the information in the worksheet first and then using the worksheet data to complete the Cause-and-Effect dialog box.</p>
</li>
</ol>
<p>In this post, I’ll walk through examples of how to create a Fishbone diagram using both options, starting with the first option above. Because I’m a baking aficionado, I’ll be using an example related to brainstorming the choice of factors in a cake-baking experiment (where the response is the moisture after baking the cake).</p>
Creating a Fishbone Diagram by Typing Information into the Dialog
<p>First, we’ll start by using the drop-down lists on the left side to tell Minitab that our information is in <strong>Constants</strong> (meaning we will type the information into this dialog box, versus having the data already typed into the worksheet). </p>
<p>For this example, I’ll have four branches in the Fishbone, so I’ve selected <strong>Constants</strong> next to <strong>Branch </strong>1, 2, 3 and 4 below, and then I’ve typed the name of each branch on the right side, under <strong>Label</strong>:</p>
<p style="margin-left: 40px;"><img border="0" height="366" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/66ff3d39a93afc068aced55ecbe20b4f/66ff3d39a93afc068aced55ecbe20b4f.png" width="632" /></p>
<p>As we work through this, we can always click <strong>OK</strong> to see our progress. So far we have:</p>
<p style="margin-left: 40px;"><img border="0" height="354" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/ee1485fa5cf9837995c168badca99180/ee1485fa5cf9837995c168badca99180.png" width="484" /></p>
<p> </p>
<p>To go back to the last dialog to keep entering information, press <strong>Ctrl+E</strong> on the keyboard.</p>
<p>Next, I’ve entered the causes in the empty column in the middle. Note that any individual cause that includes multiple words (for example, Day of Week) must be included in double-quotes: “Day of Week.” Without the double-quotes, Minitab will assign each individual word as a cause. Multiple causes for the same branch are entered with a space between the causes. For example, to enter Ambient Temperature and Ambient Moisture as causes, I’ll enter:</p>
<p>“Ambient Temperature” “Ambient Moisture”</p>
<p>After completing the dialog like in the example below, we can click <strong>OK</strong> again to see our progress:</p>
<p style="margin-left: 40px;"><img border="0" height="307" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/1efa13b18989a3aa81150110b1836e13/1efa13b18989a3aa81150110b1836e13.png" width="980" /></p>
<p>Now I’ve used <strong>Ctrl+E </strong>on my keyboard again to return to the dialog box. As a final step, I’m going to add sub-branches to some of my causes. For this example, two of the causes in the ‘Held constant factors’ branch have sub-branches. To add my sub-branches, I’ll click the <strong>Sub…</strong> button below for that particular branch:</p>
<p style="margin-left: 40px;"><img border="0" height="192" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/efa889195944ab483fd4e6823b9bc37f/efa889195944ab483fd4e6823b9bc37f.png" width="515" /></p>
<p>This will bring up the Sub-Branches dialog. Here the names of each of my causes are automatically listed in the <strong>Labels</strong> column. All I need to do is (1) choose <strong>Constants</strong> from the drop-down list and (2) type in the sub-branch labels. Note that the same double-quote rule for sub-braches with multiple words applies here:</p>
<p style="margin-left: 40px;"><img border="0" height="326" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/a6f1842bba09f215088cc5fb1f0a169f/a6f1842bba09f215088cc5fb1f0a169f.png" width="636" /></p>
<p>After completing the dialog above and clicking <strong>OK</strong> in each window, we can see our final graph:</p>
<p style="margin-left: 40px;"><img border="0" height="466" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/345d57486e824f0d074b65a7dc00fa1c/345d57486e824f0d074b65a7dc00fa1c.png" width="699" /></p>
<strong style="line-height: 1.6;">Creating a Fishbone Diagram by Using Data Entered in the Worksheet</strong>
<p>As a first step, I”ll type in my branch labels, effect, and title for my fishbone diagram:</p>
<p style="margin-left: 40px;"><img border="0" height="337" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/31eb5fb9f7623320902fb4a0b317bd47/31eb5fb9f7623320902fb4a0b317bd47.png" width="585" /></p>
<p><span style="line-height: 1.6;">Now I’ll click </span><strong style="line-height: 1.6;">OK </strong><span style="line-height: 1.6;">to see my progress and go to the worksheet to type in my data like in the example below:</span></p>
<p style="margin-left: 40px;"><img border="0" height="135" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/441d7e16383412d8a669f1d7f98a6953/441d7e16383412d8a669f1d7f98a6953.png" style="line-height: 1.6;" width="682" /></p>
<p>Notice that here we don’t need to include double-quotes for any causes or sub-branches that are described with multiple words. Also, note that the branch titles are still typed into the dialog (so the column titles in the columns above are just for my own reference, because Minitab does not use these column titles).</p>
<p>After entering the data in the worksheet, I can use <strong>Ctrl + E</strong> to go back to the dialog box. This time I’ll leave the default option for the Causes (‘In column’) and I’ll select the columns I want to use for each cause:</p>
<p style="margin-left: 40px;"><img border="0" height="354" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/d95b6bbbb80d89e7d303eec1638f9640/d95b6bbbb80d89e7d303eec1638f9640.png" width="1105" /> </p>
<p><span style="line-height: 1.6;">Now I can click </span><strong style="line-height: 1.6;">OK</strong><span style="line-height: 1.6;"> in each dialog box to show the fishbone diagram, which looks just like the one we generated using the first method:</span></p>
<p style="margin-left: 40px;"><span style="line-height: 1.6;"><img alt="Fishbone Diagram" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/345d57486e824f0d074b65a7dc00fa1c/345d57486e824f0d074b65a7dc00fa1c.png" style="width: 699px; height: 466px;" /></span></p>
Data AnalysisSix SigmaFri, 20 May 2016 12:00:00 +0000http://blog.minitab.com/blog/marilyn-wheatleys-blog/creating-a-fishbone-diagram-in-minitabMarilyn WheatleyUnderstanding Bootstrapping and the Central Limit Theorem
http://blog.minitab.com/blog/the-statistics-game/understanding-bootstrapping-and-the-central-limit-theorem
<p>For hundreds of years, people having been improving their situation by pulling themselves up by their bootstraps. Well, now you can improve your statistical knowledge by pulling yourself up by your bootstraps. <a href="http://www.minitab.com/en-us/products/express/">Minitab Express</a> has 7 different bootstrapping analyses that can help you better understand the sampling distribution of your data. </p>
<p>A sampling distribution describes the likelihood of obtaining each possible value of a statistic from a random sample of a population—in other words, what proportion of all random samples of that size will give that value. Bootstrapping is a method that estimates the sampling distribution by taking multiple samples with replacement from a single random sample. These repeated samples are called resamples. Each resample is the same size as the original sample.</p>
<p>The original sample represents the population from which it was drawn. Therefore, the resamples from this original sample represent what we would get if we took many samples from the population. The bootstrap distribution of a statistic, based on the resamples, represents the sampling distribution of the statistic.</p>
Bootstrapping and Running Backs
<p>For example, let’s estimate the sampling distribution of the number of yards per carry for Penn State’s star running back Saquon Barkley. Going through all 182 of his carries from last season seems daunting, so instead I took a random sample of 49 carries and recorded the number of yards he gained for each one. If you want to follow along, you can get the data I used <a href="//cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/5110d977d9241ebbe2cee476bfe8ae09/runnning_back_data.mtw">here</a>.</p>
<p>Repeated sampling with replacement from these 49 samples mimics what the population might look like. To take a resample, one of the carries is randomly selected from the original sample, the number of yards gained is recorded, and then then that observation is put back into the sample. This is done 49 times (the size of the original sample) to complete a single resample.</p>
<p>To obtain a single resample, in Minitab Express go to <strong>STATISTICS > Resampling > Bootstrapping > 1-Sample Mean</strong>. Enter the column of data in <strong>Sample</strong>, and enter <em>1</em> for number of resamples. The following individual plot represents a single bootstrap sample taken from the original sample.</p>
<p><strong>Note:</strong> Because Minitab Express randomly selects the bootstrap sample, your results will be different.</p>
<p><img alt="Individual Value Plot" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/48b6384c105057500c979c1892d30d75/barkely_1_resample.png" style="width: 578px; height: 389px;" /></p>
<p>The resample is done by sampling with replacement, so the bootstrap sample will usually not be the same as the original sample. To create a bootstrap distribution, you take many resamples. The following histogram shows the bootstrap distribution for 1,000 resamples or our original sample of 49 carries.</p>
<p><img alt="Bootstrap Histogram" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/fc5751c422823e18d25595b6e9cfdc89/barkely_1000_resamples.png" style="width: 578px; height: 389px;" /></p>
<p>The bootstrap distribution is centered at approximately 5.5, which is an estimate of the population mean for Barkley’s yards per carry. The middle 95% of values from the bootstrapping distribution provide a 95% confidence interval for the population mean. The red reference lines represent the interval, so we can be 95% confident the population mean of Barkley’s yards per carry is between approximately 3.4 and 7.8.</p>
Bootstrapping and the Central Limit Theorem
<p>The central limit theorem is a fundamental theorem of probability and statistics. The theorem states that the distribution of the mean of a random sample from a population with finite variance is approximately normally distributed when the sample size is large, regardless of the shape of the population's distribution. Bootstrapping can be used to easily understand <span><a href="http://blog.minitab.com/blog/understanding-statistics/how-the-central-limit-theorem-works">how the central limit theorem works</a></span>.</p>
<p>For example, consider the distribution of the data for Saquon Barkley’s yards per carry.</p>
<p><img alt="Histogram" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/4464318f71d9f48ee3b83d5f06659e1a/histogram_not_normal.png" style="width: 578px; height: 389px;" /></p>
<p>It’s pretty obvious that the data are nonnormal. But now we’ll create a bootstrap distribution of the means of 10 resamples. </p>
<p><img alt="Bootstrap Histogram" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/5135038cc40d70a35ba7ecf9973f89d0/10_resamples.png" style="width: 578px; height: 389px;" /></p>
<p>The distribution of the means is very different from the distribution of the original data. It looks much closer to a normal distribution. This resemblance increases as the number of resamples increases. With 1,000 resamples, the distribution of the mean of the resamples is approximately normal.</p>
<p><img alt="Bootstrap Histogram" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/fc5751c422823e18d25595b6e9cfdc89/barkely_1000_resamples.png" style="width: 578px; height: 389px;" /></p>
<p><em><strong>Note:</strong> Bootstrapping is only available in Minitab Express, which is an introductory statistics package meant for students and university professors.</em></p>
LearningThu, 19 May 2016 12:00:00 +0000http://blog.minitab.com/blog/the-statistics-game/understanding-bootstrapping-and-the-central-limit-theoremKevin RudyUnderstanding Analysis of Variance (ANOVA) and the F-test
http://blog.minitab.com/blog/adventures-in-statistics/understanding-analysis-of-variance-anova-and-the-f-test
<p>Analysis of variance (ANOVA) can determine whether the means of three or more groups are different. ANOVA uses F-tests to statistically test the equality of means. In this post, I’ll show you how ANOVA and F-tests work using a one-way ANOVA example.</p>
<p>But wait a minute...have you ever stopped to wonder why you’d use an analysis of <em>variance</em> to determine whether <em>means</em> are different? I'll also show how variances provide information about means.</p>
<p>As in my posts about <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-t-tests:-1-sample,-2-sample,-and-paired-t-tests" target="_blank">understanding t-tests</a>, I’ll focus on concepts and graphs rather than equations to explain ANOVA F-tests.</p>
What are F-statistics and the F-test?
<p>F-tests are named after its test statistic, F, which was named in honor of Sir Ronald Fisher. The F-statistic is simply a ratio of two variances. Variances are a measure of dispersion, or how far the data are scattered from the mean. Larger values represent greater dispersion.</p>
<img alt="F is for F-test" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/2176eecdb5dee3586bf90f5dc2ca0007/f.gif" style="line-height: 20.8px; margin: 10px 15px; float: right; width: 200px; height: 221px;" />
<p>Variance is the square of the standard deviation. For us humans, standard deviations are easier to understand than variances because they’re in the same units as the data rather than squared units. However, many analyses actually use variances in the calculations.</p>
<p>F-statistics are based on the ratio of mean squares. The term “<a href="http://support.minitab.com/minitab/17/topic-library/modeling-statistics/anova/anova-statistics/understanding-mean-squares/" target="_blank">mean squares</a>” may sound confusing but it is simply an estimate of population variance that accounts for the <a href="http://support.minitab.com/minitab/17/topic-library/basic-statistics-and-graphs/introductory-concepts/basic-concepts/df/" target="_blank">degrees of freedom (DF)</a> used to calculate that estimate.</p>
<p>Despite being a ratio of variances, you can use F-tests in a wide variety of situations. Unsurprisingly, the F-test can assess the equality of variances. However, by changing the variances that are included in the ratio, the F-test becomes a very flexible test. For example, you can use F-statistics and F-tests to <a href="http://blog.minitab.com/blog/adventures-in-statistics/what-is-the-f-test-of-overall-significance-in-regression-analysis" target="_blank">test the overall significance for a regression model</a>, to compare the fits of different models, to test specific regression terms, and to test the equality of means.</p>
Using the F-test in One-Way ANOVA
<p>To use the F-test to determine whether group means are equal, it’s just a matter of including the correct variances in the ratio. In one-way ANOVA, the F-statistic is this ratio:</p>
<p style="margin-left: 40px;"><strong>F = variation between sample means / variation within the samples</strong></p>
<p>The best way to understand this ratio is to walk through a one-way ANOVA example.</p>
<p>We’ll analyze four samples of plastic to determine whether they have different mean strengths. You can download the <a href="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/File/a8a9c678090ccac0f3be61be91cf8012/plasticstrength.mtw">sample data</a> if you want to follow along. (If you don't have Minitab, you can download a <a href="http://www.minitab.com/en-us/products/minitab/free-trial/" target="_blank">free 30-day trial</a>.) I'll refer back to the one-way ANOVA output as I explain the concepts.</p>
<p>In Minitab, choose <strong>Stat > ANOVA > One-Way ANOVA...</strong> In the dialog box, choose "Strength" as the response, and "Sample" as the factor. Press OK, and Minitab's Session Window displays the following output: </p>
<p style="margin-left: 40px;"><img alt="Output for Minitab's one-way ANOVA" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/42587221b52ed940d53478106c134ebc/1way_swo.png" style="width: 315px; height: 322px;" /></p>
Numerator: Variation Between Sample Means
<p>One-way ANOVA has calculated a mean for each of the four samples of plastic. The group means are: 11.203, 8.938, 10.683, and 8.838. These group means are distributed around the overall mean for all 40 observations, which is 9.915. If the group means are clustered close to the overall mean, their variance is low. However, if the group means are spread out further from the overall mean, their variance is higher.</p>
<p>Clearly, if we want to show that the group means are different, it helps if the means are further apart from each other. In other words, we want higher variability among the means.</p>
<p>Imagine that we perform two different one-way ANOVAs where each analysis has four groups. The graph below shows the spread of the means. Each dot represents the mean of an entire group. The further the dots are spread out, the higher the value of the variability in the numerator of the F-statistic.</p>
<p style="margin-left: 40px;"><img alt="Dot plot that shows high and low variability between group means" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/f9a100946675098ca09c4440a7907230/group_means_dot_plot.png" style="width: 576px; height: 86px;" /></p>
<p>What value do we use to measure the variance between sample means for the plastic strength example? In the one-way ANOVA output, we’ll use the adjusted mean square (Adj MS) for Factor, which is 14.540. Don’t try to interpret this number because it won’t make sense. It’s the sum of the squared deviations divided by the factor DF. Just keep in mind that the further apart the group means are, the larger this number becomes.</p>
Denominator: Variation Within the Samples
<p>We also need an estimate of the variability within each sample. To calculate this variance, we need to calculate how far each observation is from its group mean for all 40 observations. Technically, it is the sum of the squared deviations of each observation from its group mean divided by the error DF.</p>
<p>If the observations for each group are close to the group mean, the variance within the samples is low. However, if the observations for each group are further from the group mean, the variance within the samples is higher.</p>
<p style="margin-left: 40px;"><img alt="Plot that shows high and low variability within groups" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/9ef2eae1cf6bba97ccb1b664356d0d0a/within_group_dplot.png" style="width: 576px; height: 384px;" /></p>
<p>In the graph, the panel on the left shows low variation in the samples while the panel on the right shows high variation. The more spread out the observations are from their group mean, the higher the value in the denominator of the F-statistic.</p>
<p>If we’re hoping to show that the means are different, it's good when the within-group variance is low. You can think of the within-group variance as the background noise that can obscure a difference between means.</p>
<p>For this one-way ANOVA example, the value that we’ll use for the variance within samples is the Adj MS for Error, which is 4.402. It is considered “error” because it is the variability that is not explained by the factor.</p>
The F-Statistic: Variation Between Sample Means / Variation Within the Samples
<p>The F-statistic is the <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/hypothesis-tests/basics/what-is-a-test-statistic/" target="_blank">test statistic</a> for F-tests. In general, an F-statistic is a ratio of two quantities that are expected to be roughly equal under the null hypothesis, which produces an F-statistic of approximately 1.</p>
<p>The F-statistic incorporates both measures of variability discussed above. Let's take a look at how these measures can work together to produce low and high F-values. Look at the graphs below and compare the width of the spread of the group means to the width of the spread within each group.</p>
<img alt="Graph that shows sample data that produce a low F-value" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/a8faab4bb32bf1a1f5864d34d96e8d56/low_f_dplot.png" style="width: 350px; height: 233px;" />
<img alt="Graph that shows sample data that produce a high F-value" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/054b86eb1e48803baba2cff9c78028ab/high_f_dplot.png" style="width: 350px; height: 233px;" />
<p>The low F-value graph shows a case where the group means are close together (low variability) relative to the variability within each group. The high F-value graph shows a case where the variability of group means is large relative to the within group variability. In order to reject the null hypothesis that the group means are equal, we need a high F-value.</p>
<p>For our plastic strength example, we'll use the Factor Adj MS for the numerator (14.540) and the Error Adj MS for the denominator (4.402), which gives us an F-value of 3.30.</p>
<p>Is our F-value high enough? A single F-value is hard to interpret on its own. We need to place our F-value into a larger context before we can interpret it. To do that, we’ll use the F-distribution to calculate probabilities.</p>
F-distributions and Hypothesis Testing
<p>For one-way ANOVA, the ratio of the between-group variability to the within-group variability follows an <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/probability-distributions-and-random-data/distributions/f-distribution/" target="_blank">F-distribution</a> when the null hypothesis is true.</p>
<p>When you perform a one-way ANOVA for a single study, you obtain a single F-value. However, if we drew multiple random samples of the same size from the same population and performed the same one-way ANOVA, we would obtain many F-values and we could plot a distribution of all of them. This type of distribution is known as a <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/introductory-concepts/basic-concepts/sampling-distribution/" target="_blank">sampling distribution</a>.</p>
<p>Because the F-distribution assumes that the null hypothesis is true, we can place the F-value from our study in the F-distribution to determine how consistent our results are with the null hypothesis and to calculate probabilities.</p>
<p>The probability that we want to calculate is the probability of observing an F-statistic that is at least as high as the value that our study obtained. That probability allows us to determine how common or rare our F-value is under the assumption that the null hypothesis is true. If the probability is low enough, we can conclude that our data is inconsistent with the null hypothesis. The evidence in the sample data is strong enough to reject the null hypothesis for the entire population.</p>
<p>This probability that we’re calculating is also known as the p-value!</p>
<p>To plot the F-distribution for our plastic strength example, I’ll use Minitab’s <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/graphs/graphs-of-distributions/probability-distribution-plots/probability-distribution-plot/" target="_blank">probability distribution plots</a>. In order to graph the F-distribution that is appropriate for our specific design and sample size, we'll need to specify the correct number of DF. Looking at our one-way ANOVA output, we can see that we have 3 DF for the numerator and 36 DF for the denominator.</p>
<p><img alt="Probability distribution plot for an F-distribution with a probability" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/6303a2314437d8fcf2f72d9a56b1293a/f_distribution_probability.png" style="width: 576px; height: 384px;" /></p>
<p>The graph displays the distribution of F-values that we'd obtain if the null hypothesis is true and we repeat our study many times. The shaded area represents the probability of observing an F-value that is at least as large as the F-value our study obtained. F-values fall within this shaded region about 3.1% of the time when the null hypothesis is true. This probability is low enough to reject the null hypothesis using the common <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests:-significance-levels-alpha-and-p-values-in-statistics" target="_blank">significance level</a> of 0.05. We can conclude that not all the group means are equal.</p>
<p><a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-correctly-interpret-p-values" target="_blank">Learn how to correctly interpret the p-value.</a></p>
Assessing Means by Analyzing Variation
<p>ANOVA uses the F-test to determine whether the variability between group means is larger than the variability of the observations within the groups. If that ratio is sufficiently large, you can conclude that not all the means are equal.</p>
<p><span style="line-height: 20.8px;">This brings us back to why we analyze variation to make judgments about means. </span>Think about the question: "Are the group means different?" You are implicitly asking about the variability of the means. After all, if the group means <em>don't </em>vary, or don't vary by more than random chance allows, then you can't say the means are different. And that's why you use analysis of variance to test the means.</p>
Data AnalysisHypothesis TestingLearningStatistics HelpWed, 18 May 2016 12:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/understanding-analysis-of-variance-anova-and-the-f-testJim FrostAn Overview of Discriminant Analysis
http://blog.minitab.com/blog/starting-out-with-statistical-software/an-overview-of-discriminant-analysis
<p>Among the most underutilized statistical tools in Minitab, and I think in general, are multivariate tools. Minitab offers a number of different multivariate tools, including principal component analysis, factor analysis, <span><a href="http://blog.minitab.com/blog/quality-data-analysis-and-statistics/cluster-analysis-tips-part-2">clustering</a></span>, and more. In this post, my goal is to give you a better understanding of the multivariate tool called discriminant analysis, and how it can be used.</p>
<p>Discriminant analysis is used to classify observations into two or more groups if you have a sample with known groups. Essentially, it's a way to handle a classification problem, where two or more groups, clusters, populations are known up front, and one or more new observations are placed into one of these known classifications based on the measured characteristics. Discriminant analysis can also used to investigate how variables contribute to group separation.</p>
<p>An area where this is especially useful is species classification. We'll use that as an example to explore how this all works. If you want to follow along and you don't already have Minitab, you can get it <a href="http://www.minitab.com/products/minitab/free-trial/">free for 30 days</a>. </p>
Discriminant Analysis in Action
<img alt="Arctic wolf" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/43484b551c0cc2eacb1b848678d666be/wolf.jpg" style="line-height: 20.8px; margin: 10px 15px; float: right; width: 241px; height: 300px;" />
<div>
<p>I have a <a href="//cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/9429cbd678e906f6bbbda0793aa859f6/discrimdata.mtw">data set</a> with variables containing data on both Rocky Mountain and Arctic wolves. We already know which species each observation belongs to; the main goal of this analysis is find out how the data we have contribute to the groupings, and then to use this information to help us classify new individuals. </p>
<p>In Minitab, we set up our worksheet to be column-based like usual. We have a column denoting the species of wolf, as well as 9 other columns containing measurements for each individual on a number of different features.</p>
<p>Once we have our continuous predictors and a group identifier column in our worksheet, we can go to <strong>Stat > Multivariate > Discriminant Analysis</strong>. Here's how we'd fill out the dialog:</p>
<p style="margin-left: 40px;"><img alt="dialog" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/732ead34-1005-4470-b034-d7f8b87fabcf/Image/bbfff731ce2f30923c064a73324dba1e/discrimdia.png" style="width: 448px; height: 336px;" /></p>
<p>'Groups' is where you would enter the column that contains the data on which group the observation falls into. In this case, "Location" is the species ID column. Our predictors, in my case X1-X9, represent the measurements of the individual wolves for each of 9 categories; we'll use these to determine which characteristics determine the groupings.</p>
<p>Some notes before we click OK. First, we're using a Linear discriminant function for simplicity. This makes the assumption that the covariance matrices are equal for all groups. This is something we can verify using Bartlett's Test (also available in Minitab). Once we have our dialog filled out, we can click OK and see our results.</p>
Using the Linear Discriminant Function to Classify New Observations
<p>One of the most important parts of the output we get is called the Linear Discriminant Function. In our example, it looks like this:</p>
<p style="margin-left: 40px;"><img alt="function" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/732ead34-1005-4470-b034-d7f8b87fabcf/Image/a3f3b5199c25010c69d3b19843c31b0e/function.PNG" style="width: 303px; height: 208px;" /></p>
<p>This is the function we will use to classify new observations into groups. Using this function, we can use these coefficients to determine which group provides the best fit for a new individual's measurements. Minitab can do this in the "Options" subdialog. For example, let's say we had an observation with a certain vector of measurements (X1,...,X9). If we do that, we get output like this:</p>
<p style="margin-left: 40px;"><img alt="pred" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/732ead34-1005-4470-b034-d7f8b87fabcf/Image/49873dcbc94d8aa1ae75a45474aaf147/predic.PNG" style="width: 421px; height: 119px;" /></p>
<p>This will give us the probability that a particular new observation falls into either of our groups. In our case, it was an easy one. The probability that is belongs to the AR species was 1. We're reasonably sure, based on the data, that this is the case. In some cases, you may get probabilities much closer to each other, meaning it isn't as clear cut.</p>
<p>I hope this gives you some idea of the usefulness of discriminant analysis, and how you can use it in Minitab to make decisions.</p>
</div>
Data AnalysisHypothesis TestingStatisticsMon, 16 May 2016 12:00:00 +0000http://blog.minitab.com/blog/starting-out-with-statistical-software/an-overview-of-discriminant-analysisEric HeckmanIs Stephen Curry the Best NBA Point Guard Ever? Let's Check the Data
http://blog.minitab.com/blog/statistics-in-the-field/is-stephen-curry-the-best-nba-point-guard-ever-lets-check-the-data
<p><em>by Laerte de Araujo Lima, guest blogger </em></p>
<p>The NBA's 2015-16 season will be one for the history books. Not only was it the last season of <a href="http://www.nba.com/lakers/news/160413_kobepresser">Kobe Bryan</a>, who scored 60 points in his final game, but the Golden State Warriors set <a href="http://www.nba.com/news/2015-16-golden-state-warriors-chase-1995-96-chicago-bulls-all-time-wins-record/">a new wins record</a>, beating the previous record set by 1995-96 Chicago Bulls.</p>
<p><img alt="stephen curry" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/25a3dc0f9e9c615fae1224259b7c0c6f/320px_stephen_curry_vs_washington_2016_1_.jpg" style="width: 320px; height: 216px; margin: 10px 15px; float: right;" />The Warriors seem likely to take this season's NBA title, in large part thanks to the performance of point guard <a href="http://www.nba.com/playerfile/stephen_curry/">Stephen Curry</a>. A lot of my friends are even saying Curry's skill and performance make him the best point guard ever in NBA history—but it is true? Curry’s performance is amazing, and he's the key element of Warriors’ success, but it seems a little early to define him as the best NBA point guard <em>ever</em>. But in the meantime, we can use data to answer another question:</p>
<p>Has any other point guard in NBA history matched Stephen Curry’s performance during their initial seven seasons?</p>
<p>As a fan of both basketball and Six Sigma, I set out to answer this question methodically, following these steps:</p>
1. Define the Sample of Point Guards for the Study
<p>ESPN recently published <a href="http://espn.go.com/nba/story/_/page/nbarankPGs/ranking-top-10-point-guards-ever">their list</a> of the 10 best NBA point guards, which puts Magic Johnson first and Curry fourth. ESPN considers both objective factors (NBA titles, MVP nominations, etc.) and subjective parameters (player vision, charisma, team engagement, etc.) to compare players. In keeping with Six Sigma, I want my analysis to be based on figure and facts; however, ESPN's list makes a good starting point. Here are their rankings:</p>
<ol>
<li>Magic Johnson</li>
<li>Oscar Robertson</li>
<li>John Stockton</li>
<li>Stephen Curry</li>
<li>Isiah Thomas</li>
<li>Chris Paul</li>
<li>Steve Nash</li>
<li>Jason Kidd</li>
<li>Walt Frazier</li>
<li>Bob Cousy</li>
</ol>
2. Define the Data Source
<p>This is the easiest part of the job. The NBA web site is a rich source of data, so we are going to use it to check the regular-season performances of each player in ESPN's list. This makes the data average well balanced among all players, because we are going to use the same number of matches per player per season.</p>
3. Define the Critical-to-Quality (CTQ) Factors
<p>In my opinion, the following CTQ factors (based on NBA standards criteria) best characterize point guard performance and how they add value to the team's main target—winning a game:</p>
<p style="text-align: center; margin: 5px 25px;"><strong>CTQ </strong></p>
<p style="text-align: center; margin: 5px 25px;"><strong>CTQ Definition</strong></p>
<p style="text-align: center; margin: 5px 25px;"><strong>Rationale</strong></p>
<p style="text-align: center; margin: 5px 25px;"><strong>PTS</strong></p>
<p style="text-align: center; margin: 5px 25px;">Average points per game</p>
<p style="text-align: center; margin: 5px 25px;">Impact of the player on the overall score makes a positive contribution to winning the game.</p>
<p style="text-align: center; margin: 5px 25px;"><strong>FG%</strong></p>
<p style="text-align: center; margin: 5px 25px;">Percentage of successful field goals</p>
<p style="text-align: center; margin: 5px 25px;">Player efficiency in shooting makes a positive contribution to winning the game.</p>
<p style="text-align: center; margin: 5px 25px;"><strong>3P%</strong></p>
<p style="text-align: center; margin: 5px 25px;">Percentage of successful 3-point field goals</p>
<p style="text-align: center; margin: 5px 25px;">Player efficiency in the 3-point line shoot makes a positive contribution to winning the game.</p>
<p style="text-align: center; margin: 5px 25px;"><strong>FT%</strong></p>
<p style="text-align: center; margin: 5px 25px;">Percentage of successful free-throw field goals</p>
<p style="text-align: center; margin: 5px 25px;">Player efficiency in the free throw makes a positive contribution to winning the game.</p>
<p style="text-align: center; margin: 5px 25px;"><strong>AST</strong></p>
<p style="text-align: center; margin: 5px 25px;">Average assistance per game</p>
<p style="text-align: center; margin: 5px 25px;">Assisting teammates makes a positive contribution to winning the game.</p>
<p style="text-align: center; margin: 5px 25px;"><strong>STL</strong></p>
<p style="text-align: center; margin: 5px 25px;">Average steal per game</p>
<p style="text-align: center; margin: 5px 25px;">New ball possession and counterattacks make a positive contribution to winning the game.</p>
<p style="text-align: center; margin: 5px 25px;"><strong>MIN</strong></p>
<p style="text-align: center; margin: 5px 25px;">Average minutes player per game</p>
<p style="text-align: center; margin: 5px 25px;">Player's strategic importance to the team.<br />
Positive contribution to team strategy.</p>
<p style="text-align: center; margin: 5px 25px;"><strong>GS</strong></p>
<p style="text-align: center; margin: 5px 25px;">Games per season where player is part of the initial 5.</p>
<p style="text-align: center; margin: 5px 25px;">Initial starts indicate importance in terms of strategy, as well as fewer injuries.</p>
<p>With the players, critical factors, and the source of data defined, let's dig into the analysis.</p>
4. Ranking Criteria and Methodology
<p>When I opened Minitab Statistical Software to begin looking at each player's average for each CTQ factor, I faced the first challenge in the analysis. Some players did not have the same CTQ measurements in the NBA database. They had played in the NBA’s early years, and the statistics for all CTQ factors weren't available (for example, the 3-point shot didn't exist at the time some players were active). Consequently, I decided to exclude those players from the analysis to avoid discrepancy in the data. That leaves us with this short list:</p>
<ol>
<li>Magic Johnson</li>
<li>John Stockton</li>
<li>Stephen Curry</li>
<li>Isiah Thomas</li>
<li>Chris Paul</li>
<li>Steve Nash</li>
<li>Jason Kidd</li>
</ol>
To compare these players, I used the statistical tool called Analysis of Variance (ANOVA). ANOVA tests the hypothesis that the means of two or more populations are equal. An ANOVA evaluates the importance of one or more factors by comparing the response variable means at the different factor levels. The null hypothesis states that all means are equal, while the alternative hypothesis states that at least one is different.
<p>For this analysis, I used the <a href="http://www.minitab.com/products/minitab/assistant/">Assistant</a> in Minitab to perform One-Way ANOVA analysis. To access this tool, select <strong>Assistant > Hypothesis Tests...</strong> and choose One-Way ANOVA.</p>
<p style="margin-left: 80px;"><img alt="The Assistant in Minitab" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/42afcc329cd4cc74808be92ee49931d9/image001.jpg" style="width: 529px; height: 329px;" /></p>
<p>By performing one-way ANOVA for each of the factors, I can position the players based on the average values of their CTQ variables during each of their first seven seasons. After compiling all results, I deployed a <a href="http://asq.org/learn-about-quality/decision-making-tools/overview/decision-matrix.html">Decision Matrix</a> (another Six Sigma tool) to assess all the players, based on the ANOVA results. The ultimate goal is to determine if Curry’s average performance is superior, inferior, or equal to that of the other players.</p>
<p>Let's take a look at the results of the ANOVA results for the individual CTQ factors.</p>
Average Points per Game (PPG)
<p style="margin-left: 40px;"><img alt="Average Points Per Game" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/bf37da89d5aca795b00a58125d00c9db/image002.gif" style="width: 624px; height: 468px;" /></p>
<p>The Assistant's output is designed to be very easy to understand. The blue bar at the top left answers the bottom-line question, "Do the means differ?" The p-value (0,001) is less than the threshold (< 0.05), telling us that there is a statistically significant difference in means. The intervals displayed on the Means Comparison Chart indicate that Curry and Nash both had huge variation in their average points-per-game in the first 7 years. Statistically speaking, the only player with a average PPG performance that was significantly different from Curry’s is Kidd; all the others had similar performance in their first 7 seasons.</p>
Percentage of Field Goals per Game (FG%)
<p style="margin-left: 40px;"><img alt="FG% ANOVA Results" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/6ce64723b8efe750fcb163443aef7fed/image003.gif" style="width: 624px; height: 468px;" /></p>
<p>As in the previous analysis, the p-value (0,001) is less than the threshold (< 0.05), telling us that there is a difference in means. However, the interpretation of analysis is clearer. In terms of statistical significance, Curry’s performance is better than Kidd's (again), but not better than Magic's, and it is similar to that of the all other players.</p>
<p>Again, we see that Nash has tremendous variation in his field-goal percentage, and Kidd exhibits the worst average FG% among these players.</p>
Average Percentage of 3-point Field Goals per Game (3P%)
<p style="margin-left: 40px;"><img alt="3P% ANOVA" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/b997d1cc5da7397b9a7f3af002e44f89/image004.gif" style="width: 624px; height: 468px;" /></p>
<p>To my surprise, based on this comparison chart Magic has the <em>worst </em>performance—and the most variation— among the players for this factor. On the other hand, Curry has an extremely high average performance, with small variation, and this is what we see in the Warriors games.</p>
<p>If we take a closer look at the three highest performers in this category, Nash, Stockton, and Curry, we see that Nash and Curry’s performances are slightly different. Interestingly, the variation in Stockton's data prevents us from being able to conclude that statistically significant difference exists between his average and those of Curry <em>or </em>Nash.</p>
<p style="margin-left: 40px;"><img alt="3P% ANOVA for Curry, Nash, Stockton" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/93c4b44ac1c49bb7a84571a60567b798/image005.gif" style="width: 624px; height: 468px;" /></p>
<p>As happens in many Six Sigma projects, the results of this factor contradict conventional wisdom: how could Magic Johnson have the lowest average for this factor? I decided to dig a little bit deeper into Magic’s data using the Assistant's Diagnostic Report, which offers a better view of the data's distribution. we can see an outlier in Magic's data. According to this analysis, he actually had a season with 0% of 3-point field goals!</p>
<p style="margin-left: 40px;"><img alt="3PT% Diagnostic Report" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/70ad9430dc5f4ee5e2c98a21eafb7d8b/image007.png" style="width: 623px; height: 467px;" /></p>
<p>I could not believe this, so I double-checked the data at the source. To my surprise, it was correct:</p>
<p style="margin-left: 40px;"><img alt="Magic 0.0" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/9d096d542a23ab383658316271a5af5b/image009.png" style="width: 624px; height: 346px; border-width: 1px; border-style: solid;" /></p>
Average Percentage of Free-Throw Field Goals per Game (FT%)
<p style="margin-left: 40px;"><img alt="FT% ANOVA Output" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/bd00d9293ba2bd6e45cae5307b7adea4/image010.gif" style="width: 624px; height: 468px;" /></p>
<p>In the free throw analysis, Curry's performance is similar to that of Nash and Paul, all of whom performed better than the other players. Once again, Kidd (whom I have nothing against!) has the worst performance.</p>
Average Assistance per Game (AST)
<p style="margin-left: 40px;"><img alt="AST% ANOVA Output" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/100e59e07307222bc73255c030e00316/image011.gif" style="width: 624px; height: 468px;" /></p>
<p>For this factor, both Nash and Curry are at the end of the queue with similar performance. For this factor, it's also clear that while Stockton has both the highest average and small variation in his performance, he's still comparable with Isiah and Magic.</p>
Average Steals per Game (STL)
<p style="margin-left: 40px;"><img alt="STL ANOVA Output" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a1450e5f5b2883a49ad8aa5e7941e2c0/image012.gif" style="width: 624px; height: 468px;" /></p>
<p>Again, the p-value (0,001) is less than the threshold (< 0.05), telling us that there is a statistically significant difference in means. It is clear clear that Nash is not a big “stealer” when compared with the other players. It's interesting to see that Curry’s mean performance is better than Nash's and worse than Paul's, but is not statistically significantly different from the mean performance of the remaining players.</p>
Minutes Played per Game (MIN)
<p style="margin-left: 40px;"><img alt="MIN ANOVA Output" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/33b90915031366f591dd73b52e092971/image013.gif" style="width: 624px; height: 468px;" /></p>
<p>For the first time, the ANOVA results have a p-value (0.075) greater than the threshold (< 0.05), telling us that there is no statistically significant difference in means. It is clear that Nash's performance has huge variation, indicating that his contribution was very irregular in the first 7 season (perhaps due to injuries, adaptation, etc.). The amount of variation in Curry's performance follows Nash's.</p>
Games Started in the Initial 5 per Season (GS)
<p style="margin-left: 40px;"><img alt="Initial 5 ANOVA Output" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/dcb0afca98d1ed138d3d33c07f2f0d7e/image014.gif" style="width: 624px; height: 468px;" /></p>
<p>For this final CTQ, we can see that the p-value (0.006) is less than the threshold (< 0.05), indicating that the means are different. In this case, Stockton and Kidd's means differ. Curry’s presence in the initial 5 in the first 7 season is not statistically significantly different from that of any other other palyers.</p>
<p>Let's take a look at the Diagnostic Report. We can see that Stockton's performance in this CTQ is incredible—he started all seasons' games in the initial 5, showing his importance to the team</p>
<p style="margin-left: 40px;"><img alt="Initial 5 ANOVA Diagnostic Report" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/767e1328424e2c3b332cd5c612d41924/image015.gif" style="width: 624px; height: 468px;" /></p>
Conclusion
<p>Based on the analyses of these criteria, we now have a final have the final outlook based purely on the data. We can use Minitab's <a href="https://blog.minitab.com/blog/statistics-and-quality-improvement/automatically-update-your-conditional-formatting">conditional formatting</a> to highlight the differences between players for the different factors (<strong>></strong> means "better than", <strong><</strong> means "worse than", and = means similar).</p>
<p style="margin-left: 40px;"><img alt="Final Outlook - Condition Formatting" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/12c216328e4bf5214e2693f7195cd3c8/image015.png" style="width: 604px; height: 202px;" /></p>
From the analysis, we can conclude that
<ul>
<li>Considering all of the CTQs, Curry’s overall performance is not better than any other point guard in the study, although he does stand out for some individual factors.</li>
<li>Curry’s PTS is superior only to Kidd's.</li>
<li>In terms of shot efficiency, Curry’s FG% is better than Kidd's but inferior to Magic's, and at the same level as all other players.</li>
<li>Curry’s 3-point performance is amazing, but this analysis shows Stockton’s at the same level.</li>
<li>On the other hand, Curry's FT% is better than that of all the other players, except Paul and Nash.</li>
<li>Curry’s assistance per season is inferior to all other point guards, except Nash.</li>
<li>For steals, Curry’s mean performance is better than Nash's, worse than Paul's, and not statistically significantly different from the remaining players.</li>
<li>In terms of MIN and GS, Curry's performance is similar to that of the other players.</li>
<li>If we just compare points-per-game (PTS) and shot efficiency (FG%,FT%,3P%) separately, Curry’s overall performance is better than Kidd's, for sure. But if we compare the other CTQ (AST, STL, MIN,GS) factors in the same way, Chris Paul has better performance than Curry.</li>
</ul>
<p>Based on this analysis, perhaps we need a few more seasons' worth of data to compare these players overall performance and reach a more certain conclusion.</p>
<p> </p>
<p><strong>About the Guest Blogger: </strong></p>
<p><em>Laerte de Araujo Lima is a Supplier Development Manager for Airbus (France). He has previously worked as product quality engineer for Ford (Brazil), a Project Manager in MGI Coutier (Spain), and Quality Manager in IKF-Imerys (Spain). He earned a bachelor's degree in mechanical engineering from the University of Campina Grande (Brazil) and a master's degree in energy and sustainability from the Vigo University (Spain). He has 10 years of experience in applying Lean Six Sigma to product and process development/improvement. To get in touch with Laerte, please follow him on Twitter @laertelima or on</em> <a href="http://www.linkedin.com/pub/laerte-lima/7/46b/443" target="_blank"><strong><em>LinkedIn</em></strong></a><em>.</em></p>
<p> </p>
<p style="font-size:11px;"><em>Photo of Stephen Curry by <a href="https://www.flickr.com/people/27003603@N00">Keith Allison</a>, used under Creative Commons 2.0. </em></p>
Fun StatisticsStatistics in the NewsFri, 13 May 2016 12:00:00 +0000http://blog.minitab.com/blog/statistics-in-the-field/is-stephen-curry-the-best-nba-point-guard-ever-lets-check-the-dataGuest BloggerTests of 2 Standard Deviations? Side Effects May Include Paradoxical Dissociations
http://blog.minitab.com/blog/data-analysis-and-quality-improvement-and-stuff/tests-of-2-standard-deviations-side-effects-may-include-paradoxical-dissociations
<p>Once upon a time, when people wanted to compare the standard deviations of two samples, they had two handy tests available, the F-test and Levene's test.</p>
<p>Statistical lore has it that the F-test is so named because <a href="##footnote">it so frequently fails you.1</a> Although the F-test is suitable for data that are normally distributed, its sensitivity to departures from <span><a href="http://blog.minitab.com/blog/the-statistical-mentor/anderson-darling-ryan-joiner-or-kolmogorov-smirnov-which-normality-test-is-the-best">normality</a></span> limits when and where it can be used.</p>
<p><a name="#back"></a>Levene’s test was developed as an antidote to the F-test's extreme sensitivity to nonnormality. However, Levene's test<span style="line-height: 1.6;"> is sometimes accompanied by a troubling side effect: paradoxical </span>dissociations<span style="line-height: 1.6;">. To see what I mean, take a look at these results from an </span><span style="line-height: 1.6;">actual </span><span style="line-height: 1.6;">test of 2 standard deviations that I actually ran in Minitab 16 using actual data that I actually made up:</span></p>
<p style="margin-left: 40px;"><img alt="Ratio of the standard deviations in Release 16" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/313db9f57725eeb074002df423c4415e/16_ratio.jpg" style="width: 286px; height: 99px;" /></p>
<p>Nothing surprising so far. The ratio of the standard deviations from samples 1 and 2 (s1/s2) is <span style="line-height: 20.8px;">1.414 / 1.575 = 0.898. This ratio is </span>our best "point estimate" for the ratio of the standard deviations from populations 1 and 2 (Ps1/Ps2).</p>
<p>Note that the ratio is less than 1, which suggests that Ps2 is greater than Ps1. </p>
<p>Now, let's have a look at the confidence interval (CI) for the population ratio. The CI gives us a range of likely values for the ratio of Ps1/Ps2. The CI <span style="line-height: 20.8px;">below</span><span style="line-height: 1.6;"> labeled "Continuous" is the one calculated using Levene's method:</span></p>
<p style="margin-left: 40px;"><img alt="Confidence interval for the ratio in Release 16" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/aee886880d52d5aed7150abd242b5d61/16_ci.jpg" style="width: 338px; height: 114px;" /></p>
<p><span style="line-height: 1.6;">What in Gauss' name is going on here?!? The range of likely values for Ps1/Ps2—1.046 to 1.566—doesn't include the point estimate of 0.898?!? In fact, the CI suggests that Ps1/Ps2 is </span><em style="line-height: 1.6;">greater </em><span style="line-height: 1.6;">than 1. Which suggests that Ps1 is actually </span><em style="line-height: 20.8px;">greater </em><span style="line-height: 1.6;">than Ps2. </span></p>
<p><span style="line-height: 1.6;">But the point estimate suggests the exact opposite! Which suggests that </span><span style="line-height: 20.8px;">something odd is going on here. Or that</span><span style="line-height: 1.6;"> I might be losing my mind (which wouldn't be that odd). Or both.</span></p>
<p>As it turns out, the very elements that make Levene's test robust to departures from normality also leave the test susceptible to paradoxical dissociations like this one. You see, Levene's test isn't <em>actually </em>based on the standard deviation. Instead, the test is based on a statistic called the <em>mean absolute deviation from the median</em>, or MADM. The MADM is much less affected by nonnormality and outliers than is the standard deviation. And even though the MADM and the <span style="line-height: 20.8px;">standard deviation of a sample </span>can be very different, the <em>ratio </em>of MADM1/MADM2 is nevertheless a good approximation for the <em>ratio </em>of Ps1/Ps2. </p>
<p><span style="line-height: 1.6;">However, in extreme cases, outliers can affect the sample standard deviations so much that s1/s2 can fall completely outside of Levene's CI. And that's when you're left with an awkward and confusing case of paradoxical dissociation. </span></p>
<p><span style="line-height: 1.6;">Fortunately (and this may be the first and last time that you'll ever hear this next phrase), our </span><span style="line-height: 1.6;">statisticians have made things a lot less awkward. </span><span style="line-height: 1.6;">One of the brave folks in Minitab's R&D department toiled against all odds, and at considerable personal peril to solve this enigma. The result, which has been incorporated into Minitab 17, is an effective, elegant, and </span>non-enigmatic<span style="line-height: 1.6;"> test that we call Bonett's test. </span></p>
<p style="margin-left: 40px;"><span style="line-height: 1.6;"><img alt="Confidence interval in Release 17" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/3c014cdea970a3f1f6a540119ef3b533/bonnet_results.jpg" style="width: 310px; height: 170px;" /></span></p>
<p>Like Levene's test, Bonett's test can be used with nonnormal data. But <em>unlike </em>Levene's test, Bonett's test is actually based on the actual standard deviations of the actual samples. Which means that Bonett's test is not subject to the same awkward and confusing paradoxical dissociations that can accompany Levene's test. And I don't know about you, but I try to avoid paradoxical dissociations whenever I can. (Especially as I get older, ... I just don't bounce back the way I used to.) </p>
<p><span style="line-height: 20.8px;">When you compare two standard deviations in Minitab 17, you get a handy graphical report </span><span style="line-height: 20.8px;">that quickly and clearly summarizes the results of your test, including the point estimate and the CI from Bonett's test. Which means n</span><span style="line-height: 20.8px;">o more awkward and confusing paradoxical dissociations. </span></p>
<p style="margin-left: 40px;"><img alt="Summary plot in Release 17" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/8de770ba-a50a-4f6b-9144-9713c3b99f66/Image/b785749b3292df1aa6d32abe4e430b63/17_summary_plot.jpg" style="width: 578px; height: 386px;" /></p>
<p><span style="line-height: 1.6;">------------------------------------------------------------</span></p>
<p><a name="#footnote"> </a></p>
<p>1 So, that bit about the name of the F-test—I kind of made that up. Fortunately, there is a better source of information for the genuinely curious. Our white paper, <a href="http://support.minitab.com/en-us/minitab/17/bonetts_method_two_variances.pdf">Bonett's Method</a>, includes all kinds of details about these tests and comparisons between the CIs calculated with each. Enjoy.</p>
<p> <br />
<em><a href="##back">return to text of post</a></em></p>
<p> </p>
<p> </p>
Hypothesis TestingStatisticsStatsWed, 11 May 2016 12:00:00 +0000http://blog.minitab.com/blog/data-analysis-and-quality-improvement-and-stuff/tests-of-2-standard-deviations-side-effects-may-include-paradoxical-dissociationsGreg Fox3 Ways to Graph 3 Variables in Minitab
http://blog.minitab.com/blog/real-world-quality-improvement/3-ways-to-graph-3-variables-in-minitab
<p>You can use contour plots, 3D scatterplots, and 3D surface plots in Minitab to view three variables in a single plot. These graphs are ideal if you want to see how temperature and humidity <span style="line-height: 20.8px;">affect the drying time of paint, or how horsepower and tire pressure affect a vehicle's fuel efficiency, for example. Ultimately, these three graphs are good choices for helping you to visualize your data and </span><span style="line-height: 1.6;">examine relationships among your three variables. </span></p>
1. Contour Plot
<p>Contour plots display a 3-dimensional relationship in two dimensions, with x- and y-factors (predictors) plotted on the x- and y-scales and response values represented by contours. You can think of a contour plot like a topographical map, in which x-, y-, and z-values are plotted instead of longitude, latitude, and elevation.</p>
<p>For example, this contour plot shows how reheat time (y) and temperature (x) affect the quality (contours) of a frozen entrée (mac-n-cheese, anyone?). The darker regions indicate higher quality. The contour levels reveal a peak centered in the vicinity of 35 minutes (Time) and 425 degrees (Temp). Quality scores in this peak region are greater than 8.</p>
<p style="margin-left:.25in;"><img alt="http://support.minitab.com/en-us/minitab/17/contour_plot_def.png" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/8bc30ce8807975037693ddb4e28dc48a/contour_plot.png" style="width: 360px; height: 240px;" /></p>
<p>To create a contour plot in Minitab, choose <strong>Graph</strong> > <strong>Contour Plot</strong>. Note that you can easily change the number and colors of contour levels by right-clicking in the graph area and choosing <strong>Edit Area</strong>.</p>
2. 3D Scatterplot
<p>A 3D scatterplot graphs the actual data values of three continuous variables against each other on the x-, y-, and z-axes. Usually, you would plot predictor variables on the x-axis and y-axis and the response variable on the z-axis.</p>
<p>You can create 3D scatterplots in Minitab by choosing <strong>Graph</strong> > <strong>3D Scatterplot</strong>. Take the frozen entrée example from above—you can plot a simple 3D scatterplot to show how reheat time and temperature affect the quality of the entrée:</p>
<p style="margin-left:.25in;"><img alt="http://support.minitab.com/en-us/minitab/17/3D_scatterplot_simple.png" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/8b91a3cb7177285289ed43d5e03eb49f/3d_scatterplot.png" style="width: 348px; height: 230px;" /></p>
<p>It’s also easy to rotate a 3D scatterplot to view it from different angles. Just click on your plot to activate it, then choose <strong>Tools</strong> > <strong>Toolbars</strong> > <strong>3D Graph Tools</strong>.</p>
3. 3D Surface Plot
<p>Use a 3D surface plot to create a three-dimensional surface based on the x-, y-, and z-variables. The predictor variables are displayed on the x- and y-scales, and the response (z) variable is represented by a smooth surface (in a 3D surface plot) or a grid (in a 3D wireframe plot).</p>
<p>You may be thinking that the 3D surface plot looks very similar to the 3D scatterplot. The only difference between the two is that for the surface plot, Minitab displays a continuous surface or a grid (wireframe plot) of z-values instead of individual data points.</p>
<p>Here’s the frozen entrée data shown on a 3D surface plot:</p>
<p style="margin-left:.25in;"><img alt="http://support.minitab.com/en-us/minitab/17/surface_plot.png" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/227a2cadedcf277890ca2efb4d3e4811/3d_surface_plot.png" style="width: 348px; height: 230px;" /></p>
<p>To build a 3D surface plot in In Minitab, choose <strong>Graph</strong> > <strong>3D Surface Plot</strong>. The same instructions above for rotating a 3D scatterplot apply here as well, making it just as easy to view your 3D surface plot from different angles.</p>
Bonus Plot!
<p>It’s your lucky day! Here’s a <strong>bonus fourth way to graph 3 variables in Minitab: </strong>You can also use a bubble plot to explore the relationships among three variables on a single plot. Like a scatterplot, a bubble plot plots a y-variable versus an x-variable. However, the symbols ("bubbles") on this plot vary in size. The area of each bubble represents the value of a third variable. Visit <a href="http://blog.minitab.com/blog/starting-out-with-statistical-software/introducing-the-bubble-plot" target="_blank">this blog post</a> to learn more!</p>
<p>If you want to try your hand at creating these graphs in Minitab and you don't already have it, we offer a <a href="https://www.minitab.com/products/minitab/free-trial/" target="_blank">full trial version—it's free for 30 days</a>!</p>
Data AnalysisStatisticsMon, 09 May 2016 12:00:00 +0000http://blog.minitab.com/blog/real-world-quality-improvement/3-ways-to-graph-3-variables-in-minitabCarly BarryNovel Uses of the Pareto Chart Through Human History
http://blog.minitab.com/blog/statistics-and-quality-data-analysis/novel-uses-of-the-pareto-chart-through-human-history
<p><img alt="bones" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/4a63351bc132ab777b817253ca9d4ddf/mastadon_bones.jpg" style="width: 250px; height: 211px; float: right; margin: 10px 15px;" />The Pareto chart is a graphic representation of the 80/20 rule, also known as the Pareto principle. If you're a quality improvement specialist, you know that the chart is named after the early 20th century economist Vilfredo Pareto, who discovered that roughly 20% of the population in Italy owned about 80% of the property at that time.</p>
<p>You probably also know that the Pareto principle was later adopted and repurposed as a powerful business metric by Dr. Joseph Juran in the 1940s, to identify the "<a href="http://blog.minitab.com/blog/michelle-paret/fast-food-and-identifying-the-vital-few" target="_blank">vital few</a>" issues versus the "trivial many".</p>
<p>But most people don't realize that human use of the Pareto chart goes back much earlier than this. Archeological evidence suggests the chart could date back to the Middle Paleolithic era: using broken-off mastodon bones for bars, and hyena sinews for connect lines, it appears Stone Age humans constructed rudimentary Pareto charts to depict problems as they first began to cook with fire.</p>
<p>Based on the fossilized records, I used our <a href="http://www.minitab.com/products/minitab">statistical software</a> to recreate a Stone Age Pareto chart:</p>
<p style="margin-left: 40px;"><img alt="Pareto Prehistoric" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/7e5eaaf8ab636dfef3f3ef1f7dd191f9/pareto_chart_of_problems_final.jpg" style="width: 576px; height: 384px;" /></p>
<p>Unfortunately, although Paleolithic humans were able to create a rough-hewn version of a Pareto chart, their brains were still too small to interpret it. Moreover, they didn't have <a href="http://blog.minitab.com/blog/understanding-statistics/root-cause-analysis-and-process-improvement-for-patient-safety" target="_blank">follow-up tools to identify the root causes</a> of the "vital few" problems identified by the chart. Early attempts at fishbone diagrams similarly failed because the bones were eaten before the diagram was completed. Thus, it would take another 400,000 years of evolution before humans could fry an egg, over-easy.</p>
<p>Fast forward to about 4500 BP (Before Pareto). Hieroglyphic documents unearthed in the tombs of the great pyramids reveal that Egyptian quality engineers in the sphinx manufacturing industry used Pareto charts to reveal the vital few defects in their product.</p>
<div style="margin-left: 40px;"><img alt="sphinx 3" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/c88263ab1f9a9286faf7ce16c931eaa9/sphinx_4_cropped.jpg" style="width: 230px; height: 347px; float: left;" /> <img alt="pareto sphinx" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/f105ebd41ab2543e548c963a5f02f08f/pareto_chart_of_sphinx_issues_3.jpg" style="width: 523px; height: 350px;" /></div>
<p>Unlike their Stone age predecessors, Egyptian quality engineers <em>were </em>able to identify root causes of the vital few issues shown on the chart. For example, they found that the poor grade of limestone used to make the sphinx was responsible for most nose and beard breakage. (The engineers recommended using a more durable, high-quality stone for construction. Unfortunately, the chief treasurer deemed this too costly, arguing that it would undercut the Pharaohs' short-term profit margin over the next few centuries.)</p>
<p>Based on the second-highest bar in the chart, product designers also recommended that the design be modified to make the sphinx either more like a lion, or more like a human. However, upper level priests and viziers noted that the current design was based on Nile delta marketing research that showed the 50% of customers wanted a gigantic lion, while 50% wanted a really, really big person. The current design was deemed a compromise.</p>
<p>None of the recommendations based on the chart were adopted, and the sphinx manufacturing industry went bankrupt soon thereafter.</p>
Novel Applications in the Modern Era
<p>By the 20th century, the Pareto chart had become a quintessential tool for quality improvement in the manufacturing and service industries. However, new applications were still being made in other diverse fields, including social work and psychotherapy.</p>
<p>On February 7, 1959, marriage therapist <a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/alpha-male-vs-alpha-female" target="_blank">Dr. Sigma Freud</a> was the first to apply a Pareto chart in the venue of couples counseling. Alfred and Gloria VanderCamp had sought help to save their crumbling marriage of 23 years. But the counseling sessions soon became bogged down in endless recriminations, as each spouted the innumerable, trivial flaws of the other.</p>
<p>To gain insight, the doctor suggested that the VanderCamps track and record each other's defects over a period of one month. Then the Pareto chart could be used to identify the vital few flaws from the trivial many, allowing the couple to focus on important issues in the marriage. The results are shown below.</p>
<p style="margin-left: 40px;"><img alt="Alfred" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/b5ac2db92070d30171148ed2432de64d/pareto_chart_of_alfred_vandercamp_2.jpg" style="width: 576px; height: 384px;" /></p>
<p style="margin-left: 40px;"><img alt="Gloria" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/19c0bd3269027a3a6f9f99248410a287/pareto_chart_of_gloria_vandercamp_2.jpg" style="width: 576px; height: 384px;" /></p>
<p><span style="line-height: 107%; font-family: "Calibri",sans-serif; font-size: 11pt; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin; mso-hansi-theme-font: minor-latin; mso-bidi-font-family: "Times New Roman"; mso-bidi-theme-font: minor-bidi; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA;">Although Dr. Freud was hopeful that the VanderCamps could improve their relationship by focusing on vital flaws, her initial application of the Pareto chart overlooked two critical assumptions:</span></p>
<p style="margin-left: 40px;"><span style="line-height: 107%; font-family: "Calibri",sans-serif; font-size: 11pt; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin; mso-hansi-theme-font: minor-latin; mso-bidi-font-family: "Times New Roman"; mso-bidi-theme-font: minor-bidi; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA;">1. Pareto charts that track frequency assume that </span>the more frequently something happens, the greater the impact it has on the outcome. If this is not the case, flaws should be scored by severity.</p>
<p style="margin-left: 40px;">2. <span style="line-height: 107%; font-family: "Calibri",sans-serif; font-size: 11pt; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin; mso-hansi-theme-font: minor-latin; mso-bidi-font-family: "Times New Roman"; mso-bidi-theme-font: minor-bidi; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA;">A Pareto analysis usually illuminates only a snapshot in time, and may not take into account changing conditions. </span></p>
<p>And so it was. Two months into therapy, Mrs. VanderCamp decided that nothing in life was as important as dancing, and ran off with a ballroom dance instructor.</p>
<p>Several months into the new relationship, upon creating a Pareto chart of her twinkle-toed partner, the former Mrs. VanderCamp was aghast to discover that his flaws<span style="line-height: 107%; font-family: "Calibri",sans-serif; font-size: 11pt; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin; mso-hansi-theme-font: minor-latin; mso-bidi-font-family: "Times New Roman"; mso-bidi-theme-font: minor-bidi; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA;">—</span>both the vital few and the trivial many<span style="line-height: 107%; font-family: "Calibri",sans-serif; font-size: 11pt; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin; mso-hansi-theme-font: minor-latin; mso-bidi-font-family: "Times New Roman"; mso-bidi-theme-font: minor-bidi; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA;">—w</span>ere essentially the same as Alfred's.</p>
<p>She did, however, dance more.</p>
Fun StatisticsQuality ImprovementFri, 06 May 2016 12:00:00 +0000http://blog.minitab.com/blog/statistics-and-quality-data-analysis/novel-uses-of-the-pareto-chart-through-human-historyPatrick RunkelUnderstanding t-Tests: 1-sample, 2-sample, and Paired t-Tests
http://blog.minitab.com/blog/adventures-in-statistics/understanding-t-tests%3A-1-sample%2C-2-sample%2C-and-paired-t-tests
<p>In statistics, t-tests are a type of hypothesis test that allows you to compare means. They are called t-tests because each t-test boils your sample data down to one number, the t-value. If you understand how t-tests calculate t-values, you’re well on your way to understanding how these tests work.</p>
<p>In this series of posts, I'm focusing on concepts rather than equations to show how t-tests work. However, this post includes two simple equations that I’ll work through using the analogy of a signal-to-noise ratio.</p>
<p><a href="http://www.minitab.com/products/minitab/" target="_blank">Minitab statistical software</a> offers the 1-sample t-test, paired t-test, and the 2-sample t-test. Let's look at how each of these t-tests reduce your sample data down to the t-value.</p>
How 1-Sample t-Tests Calculate t-Values
<p>Understanding this process is crucial to understanding how t-tests work. I'll show you the formula first, and then I’ll explain how it works.</p>
<p style="margin-left: 40px;"><img alt="formula to calculate t for a 1-sample t-test" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/dbbda42fec926eef96a56c22ed462458/formula_1t.png" style="width: 142px; height: 88px;" /></p>
<p>Please notice that the formula is a ratio. A common analogy is that the t-value is the signal-to-noise ratio.</p>
<strong>Signal (a.k.a. the effect size)</strong>
<p>The numerator is the signal. You simply take the sample mean and subtract the null hypothesis value. If your sample mean is 10 and the null hypothesis is 6, the difference, or signal, is 4.</p>
<p>If there is no difference between the sample mean and null value, the signal in the numerator, as well as the value of the entire ratio, equals zero. For instance, if your sample mean is 6 and the null value is 6, the difference is zero.</p>
<p>As the difference between the sample mean and the null hypothesis mean increases in either the positive or negative direction, the strength of the signal increases.</p>
<div style="float: right; width: 325px; margin: 15px 0px 15px 15px;"><img alt="Photo of a packed stadium to illustrate high background noise" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/695f063e8d38c2bc9c5fa61637ef6327/crowd.jpg" style="width: 325px; height: 244px; margin-bottom:5px;" /><br />
<em>Lots of noise can overwhelm the signal.</em></div>
<strong>Noise</strong>
<p>The denominator is the noise. The equation in the denominator is a measure of variability known as the <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/hypothesis-tests/tests-of-means/what-is-the-standard-error-of-the-mean/" target="_blank">standard error of the mean</a>. This statistic indicates how accurately your sample estimates the mean of the population. A larger number indicates that your sample estimate is less precise because it has more random error.</p>
<p>This random error is the “noise.” When there is more noise, you expect to see larger differences between the sample mean and the null hypothesis value <em>even when the null hypothesis is true</em>. We include the noise factor in the denominator because we must determine whether the signal is large enough to stand out from it.</p>
<strong>Signal-to-Noise ratio</strong>
<p>Both the signal and noise values are in the units of your data. If your signal is 6 and the noise is 2, your t-value is 3. This t-value indicates that the difference is 3 times the size of the standard error. However, if there is a difference of the same size but your data have more variability (6), your t-value is only 1. The signal is at the same scale as the noise.</p>
<p>In this manner, t-values allow you to see how distinguishable your signal is from the noise. Relatively large signals and low levels of noise produce larger t-values. If the signal does not stand out from the noise, it’s likely that the observed difference between the sample estimate and the null hypothesis value is due to random error in the sample rather than a true difference at the population level.</p>
A Paired t-test Is Just A 1-Sample t-Test
<p>Many people are confused about when to use a paired t-test and how it works. I’ll let you in on a little secret. The paired t-test and the 1-sample t-test are actually the same test in disguise! As we saw above, a 1-sample t-test compares one sample mean to a null hypothesis value. A paired t-test simply calculates the difference between paired observations (e.g., before and after) and then performs a 1-sample t-test on the differences.</p>
<p>You can test this with <a href="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/File/946c3f4725847e714e7fcc9664ae67b2/paired_t_test.mtw">this data set</a> to see how all of the results are identical, including the mean difference, t-value, p-value, and confidence interval of the difference.</p>
<p style="margin-left: 40px;"><img alt="Minitab worksheet with paired t-test example" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/02fbcdbbf62fec3823123fbcc818b11f/paired_t_worksheet.png" style="width: 229px; height: 223px;" /><img alt="paired t-test output" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/170d6d4fa1fbbb1bf4f5aa56b1783b5f/paired_t_swo.png" style="width: 518px; height: 196px;" /></p>
<p style="margin-left: 40px;"><img alt="1-sample t-test output" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/08d652fb45599fc1ac247181a935c471/1t_difc_swo.png" style="width: 504px; height: 115px;" /></p>
<p>Understanding that the paired t-test simply performs a 1-sample t-test on the paired differences can really help you understand how the paired t-test works and when to use it. You just need to figure out whether it makes sense to calculate the difference between each pair of observations.</p>
<p>For example, let’s assume that “before” and “after” represent test scores, and there was an intervention in between them. If the before and after scores in each row of the example worksheet represent the same subject, it makes sense to calculate the difference between the scores in this fashion—the paired t-test is appropriate. However, if the scores in each row are for different subjects, it doesn’t make sense to calculate the difference. In this case, you’d need to use another test, such as the 2-sample t-test, which I discuss below.</p>
<p>Using the paired t-test simply saves you the step of having to calculate the differences before performing the t-test. You just need to be sure that the paired differences make sense!</p>
<p>When it is appropriate to use a paired t-test, it can be more powerful than a 2-sample t-test. For more information, go to <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/hypothesis-tests/tests-of-means/why-use-paired-t/" target="_blank">Why should I use a paired t-test?</a></p>
How Two-Sample T-tests Calculate T-Values
<p>The 2-sample t-test takes your sample data from two groups and boils it down to the t-value. The process is very similar to the 1-sample t-test, and you can still use the analogy of the signal-to-noise ratio. Unlike the paired t-test, the 2-sample t-test requires independent groups for each sample.</p>
<p>The formula is below, and then some discussion.</p>
<p style="margin-left: 40px;"><img alt="formula to cacalculate t for a 2-sample t-test" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/276994cf179b4997ce6097d1f4462363/formula_2t.png" style="width: 102px; height: 54px;" /></p>
<p>For the 2-sample t-test, the numerator is again the signal, which is the difference between the means of the two samples. For example, if the mean of group 1 is 10, and the mean of group 2 is 4, the difference is 6.</p>
<p>The default null hypothesis for a 2-sample t-test is that the two groups are equal. You can see in the equation that when the two groups are equal, the difference (and the entire ratio) also equals zero. As the difference between the two groups grows in either a positive or negative direction, the signal becomes stronger.</p>
<p>In a 2-sample t-test, the denominator is still the noise, but Minitab can use two different values. You can either assume that the variability in both groups is equal or not equal, and Minitab uses the corresponding estimate of the variability. Either way, the principle remains the same: you are comparing your signal to the noise to see how much the signal stands out.</p>
<p>Just like with the 1-sample t-test, for any given difference in the numerator, as you increase the noise value in the denominator, the t-value becomes smaller. To determine that the groups are different, you need a t-value that is large.</p>
What Do t-Values Mean?
<p>Each type of t-test uses a procedure to boil all of your sample data down to one value, the t-value. The calculations compare your sample mean(s) to the null hypothesis and incorporates both the sample size and the variability in the data. A t-value of 0 indicates that the sample results exactly equal the null hypothesis. In statistics, we call the difference between the sample estimate and the null hypothesis the effect size. As this difference increases, the absolute value of the t-value increases.</p>
<p>That’s all nice, but what does a t-value of, say, 2 really mean? From the discussion above, we know that a t-value of 2 indicates that the observed difference is twice the size of the variability in your data. However, we use t-tests to evaluate hypotheses rather than just figuring out the signal-to-noise ratio. We want to determine whether the effect size is statistically significant.</p>
<p>To see how we get from t-values to assessing hypotheses and determining statistical significance, read the other post in this series, <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-t-tests-t-values-and-t-distributions">Understanding t-Tests: t-values and t-distributions</a>.</p>
Data AnalysisHypothesis TestingLearningStatistics HelpWed, 04 May 2016 12:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/understanding-t-tests%3A-1-sample%2C-2-sample%2C-and-paired-t-testsJim FrostExploring Healthcare Data, Part 2
http://blog.minitab.com/blog/meredith-griffith/exploring-healthcare-data-part-2
<p>In the <a href="http://blog.minitab.com/blog/meredith-griffith/exploring-healthcare-data-part-1">first part</a><a> </a>of this series, we looked at a case study where staff at a hospital used ATP swab tests to test 8 surfaces for bacteria in 10 different hospital rooms across 5 departments. ATP measurements below 400 units pass the swab test, while measurements greater than or equal to 400 units fail the swab test and require further investigation.</p>
<p><img alt="washing hands" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/20208ca1f1030aa16621cfb8b84f947a/washing_hands.jpg" style="margin: 10px 15px; float: right; width: 250px; height: 202px;" />I offered two tips on exploring and visualizing data using graphs, brushing, and conditional formatting.</p>
<ol>
<li><strong>Evaluate the shape of your data.</strong></li>
<li><strong>Identify and investigate outliers.</strong></li>
</ol>
<p>By performing these preliminary explorations on the swab test data, we discovered that the mean ATP measurement would not be effective for testing whether surfaces showed statistically significant differences in contamination levels. This was due to the data being highly skewed by extreme outliers.</p>
<p>We then identified where these unusually high-ATP measurements were discovered in the hospital. These findings provide valuable information for appropriately focusing process improvement efforts on particular hospital rooms, departments, and surfaces within those rooms.</p>
<p>Now that we've seen how much some simple exploration and visualization tools can reveal, let's run through three more tools that will help you explore your own healthcare data in order to draw actionable insights.</p>
<p>If you’d like to follow along and didn't already download the data from the first post, you can <a href="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/b7c8abc2ba12bed0e1aa6e69d8e22d06/atp.MPJ">download and explore the data</a> yourself! If you don’t yet have Minitab 17, you can download the <a href="http://www.minitab.com/products/minitab/free-trial/">free, 30-day trial</a>.</p>
Tip #3: Manipulate the data
<p>The swab test data the hospital staff collected and recorded is unstacked—this simply means that all response measurements are contained in multiple columns rather than stacked together in one column. To do additional data visualization and a more formal analysis, you need to reconfigure or manipulate how the data is arranged. We can accomplish this by stacking rows.</p>
<p>The <strong>ATP Stacked.MTW</strong> worksheet in the downloadable Minitab project file above already has the data reshaped for you. But you can manipulate the data on your own using the <strong>ATP Unstacked.MTW</strong> worksheet. Just navigate to <strong>Data > Stack > Rows</strong>, and complete the dialog as shown:</p>
<p style="margin-left: 40px;"><img alt="health care data - stack rows to prepare for analysis" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/e5e28f9788efa682ac7b96b7df13846c/health_care_data_2_1.png" style="width: 370px; height: 288px;" /></p>
<p>Stacking all rows of your data and storing the associated column subscripts (or column names) in a separate column will result in all ATP measurements stacked into one column, a separate column containing categories for Surfaces, and another column containing the Room Number.</p>
<p>With stacked data, you are properly set up to perform formal analyses in Minitab—this is an important step as you work with your data, as most Minitab analyses require columns of stacked data. We won’t tackle a formal analysis here, but rest assured that you are set up to do so!</p>
Tip #4: Extract information from your original data set
<p>Once your data are stacked, you can use functions available in <strong>Calc > Calculator</strong> and <strong>Data > Recode </strong>to leverage information intrinsic to your original data to create new variables to explore and analyze.</p>
<p>For instance, we know the first character of each room number denotes the department. You can use the ‘left’ function in <strong>Calc > Calculator </strong>to extract the left-most character from the Room column, and store the result in a new column labeled Department. You can do this by filling out the <strong>Calculator </strong>dialog as shown:</p>
<p style="margin-left: 40px;"><img alt="manipulating health care data" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/3c02f831773bf08209ca98ea6715b336/health_care_data_2_2.png" style="width: 326px; height: 288px;" /></p>
<p>You also know that ATP measurements below 400 ‘pass’ the ATP swab test. Recoding ranges of ATP values to text to indicate which values ‘Pass’ and which values ‘Fail’ can be useful when visualizing the data. You can do this by filling out the <strong>Data ></strong> <strong>Recode > To Text </strong>dialog as shown:</p>
<p style="margin-left: 40px;"><img alt="health care data dialog box" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/3037d5b1320e2e590304dd32f214c68a/health_care_data_2_3.png" style="width: 487px; height: 288px;" /></p>
<p>Finally, you can use this newly extracted data to create a stacked bar chart showing the counts of measurements that failed, passed, or were missing from the ATP swab test across Department and the recoded ATP. Using the <strong>ATP Stacked.MTW </strong>worksheet, navigate to <strong>Graph > Bar Chart > Stack</strong>. Verify that the <strong>Bars represent </strong>drop-down shows the default selection, <em>Counts of unique values</em>. Click <strong>OK.</strong> Select Department and Recoded ATP as Categorical variables, and click <strong>OK.</strong></p>
<p>Minitab produces the following graph:<br />
</p>
<p style="margin-left: 40px;"><img alt="Health care ATP swab test data" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/3b5b258c10e1d6223fe04b1252557c6c/health_care_data_2_4.png" style="width: 432px; height: 288px;" /></p>
<p>The bar chart reveals that:</p>
<ul>
<li>Department 4 has the highest count of ATP measurements that failed the swab test.</li>
<li>The sanitation team should consider focusing initial efforts in department 4 as the investigation of problems with room-cleaning procedures continues.</li>
</ul>
Tip #5: Obtain important statistics that describe your data
<p>Now that we’ve manipulated the data in a way that prepares us for more formal analyses, identified which department contains the most contaminated surfaces, and compared the portion of measurements in each department that passed or failed the ATP swab test, we can display descriptive statistics to get an idea of how mean or median bacteria levels differed or varied across surfaces and across departments.</p>
<p>Using the <strong>ATP Stacked.MTW </strong>worksheet, navigate to <strong>Stat > Basic Statistics > Display Descriptive Statistics</strong>. Enter ATP as the <strong>Variable, </strong>Department as the <strong>By variable,</strong> and click <strong>OK. </strong>Press Ctrl + E to re-enter the <strong>Display Descriptive Statistics</strong> dialog, and replace Department with Surface as the <strong>By variable</strong>. Click <strong>OK. </strong>The following output displays in Minitab’s Session Window.</p>
<p style="margin-left: 40px;"><img alt="Health care data descriptive statistics" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/22c23a08935da3670567f419c031d235/health_care_data_2_5.png" style="width: 555px; height: 117px;" /></p>
<p style="margin-left: 40px;"><img alt="Health care data swab tests descriptive statistics" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/6f3e2ffedb14dd05c44f62d47d1c25b9/health_care_data_2_6.png" style="width: 560px; height: 149px;" /></p>
<p>The descriptive statistics reveal helpful information:</p>
<ul>
<li>These statistics allow for easy comparison of mean and median ATP measurements as well as the variation of ATP measurements, either by department or by surface.<br />
</li>
<li>Notice that mean ATP measurements are much higher than median ATP measurements for both sets of descriptive statistics. This is because the data are right-skewed. Certain analyses that assume you have normally distributed data—such as t-tests to compare means—might not be the best tool to formally analyze this data. Comparing medians might offer more insight.<br />
</li>
<li>Both sets of descriptive statistics highlight which departments and surfaces to focus on for investigation and process improvement efforts. For instance, department 4 has the highest median ATP presence, while Bed Rails, Phone, and Call Button—the touch points closest to a sick patient in a hospital bed—appear to be the most problematic surfaces to sanitize. Process improvement efforts can begin with this information.</li>
</ul>
What Else Can You Do with Your Data?
<p>What you’ve seen in this two-part blog post is just the beginning. But consider how much of this initial exploration is <em>actionable!</em> By having this foundation for visualizing and manipulating your data, you’ll be well on your way to investigating and testing root causes, and more efficiently performing analyses that yield trustworthy results.</p>
<p>If you’re interested in how other healthcare organizations use Minitab for quality improvement, check out our <a href="http://www.minitab.com/en-us/company/case-studies/?Industry=healthcare&Product=mss">case studies</a>.</p>
Health Care Quality ImprovementTue, 03 May 2016 12:00:00 +0000http://blog.minitab.com/blog/meredith-griffith/exploring-healthcare-data-part-2Meredith GriffithExploring Healthcare Data, Part 1
http://blog.minitab.com/blog/meredith-griffith/exploring-healthcare-data-part-1
<p>Working with healthcare-related data often feels different than working with manufacturing data. After all, the common thread among healthcare quality improvement professionals is the motivation to preserve and improve the lives of patients. Whether collecting data on the number of patient falls, patient length-of-stay, bed unavailability, wait times, hospital acquired-infections, or readmissions, human lives are stake. And so collecting and analyzing data—and trusting your results—in a healthcare setting feels even more critical.</p>
<p><img alt="ATP test" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/e6370178caaa6532c6332070281a2b87/atp_test.jpg" style="margin: 10px 15px; float: right; width: 300px; height: 299px; border-width: 1px; border-style: solid;" />Because delivering quality care efficiently is of utmost importance in the healthcare industry, understanding your process, collecting data around that process, and knowing what analysis to perform is key. Awareness about your process and opportunities to improve patient care and cut costs will benefit from using data to drive decisions in your organization that will result in better business and better care.</p>
<p>So, in the interest of using data to draw insights and make decisions that have positive impacts, I’d like to offer several tips for exploring and visualizing your healthcare data in a way that will prepare you for a formal analysis. For instance, graphing your data and examining descriptive statistics such as means and medians can tell you a lot about <span><a href="http://blog.minitab.com/blog/michelle-paret/3-things-a-histogram-can-tell-you">how your data are distributed</a></span> and can help you visualize relationships between variables. These preliminary explorations can also reveal unusual observations in your data that should be investigated before you perform a more sophisticated statistical analysis, allowing you to take action quickly when a process, outcome, or adverse event needs attention.</p>
<p>In the first part of this series, I’ll offer two tips on exploring and visualizing data with graphs, brushing, and conditional formatting. In part 2, I’ll offer three more tips focusing on data manipulation and obtaining descriptive statistics.</p>
<p>If you’d like to follow along, you can <a href="//cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/b7c8abc2ba12bed0e1aa6e69d8e22d06/atp.MPJ">download and explore the data</a> yourself! If you don’t yet have Minitab 17, you can download the <a href="http://www.minitab.com/products/minitab/free-trial/">free, 30-day trial</a>.</p>
A Case Study: Ensuring Sound Sanitization Procedures
<p>Let’s look at a case study where a hospital was seeking to examine—and ultimately improve—their room cleaning procedures.</p>
<p>The presence of adenosine triphosphate (ATP) on a surface indicates that bacteria exists. Hospitals can use ATP detection systems to ensure the effectiveness of their sanitization efforts and identify improvement opportunities.</p>
<p>Staff at your hospital used ATP swab tests to test 8 surfaces in 10 different hospital rooms across 5 departments, and recorded the results in a data sheet. ATP measurements below 400 units ‘pass’ the swab test, while measurements greater than or equal to 400 units ‘fail’ the swab test and require further investigation.</p>
<p>Here is a screenshot of part of the worksheet:</p>
<p style="margin-left: 40px;"><img alt="health care data" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/5c0c3a39bc2399d80b2bd2992107a724/health_care_data_1.png" style="width: 562px; height: 159px;" /></p>
Tip #1: Evaluate the shape of your data
<p>You can use a histogram to graph all eight surfaces that were tested in separate panels of the same graph. This helps you observe and compare the distribution of data across each touch point.</p>
<p>If you’ve downloaded the <a href="//cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/b7c8abc2ba12bed0e1aa6e69d8e22d06/atp.MPJ">data</a>, you can use the <strong>ATP Unstacked.MTW</strong> worksheet to create this same histogram by navigating to <strong>Graph > Histogram > Simple</strong>. In the <strong>Graph Variables </strong>window, select Door Knob, Light Switch, Bed Rails, Call Button, Phone, Bedside Table, Chair, and IV Pole. Click on the <strong>Multiple Graphs </strong>subdialog and select <em>In separate panels of the same graph </em>under <strong>Show Graph Variables. </strong>Click <strong>OK</strong> through all dialogs.</p>
<p style="margin-left: 40px;"><img alt="health care data - histogram" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f85376e8b3b4d8338fb795107b385324/health_care_data_2_histogram.png" style="width: 432px; height: 288px;" /> </p>
<p>These histograms reveal that:</p>
<ul>
<li>For all test areas, the distribution is asymmetrical with some extreme outliers.</li>
<li>Data are all right-skewed.</li>
<li>Data do not appear to be normally distributed.</li>
</ul>
Tip #2: Identify and investigate outliers
<p>An individual value plot can be used to graph the ATP measurements collected across all eight surfaces. Identifying the outliers is quite easy with this plot.</p>
<p>And again, you can use the <strong>ATP Unstacked.MTW</strong> worksheet to create an individual value plot that looks just like mine. Navigate to <strong>Graph > Individual Value Plots > Multiple Y’s > Simple, </strong>and choose Door Knob, Light Switch, Bed Rails, Call Button, Phone, Bedside Table, Chair, and IV Pole as <strong>Graph variables</strong>. Click <strong>OK</strong>.</p>
<p style="margin-left: 40px;"><img alt="health care data - individual value plot" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/061e038dd122755aa672fc1554156b38/health_care_data_2_individual_value_plot.png" style="width: 432px; height: 288px;" /></p>
<p>This individual value plot reveals that:</p>
<ul>
<li>Extreme outliers are present for ATP measurements on Bed Rails, Call Button, Phone, and Bedside Table.</li>
<li>These extreme values are influencing the mean ATP measured for each surface.</li>
<li>It may be more helpful to analyze differences in medians since the means are skewed by these outliers (judging by the histogram and individual value plot).</li>
</ul>
<p>Once the outliers are identified, you can investigate them with Minitab’s brushing tool to uncover more insights by right-clicking anywhere in the individual value plot and selecting <strong>Brush</strong>. Setting ID variables also helps to reveal information about other variables associated with these outliers. To do this, right-click in the graph again and select <strong>Set ID Variables. </strong>Enter Room as the <strong>Variable</strong> and click <strong>OK. </strong>Click and drag the cursor to form a rectangle around the outliers as shown below.</p>
<p style="margin-left: 40px;"><img alt="health care data - brushing" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a4c1a4168f9e0b8dd06a54d4d04346e5/health_care_data_4_brushing.png" style="width: 489px; height: 240px;" /></p>
<p>Brushing can provide actionable insights:</p>
<ul>
<li>Brushing the extreme outliers on the individual value plot and setting ID variables reveals the room numbers associated with high ATP measurements.</li>
<li>Quickly identifying rooms where surfaces have high levels of ATP enables faster follow-up and investigation on specific surfaces in specific rooms.</li>
</ul>
<p>Finally, you can use conditional formatting and other cell properties to investigate and make notes about the outliers. To look at outliers across all surfaces tested, highlight columns C2 through C9, right-click in the worksheet, and select <strong>Conditional Formatting > Statistical > Outlier</strong>. Alternatively, you can highlight only the extreme outliers by right-clicking in the worksheet, selecting <strong>Conditional Formatting > Highlight Cell > Greater Than </strong>and entering 2000 (a value we know extreme outliers are above based on the individual value plot).</p>
<p>To make notes about individual outliers, right-click on the cell containing the extreme value, select <strong>Cell Properties > Comment</strong>, and enter your cell comment.</p>
<p style="margin-left: 40px;"><img alt="health care data - conditional formatting" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/4fc3e4d262dac484ef45902176ccb4e8/health_care_data_5_conditional_formatting.png" style="width: 450px; height: 384px; border-width: 1px; border-style: solid;" /></p>
<p>Conditional formats and cell properties offer:</p>
<ul>
<li>Quick insight into surfaces and rooms with high ATP measurements.</li>
<li>More efficient investigation of problem areas in order to make process improvements.</li>
</ul>
Visualizations that Lead to Actionable Insights
<p>By exploring and visualizing your data in these preliminary ways, you can see how easy it is to draw conclusions before even doing an analysis. The data is not normally distributed but is highly skewed by several extreme outliers, which greatly influence the mean ATP measurement recorded for each surface. The first graph created to visualize the data is helpful evidence that comparing medians instead of means may be a more effective way to determine if statistically significant differences exist across surfaces. Investigating these outliers both graphically and in the worksheet offers further evidence that analyzing differences in median measurements will be most effective. It is also obvious that bed rails, call buttons, phones, and bedside tables are highly contaminated surfaces—one might surmise this is because of the touch points’ close proximity to sick patients, and the frequency with which patients come into contact with these surfaces.</p>
<p>You can use these insights to focus our initial process improvement efforts on the most problematic touch points and hospital rooms. In part 2 of this blog post, I’ll share some <a href="http://blog.minitab.com/blog/meredith-griffith/exploring-healthcare-data-part-2">tips for manipulating data, extracting even more information from the data, and displaying descriptive statistics</a> about contamination levels.</p>
Health Care Quality ImprovementMon, 02 May 2016 12:00:00 +0000http://blog.minitab.com/blog/meredith-griffith/exploring-healthcare-data-part-1Meredith Griffith