Data Analysis Software | MinitabBlog posts and articles with tips for using statistical software to analyze data for quality improvement.
http://blog.minitab.com/blog/data-analysis-software/rss
Mon, 27 Jun 2016 09:18:26 +0000FeedCreator 1.7.3How to Identify Outliers (and Get Rid of Them)
http://blog.minitab.com/blog/michelle-paret/how-to-identify-outliers-and-get-rid-of-them
<p><img alt="an outlier among falcon tubes" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/389155acc918fbd094941685c31a33b8/falcontubes.jpg" style="width: 250px; height: 188px; margin: 10px 15px; border-width: 1px; border-style: solid; float: right;" />An outlier is an observation in a data set that lies a substantial distance from other observations. These unusual observations can have a disproportionate effect on statistical analysis, <a href="http://blog.minitab.com/blog/michelle-paret/using-the-mean-its-not-always-a-slam-dunk">such as the mean</a>, which can lead to misleading results. Outliers can provide useful information about your data or process, so it's important to investigate them. Of course, you have to find them first. </p>
<p>Finding outliers in a data set is easy <span style="line-height: 20.8px;">using </span><a href="http://www.minitab.com/products/minitab/" style="line-height: 20.8px;">Minitab Statistical Software</a><span style="line-height: 1.6;">, and there are a few ways to go about it. </span></p>
Finding Outliers in a Graph
<p><span style="line-height: 1.6;">If you want to identify them graphically and visualize where your outliers are located compared to rest of your data, you can use </span><strong style="line-height: 1.6;">Graph > Boxplot</strong><span style="line-height: 1.6;">.</span></p>
<p style="margin-left: 40px;"><img alt="Boxplot" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/652993cfab104ddfe0076fd52ab0f5fd/boxplot_of_strength.jpg" style="width: 600px; height: 394px;" /></p>
<p>This boxplot shows a few outliers, each marked with an asterisk. Boxplots are certainly one of the most common ways to visually identify outliers, but there are <a href="http://blog.minitab.com/blog/fun-with-statistics/visualizing-the-greatest-olympic-outlier-of-all-time">other graphs, such as scatterplots and individual value plots</a>, to consider as well.</p>
Finding Outliers in a Worksheet
<p>To highlight outliers directly in the worksheet, you can right-click on your column of data and choose <strong>Conditional Formatting > Statistical > Outlier</strong>. Each outlier in your worksheet will then be highlighted in red, or whatever color you choose.</p>
<p style="margin-left: 40px;"><img alt="Conditional Formatting Menu in Minitab" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/a898ff565350c40dcbad28bf6d878f82/conditionalformattingmenu.jpg" style="width: 549px; height: 145px;" /></p>
Removing Outliers
<p>If you then want to create a new data set that excludes these outliers, that’s easy to do too. Now I’m not suggesting that removing outliers should be done without thoughtful consideration. After all, they may have a story – perhaps a very important story – to tell. However, for those situations where removing outliers is worthwhile, you can first highlight outliers per the Conditional Formatting steps above, then right-click on the column again and use <strong>Subset Worksheet > Exclude Rows with Formatted Cells</strong> to create the new data set.</p>
The Math
<p>If you want to know the mathematics used to identify outliers, let's begin by talking about quartiles, which divide a data set into quarters:</p>
<ul>
<li><em>Q</em>1 (the 1st quartile): 25% of the data are <em>less than</em> or equal to this value</li>
<li><em>Q</em>3 (the 3rd quartile): 25% of the data are <em>greater than</em> or equal to this value</li>
<li>IQR (the interquartile range): the distance between <em>Q</em>3 – <em>Q</em>1, it contains the middle 50% of the data</li>
</ul>
<p>Outliers are then defined as any values that fall outside of:</p>
<p style="margin-left: 40px;"><em>Q</em>1 – (1.5 * IQR)</p>
<p style="margin-left: 40px;">or</p>
<p style="margin-left: 40px;"><em>Q</em>3 + (1.5 * IQR)</p>
<p><span style="line-height: 1.6;">Of course, rather than doing this by hand, you can leave the heavy-lifting up to Minitab and instead focus on what your data are telling you.</span></p>
<p>Don't see these features in your version of Minitab? Choose <strong>Help > Check for Updates </strong>to see if you're using Minitab 17.3.</p>
Data AnalysisLearningStatisticsStatistics HelpStatsWed, 22 Jun 2016 15:00:00 +0000http://blog.minitab.com/blog/michelle-paret/how-to-identify-outliers-and-get-rid-of-themMichelle ParetUsing Multivariate Statistical Tools to Analyze Customer and Survey Data
http://blog.minitab.com/blog/applying-statistics-in-quality-projects/using-multivariate-statistical-tools-to-analyze-customer-and-survey-data
<p>Businesses are getting more and more data from existing and potential customers: whenever we click on a web site, for example, it can be recorded in the vendor's database. And whenever we use electronic ID cards to access public transportation or other services, our movements across the city may be analyzed.</p>
<p>In the very near future, connected objects such as cars and electrical appliances will continuously generate data that will provide useful insights regarding user preferences, personal habits, and more. Companies will learn a lot from users and the way their products are being used. This learning process will help them focus on particular niches and improve their products according to customer expectations and profiles.</p>
<p>For example, insurance companies will monitor how motorists are driving connected cars, to adjust insurance premiums according to perceived risks, or to analyze driving behaviors so they can advise motorists how to boost fuel efficiency. No formal survey will be needed, because customers will be continuously surveyed.</p>
<p>Let's look at some statistical tools we can use to create and analyze user profiles, map expectations, study which expectations are related, and so on. I will focus on multivariate tools, which are very efficient methods for analyzing surveys and taking into account a large number of variables. My objective is to provide a very high level, general overview of the statistical tools that may be used to analyze such survey data.</p>
A Simple Example of Multivariate Analysis
<p>Let us start with a very simple example. The table below presents data some customers have shared about their enjoyment of specific types of food :</p>
<p style="margin-left: 40px;"><img height="134" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/File/386a610e5bb77c8aa7a5adf8ba5adf03/386a610e5bb77c8aa7a5adf8ba5adf03.png" width="532" /></p>
<p>A simple look at the table does not really help us easily understand preferences. So we can use Simple Correspondence Analysis, a statistical multivariate tool, has been used to visually display expectations.</p>
<p>In Minitab, go to <strong>Stat > Multivariate > Simple Correspondence Analysis...</strong> and enter your data as shown in the dialogue box below. (Also click on "Graphs" and check the box labeled "Symmetric plot showing rows and columns.")</p>
<p style="margin-left: 40px;"><img height="347" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/File/3d7fdd8b981b91398acfbffc1d02f1e4/3d7fdd8b981b91398acfbffc1d02f1e4.png" width="451" /></p>
<p>Minitab creates the following plot: </p>
<p style="margin-left: 40px;"><img height="384" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/File/9e75de185fc35b03062c8f87492d3246/9e75de185fc35b03062c8f87492d3246.png" width="576" /></p>
<p>Looking at the plot, we quickly see that vegetables tend to be associated with “Disagree” (positioned close to each other in the graph) and Ice cream is positioned close to “Neutral” (they are related to each other). As for Meat and Potatoes, the panel tends either to “Agree” or “Strongly agree.”</p>
<p>We now have a much better understanding of the preferences of our panel, because we know what they tend to like and dislike.</p>
Selecting the Right Type of Tool to Analyze Survey Data
<p>Many multivariate tools are available, so how can you choose the right one to analyze your survey data?</p>
<p>The decision tree below shows which method you might choose according to your objectives and the <a href="http://blog.minitab.com/blog/understanding-statistics/understanding-qualitative-quantitative-attribute-discrete-and-continuous-data-types">type of data you have</a>. For example, we selected correspondence analysis in the<span style="line-height: 1.6;"> </span><span style="line-height: 20.8px;">previous</span><span style="line-height: 20.8px;"> </span><span style="line-height: 1.6;">example because all our variables were categorical, or qualitative in nature.</span></p>
<p style="margin-left: 40px;"><img alt="multivariate diagram 1" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/b6150beff1fbc04623fcccdadc0faac4/multivariate_1.gif" style="line-height: 20.8px; width: 624px; height: 464px;" /></p>
<p> </p>
Categorical Data and Prediction of Group Membership (Right Branch)
<p><strong>Clustering</strong><br />
If you have some numerical (or continuous) data and you want to understand how your customers might be grouped / aggregated (from a statistical point of view) into several homogeneous groups, you can use clustering techniques. This could be helpful to define profiles and user groups.</p>
<p><strong>Discriminant Analysis or Logistic Regression (Scoring)</strong><br />
If your individuals already belong to different groups and you want to understand which variables are important to define an existing user group, or predict group membership for new individuals, you can use discriminant analysis, or binary logistic regression (if you only have two groups).</p>
<p><strong>Correspondence Analysis </strong><br />
<span style="line-height: 1.6;">As we saw in the first example, correspondence analysis lets us study relationships between variables that are categorical / qualitative.</span></p>
Numeric or Continuous Data Analysis (Left Branch)
<p><strong>Principal Component Analysis or Factor Analysis</strong><br />
I<span style="line-height: 1.6;">f all your variables are numeric, you can use principal components analysis to understand how variables are related to one another. Factor analysis may be useful to identify an underlying, unknown factor associated to your variables.</span></p>
<p><strong>Item Analysis</strong><br />
This tool was specifically created for survey analysis. Do the items of a survey evaluate similar characteristics? Which items differ from the remaining questions The objective is to assess internal consistency of a survey. </p>
<p>They <em>are </em>computationally intensive, but performing these multivariate analyses in Minitab is very user-friendly, and the software produces easy-to-understand graphs (as in the food preference example above).</p>
A Closer Look at Some Specific Multivariate Tools
<p>Let's take a closer look at the tools for numerical survey data analysis. The graph below shows the tools that are available to you and their objectives in each case. These methods are often used to group numeric variables according to similarity, they may also be useful in studying how individuals are positioned according the main groups of variables in order to identify user profiles.</p>
<p style="margin-left: 40px;"> <img alt="multivariate diagram 2" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/283b5721982a2c167236a120799ede2c/multivariate_2.gif" style="width: 624px; height: 438px;" /></p>
<p>And now let's look a bit more closely at the tools we can use for analyzing categorical survey data. Again, the diagram below shows the tools that are available to you and their objectives. Many of these tools can be used to study how numeric variable relate to qualitative categories.</p>
<p style="margin-left: 40px;"><img height="430" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/File/58cd6feb678c0f5d2232c304b0173391/58cd6feb678c0f5d2232c304b0173391.png" width="624" /></p>
Conclusion
<p>This is a very general overview of multivariate tools for survey analysis. If you want to go deeper and learn more about these techniques, you can find some resources on the <a href="http://support.minitab.com/minitab/17/topic-library/modeling-statistics/multivariate/basics/multivariate-analyses-in-minitab/">Minitab web site</a>, in the Help menu in Minitab's statistical software, or you can contact <a href="http://www.minitab.com/support/">our technical support team</a>. </p>
Data AnalysisInsightsLearningStatisticsStatistics HelpStatsWed, 15 Jun 2016 12:00:00 +0000http://blog.minitab.com/blog/applying-statistics-in-quality-projects/using-multivariate-statistical-tools-to-analyze-customer-and-survey-dataBruno ScibiliaPoisson Data: Examining the Number Deaths in an Episode of Game of Thrones
http://blog.minitab.com/blog/the-statistics-game/poisson-data-examining-the-number-deaths-in-an-episode-of-game-of-thrones
<p><img alt="Game of Thrones" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/d11b4341996f340e24132eb12253d8e5/game_of_thrones.jpg" style="float: right; width: 250px; height: 141px; margin: 10px 15px; border-width: 1px; border-style: solid;" />There may not be a situation more perilous than being a character on <a href="http://www.hbo.com/game-of-thrones" target="_blank"><em>Game of Thrones</em></a>. Warden of the North, Hand of the King, and apparent protagonist of the entire series? Off with your head before the end of the first season! Last male heir of a royal bloodline? Here, have a pot of molten gold poured on your head! Invited to a wedding? Well, you probably know what happens at weddings in the show. </p>
<p>So what do all these gruesome deaths have to do with statistics? They are data that come from a <a href="http://blog.minitab.com/blog/fun-with-statistics/poisson-processes-and-probability-of-poop">Poisson distribution</a>.</p>
<p>Data from a Poisson distribution describe the number of times an even occurs in a finite observation space. For example, a Poisson distribution can describe the number of defects in the mechanical system of an airplane, the number of calls to a call center, or in our case it can describe the number of deaths in an episode of Game of Thrones.</p>
Goodness-of-Fit Test for Poisson
<p>If you're not certain whether your data follow a Poisson distribution, you can use <a href="http://www.minitab.com/en-us/products/minitab/" target="_blank">Minitab Statistical Software</a> to perform a goodness-of-fit test. If you don't already use Minitab and you'd like to follow along with this analysis, download the <a href="http://www.minitab.com/products/minitab/free-trial/">free 30-day trial</a>.</p>
<p>I collected the <a href="http://genius.com/Game-of-thrones-list-of-game-of-thrones-deaths-annotated" target="_blank">number of deaths for each episode</a> of Game of Thrones (as of this writing, 57 episodes have aired), and put them in a Minitab worksheet. Then I went to <strong>Stat > Basic Statistics > Goodness-of-Fit Test for Poisson </strong>to determine whether the data follow a Poisson distribution. You can get the data I used <a href="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f73acb13fa520a25583149f8b780a31c/game_of_thrones_deaths.mtw">here</a>. </p>
<p style="margin-left: 40px;"><img alt="Goodness-of-Fit Test for Poisson Distribution " src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/0c9dcb9ecb6eb644109d86e3501143b3/gof_test_poisson.jpg" style="width: 492px; height: 417px;" /></p>
<p>Before we interpret the p-value, we see that we have a problem. Three of the categories have an expected value less than 5. If the expected value for any category is less than 5, the results of the test may not be valid. To fix our problem, we can combine categories to achieve the minimum expected count. In fact, we see that Minitab actually already started doing this by combining all episodes with 7 or more deaths.</p>
<p>So we'll just continue by making the highest category 6 or more deaths, and the lowest category 1 or 0 deaths. To do this, I created a new column with the categories 1, 2, 3, 4, 5 and 6. Then I made a frequency column that contained the number of occurrences for each category. For example, the "1" category is a combination of episodes with 0 deaths and 1 death, so there were 15 occurrences. Then I ran the analysis again with the new categories.</p>
<p style="margin-left: 40px;"><img alt="Goodness-of-Fit Test for Poisson Distribution " src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/93551e38ce5c4cc5321c249fee184e24/gof_test_poisson_2.jpg" style="width: 420px; height: 323px;" /></p>
<p>Now that all of our categories have expected counts greater than 5, we can examine the p-value. If the p-value is less than the significance level (usually 0.05 works well), you can conclude that the data do not follow a Poisson distribution. But in this case the p-value is 0.228, which is greater than 0.05. Therefore, we cannot conclude that the data do not follow the Poisson distribution, and can continue with analyses that assume the data follow a Poisson distribution. </p>
Confidence Interval for 1-Sample Poisson Rate
<p>When you have data that come from a Poisson distribution, you can use <strong>Stat > Basic Statistics > 1-Sample Poisson Rate</strong> to get a rate of occurrence and calculate a range of values that is likely to include the population rate of occurrence. We'll perform the analysis on our data.</p>
<p style="margin-left: 40px;"><img alt="1-Sample Poisson Rate" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/259b9b0cb11fed7e5b7467703f7037ad/1_poisson_rate.jpg" style="width: 489px; height: 133px;" /></p>
<p>The rate of occurrence tells us that on average there are about 3.2 deaths per episode on <em>Game of Thrones</em>. If our 57 episodes were a sample from a much larger population of <em>Game of Thrones</em> episodes, the confidence interval would tell us that we can be 95% confident that the population rate of deaths per episode is between 2.8 and 3.7.</p>
<p>The length of observation lets you specify a value to represent the rate of occurrence in a more useful form. For example, suppose instead of deaths per episode, you want to determine the number of deaths per season. There are 10 episodes per season. So because an individual episode represents 1/10 of a season, 0.1 is the value we will use for the length of observation. </p>
<p style="margin-left: 40px;"><img alt="1-Sample Poisson Rate" src="https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/b6fa9d2e740aacc86d4223ea75487d95/1_poisson_rate_season.jpg" style="width: 495px; height: 106px;" /></p>
<p>With a different length of observation, we see that there are about 32 deaths per season with a confidence interval ranging from 28 to 37.</p>
Poisson Regression
<p>The last thing we'll do with our Poisson data is perform a regression analysis. In Minitab, go to <strong>Stat > Regression > Poisson Regression > Fit Poisson Model</strong> to perform a Poisson regression analysis. We'll look at whether we can use the episode number (1 through 10) to predict how many deaths there will be in that episode.</p>
<p style="margin-left: 40px;"><img alt="Poisson Regression" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/0540d6716d13c4de50421155038b2c03/poisson_regression.jpg" style="width: 402px; height: 238px;" /></p>
<p>The first thing we'll look at is the p-value for the predictor (episode). The p-value is 0.042, which is less than 0.05, so we can conclude that there is a statistically significant association between the episode number and the number of deaths. However, the Deviance R-Squared value is only 18.14%, which means that the episode number explains only 18.14% of the variation in the number of deaths per episode. So while an association exists, it's not very strong. Even so, we can use the coefficients to determine how the episode number affects the number of deaths. </p>
<p style="margin-left: 40px;"><img alt="Poisson Regression" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/adb7514fd7892c3b8591895321c96918/poisson_regression_2.jpg" style="width: 241px; height: 227px;" /></p>
<p>The episode number was entered as a categorical variable, so the coefficients show how each episode number affects the number of deaths relative to episode number 1. A positive coefficient indicates that episode number is likely to have more deaths than episode 1. A negative coefficient indicates that episode number is likely to have fewer deaths than episode 1.</p>
<p>We see that the start of each season usually starts slow, as 7 of the 9 episode numbers have positive coefficients. Episodes 8, 9, and 10 have the highest coefficients, meaning relative to the first episode of the season they have the greatest number of deaths. So even though our model won't be great at predicting the exact number of deaths for each episode, it's clear that the show ends each season with a bang.</p>
<p>And considering episode 8 of the current season airs this Sunday, if you're a <em>Game of Thrones</em> viewer you should brace yourself, because death is coming. Or, as they would say in Essos:</p>
<p><em>Valar morghulis.</em></p>
Data AnalysisFun StatisticsStatisticsStatistics in the NewsFri, 10 Jun 2016 12:03:00 +0000http://blog.minitab.com/blog/the-statistics-game/poisson-data-examining-the-number-deaths-in-an-episode-of-game-of-thronesKevin RudyA Six Sigma Healthcare Project, part 4: Predicting Patient Participation with Binary Logistic ...
http://blog.minitab.com/blog/understanding-statistics/a-six-sigma-healthcare-project%2C-part-4%3A-predicting-patient-participation-with-binary-logistic-regression
<p>By looking at the data we have about 500 cardiac patients, we've learned that easy access to the hospital and good transportation are key factors influencing participation in a rehabilitation program.</p>
<p><span style="line-height: 20.8px;"><img alt="monitor" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/96f126b828fb05a099854d278cfba6eb/monitor.jpg" style="margin: 10px 15px; float: right; width: 296px; height: 212px;" />Past data shows that each month, about 15 of the patients discharged after cardiac surgery do not have a car. Providing transportation to the hospital might make these patients more likely to join the rehabilitation program, but the costs of such a service </span><span style="line-height: 1.6;">can't exceed the potential revenue from participation. </span></p>
<p><span style="line-height: 1.6;">We can use </span><a href="http://blog.minitab.com/blog/understanding-statistics/a-six-sigma-healthcare-project-part-3-creating-binary-logistic-regression-models-for-patient-participation" style="line-height: 1.6;">the binary logistic regression model developed in part 3</a><span style="line-height: 1.6;"> to predict probabilities of participation, to identify where </span><span style="line-height: 20.8px;">transportation assistance</span><span style="line-height: 1.6;"> might make the biggest impact, and to develop an estimate of how much we could invest in such assistance. </span></p>
<p>Download the <a href="//cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/521a4efabc5f892eef403e8c4a354f9a/cardiacparticipationdata_1.mtw">data set</a> to follow along and try these analyses yourself. If you don't already have Minitab, you can <a href="http://www.minitab.com/products/minitab/free-trial/">download and use our statistical software free for 30 days</a>.</p>
Using the Regression Model to Predict Patient Participation
<p>We want to develop some estimates of the probability of participation based on whether or not a patient has access to transportation. The first step is make some mesh data representing our population. In Minitab, go to <strong>Calc > Create Mesh Data...</strong>, and complete the dialog box as shown below. (The maximum and minimum ranges for Age and Distance are drawn directly from the descriptive statistics for the sample data we used to create our regression model.) </p>
<p style="margin-left: 40px;"><img alt="Make Mesh Data Dialog" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/b8437a0b42a63b84e2dcdd65281a3eef/make_mesh_data.png" style="width: 484px; height: 340px;" /></p>
<p>When you press OK, Minitab adds 2 new columns to the worksheet that contain the 200 different combinations of the levels of these factors. Now we'll add two additional columns, one representing patients who have access to a car, and one representing those who don't. Now our worksheet should include four columns of data as shown:</p>
<p style="margin-left: 40px;"><img alt="mesh data in worksheet" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/68996051dec970430f44a13696661e02/mesh_data_worksheet.png" style="width: 255px; height: 191px;" /></p>
<p>Now we'll go to <strong>Stat > Regression > Binary Logistic Regression > Predict...</strong> Minitab remembers the last regression model that was run; to make sure it's the right one, click the "View Model..." button...</p>
<p style="margin-left: 40px;"><img alt="view model" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/1b1f96f9fcd549add0aca00a958137d1/view_model.png" style="width: 232px; height: 93px;" /></p>
<p>and confirm that the model displayed is the correct one.</p>
<p style="margin-left: 40px;"><img alt="view model" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/fb09621305a2dcc0c898c7ac90eaa79d/view_model.png" style="width: 600px; height: 267px;" /></p>
<p>Next, press the "Predict" button and complete the dialog box using the mesh variables we created, as shown. We can also press the "Storage" button to tell Minitab to store the Fits (the predicted probabilities) for each data point in the worksheet. Note that the column selected for the Mobility term is "Car," so all of these predictions will be based on the equation for patients who have access to a vehicle. </p>
<p style="margin-left: 40px;"><img alt="regression prediction dialog" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a7d5fe724e64117e1a26d18e21203d5a/prediction_dialog_1.png" style="line-height: 20.8px; width: 819px; height: 392px;" /></p>
<p>When you click <strong>OK</strong> through all dialogs, Minitab will add a column of data that shows the predicted probability of participation for patients, assuming they have a vehicle. </p>
<p>Now we'll create the predictions for individuals who don't have cars. Press <strong>CTRL-E</strong> to edit the previous dialog box. This time, for the M<span style="line-height: 1.6;">obility column, select "NoCar."</span></p>
<p style="margin-left: 40px;"><img alt="no car" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f7076a96b1cd37b6d8d38e7e7138631c/prediction_dialog_2.png" style="width: 307px; height: 68px;" /></p>
<p>When you press OK, Minitab recalculates the probabilities for the patients, this time using the equation that assumes they do not have a vehicle. The probabilities of participation for each data point are stored in two columns in the worksheet, which I've renamed PFITS-Car and PFITS-No car. </p>
<p style="margin-left: 40px;"><img alt="pfits" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/165b5cbca4f6edb53c10507d49d4438b/pfits_in_worksheet.png" style="width: 404px; height: 306px;" /></p>
Where Can Providing Transportation Make an Impact?
<p>Now we have estimated probabilities of participation for patients with the same age and distance characteristics, both with and without access to a vehicle. It would be helpful to visualize the differences in these probabilities to see where offering transportation might make the biggest impact in increasing participation rates.</p>
<p>First, we'll use Minitab's calculator to compute the difference in probabilities between having and not having a car. Go to <strong>Calc > Calculator...</strong> and complete the dialog as shown: </p>
<p style="margin-left: 40px;"><img alt="calculator" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/5e7a07e8f7f41d8569535006f9d5debd/calculator.png" style="width: 433px; height: 383px;" /></p>
<p>Now we have column of data named "Car - NoCar" that contains the probability difference for patients with the same age and distance characteristics both with and without a vehicle. We can use that column to create a contour plot that offers additional insight into the relationships between the likelihood of participation in the rehabilitation program and a patient's age, distance, and mobility. Select <strong>Graph > Contour Plot...</strong> and complete the dialog as shown: </p>
<p style="margin-left: 40px;"><img alt="contour plot dialog box" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/648a12ddd172dee236c96cccc0c1d0bc/contour_plot_dialog.png" style="width: 531px; height: 344px;" /></p>
<p>Minitab produces this contour plot (we have edited the range of colors from the default):</p>
<p style="margin-left: 40px;"><img alt="contour plot" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/132dc089b5f7a9c2fb849f5427b1c927/contourplot.png" style="width: 576px; height: 384px;" /></p>
<p>From this plot we can see the patients for whom transportation assistance is likely to make the most impact. These are the patients whose age and distance characteristics fall within the dark-red-colored area, where access to a vehicle raises the probability of participation by more than 40 percent.</p>
<p>The hospital <em>could </em>use this information to carefully target potential recipients of transportation assistance, but doing so would raise many ethical issues. Instead, the hospital will offer transportation assistance to any potential participant who needs it. The project team decides to calculate the average probability of participation for all patients without access to a vehicle.</p>
<p>To obtain that average, select <strong>Stat > Basic Statistics > Display Descriptive Statistics...</strong> in Minitab, and choose "PFITS-NoCar" as the variable. Click on the "Statistics" button to make sure the Mean is among the descriptive statistics being calculated, and click OK. Minitab will display the descriptive statistics you've selected in the Session Window. </p>
<p style="margin-left: 40px;"><img alt="descriptive statistics" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/e14f812d91728a9e17b8ae2dde0d8f30/pfits_nocar_mean.png" style="width: 290px; height: 89px;" /></p>
<p>According to our binary logistic regression model, the average probability of participation for all patients without a car equals 0.1695, which we will round up to .17. Now we can easily calculate an estimated break-even point for ensuring transport for patients who need it. We have the following information on hand: </p>
<div style="margin-left:40px;">
Patients per month without a car.................................................
15
Average probability of participation without a car...........................
.30
Average number of sessions per participant..................................
29
Revenue per session..................................................................
$23
</div>
<p>Based on these figures, a per-patient maximum for transportation can be calculated as:</p>
<p style="margin-left: 40px;">.17 probability of participation x 29 sessions x $23 per session = $113.39</p>
<p><span style="line-height: 1.6;">Since about 15 discharged cardiac patients each month do not have a car, we can invest at most 15 x $113.39 = $1700.85/month in transportation assistance. </span></p>
<span style="line-height: 1.6;">Implementing Transportation Assistance for Patient Participation</span>
<p>As described in the <a href="http://dx.doi.org/10.1080/08982112.2011.553761" target="_blank">article on which inspired this series of posts</a>, the project team evaluated potential improvement options against this this economic calculation and developed a process that brought together patients with cars and those without to carpool to sessions. A pilot-test of the process proved successful, and most of the car-less patients noted that they would not have participated in the rehabilitation program without the service. </p>
<p>After implementing the new carpool process, the project team revisited the key factors they had considered at the start of the initiative, the number of patients enrolling in the program each month, and the average number of sessions participants attended.</p>
<p>After implementing the carpool process, the average number of sessions attended remained constant at 29. But patient participation rose from 33 to 45 per month, which exceeded the project goal of increasing participation to 36 patients per month. Additional revenues turned out to be circa $96,000 annually.</p>
Take-Away Lessons from This Project Study
<p>If you've read all four parts of this series, you may recall that at the start of the <span style="line-height: 20.8px;"> </span><span style="line-height: 20.8px;">Six Sigma</span><span style="line-height: 20.8px;"> </span><span style="line-height: 1.6;">project, several stakeholders believed that the problem of low participation could be addressed by creating a nicer brochure for the program, and by encouraging surgeons to tell their patients about it at an earlier point in their treatment. </span></p>
<p>None of those initial ideas wound up being implemented, but the project team succeeded in meeting the project goals by enacting improvements that were supported by their data analysis. For me, this is a core takeaway from this article. </p>
<p>As the authors note, "Often people’s ideas on processes are incorrect, but improvement actions based on these are still being implemented. These actions cause frustrated employees, may not be cost effective, and in the end do not solve the problem."</p>
<p>Thus, the article makes a compelling case for the value of applying data analysis to improve processes in healthcare. "<span style="line-height: 1.6;">Even when a somewhat more advanced technique like logistic regression modeling is required," the authors write, "exploratory graphics such as boxplots and bar charts point the direction toward a valuable solution."</span></p>
Health Care Quality ImprovementThu, 09 Jun 2016 12:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/a-six-sigma-healthcare-project%2C-part-4%3A-predicting-patient-participation-with-binary-logistic-regressionEston MartzA Six Sigma Healthcare Project, part 3: Creating a Binary Logistic Regression Model for Patient ...
http://blog.minitab.com/blog/understanding-statistics/a-six-sigma-healthcare-project-part-3-creating-binary-logistic-regression-models-for-patient-participation
<p>In part 2 of this series, we used graphs and tables to see <a href="http://blog.minitab.com/blog/understanding-statistics/a-six-sigma-healthcare-project-part-2-visualizing-the-impact-of-individual-factors">how individual factors affected rates of patient participation</a> in a cardiac rehabilitation program. This initial look at the data indicated that ease of access to the hospital was a very important contributor to patient participation.</p>
<p><img alt="physical therapy facility" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/cf2f4a8979304153c3ea8fd5210215e8/rehab_facility.jpg" style="margin: 10px 15px; float: right; width: 320px; height: 211px;" />Given this revelation, a bus or shuttle service for people who do not have cars might be a good way to increase participation, but only if such a service doesn't cost more than the amount of revenue generated by participation.</p>
<p>A good estimate of that probability will enable us to calculate the break-even point for such a service. We can use regression to develop a statistical model that lets us do just that.</p>
<p>We have a binary response variable, because only two outcomes exist: a patient either participates in the rehabilitation program, or does not. To model these kinds of responses, we need to use a statistical method called "Binary Logistic Regression." This may sound intimidating, but it's really not as scary as it sounds, especially with a statistical software package like Minitab.</p>
<p>Download the <a href="//cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/521a4efabc5f892eef403e8c4a354f9a/cardiacparticipationdata_1.mtw">data set</a> to follow along and try these analyses yourself. If you don't already have Minitab, you can <a href="http://www.minitab.com/products/minitab/free-trial/">download and use our statistical software free for 30 days</a>.</p>
Using Stepwise Binary Logistic Regression to Obtain an Initial Model
<p>First, let's review our data. We know the gender, age, and distance from the hospital for 500 cardiac patients. We also know whether or not they have access to a vehicle ("Mobility") and whether or not they participated in the rehabilitation program after their surgery (coded so that 0 = no, and 1 = yes). </p>
<p style="margin-left: 40px;"><img alt="data" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/98b12f3c127d5370169e1eee44679577/data_snapshot_for_blr.png" style="width: 339px; height: 348px;" /></p>
<p>The process of developing a regression equation that can predict a response based on your data is called "Fitting a model." We'll do this in Minitab by selecting <strong>Stat > Regression > Binary Logistic Regression > Fit Binary Logistic Model...</strong> </p>
<p style="margin-left: 40px;"><img alt="Binary Logistic Regression menu" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/4d475b704b654470760bc8fb561d6547/binary_logistic_regression_menu.png" style="width: 633px; height: 367px;" /></p>
<p>In the dialog box, we need to select the appropriate columns of data for the response we want to predict, and the factors we wish to base the predictions on. In this case, our response variable is "Participation," and we're basing predictions on the continuous factors of "Age" and "Distance," along with the categorical factor "Mobility." </p>
<p style="margin-left: 40px;"><img alt="binary logistic regression dialog 1" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/2cefc4466635a596b45a8a7d58472486/binary_logistic_regression_dialog_1.png" style="width: 580px; height: 494px;" /></p>
<p>After selecting the factors, click on the "Model" button. This lets us tell Minitab whether we want to consider interactions and polynomial terms in addition to the main effects of each factor. Complete the Model dialog as shown below. To include the two-way interactions in the model, highlight all the items in the Predictors window, make sure that the “Interactions through order:” drop-down reads “2,” and press the Add button next to it:</p>
<p style="margin-left: 40px;"><img alt="Binary Logistic Regression Dialog 2" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/2687d8f135fa8f4ef8822d47e72e357a/binary_logistic_regression_model_dialog.png" style="width: 520px; height: 542px;" /></p>
<p>Click OK to return to the main dialog, then press the “Coding” button. In this subdialog, we can tell Minitab to automatically standardize the continuous predictors, Age and Distance. There are several reasons you might want to standardize the continuous predictors, and different ways of standardizing depending on your intent.</p>
<p>In this case, we’re going to standardize by subtracting the mean of the predictor from each row of the predictor column, then dividing the difference by the standard deviation of the predictor. This centers the predictors and also places them on a similar scale. This is helpful when a model contains highly correlated predictors and interaction terms, because standardizing helps reduce multicollinearity and improves the precision of the model’s estimated coefficients. To accomplish this, we just need to select that option from the drop-down as shown below:</p>
<p style="margin-left: 40px;"><img alt="Binary Logistic Regression - Coding" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/429d47a6cf7b5a9e63363229512691f2/binary_logistic_regression_coding_dialog.png" style="width: 519px; height: 560px;" /></p>
<p>After you click OK to return to the main dialog, press the "Stepwise" button. We use this subdialog to perform a stepwise selection, which is a technique that automatically chooses the best model for your data. Minitab will evaluate several different models by adding and removing various factors, and select the one that appears to provide the best fit for the data set. You can have Minitab provide details about the combination of factors it evaluates at each "step," or just show the recommended model<span style="line-height: 1.6;">.</span></p>
<p style="margin-left: 40px;"><span style="line-height: 1.6;"><img alt="Binary Logistic Regression - stepwise" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f9a8375d2b429a1b7bd475182b6b1461/binary_logistic_regression_dialog_3.png" style="width: 490px; height: 542px;" /> </span></p>
<p>Now click OK to close the Stepwise dialog, and OK again to run the analysis. The output in Minitab's Session window will include details about each potential model, followed by a summary or "deviance" table for the recommended model.</p>
<span style="line-height: 1.6;">Assessing and Refining the Regression Model</span>
<p><span style="line-height: 1.6;">Using software to perform stepwise regression is extremely helpful, but it's always important to check the recommended model to see if it can be refined further. In this case, all of the model terms are significant, and the deviance table's adjusted R2 indicates that the model explains about 40 percent of the observed variation in the response data. </span></p>
<p style="margin-left: 40px;"><img alt="stepwise regression selected model" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/f6a95017c24e7ed0f315471d685a91c6/output_deviance_table.png" style="width: 520px; height: 289px;" /></p>
<p>We also want to look at the table of coded coefficients immediately below the summary. The final column of the table lists the VIFs, or variance inflation factors, for each term in the model. This is important because VIF values greater than 5–10 can indicate unstable coefficients that are difficult to interpret.</p>
<p>None of these terms have VIF values over 10<span style="line-height: 1.6;">. </span></p>
<p style="margin-left: 40px;"><img alt="variance inflation factors (VIF)" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/bc95899d33a4511e92288e158693e39d/output_coded_coefficients.png" style="width: 307px; height: 182px;" /></p>
<p>Minitab also performs goodness-of-fit tests that assess how well the model predicts observed data. The first two tests, the deviance and Pearson chi-squared tests, have high p-values, indicating that these tests do not support the conclusion that this model is a poor fit for the data. However, the low p-value for the Hosmer-Lemeshow test indicates that the model could be improved.</p>
<p style="margin-left: 40px;"><img alt="goodness-of-fit tests" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/5bf6a28128eebbed608253c122c3daf6/output_goodness_of_fit_tests.png" style="width: 364px; height: 119px;" /></p>
<p>It may be that our model does not account for curvature that exists in the data. We can ask Minitab to add polynomial terms, which model curvature between predictors and the response, to see if it improves the model. Press CTRL-E to recall the binary logistic regression dialog box, then press the "Model" button. To add the polynomial terms, select Age and Distance in the Predictors window, make sure that "2" appears in the “Terms through order:” drop-down, and press "Add" to add those polynomial terms to the model. An order 2 polynomial is the square of the predictor.</p>
<p style="margin-left: 40px;"><img alt="binary logistic regression dialog 4" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/d2cd01d9ac0a49a57885801204896885/model_dialog_adding_polynomial_terms.png" style="width: 520px; height: 542px;" /></p>
<p>You may have noticed that we did not select “Mobility” above. Why? Because that categorical variable is coded with 1’s and 0’s, so the polynomial term would be identical to the term that is already in the model.</p>
<p>Now press OK all the way out to have Minitab evaluate models that include the polynomial terms. Minitab generates the following output:: </p>
<p style="margin-left: 40px;"><img alt="binary logistic regression final model" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ad1c37024ed6de634bb5215f448b2227/model_with_polynomials_deviance_table.png" style="width: 483px; height: 311px;" /></p>
<p>However, the VIFs for Mobility and the Distance*Mobility interaction remain higher than desirable:</p>
<p style="margin-left: 40px;"><img alt="VIF" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/173c061a582e9c7e3d00ceba0c65c94d/binary_logistic_regression_model_2_coefficients.png" style="width: 349px; height: 184px;" /></p>
<p>So far, so good—all model terms are significant, and the adjusted R2 indicates that the new model accounts for 51 percent of the observed variation in the response, compared to the initial model’s 40 percent. The coefficients are also acceptable, with no variance inflation factors above 10. These terms are moderately correlated, but probably not enough to make the regression results unreliable: </p>
<p style="margin-left: 40px;"><img alt="binary-logistic-regression-model-VIF" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/7935289a46fbd462c1f13d8c901fc94b/model_with_polynomials_coefficients.png" style="width: 313px; height: 188px;" /></p>
<p>The goodness-of-fit tests for this model also look good—the lack of p-values below 0.05 indicate that these tests do not suggest the model is a poor fit for the observed data.</p>
<p style="margin-left: 40px;"><img alt="final-binary-logistic-regression-model-goodness-of-fit-tests" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/6d188aa3bda808ab62df9f4ef08f692c/model_with_polynomials_goodness_of_fit_tests.png" style="width: 342px; height: 118px;" /></p>
The Binary Logistic Regression Equations
<p><span style="line-height: 1.6;">This model seems like the best option for predicting the probability of patient participation in the program. Based on the available data, Minitab has calculated the following regression equations, one that predicts the probability of attendance for people who have access to their own transportation, and one for those who do not: </span></p>
<p style="margin-left: 40px;"><img alt="regression equations" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/25c8ca3c2f4b195e446bf18b75fffee8/model_with_polynomials_regression_equations.png" style="width: 533px; height: 112px;" /></p>
<p><span style="line-height: 1.6;">In the next post, we'll complete this process by <a href="http://blog.minitab.com/blog/understanding-statistics/a-six-sigma-healthcare-project%2C-part-4%3A-predicting-patient-participation-with-binary-logistic-regression">using this model to make predictions about the probability of participation</a> </span><span style="line-height: 20.8px;">in the rehabilitation program </span><span style="line-height: 1.6;">and how much we can afford to invest in transportation to help more cardiac patients. </span></p>
Health Care Quality ImprovementTue, 07 Jun 2016 12:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/a-six-sigma-healthcare-project-part-3-creating-binary-logistic-regression-models-for-patient-participationEston Martz3 Ways to Get Up and Running with Statistical Softwareâ€”Fast
http://blog.minitab.com/blog/real-world-quality-improvement/3-ways-to-get-up-and-running-with-statistical-software%E2%80%94fast
<p><img alt="running" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/6670317de85d2c7240a30951338eadd2/up_and_running.jpg" style="width: 300px; height: 171px; float: right; margin: 10px 15px;" />The last thing you want to do when you purchase a new piece of software is spend an excessive amount of time getting up and running. You’ve probably been ready to the use the software since, well, <em>yesterday.</em> Minitab has always focused on making our software easy to use, but many professional software packages do have a steep learning curve.</p>
<p>Whatever package you’re using, here are three things you can do to speed the process of starting to analyze your data with statistical software:</p>
1. Get Technical Support
<p>If you’re having trouble figuring out how to do something in a statistical software package, the makers of the software should be ready to provide the assistance you need.</p>
<p>When you purchase Minitab, whether for a single user or for your entire organization, we offer <a href="https://www.minitab.com/support/" target="_blank">free technical support</a>, by phone or online, to help you install and use the software. We’ve also got quick-start installation guides and an <a href="http://support.minitab.com/installation/" target="_blank">extensive library of installation-related FAQs</a> to browse.</p>
<p>Minitab’s technical support team includes specialists in <span style="line-height: 20.8px;">statistics and quality improvement, as well as</span><span style="line-height: 20.8px;"> </span><span style="line-height: 1.6;">technology, so they can assist with virtually any challenge you encounter while using the software.</span></p>
2. Consult Help
<p>Let’s face it, when a problem arises, the documentation for a lot of software is not all that helpful. That’s why many of us tend to ignore the “Help” menu when we encounter a software-related question. But if you haven’t explored the Help options offered by your statistical software, you should check them out.</p>
<p>Most software have some sort of built-in Help content, but our team has taken it a step further by offering truly useful, valuable information within Minitab. That information includes concise overviews of major statistical topics, guidance for setting up your data, information on methods and formulas, comprehensive guidance for completing dialog boxes, and easy-to-follow examples. And that’s just the start. Minitab’s built-in help options also include:</p>
<p style="margin-left:1.0in;"><strong><a href="https://www.minitab.com/products/minitab/assistant/" target="_blank">The Assistant</a>:</strong> You certainly don’t need to be a statistics expert to get the insight you need from your data. Minitab’s Assistant menu interactively guides you through several types of analyses—including Measurement Systems Analysis, Capability Analysis, Hypothesis Tests, Control Charts, DOE and Multiple Regression.</p>
<p style="margin-left:1.0in;"><strong>StatGuide: </strong>After you analyze your data, the built-in StatGuide helps you interpret statistical graphs and tables in a practical, straightforward way. To access the StatGuide, just right-click on your output, press Shift+F1 on the keyboard, or click the StatGuide icon in the toolbar:</p>
<p style="margin-left:1.0in;"><img alt="stat guide" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/acf7d51c157e9d9e29d863a398301135/stat_guide.jpg" style="width: 543px; height: 53px;" /></p>
<p style="margin-left:1.0in;"><strong>Tutorials: </strong>For a refresher on statistical tasks, take a look at built-in tutorials (<strong>Help > Tutorials</strong>), which include an overview of data requirements, step-by-step instructions, and guidance on interpreting the results.</p>
3. Free Web Site Resources
<p>See what kinds of material exists on the web site of your statistical software package. There may be much more there than basic information about the product!</p>
<p><span style="line-height: 1.6;">For instance, at the Minitab web site you can attend </span><a href="https://www.minitab.com/support/webinars/" style="line-height: 1.6;" target="_blank">live webinars</a><span style="line-height: 1.6;">, view </span><a href="https://www.minitab.com/support/videos/" style="line-height: 1.6;" target="_blank">recorded webcasts</a><span style="line-height: 1.6;">, and read step-by-step how-to’s and detailed </span><a href="http://www.minitab.com/articles/" style="line-height: 1.6;" target="_blank">technical articles</a><span style="line-height: 1.6;">. The </span><a href="http://blog.minitab.com/" style="line-height: 1.6;" target="_blank">Minitab Blog</a><span style="line-height: 1.6;"> also offers tips and techniques for using Minitab in quality improvement projects, research, and more.</span></p>
<p>Perhaps my favorite resource on Minitab.com is the <a href="http://support.minitab.com/minitab/17/" target="_blank">Minitab Product Support Section</a>, which features a getting started guide, a topic library with all the various analyses available in Minitab, a free data set library to practice analyses, and a macro library that contains over 100 helpful macros you can use to automate, customize, and extend the functionality of Minitab analyses.</p>
<p><em>Interested in learning more about our pricing and licensing options? Visit <a href="http://www.minitab.com/products/minitab/pricing" target="_blank">http://www.minitab.com/products/minitab/pricing</a> and <a href="https://www.minitab.com/contact-us/" target="_blank">contact us</a> if you have questions.</em></p>
Project ToolsQuality ImprovementStatistics HelpStatsMon, 06 Jun 2016 12:01:00 +0000http://blog.minitab.com/blog/real-world-quality-improvement/3-ways-to-get-up-and-running-with-statistical-software%E2%80%94fastCarly BarryRegression versus ANOVA: Which Tool to Use When
http://blog.minitab.com/blog/michelle-paret/regression-versus-anova%3A-which-tool-to-use-when
<p>Suppose you’ve collected data on cycle time, revenue, the dimension of a manufactured part, or some other metric that’s important to you, and you want to see what other variables may be related to it. Now what?</p>
<img alt="thinker" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/a8cd9157d0cc1e5f75d11dc7f67e0fcd/thinkergorilla.jpg" style="line-height: 20.8px; float: left; width: 225px; height: 218px; margin: 10px 15px; border-width: 1px; border-style: solid;" />
<div>
<p>When I graduated from college with my first statistics degree, my diploma was bona fide proof that I'd endured hours and hours of classroom lectures on various statistical topics, including <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-tutorial-and-examples" style="line-height: 1.6;">linear regression</a>, <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-analysis-of-variance-anova-and-the-f-test" style="line-height: 1.6;">ANOVA</a>, and <a href="http://blog.minitab.com/blog/fun-with-statistics/analyzing-titanic-survival-rates-part-ii-v1" style="line-height: 1.6;">logistic regression</a>.</p>
<p>However, there wasn’t a single class that put it all together and explained which tool to use when. I have all of this data for my Y and X's and I want to describe the relationship between them, but what do I do now?</p>
<p>Back then, I wish someone had clearly laid out which regression or ANOVA analysis was most suited for this type of data or that. Let's start with how to choose the right tool for a continuous Y…</p>
Continuous Y, Continuous X(s)
<p>Example:</p>
<p> Y: Weights of adult males</p>
<p> X’s: Age, Height, Minutes of exercise per week</p>
<p>What tool should you use? <strong>Regression</strong></p>
<p>Where’s that in Minitab? <strong>Stat > Regression > Regression > Fit Regression Model</strong></p>
<p> </p>
Continuous Y, Categorical X(s)
<p>Example:</p>
<p> Y: Your Mario Kart Wii score</p>
<p> X’s: Wii controller type (racing wheel or standard), whether you stand or sit while playing, character (Mario, Luigi, Yoshi, Bowser, Peach)</p>
<p>What tool should you use? ANOVA</p>
<p>Where’s that in Minitab? <strong>Stat > ANOVA > General Linear Model > Fit General Linear Model</strong></p>
<p> </p>
Continuous Y, Continuous AND Categorical X(s)
<p>Example:</p>
<p> Y: Number of hours people sleep per night</p>
<p> X’s: Age, activity prior to sleeping (none, read a book, watch TV, surf the internet), whether or not the person has young children…“I had a bad dream, I'm thirsty, there’s a monster under my bed!”</p>
<p>What tool should you use? You have a choice of using either <strong>ANOVA or Regression</strong></p>
<p>Where’s that in Minitab? <strong>Stat > ANOVA > General Linear Model > Fit General Linear Model </strong><em>or</em> <strong>Stat > Regression > Regression > Fit Regression Model</strong></p>
<p>I personally prefer GLM because it offers <a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/keep-that-special-someone-happy-when-you-perform-multiple-comparisons">multiple comparisons</a>, which are useful if you have a significant categorical X with more than 2 levels. For example, suppose activity prior to sleep is significant. Comparisons will tell you which of the 4 levels—none, read a book, watch TV, surf the Internet—are significantly different from one another.</p>
<p>Do people who watch TV sleep, on average, the same as people who surf the Internet, but significantly less than people who do nothing or read? Or, perhaps, are internet surfers significantly different from the other three categories? Comparisons help you detect these differences.</p>
Categorical Y
<p>If Y is categorical, then you can use logistic regression for your continuous and/or categorical X’s. The 3 types of logistic regression are:</p>
<p> <strong> Binary: </strong> Y with 2 levels (yes/no, pass/fail)</p>
<p> <strong>Ordinal: </strong>Y with more than 2 levels that have a natural order (low/medium/high)</p>
<p> <strong>Nominal</strong>: Y with more than 2 levels that have no order (sedan/SUV/minivan/truck)</p>
<p>So the next time you have a bunch of X’s and a Y and you want to see if there's a relationship between them, here is a summary of which tool to use when:</p>
<p><img alt="Tool Selection Guide" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a038d79708c41277dc075ad297c54d62/regression_versus_anova.jpg" style="width: 757px; height: 446px;" /></p>
<p>For step-by-step instructions on how to use General Regression, General Linear Model, or Logistic Regression in Minitab Statistical Software, just navigate to any of these tools in Minitab and click Help in the bottom left corner of the dialog. You will then see ‘example’ located at the top of the Help screen. And Minitab customers can always contact Minitab Technical Support at 814-231-2682 or <a href="http://www.minitab.com/contact-us" style="line-height: 1.6;">www.minitab.com/contact-us</a>. Our Tech Support team is staffed with statisticians, and best of all, accessing them is free!</p>
</div>
ANOVAData AnalysisLean Six SigmaQuality ImprovementRegression AnalysisSix SigmaStatisticsStatistics HelpStatsThu, 02 Jun 2016 12:00:00 +0000http://blog.minitab.com/blog/michelle-paret/regression-versus-anova%3A-which-tool-to-use-whenMichelle ParetA Six Sigma Healthcare Project, part 1: Examining Factors with a Pareto Chart
http://blog.minitab.com/blog/understanding-statistics/a-six-sigma-healthcare-project-part-1-examining-factors-with-a-pareto-chart
<p>Over the past year I've been able to work with and learn from practitioners and experts who are using data analysis and Six Sigma to improve the quality of healthcare, both in terms of operational efficiency and better patient outcomes. I've been struck by how frequently a very basic analysis can lead to remarkable improvements, but some insights cannot be attained without conducting more sophisticated analyses. One such situation is covered in a 2011 <em>Quality Engineering</em> article on the application of <a href="http://dx.doi.org/10.1080/08982112.2011.553761">binary logistic regression in a healthcare Six Sigma project</a>.</p>
<p>In this series of blog posts, I'll follow the path of the project discussed in that article and show you how to perform the analyses described using Minitab Statistical Software. (I am using simulated data, so my analyses will not match those in the original article.)</p>
The Six Sigma Project Goal
<p>The goal of this Six Sigma project was to attract and retain more patients in a hospital's cardiac rehabilitation program. On being discharged, heart-surgery patients are advised to join this program, which offers psychological support and guidance on a healthy diet and lifestyle. Program participants also have two or three physical therapy sessions per week, for up to 45 sessions.</p>
<p>An average of 33 new patients begin participating in the program per month, and participants attend an average of 29 sessions. But many discharged patients do not enroll in the program, and many who do drop out before they complete it. Greater rates of participation would benefit individual patients' health and increase the hospital's revenues.</p>
<p>The project team identified two critical metrics they might improve:</p>
<ul>
<li>The number of patients participating in the program each month</li>
<li>The number of therapy sessions for each participant</li>
</ul>
<p>The team set a goal to increase the average number of new participants to 36 per month, and to increase the average number of sessions each patient attends to 32.</p>
Available Patient Data
<p>Existing data on the hospital's cardiac patients includes:</p>
<ul>
<li>The distance between each patient's home and the hospital</li>
<li>Patient's age and gender</li>
<li>Whether or not the patient has access to a car</li>
<li>Whether or not the patient participated in the rehabilitation program</li>
</ul>
<p>To illustrate the analyses conducted for this project, we will use a simulated set of data for 500 patients. Download the <a href="//cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/521a4efabc5f892eef403e8c4a354f9a/cardiacparticipationdata_1.mtw">data set</a> to follow along and try these analyses yourself. If you don't already have Minitab, you can <a href="http://www.minitab.com/products/minitab/free-trial/">download and use our statistical software free for 30 days</a>.</p>
Exploring Why Patients Leave the Program with a Pareto Chart
<p><span style="line-height: 20.8px;">Encouraging patients who start the program to complete it, or at least to attend a greater number of sessions, has the potential to be a quick and easy "win," </span>so the project team began by looking at why 156 patients who started the program eventually dropped out.</p>
<p>The reasons patients gave for dropping out of the rehabilitation program were placed into several different categories, then visualized with a <span><a href="http://blog.minitab.com/blog/understanding-statistics/explaining-quality-statistics-so-your-boss-will-understand-pareto-charts">Pareto chart</a></span>.</p>
<p><span style="line-height: 20.8px;">The Pareto chart is a must-have in any analyst’s toolbox. </span>The Pareto principle states that about 80% of outcomes come from 20% of the possible causes. <span style="line-height: 1.6;">By plotting the frequencies and corresponding percentages of a categorical variable, a Pareto chart helps identify the "vital few"—the “20%" that really matter, so you can focus your efforts where they can make the most difference.</span></p>
<p>To create this chart in Minitab, open <strong>Stat > Quality Tools > Pareto Chart...</strong> From our worksheet of simulated hospital data, select the <em>Reason</em> column as shown:</p>
<p style="margin-left: 40px;"><img alt="Pareto Dialog" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/dba69467c17b4b989011c6771b524e7e/pareto_dialog.png" style="width: 500px; height: 241px;" /></p>
<p>When you press <strong>OK</strong>, Minitab creates the following chart:</p>
<p style="margin-left: 40px;"><img alt="Pareto Chart of Reasons" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/4418104d45e4556ff52270cc56aec5c8/pareto_chart_of_reason.png" style="width: 576px; height: 384px;" /></p>
<p>Along the x-axis, Minitab displays the reasons people dropped out of the rehabilitation program, along with the percent of the total and the cumulative percentage each reason accounted for. We can see that some 80% of these patients dropped out of the program for one of the following reasons:</p>
<ul>
<li>They were readmitted to the hospital.</li>
<li>Work or other obligations conflicted with the program schedule.</li>
<li>They could not participate for medical reasons.</li>
<li>They had their own exercise facilities.</li>
</ul>
<p>While encouraging existing participants to complete the program seemed like a good strategy, the Pareto chart shows that most people stop participating due to factors that are beyond the hospital's control. Therefore, rather than focusing on keeping existing participants, the team decided to explore how to attract more new participants.</p>
Getting More Patients to Participate in the Program
<p>Having decided to focus on increasing initial enrollment, the <span style="line-height: 1.6;">project team next gathered cardiologists, physical therapists, patients, and other stakeholders to brainstorm about the factors that influence participation. </span></p>
<p><span style="line-height: 1.6;">At these brainstorming sessions, many stakeholders insisted that more people would participate in the rehabilitation program if the brochure about it were better. Another suggested solution involved sending a letter to cardiologists encouraging them to be more positive about the program and to mention it to patients at an earlier point in their treatment. </span></p>
<p>The project team recorded these suggestions, but they were wary of jumping to conclusions that weren't supported by data. They decided to look more closely at the data they had from existing patients before proceeding with any potential solutions.</p>
<p>In part 2, we will review how the team used <a href="http://blog.minitab.com/blog/understanding-statistics/a-six-sigma-healthcare-project-part-2-visualizing-the-impact-of-individual-factors">graphs and basic descriptive statistics to get quick insight</a> into the influence of individual factors on patient participation in the program.</p>
Health Care Quality ImprovementTue, 31 May 2016 12:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/a-six-sigma-healthcare-project-part-1-examining-factors-with-a-pareto-chartEston MartzIs Stephen Curry the Best NBA Point Guard Ever? Let's Check the Data
http://blog.minitab.com/blog/statistics-in-the-field/is-stephen-curry-the-best-nba-point-guard-ever-lets-check-the-data
<p><em>by Laerte de Araujo Lima, guest blogger </em></p>
<p>The NBA's 2015-16 season will be one for the history books. Not only was it the last season of <a href="http://www.nba.com/lakers/news/160413_kobepresser">Kobe Bryan</a>, who scored 60 points in his final game, but the Golden State Warriors set <a href="http://www.nba.com/news/2015-16-golden-state-warriors-chase-1995-96-chicago-bulls-all-time-wins-record/">a new wins record</a>, beating the previous record set by 1995-96 Chicago Bulls.</p>
<p><img alt="stephen curry" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/25a3dc0f9e9c615fae1224259b7c0c6f/320px_stephen_curry_vs_washington_2016_1_.jpg" style="width: 320px; height: 216px; margin: 10px 15px; float: right;" />The Warriors seem likely to take this season's NBA title, in large part thanks to the performance of point guard <a href="http://www.nba.com/playerfile/stephen_curry/">Stephen Curry</a>. A lot of my friends are even saying Curry's skill and performance make him the best point guard ever in NBA history—but it is true? Curry’s performance is amazing, and he's the key element of Warriors’ success, but it seems a little early to define him as the best NBA point guard <em>ever</em>. But in the meantime, we can use data to answer another question:</p>
<p>Has any other point guard in NBA history matched Stephen Curry’s performance during their initial seven seasons?</p>
<p>As a fan of both basketball and Six Sigma, I set out to answer this question methodically, following these steps:</p>
1. Define the Sample of Point Guards for the Study
<p>ESPN recently published <a href="http://espn.go.com/nba/story/_/page/nbarankPGs/ranking-top-10-point-guards-ever">their list</a> of the 10 best NBA point guards, which puts Magic Johnson first and Curry fourth. ESPN considers both objective factors (NBA titles, MVP nominations, etc.) and subjective parameters (player vision, charisma, team engagement, etc.) to compare players. In keeping with Six Sigma, I want my analysis to be based on figure and facts; however, ESPN's list makes a good starting point. Here are their rankings:</p>
<ol>
<li>Magic Johnson</li>
<li>Oscar Robertson</li>
<li>John Stockton</li>
<li>Stephen Curry</li>
<li>Isiah Thomas</li>
<li>Chris Paul</li>
<li>Steve Nash</li>
<li>Jason Kidd</li>
<li>Walt Frazier</li>
<li>Bob Cousy</li>
</ol>
2. Define the Data Source
<p>This is the easiest part of the job. The NBA web site is a rich source of data, so we are going to use it to check the regular-season performances of each player in ESPN's list. This makes the data average well balanced among all players, because we are going to use the same number of matches per player per season.</p>
3. Define the Critical-to-Quality (CTQ) Factors
<p>In my opinion, the following CTQ factors (based on NBA standards criteria) best characterize point guard performance and how they add value to the team's main target—winning a game:</p>
<p style="text-align: center; margin: 5px 25px;"><strong>CTQ </strong></p>
<p style="text-align: center; margin: 5px 25px;"><strong>CTQ Definition</strong></p>
<p style="text-align: center; margin: 5px 25px;"><strong>Rationale</strong></p>
<p style="text-align: center; margin: 5px 25px;"><strong>PTS</strong></p>
<p style="text-align: center; margin: 5px 25px;">Average points per game</p>
<p style="text-align: center; margin: 5px 25px;">Impact of the player on the overall score makes a positive contribution to winning the game.</p>
<p style="text-align: center; margin: 5px 25px;"><strong>FG%</strong></p>
<p style="text-align: center; margin: 5px 25px;">Percentage of successful field goals</p>
<p style="text-align: center; margin: 5px 25px;">Player efficiency in shooting makes a positive contribution to winning the game.</p>
<p style="text-align: center; margin: 5px 25px;"><strong>3P%</strong></p>
<p style="text-align: center; margin: 5px 25px;">Percentage of successful 3-point field goals</p>
<p style="text-align: center; margin: 5px 25px;">Player efficiency in the 3-point line shoot makes a positive contribution to winning the game.</p>
<p style="text-align: center; margin: 5px 25px;"><strong>FT%</strong></p>
<p style="text-align: center; margin: 5px 25px;">Percentage of successful free-throw field goals</p>
<p style="text-align: center; margin: 5px 25px;">Player efficiency in the free throw makes a positive contribution to winning the game.</p>
<p style="text-align: center; margin: 5px 25px;"><strong>AST</strong></p>
<p style="text-align: center; margin: 5px 25px;">Average assistance per game</p>
<p style="text-align: center; margin: 5px 25px;">Assisting teammates makes a positive contribution to winning the game.</p>
<p style="text-align: center; margin: 5px 25px;"><strong>STL</strong></p>
<p style="text-align: center; margin: 5px 25px;">Average steal per game</p>
<p style="text-align: center; margin: 5px 25px;">New ball possession and counterattacks make a positive contribution to winning the game.</p>
<p style="text-align: center; margin: 5px 25px;"><strong>MIN</strong></p>
<p style="text-align: center; margin: 5px 25px;">Average minutes player per game</p>
<p style="text-align: center; margin: 5px 25px;">Player's strategic importance to the team.<br />
Positive contribution to team strategy.</p>
<p style="text-align: center; margin: 5px 25px;"><strong>GS</strong></p>
<p style="text-align: center; margin: 5px 25px;">Games per season where player is part of the initial 5.</p>
<p style="text-align: center; margin: 5px 25px;">Initial starts indicate importance in terms of strategy, as well as fewer injuries.</p>
<p>With the players, critical factors, and the source of data defined, let's dig into the analysis.</p>
4. Ranking Criteria and Methodology
<p>When I opened Minitab Statistical Software to begin looking at each player's average for each CTQ factor, I faced the first challenge in the analysis. Some players did not have the same CTQ measurements in the NBA database. They had played in the NBA’s early years, and the statistics for all CTQ factors weren't available (for example, the 3-point shot didn't exist at the time some players were active). Consequently, I decided to exclude those players from the analysis to avoid discrepancy in the data. That leaves us with this short list:</p>
<ol>
<li>Magic Johnson</li>
<li>John Stockton</li>
<li>Stephen Curry</li>
<li>Isiah Thomas</li>
<li>Chris Paul</li>
<li>Steve Nash</li>
<li>Jason Kidd</li>
</ol>
To compare these players, I used the statistical tool called Analysis of Variance (ANOVA). ANOVA tests the hypothesis that the means of two or more populations are equal. An ANOVA evaluates the importance of one or more factors by comparing the response variable means at the different factor levels. The null hypothesis states that all means are equal, while the alternative hypothesis states that at least one is different.
<p>For this analysis, I used the <a href="http://www.minitab.com/products/minitab/assistant/">Assistant</a> in Minitab to perform One-Way ANOVA analysis. To access this tool, select <strong>Assistant > Hypothesis Tests...</strong> and choose One-Way ANOVA.</p>
<p style="margin-left: 80px;"><img alt="The Assistant in Minitab" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/42afcc329cd4cc74808be92ee49931d9/image001.jpg" style="width: 529px; height: 329px;" /></p>
<p>By performing one-way ANOVA for each of the factors, I can position the players based on the average values of their CTQ variables during each of their first seven seasons. After compiling all results, I deployed a <a href="http://asq.org/learn-about-quality/decision-making-tools/overview/decision-matrix.html">Decision Matrix</a> (another Six Sigma tool) to assess all the players, based on the ANOVA results. The ultimate goal is to determine if Curry’s average performance is superior, inferior, or equal to that of the other players.</p>
<p>Let's take a look at the results of the ANOVA results for the individual CTQ factors.</p>
Average Points per Game (PPG)
<p style="margin-left: 40px;"><img alt="Average Points Per Game" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/bf37da89d5aca795b00a58125d00c9db/image002.gif" style="width: 624px; height: 468px;" /></p>
<p>The Assistant's output is designed to be very easy to understand. The blue bar at the top left answers the bottom-line question, "Do the means differ?" The p-value (0,001) is less than the threshold (< 0.05), telling us that there is a statistically significant difference in means. The intervals displayed on the Means Comparison Chart indicate that Curry and Nash both had huge variation in their average points-per-game in the first 7 years. Statistically speaking, the only player with a average PPG performance that was significantly different from Curry’s is Kidd; all the others had similar performance in their first 7 seasons.</p>
Percentage of Field Goals per Game (FG%)
<p style="margin-left: 40px;"><img alt="FG% ANOVA Results" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/6ce64723b8efe750fcb163443aef7fed/image003.gif" style="width: 624px; height: 468px;" /></p>
<p>As in the previous analysis, the p-value (0,001) is less than the threshold (< 0.05), telling us that there is a difference in means. However, the interpretation of analysis is clearer. In terms of statistical significance, Curry’s performance is better than Kidd's (again), but not better than Magic's, and it is similar to that of the all other players.</p>
<p>Again, we see that Nash has tremendous variation in his field-goal percentage, and Kidd exhibits the worst average FG% among these players.</p>
Average Percentage of 3-point Field Goals per Game (3P%)
<p style="margin-left: 40px;"><img alt="3P% ANOVA" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/b997d1cc5da7397b9a7f3af002e44f89/image004.gif" style="width: 624px; height: 468px;" /></p>
<p>To my surprise, based on this comparison chart Magic has the <em>worst </em>performance—and the most variation— among the players for this factor. On the other hand, Curry has an extremely high average performance, with small variation, and this is what we see in the Warriors games.</p>
<p>If we take a closer look at the three highest performers in this category, Nash, Stockton, and Curry, we see that Nash and Curry’s performances are slightly different. Interestingly, the variation in Stockton's data prevents us from being able to conclude that statistically significant difference exists between his average and those of Curry <em>or </em>Nash.</p>
<p style="margin-left: 40px;"><img alt="3P% ANOVA for Curry, Nash, Stockton" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/93c4b44ac1c49bb7a84571a60567b798/image005.gif" style="width: 624px; height: 468px;" /></p>
<p>As happens in many Six Sigma projects, the results of this factor contradict conventional wisdom: how could Magic Johnson have the lowest average for this factor? I decided to dig a little bit deeper into Magic’s data using the Assistant's Diagnostic Report, which offers a better view of the data's distribution. we can see an outlier in Magic's data. According to this analysis, he actually had a season with 0% of 3-point field goals!</p>
<p style="margin-left: 40px;"><img alt="3PT% Diagnostic Report" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/70ad9430dc5f4ee5e2c98a21eafb7d8b/image007.png" style="width: 623px; height: 467px;" /></p>
<p>I could not believe this, so I double-checked the data at the source. To my surprise, it was correct:</p>
<p style="margin-left: 40px;"><img alt="Magic 0.0" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/9d096d542a23ab383658316271a5af5b/image009.png" style="width: 624px; height: 346px; border-width: 1px; border-style: solid;" /></p>
Average Percentage of Free-Throw Field Goals per Game (FT%)
<p style="margin-left: 40px;"><img alt="FT% ANOVA Output" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/bd00d9293ba2bd6e45cae5307b7adea4/image010.gif" style="width: 624px; height: 468px;" /></p>
<p>In the free throw analysis, Curry's performance is similar to that of Nash and Paul, all of whom performed better than the other players. Once again, Kidd (whom I have nothing against!) has the worst performance.</p>
Average Assistance per Game (AST)
<p style="margin-left: 40px;"><img alt="AST% ANOVA Output" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/100e59e07307222bc73255c030e00316/image011.gif" style="width: 624px; height: 468px;" /></p>
<p>For this factor, both Nash and Curry are at the end of the queue with similar performance. For this factor, it's also clear that while Stockton has both the highest average and small variation in his performance, he's still comparable with Isiah and Magic.</p>
Average Steals per Game (STL)
<p style="margin-left: 40px;"><img alt="STL ANOVA Output" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a1450e5f5b2883a49ad8aa5e7941e2c0/image012.gif" style="width: 624px; height: 468px;" /></p>
<p>Again, the p-value (0,001) is less than the threshold (< 0.05), telling us that there is a statistically significant difference in means. It is clear clear that Nash is not a big “stealer” when compared with the other players. It's interesting to see that Curry’s mean performance is better than Nash's and worse than Paul's, but is not statistically significantly different from the mean performance of the remaining players.</p>
Minutes Played per Game (MIN)
<p style="margin-left: 40px;"><img alt="MIN ANOVA Output" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/33b90915031366f591dd73b52e092971/image013.gif" style="width: 624px; height: 468px;" /></p>
<p>For the first time, the ANOVA results have a p-value (0.075) greater than the threshold (< 0.05), telling us that there is no statistically significant difference in means. It is clear that Nash's performance has huge variation, indicating that his contribution was very irregular in the first 7 season (perhaps due to injuries, adaptation, etc.). The amount of variation in Curry's performance follows Nash's.</p>
Games Started in the Initial 5 per Season (GS)
<p style="margin-left: 40px;"><img alt="Initial 5 ANOVA Output" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/dcb0afca98d1ed138d3d33c07f2f0d7e/image014.gif" style="width: 624px; height: 468px;" /></p>
<p>For this final CTQ, we can see that the p-value (0.006) is less than the threshold (< 0.05), indicating that the means are different. In this case, Stockton and Kidd's means differ. Curry’s presence in the initial 5 in the first 7 season is not statistically significantly different from that of any other other palyers.</p>
<p>Let's take a look at the Diagnostic Report. We can see that Stockton's performance in this CTQ is incredible—he started all seasons' games in the initial 5, showing his importance to the team</p>
<p style="margin-left: 40px;"><img alt="Initial 5 ANOVA Diagnostic Report" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/767e1328424e2c3b332cd5c612d41924/image015.gif" style="width: 624px; height: 468px;" /></p>
Conclusion
<p>Based on the analyses of these criteria, we now have a final have the final outlook based purely on the data. We can use Minitab's <a href="https://blog.minitab.com/blog/statistics-and-quality-improvement/automatically-update-your-conditional-formatting">conditional formatting</a> to highlight the differences between players for the different factors (<strong>></strong> means "better than", <strong><</strong> means "worse than", and = means similar).</p>
<p style="margin-left: 40px;"><img alt="Final Outlook - Condition Formatting" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/12c216328e4bf5214e2693f7195cd3c8/image015.png" style="width: 604px; height: 202px;" /></p>
From the analysis, we can conclude that
<ul>
<li>Considering all of the CTQs, Curry’s overall performance is not better than any other point guard in the study, although he does stand out for some individual factors.</li>
<li>Curry’s PTS is superior only to Kidd's.</li>
<li>In terms of shot efficiency, Curry’s FG% is better than Kidd's but inferior to Magic's, and at the same level as all other players.</li>
<li>Curry’s 3-point performance is amazing, but this analysis shows Stockton’s at the same level.</li>
<li>On the other hand, Curry's FT% is better than that of all the other players, except Paul and Nash.</li>
<li>Curry’s assistance per season is inferior to all other point guards, except Nash.</li>
<li>For steals, Curry’s mean performance is better than Nash's, worse than Paul's, and not statistically significantly different from the remaining players.</li>
<li>In terms of MIN and GS, Curry's performance is similar to that of the other players.</li>
<li>If we just compare points-per-game (PTS) and shot efficiency (FG%,FT%,3P%) separately, Curry’s overall performance is better than Kidd's, for sure. But if we compare the other CTQ (AST, STL, MIN,GS) factors in the same way, Chris Paul has better performance than Curry.</li>
</ul>
<p>Based on this analysis, perhaps we need a few more seasons' worth of data to compare these players overall performance and reach a more certain conclusion.</p>
<p> </p>
<p><strong>About the Guest Blogger: </strong></p>
<p><em>Laerte de Araujo Lima is a Supplier Development Manager for Airbus (France). He has previously worked as product quality engineer for Ford (Brazil), a Project Manager in MGI Coutier (Spain), and Quality Manager in IKF-Imerys (Spain). He earned a bachelor's degree in mechanical engineering from the University of Campina Grande (Brazil) and a master's degree in energy and sustainability from the Vigo University (Spain). He has 10 years of experience in applying Lean Six Sigma to product and process development/improvement. To get in touch with Laerte, please follow him on Twitter @laertelima or on</em> <a href="http://www.linkedin.com/pub/laerte-lima/7/46b/443" target="_blank"><strong><em>LinkedIn</em></strong></a><em>.</em></p>
<p> </p>
<p style="font-size:11px;"><em>Photo of Stephen Curry by <a href="https://www.flickr.com/people/27003603@N00">Keith Allison</a>, used under Creative Commons 2.0. </em></p>
Fun StatisticsStatistics in the NewsFri, 13 May 2016 12:00:00 +0000http://blog.minitab.com/blog/statistics-in-the-field/is-stephen-curry-the-best-nba-point-guard-ever-lets-check-the-dataGuest BloggerWhat's a Moving Range, and How Is It Calculated?
http://blog.minitab.com/blog/marilyn-wheatleys-blog/whats-a-moving-range-and-how-is-it-calculated
<p>We often receive questions about moving ranges because they're used in various tools in our <a href="http://www.minitab.com/products/minitab">statistical software</a>, including control charts and capability analysis when data is not collected in subgroups. In this post, I'll explain what a moving range is, and how a moving range and average moving range are calculated.</p>
<p>A moving range measures how variation changes over time when data are collected as individual measurements rather than in subgroups.</p>
<p>If we collect individual measurements and need to plot the data on a control chart, or assess the capability of a process, we need a way to estimate the variation over time. But when we have individual observations, we cannot calculate the standard deviation for each subgroup. In such cases, the average moving range across all subgroups is an alternative way to estimate process variation.</p>
<p>Consider the 10 random data points plotted in the graph below:</p>
<p style="margin-left: 40px;"><img height="369" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/7b447a0adb4a6e3a23fee5a34ab07563/7b447a0adb4a6e3a23fee5a34ab07563.png" width="624" /></p>
<p>A moving range is the distance or difference between consecutive points. For example, MR1 in the graph below represents the first moving range, MR2 represents the second moving range, and so forth:</p>
<p style="margin-left: 40px;"><img height="414" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/041539e9131ddbfb6cae7517ec190ab8/041539e9131ddbfb6cae7517ec190ab8.png" width="624" /></p>
<p>The difference between the first and second points (MR1) is 0.704, and that’s a positive number since the first point has a lower value than the second. The second moving range, MR2, is the difference between the second point (21.0494) and the third (19.6375), and that’s a negative number (-1.4119), since the third point has a lower value than the second. If we continue that way, we’ll have 9 moving ranges for our 10 data points.</p>
<p>In Minitab, a moving range is easy to compute by "lagging" the data. Continuing the example with the 10 data points above, I can use <strong>Stat</strong> > <strong>Time Series</strong> > <strong>Lag</strong>, and then complete the dialog box as shown below:</p>
<p style="margin-left: 40px;"><img alt="a" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/Image/2b125f53827fb9cc7aec8b2a300845a7/capture.PNG" style="width: 557px; height: 330px;" /></p>
<p>Clicking <strong>OK</strong> in the dialog above will shift the data in C1 down by one row and store the results in C4. Now we can use <strong>Calc</strong> > <strong>Calculator</strong> to subtract C4 from C1 and calculate all the moving ranges:</p>
<p style="margin-left: 40px;"><img alt="b" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/Image/070834223bef3007c9621c940ff3a195/capture.PNG" style="width: 563px; height: 380px;" /></p>
<p>To calculate the average moving range, we need to use the absolute value of the moving ranges we calculated above. We’ll take a look at how to do that later. </p>
<p>When Minitab calculates the average of a moving range, the calculation also includes and <a href="http://support.minitab.com/en-us/minitab/17/topic-library/quality-tools/capability-analyses/data-and-data-assumptions/unbiasing-constants/">unbiasing constant</a>. The formula used to calculate the moving range is:</p>
<p style="margin-left: 40px;"><img alt="equation" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/a5a46a4ff68b1425bbd155792d20a701/a5a46a4ff68b1425bbd155792d20a701.png" style="border-width: 0px; border-style: solid; width: 624px; height: 140px;" /></p>
<p>The table of unbiasing constants is available within Minitab and <a href="http://support.minitab.com/en-us/minitab-express/1/help-and-how-to/control-charts/how-to/variables-data-in-subgroups/xbar-r-chart/methods-and-formulas/unbiasing-constants-d2-d3-and-d4/">on this page</a>.</p>
<p>We’ve already done most of the work. To finish, we’ll find the right value of d2 in the table linked above, and use Minitab’s calculator to get the answer. We need the value of d2 that corresponds to a moving range of length 2 (that’s the number of points in each moving range calculation, but don’t worry, I’ll explain more about the length of the moving range later):</p>
<p style="margin-left: 40px;"><img border="0" height="179" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/2caa9e4eec046f281a834976260d3f8c/2caa9e4eec046f281a834976260d3f8c.png" width="173" /></p>
<p>Now back to Minitab, and we can use <strong>Calc</strong> > <strong>Calculator</strong> to get our answer:</p>
<p style="margin-left: 40px;"><img alt="c" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/Image/f3eaf58a9007d6420c44b559206206eb/capture.PNG" style="width: 604px; height: 386px;" /></p>
<p>Using the formula above, we’re telling Minitab to use the absolute values (ABS calculator command) in C5 to calculate the mean, and then divide that by our unbiasing constant value of 1.128.</p>
<p>Now to check our results against Minitab, we can use <strong>Stat </strong>> <strong>Control Charts</strong> > <strong>Variables Charts for Individuals</strong> > <strong>I-MR</strong> and enter our original data column:</p>
<p style="margin-left: 40px;"><img border="0" height="334" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/0c80b992ef94f8d021aa1ebfc5bbc594/0c80b992ef94f8d021aa1ebfc5bbc594.png" width="507" /></p>
<p>Next, choose <strong>I-MR Options</strong> > <strong>Storage</strong>, and check the box next to <strong>Standard deviations</strong>, then click <strong>OK</strong> in each dialog box:</p>
<p style="margin-left: 40px;"><img alt="d" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/Image/c4b545c37882980e3f690ad046f63626/capture.PNG" style="width: 582px; height: 440px;" /></p>
<p>The results show the same average moving range value we calculated, <strong>0.602627</strong>. </p>
<p>In this case, because we used a moving range of length 2, the average moving range gives us an estimate of the average distance between our consecutive individual data points. A moving range of length 2 is Minitab’s default, but that can be changed by clicking the <strong>I-MR Options</strong> button in the I-MR chart dialog, and then choosing the <strong>Estimate</strong> tab:</p>
<p style="margin-left: 40px;"><img border="0" height="438" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/3e03c57905dc63ff5be0971285a4d518/3e03c57905dc63ff5be0971285a4d518.png" width="442" /></p>
<p>Here we can type in a different value (let’s use 3 as an example), and Minitab will use that number of points to estimate the moving ranges. If we did that for the calculations above, we’d have to make two adjustments:</p>
<ol>
<li>
<p>We’d need to choose the correct value for the unbiasing constant, d2, that corresponds with a moving range length of 3:</p>
<p><img alt="t" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/Image/94764a32eec04329f8dfdd4d73219214/capture.PNG" style="width: 173px; height: 182px;" /></p>
</li>
<li>We’d have to adjust the number of points used for our moving ranges from 2 to 3. Using the same random data as before:</li>
</ol>
<p style="margin-left: 40px;"><img border="0" height="248" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/File/bf32b968b0788bc21e03920397ccefe4/bf32b968b0788bc21e03920397ccefe4.png" width="71" /></p>
<p style="margin-left: 40px;">With three data points, we’ll use just the highest and the lowest values from the first 3 rows, so MR1 will be 21.0494 – 19.6375 = 1.4119.</p>
<p><span style="line-height: 1.6;">If you’ve enjoyed this post, check out some of our other blog </span><a href="http://blog.minitab.com/blog/control-charts" style="line-height: 1.6;">posts about control charts</a><span style="line-height: 1.6;">.</span></p>
<p> </p>
Fri, 29 Apr 2016 12:00:00 +0000http://blog.minitab.com/blog/marilyn-wheatleys-blog/whats-a-moving-range-and-how-is-it-calculatedMarilyn WheatleyManipulating Your Survey Data in Minitab
http://blog.minitab.com/blog/statistics-and-quality/manipulating-your-survey-data-in-minitab
<p>As a recent graduate from Arizona State University with a degree in Business Statistics, I had the opportunity to work with students from different areas of study and help analyze data from various projects for them.</p>
<p><img alt="survey symbold" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/3b2a7f4c85707a09177d3da12dbaa009/online_survey_icon_or_logo_svg.png" style="margin: 10px 15px; float: right; width: 300px; height: 300px;" />One particular group asked for help analyzing online survey data they had gathered from other students, and they wanted to see if their new student program was beneficial. I would describe this request as them giving us a "pile of data" and saying, "Tell us what you can find out." </p>
<p>There were numerous problems with this "pile of data" because it wasn't organized, in part because of the way the survey itself was set up. (Our statistics professor later told us that she asked this group to come in because she'd looked at their data before they presented it to us and she wanted to see how we would perform with a "real-world" situation.)</p>
<p>Unfortunately, the statistics department didn't have a time machine that would enable us to go back and set up the survey to have better data that was more organized (I guess if we <em>did </em>have a time machine there would be no need for predictive analytics), but we did have <a href="http://www.minitab.com/products/minitab/">Minitab and its tools</a> to help with the importing of data, reviewing the data, and putting it in a format that is best for analyzing. </p>
<p>So let’s assume you have a pile of survey data that is:</p>
<ul>
<li>Unbiased</li>
<li>Taken from a random sample</li>
<li>Taken from the appropriate audience</li>
<li>Contained enough respondents</li>
</ul>
<p><span style="line-height: 1.6;">Many online survey tools allow you to download your data to a .csv or Excel file, which would be perfect to <span>import into Minitab</span>. </span></p>
<p><span style="line-height: 1.6;">In fact, Minitab 17.3 has recently included a new dialog box that shows you the data before it is opened so you can modify the data type, include/exclude certain columns, and see how many rows are within the data. Within options of that same dialog box you are able to choose what is done with missing data points, and missing data rows. All of these new functions give you the ability to bring a "pile of data" into Minitab a little cleaner with less headache.</span></p>
<p style="margin-left: 40px;"><img alt="open survey data dialog" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/b51a0c86-e2dd-456e-878a-4196c7381c3a/File/c5319276614d905f12f38eca2f3a6343/c5319276614d905f12f38eca2f3a6343.png" style="width: 669px; height: 570px;" /> </p>
<p><span style="line-height: 1.6;">Once the data is in Minitab reviewing the data is essential to uncover any irregularities that may be hiding in the data before analysis. Within the Project Manager Bar there is the information icon that allows you to be able to see each column name, column ID, row count, how many missing data points and the type of data of each column. This provides the ability to quickly scan the different columns to make sure that the online data you received correctly by checking the row count, any missing data irregularities, and data type. </span></p>
<p style="margin-left: 40px;"><img alt="data" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/b51a0c86-e2dd-456e-878a-4196c7381c3a/File/637ee7794419e3ad489f4a98c96cbc3c/637ee7794419e3ad489f4a98c96cbc3c.png" style="width: 396px; height: 342px;" /></p>
<p> </p>
<p>Minitab also has numerous tools to format the data before analysis, including <a href="http://blog.minitab.com/blog/understanding-statistics/what-to-do-when-your-datas-a-mess2c-part-3">coding, sorting and splitting worksheets</a>. </p>
<p>For example, occasionally survey data will use “0” in the place of a non-response. This can be a problem because any data analysis will make this a data point when it probably shouldn't be. Minitab can find those “0”s and replace them with missing data to remove them from your worksheet so they won't throw off your analysis (<strong>Editor > Find and Replace > Replace</strong>).</p>
<p>Before analysis you can also sort your data (<strong>Data > Sort</strong>) and choose the column you would like to sort the data to, and you can also create a new worksheet from the sorted data. I also really like the Split and Subset Worksheet options in the event you have a lot of data and it would be easier to look at smaller sections of it for analysis (<strong>Data > Split Worksheet</strong> and <strong>Data > Subset Worksheet</strong>)<strong>.</strong></p>
<p>These are just a few tools that allow you to import data and then prepare the data without having to go back and forth between your spreadsheet software and statistical software. So when you have someone drop off a "pile of data," see how you can use your Minitab tools to shovel through and find the gems that are lying beneath the surface.</p>
Data AnalysisStatisticsTue, 26 Apr 2016 12:00:00 +0000http://blog.minitab.com/blog/statistics-and-quality/manipulating-your-survey-data-in-minitabJoseph Hartsock3 Tips for Importing Excel Data into Minitab
http://blog.minitab.com/blog/michelle-paret/3-tips-for-importing-excel-data-into-minitab
<p>Getting your data from Excel into <a href="http://www.minitab.com/products/minitab/">Minitab Statistical Software</a> for analysis is easy, especially if you keep the following tips in mind.</p>
Copy and Paste
<p><span style="line-height: 20.8px;">To paste into Minitab, you can either right-click in the worksheet and choose </span><strong style="line-height: 20.8px;">Paste Cells</strong><span style="line-height: 20.8px;"> or you can use </span><strong style="line-height: 20.8px;">Control-V</strong><span style="line-height: 20.8px;">. </span>Minitab allows for 1 row of column headers, so if you have a single row of column info (or no column header info), then you can quickly copy and paste an entire sheet at once. However, if you have multiple rows of descriptive text at the top of your Excel file, then use the following steps:</p>
<p><em> Step 1</em> - Choose a single row for your column headers and paste it into Minitab. </p>
<p><em> Step 2</em> - Go back to your Excel file to copy all of the actual data over.</p>
<p>And if you have any summary info at the end of your Excel file, you'll want to exclude that too, just like any extraneous column header info.</p>
<p><img alt="Excel to Minitab" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/951006fe8ebf8bfde86486660018fbe0/excel_to_mtb.jpg" style="width: 650px; height: 379px;" /></p>
<p> </p>
Importing Lots of Data
<p><img alt="File Open dialog" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/75e6b833214b1e9cbda4e6056a2fde43/file_open_menu.jpg" style="line-height: 20.8px; width: 253px; height: 359px; margin: 10px 15px; float: right;" /></p>
<p>Copy/paste is ideal when you have only a few Excel sheets. But what if you have lots of <span style="line-height: 1.6;">sheets? In this case, try using </span><strong style="line-height: 1.6;">File > Open</strong><span style="line-height: 1.6;">. Another advantage of </span><strong style="line-height: 1.6;">File > Open</strong><span style="line-height: 1.6;"> is the additional import options, should you need them. For example, you can specify which sheets </span><span style="line-height: 1.6;">and rows to include. And there are even options to handle messy data issues, such as case mismatches and </span><a href="http://blog.minitab.com/blog/michelle-paret/how-to-remove-leading-or-trailing-spaces-from-a-data-set" style="line-height: 1.6;">leading and trailing spaces</a><span style="line-height: 1.6;">.</span></p>
<div>
Fixing Column Formats
<p>Minitab has 3 column formats: numeric, text, and date/time. Text columns are noted with a <strong>-T</strong> and date/time columns are noted with a <strong>-D</strong>, while numeric columns appear without such an indicator. Why does column format matter? It matters because certain graphs and analyses are only available for certain formats. For example, if you want to create a time series plot, Minitab will not allow you to use a text column. If you bring data over from Excel and the format does not reflect the type of data in a given column, just right-click in the column and choose <strong>Format Column</strong> to select the right type, such as <strong>Automatic numeric</strong>.</p>
<p><img alt="column formats" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/350de8d0fc91e01d485bc1f124a28148/column_format.jpg" style="width: 645px; height: 332px;" /></p>
<p><span style="line-height: 1.6;">Once you import your data and it's properly formatted, you can then use the </span><strong style="line-height: 1.6;">Stat</strong><span style="line-height: 1.6;">, </span><strong style="line-height: 1.6;">Graph</strong><span style="line-height: 1.6;">, and </span><strong style="line-height: 1.6;">Assistant</strong><span style="line-height: 1.6;"> menus to start analyzing it. And if you need help running a particular analysis, just </span><a href="http://www.minitab.com/contact-us" style="line-height: 1.6;">contact Minitab Technical Support</a><span style="line-height: 1.6;">. This outstanding service is free and is staffed with statisticians, so don't hesitate to give them a call.</span></p>
</div>
Data AnalysisFri, 22 Apr 2016 12:00:00 +0000http://blog.minitab.com/blog/michelle-paret/3-tips-for-importing-excel-data-into-minitabMichelle ParetUnderstanding t-Tests: t-values and t-distributions
http://blog.minitab.com/blog/adventures-in-statistics/understanding-t-tests-t-values-and-t-distributions
<p>T-tests are handy <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-why-we-need-to-use-hypothesis-tests-in-statistics" target="_blank">hypothesis tests</a> in statistics when you want to compare means. You can compare a sample mean to a hypothesized or target value using a one-sample t-test. You can compare the means of two groups with a two-sample t-test. If you have two groups with paired observations (e.g., before and after measurements), use the paired t-test.</p>
<img alt="Output that shows a t-value" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/efd51d69e3947d70197143b735e0c51d/t_value_swo.png" style="line-height: 20.8px; float: right; width: 400px; height: 57px; margin: 10px 15px; border-width: 1px; border-style: solid;" />
<p>How do t-tests work? How do t-values fit in? In this series of posts, I’ll answer these questions by focusing on concepts and graphs rather than equations and numbers. After all, a key reason to use <a href="http://www.minitab.com/products/minitab">statistical software like </a><a href="http://www.minitab.com/en-us/products/minitab/" target="_blank">Minitab</a> is so you don’t get bogged down in the calculations and can instead focus on understanding your results.</p>
<p>In this post, I will explain t-values, t-distributions, and how t-tests use them to calculate probabilities and assess hypotheses.</p>
What Are t-Values?
<p>T-tests are called t-tests because the test results are all based on t-values. T-values are an example of what statisticians call test statistics. A test statistic is a standardized value that is calculated from sample data during a hypothesis test. The procedure that calculates the test statistic compares your data to what is expected under the <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/hypothesis-tests/basics/null-and-alternative-hypotheses/" target="_blank">null hypothesis</a>.</p>
<p>Each type of t-test uses a specific procedure to boil all of your sample data down to one value, the t-value. The calculations behind t-values compare your sample mean(s) to the null hypothesis and incorporates both the sample size and the variability in the data. A t-value of 0 indicates that the sample results exactly equal the null hypothesis. As the difference between the sample data and the null hypothesis increases, the absolute value of the t-value increases.</p>
<p>Assume that we perform a t-test and it calculates a t-value of 2 for our sample data. What does that even mean? I might as well have told you that our data equal 2 fizbins! We don’t know if that’s common or rare when the null hypothesis is true.</p>
<p>By itself, a t-value of 2 doesn’t really tell us anything. T-values are not in the units of the original data, or anything else we’d be familiar with. We need a larger context in which we can place individual t-values before we can interpret them. This is where t-distributions come in.</p>
What Are t-Distributions?
<p>When you perform a t-test for a single study, you obtain a single t-value. However, if we drew multiple random samples of the same size from the same population and performed the same t-test, we would obtain many t-values and we could plot a distribution of all of them. This type of distribution is known as a <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/introductory-concepts/basic-concepts/sampling-distribution/" target="_blank">sampling distribution</a>.</p>
<p>Fortunately, the properties of t-distributions are well understood in statistics, so we can plot them without having to collect many samples! A specific t-distribution is defined by its <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/introductory-concepts/basic-concepts/df/" target="_blank">degrees of freedom (DF)</a>, a value closely related to sample size. Therefore, different t-distributions exist for every sample size. <span style="line-height: 20.8px;">You can graph t-distributions u</span><span style="line-height: 1.6;">sing Minitab’s </span><a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/graphs/graphs-of-distributions/probability-distribution-plots/probability-distribution-plot/" style="line-height: 1.6;" target="_blank">probability distribution plots</a><span style="line-height: 1.6;">.</span></p>
<p>T-distributions assume that you draw repeated random samples from a population where the null hypothesis is true. You place the t-value from your study in the t-distribution to determine how consistent your results are with the null hypothesis.</p>
<p style="margin-left: 40px;"><img alt="Plot of t-distribution" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/d628e56f0380e0edcf575502a670ed31/t_dist_20_df.png" style="width: 576px; height: 384px;" /></p>
<p>The graph above shows a t-distribution that has 20 degrees of freedom, which corresponds to a sample size of 21 in a one-sample t-test. It is a symmetric, bell-shaped distribution that is similar to the normal distribution, but with thicker tails. This graph plots the probability density function (PDF), which describes the likelihood of each t-value.</p>
<p>The peak of the graph is right at zero, which indicates that obtaining a sample value close to the null hypothesis is the most likely. That makes sense because t-distributions assume that the null hypothesis is true. T-values become less likely as you get further away from zero in either direction. In other words, when the null hypothesis is true, you are less likely to obtain a sample that is very different from the null hypothesis.</p>
<p>Our t-value of 2 indicates a positive difference between our sample data and the null hypothesis. The graph shows that there is a reasonable probability of obtaining a t-value from -2 to +2 when the null hypothesis is true. Our t-value of 2 is an unusual value, but we don’t know exactly <em>how </em>unusual. Our ultimate goal is to determine whether our t-value is unusual enough to warrant rejecting the null hypothesis. To do that, we'll need to calculate the probability.</p>
Using t-Values and t-Distributions to Calculate Probabilities
<p>The foundation behind any hypothesis test is being able to take the test statistic from a specific sample and place it within the context of a known probability distribution. For t-tests, if you take a t-value and place it in the context of the correct t-distribution, you can calculate the probabilities associated with that t-value.</p>
<p>A probability allows us to determine how common or rare our t-value is under the assumption that the null hypothesis is true. If the probability is low enough, we can conclude that the effect observed in our sample is inconsistent with the null hypothesis. The evidence in the sample data is strong enough to reject the null hypothesis for the entire population.</p>
<p>Before we calculate the probability associated with our t-value of 2, there are two important details to address.</p>
<p>First, we’ll actually use the t-values of +2 and -2 because we’ll perform a two-tailed test. A two-tailed test is one that can test for differences in both directions. For example, a two-tailed 2-sample t-test can determine whether the difference between group 1 and group 2 is statistically significant in either the positive or negative direction. A one-tailed test can only assess one of those directions.</p>
<p>Second, we can only calculate a non-zero probability for a range of t-values. As you’ll see in the graph below, a range of t-values corresponds to a proportion of the total area under the distribution curve, which is the probability. The probability for any specific point value is zero because it does not produce an area under the curve.</p>
<p>With these points in mind, we’ll shade the area of the curve that has t-values greater than 2 and t-values less than -2.</p>
<p><img alt="T-distribution with a shaded area that represents a probability" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/5e124a2c8139681afec706799ebabcec/t_dist_prob.png" style="width: 576px; height: 384px;" /></p>
<p>The graph displays the probability for observing a difference from the null hypothesis that is at least as extreme as the difference present in our sample data while assuming that the null hypothesis is actually true. Each of the shaded regions has a probability of 0.02963, which sums to a total probability of 0.05926. When the null hypothesis is true, the t-value falls within these regions nearly 6% of the time.</p>
<p>This probability has a name that you might have heard of—it’s called the p-value! While the probability of our t-value falling within these regions is fairly low, it’s not low enough to reject the null hypothesis using the common <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-hypothesis-tests%3A-significance-levels-alpha-and-p-values-in-statistics" target="_blank">significance level</a> of 0.05.</p>
<p><a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-correctly-interpret-p-values" target="_blank">Learn how to correctly interpret the p-value.</a></p>
t-Distributions and Sample Size
<p>As mentioned above, t-distributions are defined by the DF, which are closely associated with sample size. As the DF increases, the probability density in the tails decreases and the distribution becomes more tightly clustered around the central value. The graph below depicts t-distributions with 5 and 30 degrees of freedom.</p>
<p><img alt="Comparison of t-distributions with different degrees of freedom" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/5220dc6347611a230e89b70de904b034/t_dist_comp_df.png" style="width: 576px; height: 384px;" /></p>
<p>The t-distribution with fewer degrees of freedom has thicker tails. This occurs because the t-distribution is designed to reflect the added uncertainty associated with analyzing small samples. In other words, if you have a small sample, the probability that the sample statistic will be further away from the null hypothesis is greater even when the null hypothesis is true.</p>
<p>Small samples are more likely to be unusual. This affects the probability associated with any given t-value. For 5 and 30 degrees of freedom, a t-value of 2 in a two-tailed test has p-values of 10.2% and 5.4%, respectively. Large samples are better!</p>
<p>I’ve showed how t-values and t-distributions work together to produce probabilities. To see how each type of t-test works and actually calculates the t-values, read the other post in this series, <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-t-tests:-1-sample,-2-sample,-and-paired-t-tests">Understanding t-Tests: 1-sample, 2-sample, and Paired t-Tests</a>.</p>
<p>If you'd like to learn how the ANOVA F-test works, read my post, <a href="http://blog.minitab.com/blog/adventures-in-statistics/understanding-analysis-of-variance-anova-and-the-f-test">Understanding Analysis of Variance (ANOVA) and the F-test</a>.</p>
Data AnalysisHypothesis TestingLearningStatistics HelpWed, 20 Apr 2016 12:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/understanding-t-tests-t-values-and-t-distributionsJim FrostBest Way to Analyze Likert Item Data: Two Sample T-Test versus Mann-Whitney
http://blog.minitab.com/blog/adventures-in-statistics/best-way-to-analyze-likert-item-data%3A-two-sample-t-test-versus-mann-whitney
<p><img alt="Worksheet that shows Likert data" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/6b1cf78b969699ed58febb026d32051d/likert_worksheet.png" style="float: right; width: 162px; height: 265px; margin: 10px 15px;" />Five-point Likert scales are commonly associated with surveys and are used in a wide variety of settings. You’ve run into the Likert scale if you’ve ever been asked whether you strongly agree, agree, neither agree or disagree, disagree, or strongly disagree about something. The worksheet to the right shows what five-point Likert data look like when you have two groups.</p>
<p>Because Likert item data are discrete, ordinal, and have a limited range, there’s been a longstanding dispute about the most valid way to analyze Likert data. The basic choice is between <a href="http://blog.minitab.com/blog/adventures-in-statistics/choosing-between-a-nonparametric-test-and-a-parametric-test" target="_blank">a parametric test and a nonparametric test</a>. The pros and cons for each type of test are generally described as the following:</p>
<ul>
<li>Parametric tests, such as the 2-sample t-test, assume a normal, continuous distribution. However, with a sufficient sample size, t-tests are robust to departures from normality.</li>
<li>Nonparametric tests, such as the Mann-Whitney test, do not assume a normal or a continuous distribution. However, there are concerns about a lower ability to detect a difference when one truly exists.</li>
</ul>
<p>What’s the better choice? This is a real-world decision that users of <a href="http://www.minitab.com/en-us/products/minitab/" target="_blank">statistical software</a> have to make when they want to analyze Likert data.</p>
<p>Over the years, a number of studies that have tried to answer this question. However, they’ve tended to look at a limited number of potential distributions for the Likert data, which causes the generalizability of the results to suffer. Thanks to increases in computing power, simulation studies can now thoroughly assess a wide range of distributions.</p>
<p>In this blog post, I highlight a simulation study conducted by de Winter and Dodou* that compares the capabilities of the two sample t-test and the Mann-Whitney test to analyze five-point Likert items for two groups. Is it better to use one analysis or the other?</p>
<p>The researchers identified a diverse set of 14 distributions that are representative of actual Likert data. The computer program drew independent pairs of samples to test all possible combinations of the 14 distributions. All in all, 10,000 random samples were generated for each of the 98 distribution combinations! The pairs of samples are analyzed using both the two sample t-test and the Mann-Whitney test to compare how well each test performs. The study also assessed different sample sizes.</p>
<p>The results show that for all pairs of distributions the <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/hypothesis-tests/basics/type-i-and-type-ii-error/" target="_blank">Type I (false positive) error rates</a> are very close to the target amounts. In other words, if you use either analysis and your results are statistically significant, you don’t need to be overly concerned about a false positive.</p>
<p>The results also show that for most pairs of distributions, the difference between the <a href="http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/power-and-sample-size/what-is-power/" target="_blank">statistical power</a> of the two tests is trivial. In other words, if a difference truly exists at the population level, either analysis is equally likely to detect it. The concerns about the Mann-Whitney test having less power in this context appear to be unfounded.</p>
<p>I do have one caveat. There are a few pairs of specific distributions where there is a power difference between the two tests. If you perform both tests on the same data and they disagree (one is significant and the other is not), you can look at a table in the article to help you determine whether a difference in statistical power might be an issue. This power difference affects only a small minority of the cases.</p>
<p>Generally speaking, the choice between the two analyses is tie. If you need to compare two groups of five-point Likert data, it usually doesn’t matter which analysis you use. Both tests almost always provide the same protection against false negatives and always provide the same protection against false positives. These patterns hold true for sample sizes of 10, 30, and 200 per group.</p>
<p>*de Winter, J.C.F. and D. Dodou (2010), Five-Point Likert Items: t test versus Mann-Whitney-Wilcoxon, <em>Practical Assessment, Research and Evaluation</em>, 15(11).</p>
Data AnalysisHypothesis TestingStatisticsStatistics HelpWed, 06 Apr 2016 12:00:00 +0000http://blog.minitab.com/blog/adventures-in-statistics/best-way-to-analyze-likert-item-data%3A-two-sample-t-test-versus-mann-whitneyJim FrostAre You Putting the Data Cart Before the Horse? Best Practices for Prepping Data for Analysis, ...
http://blog.minitab.com/blog/meredith-griffith/are-you-putting-the-data-cart-before-the-horse-best-practices-for-prepping-data-for-analysis%2C-part-1
<p>Most of us have heard a backwards way of completing a task, or doing something in the conventionally wrong order, described as “putting the cart before the horse.” That’s because a horse pulling a cart is much more efficient than a horse pushing a cart.</p>
<p><img alt="cart before horse" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ec1fbea4785510ea0e0a9997c1669c68/cart_horse.png" style="margin: 10px 15px; float: right; width: 350px; height: 206px;" />This saying may be especially true in the world of statistics. Focusing on a statistical tool or analysis before checking out the condition of your data is one way you may be putting the cart before the horse. You may then find yourself trying to force your data to fit an analysis, particularly when the data has not been set up properly. It’s far more efficient to first make sure your <a href="http://blog.minitab.com/blog/understanding-statistics/the-single-most-important-question-in-every-statistical-analysis">data are reliable</a> and then allow your questions of interest to guide you to the right analysis.</p>
<p>Spending a little quality time with your data up front can save you from wasting a lot of time on an analysis that either can’t work—or can’t be trusted.</p>
<p>As a quality practitioner, you’re likely to be involved in many activities—establishing quality requirements for external suppliers, monitoring product quality, reviewing product specifications and ensuring they are met, improving process efficiency, and much more.</p>
<p>All of these tasks will involve data collection and statistical analysis with software such as Minitab. For example, suppose you need to perform a <a href="http://blog.minitab.com/blog/meredith-griffith/fundamentals-of-gage-rr">Gage R&R</a> study to verify your measurement systems are valid, or you need to understand how machine failures impact downtime.</p>
<p>Rather than jumping right into the analysis, you will be at an advantage if you take time to look at your data. Ask yourself questions such as:</p>
<ul>
<li>What problem am I trying to solve?</li>
<li>Is my data set up in a way that will be useful to answering my question?</li>
<li>Did I make any mistakes while recording my data?</li>
</ul>
<p>Utilizing process knowledge can also help you answer questions about your data and identify data entry errors. A focus on preparing and exploring your data prior to an analysis will not only save you time in the long run, but will help you obtain reliable results.</p>
<p>So then, where to begin with best practices for prepping data for an analysis? Let’s look no further than your data.</p>
Clean your data before you analyze it
<p>Let’s assume you already know what problem you’re trying to solve with your data. For instance, you are the area supervisor of a manufacturing facility, and you’ve been experiencing lower productivity than usual on the machines in your area and want to understand why. You have collected data on these machines, recording the amount of time a machine was out of operation, the reason for the machine being down, the shift number when the machine went down, and the speed of the machine when it went down.</p>
<p>The first step toward answering your question is to ensure your data are clean. Cleaning your data before you begin an analysis can save time by preventing rework, such as reformatting data or correcting data entry errors, after you’ve already begun the analysis. Data cleaning is also essential to ensure your analyses and results—and the decisions you make—are reliable.</p>
<p>With the <a href="https://www.minitab.com/en-us/support/minitab/minitab-17.3.1-update/" style="line-height: 20.8px;">latest update to Minitab 17</a><span style="line-height: 20.8px;">, an improved data import helps you identify and correct case mismatches, fix improperly formatted columns, represent missing data accurately and in a manner that is recognized by the software, remove blank rows and extra spaces, and more. When importing your data, you see a preview of your data as a reminder to ensure it’s in the best possible state before it finds its way into Minitab. This preview helps you spot mistakes you have made in your data collection, and automatically corrects mistakes you don’t notice or that are difficult to find in large data sets.</span></p>
<p><img alt="Data Import" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/dae6c7b7-fc22-4616-9d65-f04909c20ab1/Image/b1c679056c60ac2fa82f37e1f1de406b/data_import.jpg" style="width: 775px; height: 655px;" /></p>
<p><em>Minitab offers a data import dialog that helps you quickly clean and format your data before importing into the software, ensuring your data are trustworthy and allowing you to get to your analysis sooner.</em></p>
<p><span style="line-height: 20.8px;">If you’d rather copy and paste your data from Excel, Minitab will ensure you paste your data in the right place. For instance, if your data have column names and you accidentally paste your data into the first row of the worksheet, your data will all be formatted as text—even when the data following your column names are numeric! With </span><a href="https://www.minitab.com/en-us/products/minitab/whats-new/" style="line-height: 20.8px;">Minitab 17.3</a><span style="line-height: 20.8px;">, you will receive an alert that your data is in the wrong place, and Minitab will automatically move your data where it belongs. This alert ensures your data are formatted properly, preventing you from running into the problem during an analysis and saving you time manually correcting every improperly formatted column.</span></p>
<p><img alt="Copy Paste Warning" src="http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/dae6c7b7-fc22-4616-9d65-f04909c20ab1/Image/5df941ffaa491a0072261aef075a19d6/copy_paste_warning.jpg" style="width: 431px; height: 299px;" /></p>
<p><em>Pasting your Excel data in the first row of a Minitab worksheet will trigger this warning, which safeguards against improperly formatted columns.</em></p>
<p><span style="line-height: 1.6;">This is only the beginning! Minitab makes it quick and painless to begin exploring and visualizing your data, offering more insights and ease once you get to the analysis. If you’d like to learn additional best practices for prepping your data for any analysis, stay tuned for my next post where I’ll offer tips for exploring and drawing insights from your data!</span></p>
Data AnalysisStatisticsWed, 30 Mar 2016 14:05:04 +0000http://blog.minitab.com/blog/meredith-griffith/are-you-putting-the-data-cart-before-the-horse-best-practices-for-prepping-data-for-analysis%2C-part-1Meredith Griffith