Minitab  Minitab
Blog posts and articles about using Minitab software in quality improvement projects, research, and more.
http://blog.minitab.com/blog/minitab/rss
Tue, 28 Jun 2016 16:42:33 +0000
FeedCreator 1.7.3

Using the Nelson Rules for Control Charts in Minitab
http://blog.minitab.com/blog/statisticsinthefield/usingthenelsonrulesforcontrolchartsinminitab
<p><em style="lineheight: 1.6;">by Matthew Barsalou, guest blogger</em></p>
<p>Control charts plot your process data to identify and distinguish between common cause and special cause variation. This is important, because identifying the different causes of variation lets you can take action to make improvements in your process without <em>over</em>controlling it.</p>
<p>When you create a control chart, the software you're using should make it easy to see where you may have variation that requires your attention. For example, Minitab Statistical Software automatically flags any control chart data point that is more than three standard deviations above the centerline, as shown in the I chart below.</p>
<div style="marginleft:40px; width:577; fontsize:11px;"><img alt="I Chart of Data  Nelson Rules" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/be7c11b66ec1be76d0ae6cf7ab43f3e4/image002.png" style="lineheight: 20.8px; borderwidth: 0px; borderstyle: solid; width: 576px; height: 384px;" /><br />
I chart example with one outofcontrol point.</div>
<p>A data point that more than three standard deviations from the centerline is one indicator for detecting specialcause variation in a process. There are additional control chart rules introduced by Dr. Lloyd S. Nelson in his April 1984 <em>Journal of Quality Technology </em><a href="http://asq.org/data/subscriptions/jqt_open/1984/oct/jqtv16i4technical.pdf" target="_blank">column</a>. The eight Nelson Rules are shown below, and if you're interested in using them, they can be activated in Minitab.</p>
<div style="marginleft: 40px; width: 550px; fontsize: 11px;"><img alt="Nelson Rules for special cause variation in control charts" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/e7bd87833a33a442c6d171932e6e7480/image003.png" style="width: 545px; height: 691px; borderwidth: 1px; borderstyle: solid;" /><br />
The Nelson rules for tests of special causes. Reprinted with permission from <em>Journal of Quality Technology</em> ©<strong><em>1984</em> ASQ</strong>, asq.org.</div>
<p>To activate the Nelson rules, go to <strong>Control Charts > Variables Charts for Individuals > Individuals... </strong>and then click on "I Chart Options." Go to the <strong>Tests </strong>tab and place a check mark next to the test you would like to select—or simply use the dropdown menu and select “Perform all tests for special causes,” as shown below.</p>
<p style="marginleft: 40px;"><img alt="Individual Charts Options in Minitab" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/31bbbbe82ef49054f70eb2f474da9a34/image005.png" style="borderwidth: 0px; borderstyle: solid; width: 456px; height: 453px;" /></p>
<p>The resulting session window explains which tests failed.</p>
<p style="marginleft: 40px;"><img alt="session window output" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/f0e44dccacaf2851ea8c590d0291cd6e/image006.png" style="borderwidth: 0px; borderstyle: solid; width: 626px; height: 351px;" /></p>
<p>On the chart itself, the data points that failed each test are identified in red as shown below.</p>
<p style="marginleft: 40px;"><img alt="I chart of data" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/4832d48fb67f7085cdc38af84e6c2658/image009.png" style="lineheight: 1.6; borderwidth: 0px; borderstyle: solid; width: 576px; height: 384px;" /></p>
<p>Simply activating all of the rules is not recommended—the false positive rate goes up as each additional rule is activated. At some point the control chart will become more sensitive than it needs to be and corrective actions for <a href="http://blog.minitab.com/blog/understandingstatistics/controlchartsshowyouvariationthatmatters">special causes of variation</a> may be implemented when only common cause is variation present.</p>
<p>Fortunately, Nelson provided detailed guidance on the correct application of his namesake rules. Nelson’s guidance on applying his rules for tests of special causes is presented below.</p>
<div style="marginleft:40px; width:685px; fontsize:11px;">
<p><img alt="comments on test for special causes" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/f6a2616d41592b9c671ac60cb12c98c1/image010.png" style="lineheight: 1.6; borderwidth: 1px; borderstyle: solid; width: 630px; height: 683px;" /><br />
Comments on tests for special causes. Reprinted with permission from <em>Journal of Quality Technology</em> ©<strong><em>1984</em> ASQ</strong>, asq.org.</p>
</div>
<p>Nelson’s tenth comment is an especially important one, regardless of which tests have been activated. </p>
<p>Minitab, together with the Nelson rules, can be very helpful, but neither can replace or remove the need for the analyst's judgment when assessing a control chart. These rules can, however, assist the analyst in making the proper decision. </p>
<p> </p>
<p><strong>About the Guest Blogger</strong></p>
<p><em><a href="https://www.linkedin.com/pub/matthewbarsalou/5b/539/198" target="_blank">Matthew Barsalou</a> is a statistical problem resolution Master Black Belt at <a href="http://www.3kwarner.de/" target="_blank">BorgWarner</a> Turbo Systems Engineering GmbH. He is a Smarter Solutions certified Lean Six Sigma Master Black Belt, ASQcertified Six Sigma Black Belt, quality engineer, and quality technician, and a TÜVcertified quality manager, quality management representative, and auditor. He has a bachelor of science in industrial sciences, a master of liberal studies with emphasis in international business, and has a master of science in business administration and engineering from the Wilhelm Büchner Hochschule in Darmstadt, Germany. He is author of the books <a href="http://www.amazon.com/RootCauseAnalysisStepStep/dp/148225879X/ref=sr_1_1?ie=UTF8&qid=1416937278&sr=81&keywords=Root+Cause+Analysis%3A+A+StepByStep+Guide+to+Using+the+Right+Tool+at+the+Right+Time" target="_blank">Root Cause Analysis: A StepByStep Guide to Using the Right Tool at the Right Time</a>, <a href="http://asq.org/qualitypress/displayitem/index.html?item=H1472" target="_blank">Statistics for Six Sigma Black Belts</a> and <a href="http://asq.org/qualitypress/displayitem/index.html?item=H1473&xvl=76115763" target="_blank">The ASQ Pocket Guide to Statistics for Six Sigma Black Belts</a>.</em></p>
Data Analysis
Lean Six Sigma
Quality Improvement
Six Sigma
Statistics
Mon, 27 Jun 2016 13:57:00 +0000
http://blog.minitab.com/blog/statisticsinthefield/usingthenelsonrulesforcontrolchartsinminitab
Guest Blogger

There Is No Such Thing as “Bad” Data: Top Tips to Avoid Bad Analysis
http://blog.minitab.com/blog/usingdataandstatistics/thereisnosuchthingas%E2%80%9Cbad%E2%80%9Ddata%3Atoptipstoavoidbadanalysis
<p><span style="lineheight: 1.6;">You often hear the data being blamed when an analysis is not delivering the answers you wanted or expected. I was recently reminded that the data chosen or collected for a specific analysis is determined by the analyst, so there is no such thing as bad<em> </em><em>data</em>—only bad <em>analysis</em>. </span></p>
<p><span style="lineheight: 1.6;">This made me think about the steps an analyst can take to minimise the risk of producing analysis that fails to answer answer the questions posed. Here are four tips I think are critical; we'd love to hear your thoughts and tips, too!</span></p>
Tip 1: Diving Is Not Allowed<img alt="no diving!" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/f5e536a36779c9ce32064472b6f1faf7/no_diving_into_the_problem.jpg" style="margin: 10px 15px; float: right; width: 224px; height: 224px;" />
<p>When presented with a business problem to solve, I love to dive straight into analyst mode; however, experience has taught me that to resist this temptation at all costs. Before diving in, it's vital to step back, think about the problem, and consider what type of analysis you are going to do. Broadly speaking, there are three distinct types of analysis:</p>
<ul>
<li><strong>Descriptive—exploring what <em>has</em> happened.</strong><br />
The tools you might use for this type include graphical analysis, hypothesis testing, capability, and control charts.<br />
</li>
<li><strong>Predictive—forecasting what will happen next.</strong><em> </em><br />
In this category of analysis, you use techniques such as regression, time series forecasting, and reliability analysis.<br />
</li>
<li><strong>Prescriptive—determining what should the business do next.</strong><em> </em><br />
Techniques in this type of analysis include design of experiments, optimisation, and simulation.</li>
</ul>
<p>Once you have determined the type of analysis you want to do, you can start trying to find existing data or collect new data to complete your analysis.</p>
Tip 2: Reliable Data Is Key
<p>There are three things you need to consider when collecting data for a specific type of analysis. </p>
<ol>
<li><strong>How are you going to measure performance (your response variable)? </strong><br />
Once you have decided this, you need to ensure that this measurement can be collected accurately and precisely. If your measurements are unreliable for any reason, then your analysis and any recommendations also will be unreliable. Measurement system analysis, including <a href="http://blog.minitab.com/blog/meredithgriffith/fundamentalsofgagerr">gage analyses</a> and <a href="http://blog.minitab.com/blog/understandingstatistics/gotgoodjudgmentproveitwithattributeagreementanalysis">attribute agreement analysis</a> can help with these problems. <br />
</li>
<li><strong>What factors or input parameters might affect your performance?</strong> <br />
These are useful in descriptive analysis for segmenting the results you are seeing, allowing you to highlight opportunities and problems is specific areas of your business. In predictive and prescriptive analysis these are essential for optimising your future business performance.<br />
</li>
<li><strong>What are the potential impacts of this analysis?</strong><br />
Finally, you need to understand the costs, benefits and risks associated with any analysis. This will help you determine how much you are prepared to spend on the analysis itself, and more important, what you are prepared invest to fix any problems and/or develop new opportunities the analysis reveals.</li>
</ol>
Tip 3: It’s All about the Power
<p>Once you know what kind of analysis you need to do, then you can work out how much data you need to collect. Minitab's <a href="http://support.minitab.com/enus/minitab/17/topiclibrary/basicstatisticsandgraphs/powerandsamplesize/powerandsamplesizeanalysesinminitab/">Power and Sample Size</a> menu is one of the best tools for this, as it allows an analyst to calculate the sample size needed for different types of analyses, under a number of scenarios with a minimal amount of prior knowledge about the data you are going to collect.<img alt="powerandsamplesize" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/348c6247c4636d5b1e90778310101bb6/power_and_sample_size_menu.png" style="width: 441px; height: 473px; margin: 10px 15px; float: right;" /></p>
<p>The decisions you an as analyst need to make are:</p>
<ul>
<li><strong>How big is the effect you need to find? </strong> <br />
Power is the probability of finding an effect if it exists. For example if you are making bolts that should be 10 mm in diameter on average, maybe a +/ 1 mm difference would result in too many bolts scrapped for being too big or too small. The determination of this effect (or difference) has to be done by someone with process knowledge, because it is a business, not a statistical decision. However, it <em>is </em>a decision that will impact the sample size.<br />
</li>
<li><strong>How much variation can you expect in your data, measured as a standard deviation?</strong> <br />
You need to decide this because the Power calculation is proportional to the ratio of the size of the effect you are looking for. (If you don’t have a historical standard deviation, you can use the value “1” and enter the differences you are looking for as standard deviations. Typically a onestandarddeviation difference is considered small, and a threestandarddeviation difference large.)<br />
</li>
<li><strong>How powerful do you want your analyses to be? </strong><br />
The power is the probability of finding an effect if there is one to find, and as a minimum this should be 80%. The higher the certainty you want of finding an effect if it exists, the larger the sample you will need. </li>
</ul>
<p>Once you have completed your power and sample size analysis, you are ready to collect your data and analyse it.</p>
Tip 4: Good Analysis Always Has Value
<p>When you start an analysis, you often have an idea of what you expect the results to be, because you have seen some evidence of the problem or opportunity. Consequently, when our ideas or theories are not supported by the analysis we become disappointed in the results. If you have followed a rigorous analytical methodology to answer a specific question, then accept the results, present the recommendations (in some cases this will be the recommendation of no change), and move on to the next analysis. Finding out that something is <em>not </em>important to your business performance can be just as important as finding out what the key influencers are!</p>
<p>Do you have additional suggestions for avoiding bad analyses? </p>
<p> </p>
Data Analysis
Statistics Help
Fri, 24 Jun 2016 12:00:00 +0000
http://blog.minitab.com/blog/usingdataandstatistics/thereisnosuchthingas%E2%80%9Cbad%E2%80%9Ddata%3Atoptipstoavoidbadanalysis
Gillian Groom

How to Identify Outliers (and Get Rid of Them)
http://blog.minitab.com/blog/michelleparet/howtoidentifyoutliersandgetridofthem
<p><img alt="an outlier among falcon tubes" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/389155acc918fbd094941685c31a33b8/falcontubes.jpg" style="width: 250px; height: 188px; margin: 10px 15px; borderwidth: 1px; borderstyle: solid; float: right;" />An outlier is an observation in a data set that lies a substantial distance from other observations. These unusual observations can have a disproportionate effect on statistical analysis, <a href="http://blog.minitab.com/blog/michelleparet/usingthemeanitsnotalwaysaslamdunk">such as the mean</a>, which can lead to misleading results. Outliers can provide useful information about your data or process, so it's important to investigate them. Of course, you have to find them first. </p>
<p>Finding outliers in a data set is easy <span style="lineheight: 20.8px;">using </span><a href="http://www.minitab.com/products/minitab/" style="lineheight: 20.8px;">Minitab Statistical Software</a><span style="lineheight: 1.6;">, and there are a few ways to go about it. </span></p>
Finding Outliers in a Graph
<p><span style="lineheight: 1.6;">If you want to identify them graphically and visualize where your outliers are located compared to rest of your data, you can use </span><strong style="lineheight: 1.6;">Graph > Boxplot</strong><span style="lineheight: 1.6;">.</span></p>
<p style="marginleft: 40px;"><img alt="Boxplot" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/6060c2dbf5d9449babe268eade74814a/Image/652993cfab104ddfe0076fd52ab0f5fd/boxplot_of_strength.jpg" style="width: 600px; height: 394px;" /></p>
<p>This boxplot shows a few outliers, each marked with an asterisk. Boxplots are certainly one of the most common ways to visually identify outliers, but there are <a href="http://blog.minitab.com/blog/funwithstatistics/visualizingthegreatestolympicoutlierofalltime">other graphs, such as scatterplots and individual value plots</a>, to consider as well.</p>
Finding Outliers in a Worksheet
<p>To highlight outliers directly in the worksheet, you can rightclick on your column of data and choose <strong>Conditional Formatting > Statistical > Outlier</strong>. Each outlier in your worksheet will then be highlighted in red, or whatever color you choose.</p>
<p style="marginleft: 40px;"><img alt="Conditional Formatting Menu in Minitab" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/6060c2dbf5d9449babe268eade74814a/Image/a898ff565350c40dcbad28bf6d878f82/conditionalformattingmenu.jpg" style="width: 549px; height: 145px;" /></p>
Removing Outliers
<p>If you then want to create a new data set that excludes these outliers, that’s easy to do too. Now I’m not suggesting that removing outliers should be done without thoughtful consideration. After all, they may have a story – perhaps a very important story – to tell. However, for those situations where removing outliers is worthwhile, you can first highlight outliers per the Conditional Formatting steps above, then rightclick on the column again and use <strong>Subset Worksheet > Exclude Rows with Formatted Cells</strong> to create the new data set.</p>
The Math
<p>If you want to know the mathematics used to identify outliers, let's begin by talking about quartiles, which divide a data set into quarters:</p>
<ul>
<li><em>Q</em>1 (the 1st quartile): 25% of the data are <em>less than</em> or equal to this value</li>
<li><em>Q</em>3 (the 3rd quartile): 25% of the data are <em>greater than</em> or equal to this value</li>
<li>IQR (the interquartile range): the distance between <em>Q</em>3 – <em>Q</em>1, it contains the middle 50% of the data</li>
</ul>
<p>Outliers are then defined as any values that fall outside of:</p>
<p style="marginleft: 40px;"><em>Q</em>1 – (1.5 * IQR)</p>
<p style="marginleft: 40px;">or</p>
<p style="marginleft: 40px;"><em>Q</em>3 + (1.5 * IQR)</p>
<p><span style="lineheight: 1.6;">Of course, rather than doing this by hand, you can leave the heavylifting up to Minitab and instead focus on what your data are telling you.</span></p>
<p>Don't see these features in your version of Minitab? Choose <strong>Help > Check for Updates </strong>to see if you're using Minitab 17.3.</p>
Data Analysis
Learning
Statistics
Statistics Help
Stats
Wed, 22 Jun 2016 15:00:00 +0000
http://blog.minitab.com/blog/michelleparet/howtoidentifyoutliersandgetridofthem
Michelle Paret

2 Reasons 2 Recode Data and How 2 Do It in Less than 2 Minutes
http://blog.minitab.com/blog/statisticsandqualityimprovement/2reasons2recodedataandhow2doitinlessthan2minutes
<p><img alt="convert numeric 2 into to" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/22791f44517c42aa9f28864c95cb4e27/Image/330f8cbc00c7380b8db4a71003640a43/2toto.gif" style="width: 330px; height: 124px; float: right; marginleft: 15px; marginright: 15px;" />It’s not easy to get data ready for analysis. Sometimes, data that include all the details we want aren’t clean enough for analysis. Even stranger, sometimes the exact opposite can be true: Data that are convenient to collect often don’t include the details that we want when we analyze them.</p>
<p>Let’s say that you’re looking at <a href="http://wwwn.cdc.gov/nchs/nhanes/search/datapage.aspx?Component=Examination&CycleBeginYear=2001">the documentation for the National Health and Nutrition Examination Survey (NHANES) from 20012002</a>. By convention, the data set uses a symbol for missing values, but some variables have additional numeric codes for data that are missing for a specific reason. For example, <a href="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/22791f44517c42aa9f28864c95cb4e27/File/61a969b0ab7e58d0be51fd1ce7b16d75/aux_b.mtw">one data set records hearing measurements</a> (Audiometry). One variable in this data set is the middle ear pressure in the right ear, which has values from 282 to 180, but also includes these codes:</p>
<ul>
<li><strong>555:</strong> Compliance <=0.2</li>
<li><strong>777:</strong> Refused</li>
<li><strong>888:</strong> Could not obtain</li>
</ul>
<p>Although in some cases knowing how often each of these situations occurs could be important, to analyze the numeric data, you have to change these code values from numbers to something that won’t be analyzed. After all, leaving in a bunch of values that are more than twice what the maximum should be would have a serious effect on the mean of the data set.</p>
<p>In Minitab, try this:</p>
<ol>
<li>Choose <strong>Data > Recode > To Numeric</strong>.</li>
<li>In <strong>Recode values in the following columns</strong>, enter the variables with the specialized missing values. If you’re following along with the NHANES data, the variable is AUXTMEPR.</li>
<li>In <strong>Method</strong>, select <strong>Recode range of values</strong>.</li>
<li>Complete the table with the endpoints and recoded values like this:</li>
</ol>
<p style="textalign: center;"><strong>Lower endpoint</strong></p>
<p style="textalign: center;"><strong>Upper endpoint</strong></p>
<p style="textalign: center;"><strong>Recoded value</strong></p>
<p style="textalign: center;">555</p>
<p style="textalign: center;">556</p>
<p style="textalign: center;">*</p>
<p style="textalign: center;">777</p>
<p style="textalign: center;">778</p>
<p style="textalign: center;">*</p>
<p style="textalign: center;">888</p>
<p style="textalign: center;">889</p>
<p style="textalign: center;">*</p>
<ol>
<li value="5">In <strong>Endpoints to include</strong>, select <strong>Lower endpoint only</strong>. Click <strong>OK</strong>.</li>
</ol>
<p>The resulting column has missing values instead of the coded values. And that means the statistics that you calculate will now have the correct values.</p>
<p>Recoding can let you prepare data with numeric measurements for correct analysis, but the CDC data sets also often use numeric codes to represent categories. For example, one variable records these codes for the status of an audio exam:</p>
<ul>
<li><strong>1:</strong> Complete</li>
<li><strong>2:</strong> Partial</li>
<li><strong>3:</strong> Not done</li>
</ul>
<p>Another reason to recode your data before analyzing it is so that both the data itself and the values that subsequently appear as categories and on graphs are descriptive. You can recode these numeric codes to text in a similar fashion. Try this:</p>
<ol>
<li>Choose <strong>Data > Recode > To Text</strong>.</li>
<li>In <strong>Recode values in the following columns</strong>, enter the variables with the numeric codes. If you are following along with the NHANES data, the variable is AUAEXSTS.</li>
<li>In <strong>Method</strong>, select <strong>Recode individual values</strong>.</li>
<li>Complete the table with the current values and recoded values like this:</li>
</ol>
<p style="marginleft:.25in;"><strong>Current value</strong></p>
<p style="marginleft:.25in;"><strong>Recoded value</strong></p>
<p style="marginleft:.25in;">1</p>
<p style="marginleft:.25in;">Complete</p>
<p style="marginleft:.25in;">2</p>
<p style="marginleft:.25in;">Partial</p>
<p style="marginleft:.25in;">3</p>
<p style="marginleft:.25in;">Not done</p>
<ol>
<li value="5">Click <strong>OK</strong>.</li>
</ol>
<p>The resulting column has the text labels instead of the numeric codes. When you create graphs, the labels will be descriptive.</p>
<p>Sometimes, data that are good to collect differ from data that are good to analyze. Sometimes we need more detail in the data that we collect than we need in the data that we analyze, such as when we record the reason that data are missing. Sometimes, we need data that are faster to record than is convenient when we analyze data, so we use abbreviations or codes that aren’t as descriptive as they can be.</p>
<p>Fortunately, Minitab makes it easy for you to balance those needs by making it easy to manipulate your data, with features like recoding. Ready for more? Check out some of the ways that <a href="http://support.minitab.com/enus/minitab/17/topiclibrary/minitabenvironment/dataanddatamanipulation/datamanipulation/mergingworksheets/">Minitab makes it easy to merge different worksheets</a> together.</p>
Data Analysis
Statistics
Mon, 20 Jun 2016 16:01:00 +0000
http://blog.minitab.com/blog/statisticsandqualityimprovement/2reasons2recodedataandhow2doitinlessthan2minutes
Cody Steele

Using Fitness Tracker Data to Make Wise Decisions: Are You Working Out in the Right Zone?
http://blog.minitab.com/blog/statisticsandmore/usingfitnesstrackerdatatomakewisedecisions%3Aareyouworkingoutintherightzone
<img alt="gym" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/5bec6a128445a947de2e0fc768d6c2a4/gym.jpg" style="lineheight: 20.8px; width: 400px; height: 222px; borderwidth: 1px; borderstyle: solid; margin: 10px 15px; float: right;" />
<p><span style="lineheight: 1.6;">Technology is very much part of our lives nowadays. We use our smartphones to have video calls with our friends and family, and watch our favourite TV shows on tablets. Technology has also transformed the fitness industry with the increasing popularity of fitness trackers.</span></p>
<p>Recently, I got myself a fitness watch and it's becoming my favourite gadget. It can track how many steps I’ve taken, my heart rate during a workout, and how many calories I've burned during my workout and over the whole day. Based on the calories burned, I can adjust my diet to ensure I have eaten what I require for the day. I’ve been collecting data from my weekly Zumba sessions, gym workouts and lunchtime walks. After collecting data for over a month, I decided to do some analysis with it using Minitab. Below is a snapshot of the data I collected in Minitab.</p>
<p style="marginleft: 40px;"><img alt="fitbit data" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/1747862b0465debbfc439a30e6f4a9c8/fitbit1.png" style="width: 451px; height: 465px; borderwidth: 1px; borderstyle: solid;" /></p>
<p><span style="lineheight: 1.6;">For each activity, I have the following information:</span></p>
<ul>
<li>Duration of exercise in minutes and seconds</li>
<li>Time spent (rounded to nearest minutes) on peak/highintensity exercise heartrate zone—heart rate greater than 85% of maximum</li>
<li>Time spent (rounded to nearest minutes) on cardio/mediumtohighintensity exercise heartrate zone—heart rate is 70 to 84% of maximum</li>
<li>Time spent (rounded to nearest minutes) on fatburn/lowtomediumintensity exercise heartrate zone—heart rate is 5069% of maximum</li>
<li>Average heart rate during the session</li>
<li>Total calories burned during the session</li>
</ul>
<p>It appears that the higher average heart rate results in more calories burned. Also, this depends on time spent at different heart rate zones. Let’s do some calculation using correlation coefficients.</p>
<p style="marginleft: 40px;"><img alt="Correlation  Cardio and Calories" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/7d5014f756ab5e93e1e221b3c2b52274/fitbit2.png" style="lineheight: 1.6; width: 535px; height: 371px;" /></p>
<p>As expected, all three variables are <a href="http://blog.minitab.com/blog/understandingstatistics/nomatterhowstrongcorrelationstilldoesntimplycausation">positively correlated</a> with calories burned. However, spending hours on the treadmill is probably not a very good way to burn calories. With the best summer weather just around the corner, I need a more efficient way to exercise to lose the few pounds from my indulgence in the winter months!</p>
<p>According to research, <span style="lineheight: 20.8px;">exercising at higher intensity</span><span style="lineheight: 1.6;"> can result in more calories burned due to the “afterburn” effect. The afterburn effect is the additional calories burned after intensive exercise. Recently, at my local gym, they have introduced 30minute HIIT (highintensity interval training) sessions, which I am considering taking. Hence, fitting a regression model using my data will probably help me make the decision.</span></p>
<p>In Minitab, I opened <strong>Stat > Regression > Regression > Fit Regression Model</strong>, and completed the dialog and subdialog boxes as shown below.</p>
<p style="marginleft: 40px;"><img alt="fitbit regression dialog" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/777a2d5e3a81010d3132f1b40e6223ec/fitbit3.png" style="lineheight: 1.6; width: 563px; height: 434px; borderwidth: 1px; borderstyle: solid;" /></p>
<p style="marginleft: 40px;"><img alt="fitbit regression subdialog" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/417e948723db6687b13fef9054216d23/fitbit4.png" style="lineheight: 1.6; width: 505px; height: 543px; borderwidth: 1px; borderstyle: solid;" /></p>
<p>Instead of using a trialanderror approach to select terms for the model, I will use the stepwise approach to help me identify suitable terms for the model.</p>
<p style="marginleft: 40px;"><img alt="fitbit stepwise regression" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/81a187528b6602b538fe6d8c3be04a12/fitbit5.png" style="lineheight: 1.6; width: 474px; height: 537px; borderwidth: 1px; borderstyle: solid;" /></p>
<p><span style="lineheight: 1.6;">And after I press OK on each of my dialogs, Minitab returns the regression equation:</span></p>
<p style="marginleft: 40px;"><img alt="Regression Equation for Fitbit Data" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/3db47214db3420f9cf4e000fa857386d/fitbit6.png" style="lineheight: 1.6; width: 746px; height: 95px;" /></p>
<p style="marginleft: 40px;"><img alt="fitbit regression model summary" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/b75ba560fa176a327a626e49866d7fbc/fitbit7.png" style="lineheight: 1.6; width: 388px; height: 81px;" /></p>
<p><span style="lineheight: 1.6;">The final model is quite decent, as the three types of Rsquared values are all above 80%. This implies I can use this model to make predictions. The regression equation appears complex, but I can use the response optimizer in Minitab 17 to <a href="http://support.minitab.com/minitab/17/topiclibrary/modelingstatistics/usingfittedmodels/responseoptimization/whatisresponseoptimization/">identify optimum settings</a> to achieve my goal.</span></p>
<p>There is a common belief that 1 pound of fat (0.45 kilogram) is approximately equal to 3500 calories. Let’s say I aim to burn about 300 calories in each session. This means after about 12 sessions I would have lost <span style="lineheight: 20.8px;">approximately</span><span style="lineheight: 20.8px;"> </span><span style="lineheight: 1.6;">a pound of fat, provided I also had a healthy diet. Since exercising at higher heart rate tends to burn more calories, I will also aim to maintain an average heart rate between, say, 128 and 148, which for me works out as somewhere between 7080% of maximum heart rate.</span></p>
<p>With all the conditions above, using <strong>Stat > Regression > Regression > Response Optimizer</strong>, here are some screenshots of the dialog boxes.</p>
<p style="marginleft: 40px;"><img alt="response optimizer for fitbit" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/f96c758fb378e360936b66fbeeac4905/fitbit8.png" style="width: 441px; height: 402px;" /></p>
<p style="marginleft: 40px;"><img alt="response optimizer options for fitbit data" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/f1cda8376d7575e2598085197c145358/fitbit9.png" style="width: 816px; height: 421px; borderwidth: 1px; borderstyle: solid;" /></p>
<p>My target calorie burn rate is 300, and getting above 300 would be a bonus. Hence, I am using 310 as the upper limit.</p>
<p style="marginleft: 40px;"><img alt="fitbit upper limit" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/823cd83073147ecb7d6a2210766eed22/fitbit10.png" style="lineheight: 1.6; width: 677px; height: 526px; borderwidth: 1px; borderstyle: solid;" /></p>
<p><span style="lineheight: 1.6;">I would like to spend no more than 45 minutes per session and hence I am using a maximum of 30 minutes exercising in the cardio zone, and 15 minutes in the fatburn zone.</span></p>
<p style="marginleft: 40px;"><img alt="Response optimization output for fitbit data" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/92b0236d8c1d43750c1f548cc2a076ce/fitbit11.png" style="width: 477px; height: 517px;" /></p>
<p style="marginleft: 40px;"><img alt="Fitbit optimizer response plot" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/77bf11f044783a4dae5723667eb46b68/fitbit12.png" style="width: 800px; height: 381px;" /></p>
<p><span style="lineheight: 1.6;">To achieve my goal, I need to exercise in the cardio zone for about 21 minutes, exercise in the fat burn zone for about 15 minutes, and maintain my average heart rate at about 148 for the session.</span></p>
<p>I understand that the HIIT sessions involve very intense bursts of exercise followed by short, sometimes active, recovery periods. This type of training gets and keeps your heart rate up. Based on this, if out of a 30minute HIIT session I can maintain about 21 minutes in the cardio zone, and spend the rest of the session exercising in the fatburn zone, I will be close to achieving my goal. I can always supplement this by a few minutes on the exercise bike or crosstrainer after the class. </p>
<p>Another good feature with the response optimizer is that I can evaluate different settings to see how the changes can affect the response. Let's consider the days when the HIIT class is not offered and I need to use the machines. I normally go for a longer session on the cross trainer (2030 minutes), followed by a quick 10minute session on the step machine. From past experience, I can easily get into the cardio heartrate zone when using the crosstrainer. Now I can use the optimizer to predict the calories burned for 30 minutes of working out in the cardio zone and 10 minutes in the fatburn zone. I will also use a lower average heart rate of 140.</p>
<p>By clicking on the current setup, I can input new settings.</p>
<p style="marginleft: 40px;"><img alt="Fitbit response optimizer new settings" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/672441de49281134d41391fcb29ed411/fitbit13.png" style="width: 800px; height: 403px;" /></p>
<p style="marginleft: 40px;"><img alt="response optimizer for fitbit data cardio heart rate zone" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/a856d5f38f728902ebe99e3fd3974053/fitbit14.png" style="lineheight: 1.6; width: 800px; height: 470px;" /></p>
<p><span style="lineheight: 1.6;">Well, this solution is not too far off from my target of 300 calories burned!</span></p>
<p>It’s turned out to be an enjoyable and informative experience analysing my own fitness data to see what my best workout options are. Taking the data collected by my fitness tracker and doing further analysis on it has definitely helped me to decide on how to exercise wisely and efficiently. </p>
<p> </p>
<p style="fontsize: 9px;">Gym photo by Indigo Fitness Club Zurich, used under Creative Commons 2.0 license. </p>
Fun Statistics
Regression Analysis
Statistics
Statistics Help
Fri, 17 Jun 2016 12:00:00 +0000
http://blog.minitab.com/blog/statisticsandmore/usingfitnesstrackerdatatomakewisedecisions%3Aareyouworkingoutintherightzone
Eugenie Chung

Using Multivariate Statistical Tools to Analyze Customer and Survey Data
http://blog.minitab.com/blog/applyingstatisticsinqualityprojects/usingmultivariatestatisticaltoolstoanalyzecustomerandsurveydata
<p>Businesses are getting more and more data from existing and potential customers: whenever we click on a web site, for example, it can be recorded in the vendor's database. And whenever we use electronic ID cards to access public transportation or other services, our movements across the city may be analyzed.</p>
<p>In the very near future, connected objects such as cars and electrical appliances will continuously generate data that will provide useful insights regarding user preferences, personal habits, and more. Companies will learn a lot from users and the way their products are being used. This learning process will help them focus on particular niches and improve their products according to customer expectations and profiles.</p>
<p>For example, insurance companies will monitor how motorists are driving connected cars, to adjust insurance premiums according to perceived risks, or to analyze driving behaviors so they can advise motorists how to boost fuel efficiency. No formal survey will be needed, because customers will be continuously surveyed.</p>
<p>Let's look at some statistical tools we can use to create and analyze user profiles, map expectations, study which expectations are related, and so on. I will focus on multivariate tools, which are very efficient methods for analyzing surveys and taking into account a large number of variables. My objective is to provide a very high level, general overview of the statistical tools that may be used to analyze such survey data.</p>
A Simple Example of Multivariate Analysis
<p>Let us start with a very simple example. The table below presents data some customers have shared about their enjoyment of specific types of food :</p>
<p style="marginleft: 40px;"><img height="134" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/31b80fb2db664edfa75374d4c9804ab8/File/386a610e5bb77c8aa7a5adf8ba5adf03/386a610e5bb77c8aa7a5adf8ba5adf03.png" width="532" /></p>
<p>A simple look at the table does not really help us easily understand preferences. So we can use Simple Correspondence Analysis, a statistical multivariate tool, has been used to visually display expectations.</p>
<p>In Minitab, go to <strong>Stat > Multivariate > Simple Correspondence Analysis...</strong> and enter your data as shown in the dialogue box below. (Also click on "Graphs" and check the box labeled "Symmetric plot showing rows and columns.")</p>
<p style="marginleft: 40px;"><img height="347" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/31b80fb2db664edfa75374d4c9804ab8/File/3d7fdd8b981b91398acfbffc1d02f1e4/3d7fdd8b981b91398acfbffc1d02f1e4.png" width="451" /></p>
<p>Minitab creates the following plot: </p>
<p style="marginleft: 40px;"><img height="384" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/31b80fb2db664edfa75374d4c9804ab8/File/9e75de185fc35b03062c8f87492d3246/9e75de185fc35b03062c8f87492d3246.png" width="576" /></p>
<p>Looking at the plot, we quickly see that vegetables tend to be associated with “Disagree” (positioned close to each other in the graph) and Ice cream is positioned close to “Neutral” (they are related to each other). As for Meat and Potatoes, the panel tends either to “Agree” or “Strongly agree.”</p>
<p>We now have a much better understanding of the preferences of our panel, because we know what they tend to like and dislike.</p>
Selecting the Right Type of Tool to Analyze Survey Data
<p>Many multivariate tools are available, so how can you choose the right one to analyze your survey data?</p>
<p>The decision tree below shows which method you might choose according to your objectives and the <a href="http://blog.minitab.com/blog/understandingstatistics/understandingqualitativequantitativeattributediscreteandcontinuousdatatypes">type of data you have</a>. For example, we selected correspondence analysis in the<span style="lineheight: 1.6;"> </span><span style="lineheight: 20.8px;">previous</span><span style="lineheight: 20.8px;"> </span><span style="lineheight: 1.6;">example because all our variables were categorical, or qualitative in nature.</span></p>
<p style="marginleft: 40px;"><img alt="multivariate diagram 1" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/b6150beff1fbc04623fcccdadc0faac4/multivariate_1.gif" style="lineheight: 20.8px; width: 624px; height: 464px;" /></p>
<p> </p>
Categorical Data and Prediction of Group Membership (Right Branch)
<p><strong>Clustering</strong><br />
If you have some numerical (or continuous) data and you want to understand how your customers might be grouped / aggregated (from a statistical point of view) into several homogeneous groups, you can use clustering techniques. This could be helpful to define profiles and user groups.</p>
<p><strong>Discriminant Analysis or Logistic Regression (Scoring)</strong><br />
If your individuals already belong to different groups and you want to understand which variables are important to define an existing user group, or predict group membership for new individuals, you can use discriminant analysis, or binary logistic regression (if you only have two groups).</p>
<p><strong>Correspondence Analysis </strong><br />
<span style="lineheight: 1.6;">As we saw in the first example, correspondence analysis lets us study relationships between variables that are categorical / qualitative.</span></p>
Numeric or Continuous Data Analysis (Left Branch)
<p><strong>Principal Component Analysis or Factor Analysis</strong><br />
I<span style="lineheight: 1.6;">f all your variables are numeric, you can use principal components analysis to understand how variables are related to one another. Factor analysis may be useful to identify an underlying, unknown factor associated to your variables.</span></p>
<p><strong>Item Analysis</strong><br />
This tool was specifically created for survey analysis. Do the items of a survey evaluate similar characteristics? Which items differ from the remaining questions The objective is to assess internal consistency of a survey. </p>
<p>They <em>are </em>computationally intensive, but performing these multivariate analyses in Minitab is very userfriendly, and the software produces easytounderstand graphs (as in the food preference example above).</p>
A Closer Look at Some Specific Multivariate Tools
<p>Let's take a closer look at the tools for numerical survey data analysis. The graph below shows the tools that are available to you and their objectives in each case. These methods are often used to group numeric variables according to similarity, they may also be useful in studying how individuals are positioned according the main groups of variables in order to identify user profiles.</p>
<p style="marginleft: 40px;"> <img alt="multivariate diagram 2" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/283b5721982a2c167236a120799ede2c/multivariate_2.gif" style="width: 624px; height: 438px;" /></p>
<p>And now let's look a bit more closely at the tools we can use for analyzing categorical survey data. Again, the diagram below shows the tools that are available to you and their objectives. Many of these tools can be used to study how numeric variable relate to qualitative categories.</p>
<p style="marginleft: 40px;"><img height="430" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/31b80fb2db664edfa75374d4c9804ab8/File/58cd6feb678c0f5d2232c304b0173391/58cd6feb678c0f5d2232c304b0173391.png" width="624" /></p>
Conclusion
<p>This is a very general overview of multivariate tools for survey analysis. If you want to go deeper and learn more about these techniques, you can find some resources on the <a href="http://support.minitab.com/minitab/17/topiclibrary/modelingstatistics/multivariate/basics/multivariateanalysesinminitab/">Minitab web site</a>, in the Help menu in Minitab's statistical software, or you can contact <a href="http://www.minitab.com/support/">our technical support team</a>. </p>
Data Analysis
Insights
Learning
Statistics
Statistics Help
Stats
Wed, 15 Jun 2016 12:00:00 +0000
http://blog.minitab.com/blog/applyingstatisticsinqualityprojects/usingmultivariatestatisticaltoolstoanalyzecustomerandsurveydata
Bruno Scibilia

The Matrix, It's a Complex Plot
http://blog.minitab.com/blog/dataanalysisandqualityimprovementandstuff/thematrixitsacomplexplot
<img alt="Welcome to the Matrix" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/8de770baa50a4f6b91449713c3b99f66/Image/7f60a847a90c468d4ee511e3be24ac61/welcome_to_the_matrix_3x2.jpg" style="lineheight: 20.8px; width: 288px; height: 192px; float: right; margin: 10px 15px;" />
<p>Remember the classic science fiction film <em>The Matrix</em>? The dark sunglasses, the leather, computer monitors constantly raining streams of integers (inexplicably in base 10 rather than binary or hexadecimal)? And that mindblowing plot twist when Neo takes the red pill from Morpheus' outstretched hand? Well to me, there's one thing even <em>more </em>mindblowing than the plot of the Matrix: the <strong>Matrix Plot</strong>. You know, in Minitab Statistical Software. (Click here to <a href="http://www.minitab.com/enus/products/minitab/" target="_blank">download a free trial</a>.)</p>
<p>Just as Neo and his band of futuristic rebels were constantly barraged with endless streams of data, it seems like we, too, often face large amounts of data and we must make sense of. When faced with such a challenge, a<span style="lineheight: 1.6;"> good place to start is to create some exploratory graphs in Minitab. Previous posts have </span>extolled<span style="lineheight: 1.6;"> the virtues of the <a href="http://blog.minitab.com/blog/understandingstatistics/troublestartingananalysisgraphyourdatawithanindividualvalueplot">Individual Value Plot</a> and <a href="http://blog.minitab.com/blog/statisticsandqualitydataanalysis/getaheadstartunderstandyourdatabeforeyouanalyzeit">Graphical Summary</a> for this purpose. Today, we're going to use the </span><span style="lineheight: 1.6;">oracle of all plots, the Matrix Plot, to </span><span style="lineheight: 1.6;">uncover the secrets of <a href="http://support.minitab.com/enus/datasets/graphsdatasets/automobilespecificationsdata/" target="_blank">automobile specifications data</a>. (Follow the link and scroll to the bottom of the page to download the worksheet.)</span></p>
<p>The data set looks like this:</p>
<p><img alt="AutoSpecs data set" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/8de770baa50a4f6b91449713c3b99f66/Image/f4c4a530f880c1034324419db4afc932/autospecs.jpg" style="width: 654px; height: 350px; marginleft: 20px; marginright: 20px;" /></p>
<p>There's a lot to take in here. The columns look like streams of random numbers...but <em>are </em>they? Time to enter the matrix. <span style="lineheight: 1.6;">A matrix plot is a great exploratory tool because you can throw a bunch of data in it and just see what happens. </span></p>
<p>From Minitab's <strong>Graph</strong> menu, choose <strong>Matrix Plot</strong>. Under <strong>Matrix of plots</strong>, choose <strong>With Groups</strong>, and fill out the dialog box thusly:</p>
<a name="return"></a>
<p><img alt="Click the red pill" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/8de770baa50a4f6b91449713c3b99f66/Image/1896ccb8d218f4b89466d61f12a6f0f8/click_the_red_pill3.jpg" style="lineheight: 20.8px; width: 450px; height: 293px; marginleft: 20px; marginright: 20px;" /></p>
<p>It is at this point that you must make a difficult choice. You can choose the blue <a href="#1">pill</a><a href="#1">1</a> (a.k.a., the <strong>Cancel</strong> button) and go about your business, oblivious to and untroubled by the mindblowing automotive realities that surround you. Or you can choose the red pill (click <strong>OK</strong>), after which your life will forever be altered by your ability to see into the data, to understand it, and—with practice—to even <a href="#2">control it.</a><a href="#2">2</a><a href="#2"> </a></p>
<p>If you chose the blue pill, <a href="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/b060dd8696f44d475e487202cac2f4a6/steak_597949_960_720_1_.jpg">click here</a>.</p>
<p>If you chose the red pill, read on.</p>
<p><img alt="Matrix plot of auto data" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/8de770baa50a4f6b91449713c3b99f66/Image/ec92277a7c0aab981cf13a15e1e46ba5/matrix_plot_of_autospecs.jpg" style="lineheight: 1.6; width: 576px; height: 384px; marginleft: 20px; marginright: 20px;" /></p>
<p>As you can see, the matrix plot packs a lot of information into a small space. I like to do a couple of things to allow the data to spread out just a little. Remove the graph title by clicking it and pressing <strong>Delete</strong>. Then, choose <strong>Editor > Graph Options</strong>, and select <strong>Don't alternate</strong> (under <strong>Alternate Ticks on Plots</strong>). There, that's a little better:</p>
<p><img alt="Matrix plot of vehicle data without title" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/8de770baa50a4f6b91449713c3b99f66/Image/a6853bb1f617e179440f59ec39be7ea9/matrix_plot_of_autospecs__sans_title.jpg" style="lineheight: 1.6; width: 576px; height: 384px; marginleft: 20px; marginright: 20px;" /></p>
<p>It's a lot to take in, but don't worry. Just as our band of heroes in <em>The Matrix</em> learned to read the endless streams of integers on their monitors, so too will this mass of dots soon make sense to you.</p>
<p>The matrix plot is simply a grid of scatterplots. For example, the leftmost scatterplot in the top row shows City MPG on yaxis and Hwy MPG on the xaxis. Not surprisingly, there appears to be a very tight relationship between these two variables: vehicles with good city mileage tend to also have good highway mileage. You can tell from the scales that city MPG for all vehicles ranges between about 10 and 55 and that highway MPG ranges between about 19 and 50. From the symbols, you can also easily tell that the hybrid vehicles (red squares) get better mileage than gasonly vehicles (blue dots). </p>
<p>To simplify things, we can remove City MPG and Hwy MPG from the plot and leave just Total MGP (which is just City MPG + Hwy MPG). We can also remove Total Volume (which is Interior Volume + Cargo Volume). </p>
<p>To return to the Matrix Plot dialog box, you can press <strong>Ctrl + E</strong>. (This handy shortcut was #2 in the <a href="http://blog.minitab.com/blog/statisticsandqualitydataanalysis/minitabtipsandtrickstop10countdownfinale">Minitab Tips and Tricks: Top 10 Countdown</a>.) This time, in Graph variables, enter <span style="lineheight: 20.8px;">just columns C6 through C10. </span></p>
<p><img alt="Matrix plot without the redundant variables" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/8de770baa50a4f6b91449713c3b99f66/Image/70081050bf5d37cd6434648ccd9d6f62/matrix_plot_of_autospecs__sans_city__hwy__total_vol.jpg" style="width: 576px; height: 384px; marginleft: 20px; marginright: 20px;" /></p>
<p>(To maximize the space for data, I deleted the title and unalternated the tick marks for this graph like we did for the last one.)</p>
<p>One thing that jumps out is that Safety isn't like the other variables. The other variables are continuous, but the safety ratings take on one of three discrete values: 3, 4, or 5. For discrete variables, the plot looks like an individual value plot. Interestingly, all hybrid vehicles scored a 4 or a 5; the only vehicles to score a 3 were gasonly.</p>
<p>Another thing that jumps out is the outlier in the Retail (price) measurements. While the other vehicles cost under $45,000, one vehicle sells for more than $70,000. Conveniently, we can brush the outlier and quickly see how that vehicle scores on the other measures. (For more information on this powerful tool, see <a href="http://support.minitab.com/enus/minitab/17/topiclibrary/basicstatisticsandgraphs/graphoptions/exploringdataandrevisinggraphs/usingbrushingtoinvestigatedatapoints/">Using brushing to investigate data points</a>.)</p>
<p><img alt="The magic of brushing" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/8de770baa50a4f6b91449713c3b99f66/Image/eaec1e02090b1d9eca7f81ea3b897e60/matrix_plot_of_autospecs__brushed_retail_outlier.jpg" style="width: 612px; height: 448px; marginleft: 20px; marginright: 20px;" /></p>
<p>The brushing palette shows that the outlier is in row 10 of the worksheet. The point for this observation is highlighted in each plot of the matrix. So you can quickly tell, for example, that even though you may have to ransack your kid's college fund to afford this beauty, at least he or she will enjoy the extra passenger room afforded by this luxury vehicle. And they are assured to arrive at their noncollegecampus destinations in one piece because this vehicle gets the highest safety rating. However, you may have to pass the hat for gas because it looks like this baby is always thirsty.</p>
<p>Among its other virtues, the high price tag has the added effect of squishing the data for the other vehicles into the low end of the scale and thus making the graph harder to read. Now that I've scratched this rig off my wish list, let's go ahead and remove it from the plot. <span style="lineheight: 1.6;">Again, we use the <strong>Ctrl + E</strong> trick to reopen the dialog box. This time we click the <strong>Data Options</strong> button and specify to exclude row 10 from the graph: </span></p>
<p><img alt="Exclude row 10 from the matrix" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/8de770baa50a4f6b91449713c3b99f66/Image/f4eeef7df7fffefe4288244b5eef93d5/exclude_row_10.jpg" style="width: 288px; height: 200px; marginleft: 20px; marginright: 20px;" /></p>
<p><img alt="Matrix plot without row 10" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/8de770baa50a4f6b91449713c3b99f66/Image/44ac805fd7bed5e51b7c0dcc960ee3f9/matrix_plot_of_autospecs__sans_row_10.jpg" style="lineheight: 20.8px; width: 576px; height: 384px; marginleft: 20px; marginright: 20px;" /></p>
<p>Without the gasguzzling outlier in the picture, it becomes clear that there is another outlier in town. One of the vehicles has an unusually low interior volume. Again, we can brush this point to see what's going on. </p>
<p><img alt="Brushed outlier in Volume" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/8de770baa50a4f6b91449713c3b99f66/Image/c739422fc78fbd3425b14d7b68a827ac/matrix_plot_of_autospecs__brushed_volume_outlier.jpg" style="width: 576px; height: 384px; marginleft: 20px; marginright: 20px;" /></p>
<p>Brushing shows that this vehicle is about average on the other measures. It doesn't cost less than the others and doesn't seem to get better mileage; it's just cramped on the inside. Not a big selling point. Let's remove this point as well. (This vehicle is in row 15.)</p>
<p><img alt="Final matrix, no outliers" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/8de770baa50a4f6b91449713c3b99f66/Image/7943ee11f929353ec2e4c9fc45df8c64/matrix_plot_of_autospecs__sans_rows_10__15.jpg" style="width: 576px; height: 384px; marginleft: 20px; marginright: 20px;" /></p>
<p>Without the outliers, the overall picture becomes still clearer. In general, it looks like more money does <em>not </em>buy you better gas mileage. The negative relationship between price and mileage is clear for both hybrid and<span style="lineheight: 1.6;"> gasonly vehicles. However, more money <em>does </em>seem to buy you more space. It looks like there is a positive relationship between price and interior volume and between price and cargo volume. </span><span style="lineheight: 1.6;">Bigger vehicles are heavier and generate more wind resistance, so no wonder the more expensive vehicles tend to get worse gas mileage. </span></p>
<div><img alt="Pill plots" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/8de770baa50a4f6b91449713c3b99f66/Image/bb0b8cd012900583bee2e05495d647e5/pill_plots.jpg" style="lineheight: 20.8px; width: 144px; height: 96px; float: right; margin: 10px;" /></div>
<div>
<p>I think you'll agree that we have learned a lot about these data since we first entered the matrix just a few mouse clicks ago. <span style="lineheight: 20.8px;">No doubt more time in the matrix will reveal even more insights. A</span><span style="lineheight: 1.6;">ren't you glad you chose the red pill? </span></p>
<p> </p>
<a name="1"></a> <a name="2"></a>
Notes
<p><em>1. The Matrix Plot dialog box featured in this post has been embellished for the purpose of dramatizing this reenactment. In real life, Minitab dialog boxes do not feature pills, or pharmaceutical agents of any kind. No actual dialog boxes or buttons were harmed during the making of this blog post. <a href="#return">[return]</a></em></p>
<p><em>2. OK, so you can't really use a matrix plot to actually change the data in the worksheet. But you *can* use the matrix plot to change how *you see* the data and enable you to reveal more of your data secrets. And isn't that what's important? <a href="#return">[return]</a></em></p>
Acknowledgements
<p><em style="lineheight: 1.6;">Credit for the <a href="https://commons.wikimedia.org/w/index.php?curid=34979655" target="_blank">original pill images</a> goes to W.carter. Pills and <a href="https://pixabay.com/en/steakmeatdiningdinner597949/" target="_blank">steak dinner</a> available under Creative Commons License 2.0 and Creative Commons License 1.0 respectively.</em></p>
<p> </p>
</div>
Data Analysis
Statistics
Tue, 14 Jun 2016 12:00:00 +0000
http://blog.minitab.com/blog/dataanalysisandqualityimprovementandstuff/thematrixitsacomplexplot
Greg Fox

Fitting an ARIMA Model
http://blog.minitab.com/blog/startingoutwithstatisticalsoftware/fittinganarimamodel
<p>Time series data is proving to be very useful these days in a number of different industries. However, fitting a specific model is not always a straightforward process. It requires a good look at the series in question, and possibly trying several different models before identifying the best one. So how do we get there? In this post, I'll take a look at how we can examine our data and get a feel for what models might work in a particular case. </p>
How Does a Time Series Work?
<p>The first thing to note is how Time Series work in general, and how those concepts apply to fitting the ARIMA model we're going to create.</p>
<p>In general, there are two things we look at when trying to fit a <span><a href="http://blog.minitab.com/blog/realworldqualityimprovement/lookingatpastweatherdatawithminitabtimeseriesplots">time series</a></span> model. One is past values, which is what we use in AR (autoregressive) models. Essentially, we predict what our next point would be based on looking at a certain number of past points. An AR(1) model would forecast future values by looking at 1 past value.</p>
<p>The second thing we can look at is past prediction errors. These are called MA (<span><a href="http://blog.minitab.com/blog/understandingstatistics/theghostpatternahauntingcautionarytaleaboutmovingaverages">moving average</a></span>) models, and an MA(1) model would be predicting future values using 1 past prediction error.</p>
<p>Both of these concepts make sense individually; they're just different approaches to how we predict future points. An ARIMA model uses both of these ideas and allows us to fit one nice model that looks at both past values <em>and </em>past prediction errors. </p>
Example of Fitting a Time Series Model
<p>So let's take a look at an example and see if we can't fit a model. I've randomly created some time series data, and the first thing to do it simply plot it and see what's happening. Here, I've plotted my series:</p>
<p style="marginleft: 40px;"><img alt="tsplot" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/732ead3410054470b034d7f8b87fabcf/Image/7c4615c2a3906b6eaa6c7a59a4fe7713/time_series_plot.jpg" style="width: 600px; height: 400px;" /></p>
<p>Here are some things to look for. First, a key assumption with these models is that our series has to be stationary. A stationary time series is one whose mean and variance are constant over time. In our case, it's clear that our mean is <em>not </em>constant over time—it's decreasing.</p>
<p>To resolve this, we can take a first difference of our data, and investigate <em>that</em>. In <a href="http://www.minitab.com/products/minitab">Minitab</a>, this can be done by going to <strong>Stat > Time Series > Differences</strong> and taking a difference of lag 1. (This means that we are subtracting each data point from the one that follows it.) </p>
<p>When we plot this lag 1 difference data, we can see it is now stationary:</p>
<p style="marginleft: 40px;"><img alt="first diff" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/732ead3410054470b034d7f8b87fabcf/Image/eaf19475813233df977889ad10097847/tsdiff.jpg" style="width: 600px; height: 400px;" /></p>
<p>It took one difference to make our data stationary, so we now have one piece of our ARIMA model, the "I", which stands for "Integration." We know that we have an ARIMA(p,1,q). Now, how do we find the AR term(p) and the MA term(q)? To do that, we need to dive into two plots, namely the ACF and PACF—and this is where it gets tricky.</p>
Interpreting ACF and PACF Plots
<p>The ACF stands for Autocorrelation function, and the PACF for Partial Autocorrelation function. Looking at these two plots together can help us form an idea of what models to fit. Autocorrelation computes and plots the autocorrelations of a time series. Autocorrelation is the correlation between observations of a time series separated by <em>k </em>time units.</p>
<p>Similarly, partial autocorrelations measure the strength of relationship with other terms being accounted for, in this case other terms being the intervening lags present in the model. For example, the partial autocorrelaton at lag 4 is the correlation at lag 4, accounting for the correlations at lags 1, 2, and 3. To generate these plots in Minitab, we go to <strong>Stat > Time Series > Autocorrelation</strong> or <strong>Stat > Time Series > Partial Autocorrelation</strong>. I've generated these plots for our simulated data below:</p>
<p style="marginleft: 40px;"><img alt="acf" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/732ead3410054470b034d7f8b87fabcf/Image/666e255a1a9b13b076a4f6178e107759/acf_pacf.jpg" style="width: 600px; height: 400px;" /></p>
<p>So what do these plots tell us? They each show a clear pattern, but how does that pattern help us to determine what our p and q values will be? Let's notice our patterns. Our PACF slowly tapers to 0, although it has two spikes at lags 1 and 2. On the other side, our ACF shows a tapering pattern, with lags slowly degrading towards 0. The table below can be used to help identify patterns, and what model conclusions we can make about those patterns. </p>
ACF Pattern
PACF Pattern
Conclusion
Tapers to 0 in some fashion
nonzero values at first p points; zero values elsewhere
AR(p) Model
<span style="lineheight: 20.8px;">nonzero values at first q points; zero values elsewhere</span>
<span style="lineheight: 20.8px;">Tapers to 0 in some fashion</span>
MA(q) model
Values that remain close to 1, no tapering off
<span style="lineheight: 20.8px;">Values that remain close to 1, no tapering off</span>
Symptoms of a nonstationary series. Differencing is most likely needed.
No significant correlations
No significant correlations
Random Series
<p><span style="lineheight: 1.6;">If a model contains both AR and MA terms, the interpretation gets trickier. In general, both will taper off to 0. There may still be spikes in the ACF and/or PACF which could lead you to try AR and MA terms of that quantity. However, it usually helps to try a few different models, and based on model diagnostics, choose which one fits best. </span></p>
<p><span style="lineheight: 1.6;">In this case, I used simulated data, so I know the best fit for my model is going to be an ARIMA(1,1,1). However, with realworld data, the answer may not be so obvious, and thus many models may have to be considered before landing on a single choice. </span></p>
<p><span style="lineheight: 1.6;">In my next post, I'll go over some diagnostic measures we can compare between models to see which gives us the best fit. </span></p>
<p> </p>
Data Analysis
Statistics
Mon, 13 Jun 2016 12:01:00 +0000
http://blog.minitab.com/blog/startingoutwithstatisticalsoftware/fittinganarimamodel
Eric Heckman

Poisson Data: Examining the Number Deaths in an Episode of Game of Thrones
http://blog.minitab.com/blog/thestatisticsgame/poissondataexaminingthenumberdeathsinanepisodeofgameofthrones
<p><img alt="Game of Thrones" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/fe2c58f624104b6fb687d378929b1f9b/Image/d11b4341996f340e24132eb12253d8e5/game_of_thrones.jpg" style="float: right; width: 250px; height: 141px; margin: 10px 15px; borderwidth: 1px; borderstyle: solid;" />There may not be a situation more perilous than being a character on <a href="http://www.hbo.com/gameofthrones" target="_blank"><em>Game of Thrones</em></a>. Warden of the North, Hand of the King, and apparent protagonist of the entire series? Off with your head before the end of the first season! Last male heir of a royal bloodline? Here, have a pot of molten gold poured on your head! Invited to a wedding? Well, you probably know what happens at weddings in the show. </p>
<p>So what do all these gruesome deaths have to do with statistics? They are data that come from a <a href="http://blog.minitab.com/blog/funwithstatistics/poissonprocessesandprobabilityofpoop">Poisson distribution</a>.</p>
<p>Data from a Poisson distribution describe the number of times an even occurs in a finite observation space. For example, a Poisson distribution can describe the number of defects in the mechanical system of an airplane, the number of calls to a call center, or in our case it can describe the number of deaths in an episode of Game of Thrones.</p>
GoodnessofFit Test for Poisson
<p>If you're not certain whether your data follow a Poisson distribution, you can use <a href="http://www.minitab.com/enus/products/minitab/" target="_blank">Minitab Statistical Software</a> to perform a goodnessoffit test. If you don't already use Minitab and you'd like to follow along with this analysis, download the <a href="http://www.minitab.com/products/minitab/freetrial/">free 30day trial</a>.</p>
<p>I collected the <a href="http://genius.com/Gameofthroneslistofgameofthronesdeathsannotated" target="_blank">number of deaths for each episode</a> of Game of Thrones (as of this writing, 57 episodes have aired), and put them in a Minitab worksheet. Then I went to <strong>Stat > Basic Statistics > GoodnessofFit Test for Poisson </strong>to determine whether the data follow a Poisson distribution. You can get the data I used <a href="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/f73acb13fa520a25583149f8b780a31c/game_of_thrones_deaths.mtw">here</a>. </p>
<p style="marginleft: 40px;"><img alt="GoodnessofFit Test for Poisson Distribution " src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/fe2c58f624104b6fb687d378929b1f9b/Image/0c9dcb9ecb6eb644109d86e3501143b3/gof_test_poisson.jpg" style="width: 492px; height: 417px;" /></p>
<p>Before we interpret the pvalue, we see that we have a problem. Three of the categories have an expected value less than 5. If the expected value for any category is less than 5, the results of the test may not be valid. To fix our problem, we can combine categories to achieve the minimum expected count. In fact, we see that Minitab actually already started doing this by combining all episodes with 7 or more deaths.</p>
<p>So we'll just continue by making the highest category 6 or more deaths, and the lowest category 1 or 0 deaths. To do this, I created a new column with the categories 1, 2, 3, 4, 5 and 6. Then I made a frequency column that contained the number of occurrences for each category. For example, the "1" category is a combination of episodes with 0 deaths and 1 death, so there were 15 occurrences. Then I ran the analysis again with the new categories.</p>
<p style="marginleft: 40px;"><img alt="GoodnessofFit Test for Poisson Distribution " src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/93551e38ce5c4cc5321c249fee184e24/gof_test_poisson_2.jpg" style="width: 420px; height: 323px;" /></p>
<p>Now that all of our categories have expected counts greater than 5, we can examine the pvalue. If the pvalue is less than the significance level (usually 0.05 works well), you can conclude that the data do not follow a Poisson distribution. But in this case the pvalue is 0.228, which is greater than 0.05. Therefore, we cannot conclude that the data do not follow the Poisson distribution, and can continue with analyses that assume the data follow a Poisson distribution. </p>
Confidence Interval for 1Sample Poisson Rate
<p>When you have data that come from a Poisson distribution, you can use <strong>Stat > Basic Statistics > 1Sample Poisson Rate</strong> to get a rate of occurrence and calculate a range of values that is likely to include the population rate of occurrence. We'll perform the analysis on our data.</p>
<p style="marginleft: 40px;"><img alt="1Sample Poisson Rate" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/fe2c58f624104b6fb687d378929b1f9b/Image/259b9b0cb11fed7e5b7467703f7037ad/1_poisson_rate.jpg" style="width: 489px; height: 133px;" /></p>
<p>The rate of occurrence tells us that on average there are about 3.2 deaths per episode on <em>Game of Thrones</em>. If our 57 episodes were a sample from a much larger population of <em>Game of Thrones</em> episodes, the confidence interval would tell us that we can be 95% confident that the population rate of deaths per episode is between 2.8 and 3.7.</p>
<p>The length of observation lets you specify a value to represent the rate of occurrence in a more useful form. For example, suppose instead of deaths per episode, you want to determine the number of deaths per season. There are 10 episodes per season. So because an individual episode represents 1/10 of a season, 0.1 is the value we will use for the length of observation. </p>
<p style="marginleft: 40px;"><img alt="1Sample Poisson Rate" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/fe2c58f624104b6fb687d378929b1f9b/Image/b6fa9d2e740aacc86d4223ea75487d95/1_poisson_rate_season.jpg" style="width: 495px; height: 106px;" /></p>
<p>With a different length of observation, we see that there are about 32 deaths per season with a confidence interval ranging from 28 to 37.</p>
Poisson Regression
<p>The last thing we'll do with our Poisson data is perform a regression analysis. In Minitab, go to <strong>Stat > Regression > Poisson Regression > Fit Poisson Model</strong> to perform a Poisson regression analysis. We'll look at whether we can use the episode number (1 through 10) to predict how many deaths there will be in that episode.</p>
<p style="marginleft: 40px;"><img alt="Poisson Regression" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/fe2c58f624104b6fb687d378929b1f9b/Image/0540d6716d13c4de50421155038b2c03/poisson_regression.jpg" style="width: 402px; height: 238px;" /></p>
<p>The first thing we'll look at is the pvalue for the predictor (episode). The pvalue is 0.042, which is less than 0.05, so we can conclude that there is a statistically significant association between the episode number and the number of deaths. However, the Deviance RSquared value is only 18.14%, which means that the episode number explains only 18.14% of the variation in the number of deaths per episode. So while an association exists, it's not very strong. Even so, we can use the coefficients to determine how the episode number affects the number of deaths. </p>
<p style="marginleft: 40px;"><img alt="Poisson Regression" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/fe2c58f624104b6fb687d378929b1f9b/Image/adb7514fd7892c3b8591895321c96918/poisson_regression_2.jpg" style="width: 241px; height: 227px;" /></p>
<p>The episode number was entered as a categorical variable, so the coefficients show how each episode number affects the number of deaths relative to episode number 1. A positive coefficient indicates that episode number is likely to have more deaths than episode 1. A negative coefficient indicates that episode number is likely to have fewer deaths than episode 1.</p>
<p>We see that the start of each season usually starts slow, as 7 of the 9 episode numbers have positive coefficients. Episodes 8, 9, and 10 have the highest coefficients, meaning relative to the first episode of the season they have the greatest number of deaths. So even though our model won't be great at predicting the exact number of deaths for each episode, it's clear that the show ends each season with a bang.</p>
<p>And considering episode 8 of the current season airs this Sunday, if you're a <em>Game of Thrones</em> viewer you should brace yourself, because death is coming. Or, as they would say in Essos:</p>
<p><em>Valar morghulis.</em></p>
Data Analysis
Fun Statistics
Statistics
Statistics in the News
Fri, 10 Jun 2016 12:03:00 +0000
http://blog.minitab.com/blog/thestatisticsgame/poissondataexaminingthenumberdeathsinanepisodeofgameofthrones
Kevin Rudy

A Six Sigma Healthcare Project, part 4: Predicting Patient Participation with Binary Logistic ...
http://blog.minitab.com/blog/understandingstatistics/asixsigmahealthcareproject%2Cpart4%3Apredictingpatientparticipationwithbinarylogisticregression
<p>By looking at the data we have about 500 cardiac patients, we've learned that easy access to the hospital and good transportation are key factors influencing participation in a rehabilitation program.</p>
<p><span style="lineheight: 20.8px;"><img alt="monitor" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/96f126b828fb05a099854d278cfba6eb/monitor.jpg" style="margin: 10px 15px; float: right; width: 296px; height: 212px;" />Past data shows that each month, about 15 of the patients discharged after cardiac surgery do not have a car. Providing transportation to the hospital might make these patients more likely to join the rehabilitation program, but the costs of such a service </span><span style="lineheight: 1.6;">can't exceed the potential revenue from participation. </span></p>
<p><span style="lineheight: 1.6;">We can use </span><a href="http://blog.minitab.com/blog/understandingstatistics/asixsigmahealthcareprojectpart3creatingbinarylogisticregressionmodelsforpatientparticipation" style="lineheight: 1.6;">the binary logistic regression model developed in part 3</a><span style="lineheight: 1.6;"> to predict probabilities of participation, to identify where </span><span style="lineheight: 20.8px;">transportation assistance</span><span style="lineheight: 1.6;"> might make the biggest impact, and to develop an estimate of how much we could invest in such assistance. </span></p>
<p>Download the <a href="//cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/521a4efabc5f892eef403e8c4a354f9a/cardiacparticipationdata_1.mtw">data set</a> to follow along and try these analyses yourself. If you don't already have Minitab, you can <a href="http://www.minitab.com/products/minitab/freetrial/">download and use our statistical software free for 30 days</a>.</p>
Using the Regression Model to Predict Patient Participation
<p>We want to develop some estimates of the probability of participation based on whether or not a patient has access to transportation. The first step is make some mesh data representing our population. In Minitab, go to <strong>Calc > Create Mesh Data...</strong>, and complete the dialog box as shown below. (The maximum and minimum ranges for Age and Distance are drawn directly from the descriptive statistics for the sample data we used to create our regression model.) </p>
<p style="marginleft: 40px;"><img alt="Make Mesh Data Dialog" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/b8437a0b42a63b84e2dcdd65281a3eef/make_mesh_data.png" style="width: 484px; height: 340px;" /></p>
<p>When you press OK, Minitab adds 2 new columns to the worksheet that contain the 200 different combinations of the levels of these factors. Now we'll add two additional columns, one representing patients who have access to a car, and one representing those who don't. Now our worksheet should include four columns of data as shown:</p>
<p style="marginleft: 40px;"><img alt="mesh data in worksheet" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/68996051dec970430f44a13696661e02/mesh_data_worksheet.png" style="width: 255px; height: 191px;" /></p>
<p>Now we'll go to <strong>Stat > Regression > Binary Logistic Regression > Predict...</strong> Minitab remembers the last regression model that was run; to make sure it's the right one, click the "View Model..." button...</p>
<p style="marginleft: 40px;"><img alt="view model" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/1b1f96f9fcd549add0aca00a958137d1/view_model.png" style="width: 232px; height: 93px;" /></p>
<p>and confirm that the model displayed is the correct one.</p>
<p style="marginleft: 40px;"><img alt="view model" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/fb09621305a2dcc0c898c7ac90eaa79d/view_model.png" style="width: 600px; height: 267px;" /></p>
<p>Next, press the "Predict" button and complete the dialog box using the mesh variables we created, as shown. We can also press the "Storage" button to tell Minitab to store the Fits (the predicted probabilities) for each data point in the worksheet. Note that the column selected for the Mobility term is "Car," so all of these predictions will be based on the equation for patients who have access to a vehicle. </p>
<p style="marginleft: 40px;"><img alt="regression prediction dialog" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/a7d5fe724e64117e1a26d18e21203d5a/prediction_dialog_1.png" style="lineheight: 20.8px; width: 819px; height: 392px;" /></p>
<p>When you click <strong>OK</strong> through all dialogs, Minitab will add a column of data that shows the predicted probability of participation for patients, assuming they have a vehicle. </p>
<p>Now we'll create the predictions for individuals who don't have cars. Press <strong>CTRLE</strong> to edit the previous dialog box. This time, for the M<span style="lineheight: 1.6;">obility column, select "NoCar."</span></p>
<p style="marginleft: 40px;"><img alt="no car" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/f7076a96b1cd37b6d8d38e7e7138631c/prediction_dialog_2.png" style="width: 307px; height: 68px;" /></p>
<p>When you press OK, Minitab recalculates the probabilities for the patients, this time using the equation that assumes they do not have a vehicle. The probabilities of participation for each data point are stored in two columns in the worksheet, which I've renamed PFITSCar and PFITSNo car. </p>
<p style="marginleft: 40px;"><img alt="pfits" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/165b5cbca4f6edb53c10507d49d4438b/pfits_in_worksheet.png" style="width: 404px; height: 306px;" /></p>
Where Can Providing Transportation Make an Impact?
<p>Now we have estimated probabilities of participation for patients with the same age and distance characteristics, both with and without access to a vehicle. It would be helpful to visualize the differences in these probabilities to see where offering transportation might make the biggest impact in increasing participation rates.</p>
<p>First, we'll use Minitab's calculator to compute the difference in probabilities between having and not having a car. Go to <strong>Calc > Calculator...</strong> and complete the dialog as shown: </p>
<p style="marginleft: 40px;"><img alt="calculator" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/5e7a07e8f7f41d8569535006f9d5debd/calculator.png" style="width: 433px; height: 383px;" /></p>
<p>Now we have column of data named "Car  NoCar" that contains the probability difference for patients with the same age and distance characteristics both with and without a vehicle. We can use that column to create a contour plot that offers additional insight into the relationships between the likelihood of participation in the rehabilitation program and a patient's age, distance, and mobility. Select <strong>Graph > Contour Plot...</strong> and complete the dialog as shown: </p>
<p style="marginleft: 40px;"><img alt="contour plot dialog box" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/648a12ddd172dee236c96cccc0c1d0bc/contour_plot_dialog.png" style="width: 531px; height: 344px;" /></p>
<p>Minitab produces this contour plot (we have edited the range of colors from the default):</p>
<p style="marginleft: 40px;"><img alt="contour plot" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/132dc089b5f7a9c2fb849f5427b1c927/contourplot.png" style="width: 576px; height: 384px;" /></p>
<p>From this plot we can see the patients for whom transportation assistance is likely to make the most impact. These are the patients whose age and distance characteristics fall within the darkredcolored area, where access to a vehicle raises the probability of participation by more than 40 percent.</p>
<p>The hospital <em>could </em>use this information to carefully target potential recipients of transportation assistance, but doing so would raise many ethical issues. Instead, the hospital will offer transportation assistance to any potential participant who needs it. The project team decides to calculate the average probability of participation for all patients without access to a vehicle.</p>
<p>To obtain that average, select <strong>Stat > Basic Statistics > Display Descriptive Statistics...</strong> in Minitab, and choose "PFITSNoCar" as the variable. Click on the "Statistics" button to make sure the Mean is among the descriptive statistics being calculated, and click OK. Minitab will display the descriptive statistics you've selected in the Session Window. </p>
<p style="marginleft: 40px;"><img alt="descriptive statistics" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/e14f812d91728a9e17b8ae2dde0d8f30/pfits_nocar_mean.png" style="width: 290px; height: 89px;" /></p>
<p>According to our binary logistic regression model, the average probability of participation for all patients without a car equals 0.1695, which we will round up to .17. Now we can easily calculate an estimated breakeven point for ensuring transport for patients who need it. We have the following information on hand: </p>
<div style="marginleft:40px;">
Patients per month without a car.................................................
15
Average probability of participation without a car...........................
.30
Average number of sessions per participant..................................
29
Revenue per session..................................................................
$23
</div>
<p>Based on these figures, a perpatient maximum for transportation can be calculated as:</p>
<p style="marginleft: 40px;">.17 probability of participation x 29 sessions x $23 per session = $113.39</p>
<p><span style="lineheight: 1.6;">Since about 15 discharged cardiac patients each month do not have a car, we can invest at most 15 x $113.39 = $1700.85/month in transportation assistance. </span></p>
<span style="lineheight: 1.6;">Implementing Transportation Assistance for Patient Participation</span>
<p>As described in the <a href="http://dx.doi.org/10.1080/08982112.2011.553761" target="_blank">article on which inspired this series of posts</a>, the project team evaluated potential improvement options against this this economic calculation and developed a process that brought together patients with cars and those without to carpool to sessions. A pilottest of the process proved successful, and most of the carless patients noted that they would not have participated in the rehabilitation program without the service. </p>
<p>After implementing the new carpool process, the project team revisited the key factors they had considered at the start of the initiative, the number of patients enrolling in the program each month, and the average number of sessions participants attended.</p>
<p>After implementing the carpool process, the average number of sessions attended remained constant at 29. But patient participation rose from 33 to 45 per month, which exceeded the project goal of increasing participation to 36 patients per month. Additional revenues turned out to be circa $96,000 annually.</p>
TakeAway Lessons from This Project Study
<p>If you've read all four parts of this series, you may recall that at the start of the <span style="lineheight: 20.8px;"> </span><span style="lineheight: 20.8px;">Six Sigma</span><span style="lineheight: 20.8px;"> </span><span style="lineheight: 1.6;">project, several stakeholders believed that the problem of low participation could be addressed by creating a nicer brochure for the program, and by encouraging surgeons to tell their patients about it at an earlier point in their treatment. </span></p>
<p>None of those initial ideas wound up being implemented, but the project team succeeded in meeting the project goals by enacting improvements that were supported by their data analysis. For me, this is a core takeaway from this article. </p>
<p>As the authors note, "Often people’s ideas on processes are incorrect, but improvement actions based on these are still being implemented. These actions cause frustrated employees, may not be cost effective, and in the end do not solve the problem."</p>
<p>Thus, the article makes a compelling case for the value of applying data analysis to improve processes in healthcare. "<span style="lineheight: 1.6;">Even when a somewhat more advanced technique like logistic regression modeling is required," the authors write, "exploratory graphics such as boxplots and bar charts point the direction toward a valuable solution."</span></p>
Health Care Quality Improvement
Thu, 09 Jun 2016 12:00:00 +0000
http://blog.minitab.com/blog/understandingstatistics/asixsigmahealthcareproject%2Cpart4%3Apredictingpatientparticipationwithbinarylogisticregression
Eston Martz

A Six Sigma Healthcare Project, part 3: Creating a Binary Logistic Regression Model for Patient ...
http://blog.minitab.com/blog/understandingstatistics/asixsigmahealthcareprojectpart3creatingbinarylogisticregressionmodelsforpatientparticipation
<p>In part 2 of this series, we used graphs and tables to see <a href="http://blog.minitab.com/blog/understandingstatistics/asixsigmahealthcareprojectpart2visualizingtheimpactofindividualfactors">how individual factors affected rates of patient participation</a> in a cardiac rehabilitation program. This initial look at the data indicated that ease of access to the hospital was a very important contributor to patient participation.</p>
<p><img alt="physical therapy facility" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/cf2f4a8979304153c3ea8fd5210215e8/rehab_facility.jpg" style="margin: 10px 15px; float: right; width: 320px; height: 211px;" />Given this revelation, a bus or shuttle service for people who do not have cars might be a good way to increase participation, but only if such a service doesn't cost more than the amount of revenue generated by participation.</p>
<p>A good estimate of that probability will enable us to calculate the breakeven point for such a service. We can use regression to develop a statistical model that lets us do just that.</p>
<p>We have a binary response variable, because only two outcomes exist: a patient either participates in the rehabilitation program, or does not. To model these kinds of responses, we need to use a statistical method called "Binary Logistic Regression." This may sound intimidating, but it's really not as scary as it sounds, especially with a statistical software package like Minitab.</p>
<p>Download the <a href="//cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/521a4efabc5f892eef403e8c4a354f9a/cardiacparticipationdata_1.mtw">data set</a> to follow along and try these analyses yourself. If you don't already have Minitab, you can <a href="http://www.minitab.com/products/minitab/freetrial/">download and use our statistical software free for 30 days</a>.</p>
Using Stepwise Binary Logistic Regression to Obtain an Initial Model
<p>First, let's review our data. We know the gender, age, and distance from the hospital for 500 cardiac patients. We also know whether or not they have access to a vehicle ("Mobility") and whether or not they participated in the rehabilitation program after their surgery (coded so that 0 = no, and 1 = yes). </p>
<p style="marginleft: 40px;"><img alt="data" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/98b12f3c127d5370169e1eee44679577/data_snapshot_for_blr.png" style="width: 339px; height: 348px;" /></p>
<p>The process of developing a regression equation that can predict a response based on your data is called "Fitting a model." We'll do this in Minitab by selecting <strong>Stat > Regression > Binary Logistic Regression > Fit Binary Logistic Model...</strong> </p>
<p style="marginleft: 40px;"><img alt="Binary Logistic Regression menu" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/4d475b704b654470760bc8fb561d6547/binary_logistic_regression_menu.png" style="width: 633px; height: 367px;" /></p>
<p>In the dialog box, we need to select the appropriate columns of data for the response we want to predict, and the factors we wish to base the predictions on. In this case, our response variable is "Participation," and we're basing predictions on the continuous factors of "Age" and "Distance," along with the categorical factor "Mobility." </p>
<p style="marginleft: 40px;"><img alt="binary logistic regression dialog 1" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/2cefc4466635a596b45a8a7d58472486/binary_logistic_regression_dialog_1.png" style="width: 580px; height: 494px;" /></p>
<p>After selecting the factors, click on the "Model" button. This lets us tell Minitab whether we want to consider interactions and polynomial terms in addition to the main effects of each factor. Complete the Model dialog as shown below. To include the twoway interactions in the model, highlight all the items in the Predictors window, make sure that the “Interactions through order:” dropdown reads “2,” and press the Add button next to it:</p>
<p style="marginleft: 40px;"><img alt="Binary Logistic Regression Dialog 2" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/2687d8f135fa8f4ef8822d47e72e357a/binary_logistic_regression_model_dialog.png" style="width: 520px; height: 542px;" /></p>
<p>Click OK to return to the main dialog, then press the “Coding” button. In this subdialog, we can tell Minitab to automatically standardize the continuous predictors, Age and Distance. There are several reasons you might want to standardize the continuous predictors, and different ways of standardizing depending on your intent.</p>
<p>In this case, we’re going to standardize by subtracting the mean of the predictor from each row of the predictor column, then dividing the difference by the standard deviation of the predictor. This centers the predictors and also places them on a similar scale. This is helpful when a model contains highly correlated predictors and interaction terms, because standardizing helps reduce multicollinearity and improves the precision of the model’s estimated coefficients. To accomplish this, we just need to select that option from the dropdown as shown below:</p>
<p style="marginleft: 40px;"><img alt="Binary Logistic Regression  Coding" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/429d47a6cf7b5a9e63363229512691f2/binary_logistic_regression_coding_dialog.png" style="width: 519px; height: 560px;" /></p>
<p>After you click OK to return to the main dialog, press the "Stepwise" button. We use this subdialog to perform a stepwise selection, which is a technique that automatically chooses the best model for your data. Minitab will evaluate several different models by adding and removing various factors, and select the one that appears to provide the best fit for the data set. You can have Minitab provide details about the combination of factors it evaluates at each "step," or just show the recommended model<span style="lineheight: 1.6;">.</span></p>
<p style="marginleft: 40px;"><span style="lineheight: 1.6;"><img alt="Binary Logistic Regression  stepwise" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/f9a8375d2b429a1b7bd475182b6b1461/binary_logistic_regression_dialog_3.png" style="width: 490px; height: 542px;" /> </span></p>
<p>Now click OK to close the Stepwise dialog, and OK again to run the analysis. The output in Minitab's Session window will include details about each potential model, followed by a summary or "deviance" table for the recommended model.</p>
<span style="lineheight: 1.6;">Assessing and Refining the Regression Model</span>
<p><span style="lineheight: 1.6;">Using software to perform stepwise regression is extremely helpful, but it's always important to check the recommended model to see if it can be refined further. In this case, all of the model terms are significant, and the deviance table's adjusted R2 indicates that the model explains about 40 percent of the observed variation in the response data. </span></p>
<p style="marginleft: 40px;"><img alt="stepwise regression selected model" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/f6a95017c24e7ed0f315471d685a91c6/output_deviance_table.png" style="width: 520px; height: 289px;" /></p>
<p>We also want to look at the table of coded coefficients immediately below the summary. The final column of the table lists the VIFs, or variance inflation factors, for each term in the model. This is important because VIF values greater than 5–10 can indicate unstable coefficients that are difficult to interpret.</p>
<p>None of these terms have VIF values over 10<span style="lineheight: 1.6;">. </span></p>
<p style="marginleft: 40px;"><img alt="variance inflation factors (VIF)" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/bc95899d33a4511e92288e158693e39d/output_coded_coefficients.png" style="width: 307px; height: 182px;" /></p>
<p>Minitab also performs goodnessoffit tests that assess how well the model predicts observed data. The first two tests, the deviance and Pearson chisquared tests, have high pvalues, indicating that these tests do not support the conclusion that this model is a poor fit for the data. However, the low pvalue for the HosmerLemeshow test indicates that the model could be improved.</p>
<p style="marginleft: 40px;"><img alt="goodnessoffit tests" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/5bf6a28128eebbed608253c122c3daf6/output_goodness_of_fit_tests.png" style="width: 364px; height: 119px;" /></p>
<p>It may be that our model does not account for curvature that exists in the data. We can ask Minitab to add polynomial terms, which model curvature between predictors and the response, to see if it improves the model. Press CTRLE to recall the binary logistic regression dialog box, then press the "Model" button. To add the polynomial terms, select Age and Distance in the Predictors window, make sure that "2" appears in the “Terms through order:” dropdown, and press "Add" to add those polynomial terms to the model. An order 2 polynomial is the square of the predictor.</p>
<p style="marginleft: 40px;"><img alt="binary logistic regression dialog 4" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/d2cd01d9ac0a49a57885801204896885/model_dialog_adding_polynomial_terms.png" style="width: 520px; height: 542px;" /></p>
<p>You may have noticed that we did not select “Mobility” above. Why? Because that categorical variable is coded with 1’s and 0’s, so the polynomial term would be identical to the term that is already in the model.</p>
<p>Now press OK all the way out to have Minitab evaluate models that include the polynomial terms. Minitab generates the following output:: </p>
<p style="marginleft: 40px;"><img alt="binary logistic regression final model" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/ad1c37024ed6de634bb5215f448b2227/model_with_polynomials_deviance_table.png" style="width: 483px; height: 311px;" /></p>
<p>However, the VIFs for Mobility and the Distance*Mobility interaction remain higher than desirable:</p>
<p style="marginleft: 40px;"><img alt="VIF" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/173c061a582e9c7e3d00ceba0c65c94d/binary_logistic_regression_model_2_coefficients.png" style="width: 349px; height: 184px;" /></p>
<p>So far, so good—all model terms are significant, and the adjusted R2 indicates that the new model accounts for 51 percent of the observed variation in the response, compared to the initial model’s 40 percent. The coefficients are also acceptable, with no variance inflation factors above 10. These terms are moderately correlated, but probably not enough to make the regression results unreliable: </p>
<p style="marginleft: 40px;"><img alt="binarylogisticregressionmodelVIF" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/7935289a46fbd462c1f13d8c901fc94b/model_with_polynomials_coefficients.png" style="width: 313px; height: 188px;" /></p>
<p>The goodnessoffit tests for this model also look good—the lack of pvalues below 0.05 indicate that these tests do not suggest the model is a poor fit for the observed data.</p>
<p style="marginleft: 40px;"><img alt="finalbinarylogisticregressionmodelgoodnessoffittests" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/6d188aa3bda808ab62df9f4ef08f692c/model_with_polynomials_goodness_of_fit_tests.png" style="width: 342px; height: 118px;" /></p>
The Binary Logistic Regression Equations
<p><span style="lineheight: 1.6;">This model seems like the best option for predicting the probability of patient participation in the program. Based on the available data, Minitab has calculated the following regression equations, one that predicts the probability of attendance for people who have access to their own transportation, and one for those who do not: </span></p>
<p style="marginleft: 40px;"><img alt="regression equations" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/25c8ca3c2f4b195e446bf18b75fffee8/model_with_polynomials_regression_equations.png" style="width: 533px; height: 112px;" /></p>
<p><span style="lineheight: 1.6;">In the next post, we'll complete this process by <a href="http://blog.minitab.com/blog/understandingstatistics/asixsigmahealthcareproject%2Cpart4%3Apredictingpatientparticipationwithbinarylogisticregression">using this model to make predictions about the probability of participation</a> </span><span style="lineheight: 20.8px;">in the rehabilitation program </span><span style="lineheight: 1.6;">and how much we can afford to invest in transportation to help more cardiac patients. </span></p>
Health Care Quality Improvement
Tue, 07 Jun 2016 12:00:00 +0000
http://blog.minitab.com/blog/understandingstatistics/asixsigmahealthcareprojectpart3creatingbinarylogisticregressionmodelsforpatientparticipation
Eston Martz

3 Ways to Get Up and Running with Statistical Software—Fast
http://blog.minitab.com/blog/realworldqualityimprovement/3waystogetupandrunningwithstatisticalsoftware%E2%80%94fast
<p><img alt="running" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/ccb8f6d634644afba43256c623a7b437/Image/6670317de85d2c7240a30951338eadd2/up_and_running.jpg" style="width: 300px; height: 171px; float: right; margin: 10px 15px;" />The last thing you want to do when you purchase a new piece of software is spend an excessive amount of time getting up and running. You’ve probably been ready to the use the software since, well, <em>yesterday.</em> Minitab has always focused on making our software easy to use, but many professional software packages do have a steep learning curve.</p>
<p>Whatever package you’re using, here are three things you can do to speed the process of starting to analyze your data with statistical software:</p>
1. Get Technical Support
<p>If you’re having trouble figuring out how to do something in a statistical software package, the makers of the software should be ready to provide the assistance you need.</p>
<p>When you purchase Minitab, whether for a single user or for your entire organization, we offer <a href="https://www.minitab.com/support/" target="_blank">free technical support</a>, by phone or online, to help you install and use the software. We’ve also got quickstart installation guides and an <a href="http://support.minitab.com/installation/" target="_blank">extensive library of installationrelated FAQs</a> to browse.</p>
<p>Minitab’s technical support team includes specialists in <span style="lineheight: 20.8px;">statistics and quality improvement, as well as</span><span style="lineheight: 20.8px;"> </span><span style="lineheight: 1.6;">technology, so they can assist with virtually any challenge you encounter while using the software.</span></p>
2. Consult Help
<p>Let’s face it, when a problem arises, the documentation for a lot of software is not all that helpful. That’s why many of us tend to ignore the “Help” menu when we encounter a softwarerelated question. But if you haven’t explored the Help options offered by your statistical software, you should check them out.</p>
<p>Most software have some sort of builtin Help content, but our team has taken it a step further by offering truly useful, valuable information within Minitab. That information includes concise overviews of major statistical topics, guidance for setting up your data, information on methods and formulas, comprehensive guidance for completing dialog boxes, and easytofollow examples. And that’s just the start. Minitab’s builtin help options also include:</p>
<p style="marginleft:1.0in;"><strong><a href="https://www.minitab.com/products/minitab/assistant/" target="_blank">The Assistant</a>:</strong> You certainly don’t need to be a statistics expert to get the insight you need from your data. Minitab’s Assistant menu interactively guides you through several types of analyses—including Measurement Systems Analysis, Capability Analysis, Hypothesis Tests, Control Charts, DOE and Multiple Regression.</p>
<p style="marginleft:1.0in;"><strong>StatGuide: </strong>After you analyze your data, the builtin StatGuide helps you interpret statistical graphs and tables in a practical, straightforward way. To access the StatGuide, just rightclick on your output, press Shift+F1 on the keyboard, or click the StatGuide icon in the toolbar:</p>
<p style="marginleft:1.0in;"><img alt="stat guide" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/ccb8f6d634644afba43256c623a7b437/Image/acf7d51c157e9d9e29d863a398301135/stat_guide.jpg" style="width: 543px; height: 53px;" /></p>
<p style="marginleft:1.0in;"><strong>Tutorials: </strong>For a refresher on statistical tasks, take a look at builtin tutorials (<strong>Help > Tutorials</strong>), which include an overview of data requirements, stepbystep instructions, and guidance on interpreting the results.</p>
3. Free Web Site Resources
<p>See what kinds of material exists on the web site of your statistical software package. There may be much more there than basic information about the product!</p>
<p><span style="lineheight: 1.6;">For instance, at the Minitab web site you can attend </span><a href="https://www.minitab.com/support/webinars/" style="lineheight: 1.6;" target="_blank">live webinars</a><span style="lineheight: 1.6;">, view </span><a href="https://www.minitab.com/support/videos/" style="lineheight: 1.6;" target="_blank">recorded webcasts</a><span style="lineheight: 1.6;">, and read stepbystep howto’s and detailed </span><a href="http://www.minitab.com/articles/" style="lineheight: 1.6;" target="_blank">technical articles</a><span style="lineheight: 1.6;">. The </span><a href="http://blog.minitab.com/" style="lineheight: 1.6;" target="_blank">Minitab Blog</a><span style="lineheight: 1.6;"> also offers tips and techniques for using Minitab in quality improvement projects, research, and more.</span></p>
<p>Perhaps my favorite resource on Minitab.com is the <a href="http://support.minitab.com/minitab/17/" target="_blank">Minitab Product Support Section</a>, which features a getting started guide, a topic library with all the various analyses available in Minitab, a free data set library to practice analyses, and a macro library that contains over 100 helpful macros you can use to automate, customize, and extend the functionality of Minitab analyses.</p>
<p><em>Interested in learning more about our pricing and licensing options? Visit <a href="http://www.minitab.com/products/minitab/pricing" target="_blank">http://www.minitab.com/products/minitab/pricing</a> and <a href="https://www.minitab.com/contactus/" target="_blank">contact us</a> if you have questions.</em></p>
Project Tools
Quality Improvement
Statistics Help
Stats
Mon, 06 Jun 2016 12:01:00 +0000
http://blog.minitab.com/blog/realworldqualityimprovement/3waystogetupandrunningwithstatisticalsoftware%E2%80%94fast
Carly Barry

The Life You Improve May Be Your Own: Honing Healthcare with Statistical Data Analysis
http://blog.minitab.com/blog/statisticsandqualitydataanalysis/thelifeyouimprovemaybeyourown%3Ahoninghealthcarewithstatisticaldataanalysis
<p>What does the eyesight of a homeless person have in common with complications from dental anesthesia? Or with reducing sideeffects from cancer? Or monitoring artificial hip implants?</p>
<p>These are all subjects of recently published studies that use <a href="http://www.minitab.com/products/minitab/">statistical analyses in Minitab</a> to improve healthcare outcomes. And they're a good reminder that when we improve the quality of healthcare for others, we improve it for ourselves.</p>
Vision care for the homeless
<p><img alt="eye" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/ba6a552e3bc04eed9c9aeae3ade49498/Image/2310fb293d97f96dcc70d23cf643a21d/eye_cropped.jpg" style="width: 150px; height: 101px; float: right; margin: 10px 15px;" />A recent retrospective review study was the first to investigate the visual healthcare needs of homeless people in the United Kingdom. Using clinical records of over 1,000 homeless individuals in East London who sought vision care, researchers summarized the demographics of this specialneeds population and established baseline reference levels for future studies.</p>
<p>Using ttests in Minitab, they determined that the homeless population tend to have more eye problems and greater need for visual care than the general population. Although vision problems might appear to be a secondary issue for those facing the constellation of severe, chronic problems often associated with homelessness, researchers point out that even something simple as a spectacle correction can substantially improve a person's quality of life. <a href="http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4754935/" target="_blank">BMC Health Services Research 2016; 16:54.</a></p>
Reducing complications from local anesthesia in oral surgery
<p style="margin: 0in 0in 8pt;"><img alt="dentistry" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/ba6a552e3bc04eed9c9aeae3ade49498/Image/d7ef97a0f5d3e110b1c0419ad9bd4456/dentistry_croppsed2.jpg" style="width: 150px; height: 93px; float: right; margin: 10px 15px;" />Noting the proven ability of Six Sigma methodology to increase patient compliance and satisfaction, as well as hospital profitability, investigators applied quality improvement tools to identify and reduce the most common complications from local anesthesia in dental and oral surgery.</p>
<p style="margin: 0in 0in 8pt;">They used a Pareto chart to identify the most common complications, and a binomial capability analysis to evaluate the rate of complications before and after implementing remedial measures. The results showed a significant reduction in complications from local anesthesia (preimprovement % defective 7.99 (95% CI 6.65, 9.51), vs postimprovement % defective 4.58 (95% CI 3.58, 5.77). <a href="http://www.jcdr.net/article_fulltext.asp?issn=0973709x&year=2015&volume=9&issue=12&page=ZC34&issn=0973709x&id=6989" target="_blank">Journal Clinical & Diagnostic Research. 2015;9(12) ZC34ZC38.</a></p>
Exercise, quality of life, and fatigue in breast cancer patients
<p><img alt="woman walking" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/ba6a552e3bc04eed9c9aeae3ade49498/Image/0b09cbbad0b8e97dbc72af01e0050c2c/woman_walking.jpg" style="width: 74px; height: 151px; float: right; margin: 10px 15px;" />Researchers explored associations between physical activity in women with breast cancer and their quality of life and levels of fatigue. Descriptive statistics were used to summarize characteristics of the study group. The nonparametric KruskalWallis was used to evaluate differences in median scores, and Pearson's chisquare test was used to explore possible associations between categorical variables. </p>
<p>The authors found a significant positive correlation between increased physical activity level and a higher quality of life, as well as less fatigue. Although the study didn't prove a causal connection, their results support other studies that suggest that physical activity may help preserve quality of life and reduce side effects during cancer treatment. <a href="http://www.scielo.br/scielo.php?script=sci_arttext&pid=S010442302016000100038&lng=en&nrm=iso&tlng=en" target="_blank">Rev Assoc Med Bras. 2016, 62(1).</a></p>
Monitoring results of total hip replacement
<p><img alt="hippy" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/ba6a552e3bc04eed9c9aeae3ade49498/Image/a9254abbdc2ffa39c257ab3488038773/hip_pic.jpg" style="width: 120px; height: 118px; float: right; margin: 10px 15px;" />In hip replacement surgery, an important technical factor is the inclination angle of the acetabular component. Variations from the target angle can lead to increased amount of wear and poorer outcomes after surgery. Therefore, researchers used timeweighted control charts in Minitab, such as CUSUM, <a href="http://blog.minitab.com/blog/funwithstatistics/anodetotheewmacontrolchart">EWMA</a>, and MA charts, to monitor the acetabular inclination angle in the postoperative radiographs of patients who underwent hip replacement surgery. The control charts demonstrated that the surgical process, in relation to the angle achieved, was stable and in control. The researchers noted that the timeweighed control charts helped them make a "faster visual decision." <a href="http://www.hindawi.com/journals/bmri/2015/129610/" target="_blank">Biomed Research International 2015; ID 199610</a>.</p>
Additional Questions
<p style="margin: 0in 0in 8pt;">What other types of quality improvement studies are being published in the fields of health and medicine? What are the overall trends for these studies? And how can the studies themselves be improved?</p>
<p style="margin: 0in 0in 8pt;">We'll look at that in my next post.</p>
Fri, 03 Jun 2016 12:00:00 +0000
http://blog.minitab.com/blog/statisticsandqualitydataanalysis/thelifeyouimprovemaybeyourown%3Ahoninghealthcarewithstatisticaldataanalysis
Patrick Runkel

Regression versus ANOVA: Which Tool to Use When
http://blog.minitab.com/blog/michelleparet/regressionversusanova%3Awhichtooltousewhen
<p>Suppose you’ve collected data on cycle time, revenue, the dimension of a manufactured part, or some other metric that’s important to you, and you want to see what other variables may be related to it. Now what?</p>
<img alt="thinker" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/6060c2dbf5d9449babe268eade74814a/Image/a8cd9157d0cc1e5f75d11dc7f67e0fcd/thinkergorilla.jpg" style="lineheight: 20.8px; float: left; width: 225px; height: 218px; margin: 10px 15px; borderwidth: 1px; borderstyle: solid;" />
<div>
<p>When I graduated from college with my first statistics degree, my diploma was bona fide proof that I'd endured hours and hours of classroom lectures on various statistical topics, including <a href="http://blog.minitab.com/blog/adventuresinstatistics/regressionanalysistutorialandexamples" style="lineheight: 1.6;">linear regression</a>, <a href="http://blog.minitab.com/blog/adventuresinstatistics/understandinganalysisofvarianceanovaandtheftest" style="lineheight: 1.6;">ANOVA</a>, and <a href="http://blog.minitab.com/blog/funwithstatistics/analyzingtitanicsurvivalratespartiiv1" style="lineheight: 1.6;">logistic regression</a>.</p>
<p>However, there wasn’t a single class that put it all together and explained which tool to use when. I have all of this data for my Y and X's and I want to describe the relationship between them, but what do I do now?</p>
<p>Back then, I wish someone had clearly laid out which regression or ANOVA analysis was most suited for this type of data or that. Let's start with how to choose the right tool for a continuous Y…</p>
Continuous Y, Continuous X(s)
<p>Example:</p>
<p> Y: Weights of adult males</p>
<p> X’s: Age, Height, Minutes of exercise per week</p>
<p>What tool should you use? <strong>Regression</strong></p>
<p>Where’s that in Minitab? <strong>Stat > Regression > Regression > Fit Regression Model</strong></p>
<p> </p>
Continuous Y, Categorical X(s)
<p>Example:</p>
<p> Y: Your Mario Kart Wii score</p>
<p> X’s: Wii controller type (racing wheel or standard), whether you stand or sit while playing, character (Mario, Luigi, Yoshi, Bowser, Peach)</p>
<p>What tool should you use? ANOVA</p>
<p>Where’s that in Minitab? <strong>Stat > ANOVA > General Linear Model > Fit General Linear Model</strong></p>
<p> </p>
Continuous Y, Continuous AND Categorical X(s)
<p>Example:</p>
<p> Y: Number of hours people sleep per night</p>
<p> X’s: Age, activity prior to sleeping (none, read a book, watch TV, surf the internet), whether or not the person has young children…“I had a bad dream, I'm thirsty, there’s a monster under my bed!”</p>
<p>What tool should you use? You have a choice of using either <strong>ANOVA or Regression</strong></p>
<p>Where’s that in Minitab? <strong>Stat > ANOVA > General Linear Model > Fit General Linear Model </strong><em>or</em> <strong>Stat > Regression > Regression > Fit Regression Model</strong></p>
<p>I personally prefer GLM because it offers <a href="http://blog.minitab.com/blog/statisticsandqualitydataanalysis/keepthatspecialsomeonehappywhenyouperformmultiplecomparisons">multiple comparisons</a>, which are useful if you have a significant categorical X with more than 2 levels. For example, suppose activity prior to sleep is significant. Comparisons will tell you which of the 4 levels—none, read a book, watch TV, surf the Internet—are significantly different from one another.</p>
<p>Do people who watch TV sleep, on average, the same as people who surf the Internet, but significantly less than people who do nothing or read? Or, perhaps, are internet surfers significantly different from the other three categories? Comparisons help you detect these differences.</p>
Categorical Y
<p>If Y is categorical, then you can use logistic regression for your continuous and/or categorical X’s. The 3 types of logistic regression are:</p>
<p> <strong> Binary: </strong> Y with 2 levels (yes/no, pass/fail)</p>
<p> <strong>Ordinal: </strong>Y with more than 2 levels that have a natural order (low/medium/high)</p>
<p> <strong>Nominal</strong>: Y with more than 2 levels that have no order (sedan/SUV/minivan/truck)</p>
<p>So the next time you have a bunch of X’s and a Y and you want to see if there's a relationship between them, here is a summary of which tool to use when:</p>
<p><img alt="Tool Selection Guide" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/a038d79708c41277dc075ad297c54d62/regression_versus_anova.jpg" style="width: 757px; height: 446px;" /></p>
<p>For stepbystep instructions on how to use General Regression, General Linear Model, or Logistic Regression in Minitab Statistical Software, just navigate to any of these tools in Minitab and click Help in the bottom left corner of the dialog. You will then see ‘example’ located at the top of the Help screen. And Minitab customers can always contact Minitab Technical Support at 8142312682 or <a href="http://www.minitab.com/contactus" style="lineheight: 1.6;">www.minitab.com/contactus</a>. Our Tech Support team is staffed with statisticians, and best of all, accessing them is free!</p>
</div>
ANOVA
Data Analysis
Lean Six Sigma
Quality Improvement
Regression Analysis
Six Sigma
Statistics
Statistics Help
Stats
Thu, 02 Jun 2016 12:00:00 +0000
http://blog.minitab.com/blog/michelleparet/regressionversusanova%3Awhichtooltousewhen
Michelle Paret

A Six Sigma Healthcare Project, part 2: Visualizing the Impact of Individual Factors
http://blog.minitab.com/blog/understandingstatistics/asixsigmahealthcareprojectpart2visualizingtheimpactofindividualfactors
<p>My previous post covered the initial phases of <a href="http://blog.minitab.com/blog/understandingstatistics/asixsigmahealthcareprojectpart1examiningfactorswithaparetochart">a project to attract and retain more patients in a cardiac rehabilitation program</a>, as described in a 2011 <em>Quality Engineering</em> article. A Pareto chart of the reasons enrolled patients left the program indicated that the hospital could do little to encourage participants to attend a greater number of sessions, so the team focused on increasing initial enrollment from 32 to 36 patients per month. </p>
<p><img alt="heart with stethoscope" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/369642f14676a878cf5adb1852aaa3da/heart_stetho.png" style="borderwidth: 1px; borderstyle: solid; margin: 10px 15px; float: right; width: 250px; height: 228px;" />Stakeholders offered several solutions. Before implementing any improvement strategy, however, the team decided to look at how other individual factors influenced patient participation in the program. <span style="lineheight: 20.8px;">Taking this step can help avoid devoting resources to "fixing" factors that have little impact on the outcome. </span></p>
<p>In this post, we will look at how the team analyzed those individual factors. We have (simulated) data from 500 patients, including<span style="lineheight: 1.6;">:</span></p>
<ul>
<li>Address and distance between each patient's home and hospital</li>
<li>Each patient's age and gender</li>
<li>Whether or not the patient had a car</li>
<li>Whether or not the patient participated in the program</li>
</ul>
<p>Download the <a href="//cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/521a4efabc5f892eef403e8c4a354f9a/cardiacparticipationdata_1.mtw">data set</a> to follow along and try these analyses yourself. If you don't already have Minitab, you can <a href="http://www.minitab.com/products/minitab/freetrial/">download and use our statistical software free for 30 days</a>.</p>
<p>The team used simple statistics and graphs to get some preliminary insight into how these different factors affected whether or not patients decided to participate in the rehabilitation program. </p>
Looking at the Influence of Distance on Patient Participation
<p>The team looked first at the influence of distance on participation using a <a href="http://blog.minitab.com/blog/statisticsandqualitydataanalysis/howtothinkoutsidetheboxplot">boxplot</a>. Also known as a boxandwhisker diagram, the boxplot gives you an indication of your data's general shape, central tendency, and variability with a single glance. Displaying boxplots sidebyside lets you easily compare the distribution of data between groups. You can easily compare the central value and spread of the distribution for each group and determine if the data for each group are symmetric about the center.</p>
<p>To create this graph, open the patient data set in Minitab and select <strong>Graph > Boxplot > One Y With Groups</strong>. </p>
<p style="marginleft: 40px;"><img alt="boxplot dialog" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/f73e4ee4b62333f42ca414a64fb4f8ef/boxplot_dialog.png" style="width: 369px; height: 333px;" /></p>
<p>In the dialog box, select "Distance" as the graph variable, choose "Participation" as the categorical variable, and click <strong>OK</strong>. </p>
<p style="marginleft: 40px;"><img alt="Boxplot of Distance dialog" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/0339bfa7eeabfad71ea10e6ce219614f/boxplot_dialog_distance.png" style="width: 531px; height: 344px;" /></p>
<p>Minitab generates the following graph: </p>
<p style="marginleft: 40px;"><img alt="Boxplot of Distance vs. Patient Participation" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/f342cbd7666b4862d2f72ee6f2b7a006/boxplot_of_distance.png" style="width: 576px; height: 384px;" /></p>
<p>The boxplot indicates that patients who live closer to the hospital are more likely to participate in the program. This is valuable, but it would be interesting to know more about the relationship between distance and participation. But because "Participation" is a binary response—a patient either participates, or does not—we can't visualize that relationship directly with graphs that require a <a href="http://blog.minitab.com/blog/understandingstatistics/whyiscontinuousdatabetterthancategoricalordiscretedata">continuous</a> response.</p>
<p>However, to get a bit more insight, the project team <span style="lineheight: 1.6;">divided the patients into groups according to how far away from the hospital they live, then calculated the relative percentage of participation for each group. To do this, select </span><strong style="lineheight: 1.6;">Data > Recode > To Text...</strong><span style="lineheight: 1.6;"> and complete the dialog box using the following groups. The picture below shows only the first five of the seven groups, so here is the complete list: </span></p>
<p style="marginleft: 40px;"><span style="lineheight: 1.6;">Group 1: 0 to 25 km<br />
Group 2: 25 to 35 km<br />
Group 3: 35 to 45 km<br />
Group 4: 45 to 55 km<br />
Group 5: 55 to 65 km<br />
Group 6: 65 to 75 km<br />
Group 7: 75 to 200 km</span></p>
<p style="marginleft: 40px;"><img alt="recode distance" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/969e7bfd3d95b5334ccd96ab32f77c1b/recode_to_text_dialog___distance.png" style="width: 742px; height: 650px;" /></p>
<p>When you recode the data, Minitab creates new columns of coded data and provides a summary in the Session Window:</p>
<p style="marginleft: 40px;"><img alt="distance group summary" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/7681791faefd04fe64315c56b79aa823/table_of_distance_groups.png" style="width: 233px; height: 145px;" /></p>
<p>Minitab automatically names the new column of data "Recoded Distance," which I've renamed as "Distance Group."</p>
<p>To determine the relative frequency of participation among each group, choose <strong>Stat > Tables > Descriptive Statistics...</strong> In the dialog box, select 'Distance Group' as the variable for rows, and Participation as the variable for columns, as shown. Click on the "Categorical Variables" button and make sure 'Counts' and 'Row percents' are selected, then press <strong>OK </strong>twice. </p>
<p style="marginleft: 40px;"><img alt="table of descriptive statistics for distance dialog" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/688f456f34d58bf024a69b589bcbbf3c/table_of_descriptive_statistics_dialog___dist.png" style="width: 466px; height: 314px;" /></p>
<p>In the session window, Minitab will display a table that shows the total number in each distance group, the number participating, and the relative frequency of participation for each group.</p>
<p style="marginleft: 40px;"><img alt="Tabbed Data" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/0abd6c9142f5a42a30c9ed157db5d6d5/tabbed_statistics_for_distance.png" style="width: 408px; height: 277px;" /></p>
<p>If we enter that information into the Minitab worksheet like this: </p>
<p style="marginleft: 40px;"><img alt="table of descriptive statistics for distance" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/89a2c60e9c64f896eaae95acf9501f12/table_of_descriptive_statistics_worksheet__dist.png" style="width: 149px; height: 180px;" /></p>
<p>we can create a scatterplot that reveals more about the relationship between distance and participation. Select <strong>Graph > Scatterplot...</strong>, and choose "With connect line."</p>
<p style="marginleft: 40px;"><img alt="scatterplot dialog" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/e53245e138ca1c4926c44cc1c077d8de/scatterplot_dist_dialog.png" style="width: 370px; height: 326px;" /></p>
<p>Select 'Part %' as the Y variable and 'Distance Grp' as the X variable, and Minitab creates the following graph, which shows the relationship between distance and participation more clearly:</p>
<p style="marginleft: 40px;"><img alt="scatterplot of participation vs distance" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/7d8195c7885fde95610bed82dea9ab21/scatterplot_of_part___vs_distance_group.png" style="width: 576px; height: 384px;" /></p>
<p>We can see that the percentage of participation is very high among patients who live closest to the hospital, but decreases steadily among groups who lived further than 45 miles away. </p>
Looking at the Influence of Age on Patient Participation
<p>We can use the same methods to get initial insight into how age affects a patient's likelihood of participation in the program. The boxplot below indicates age does have some influence on participation: </p>
<p style="marginleft: 40px;"><img alt="Boxplot of Age" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/21190ae61a5e0d82ba73007c9592fde7/boxplot_of_age.png" style="width: 576px; height: 384px;" /></p>
<p>By dividing the patient data into groups based on Age as we did for Distance, as detailed in the table below, we can create a similar rough scatterplot to enhance our understanding of the relationship between these variables. We’ll divide the data as shown here before using <strong>Stat > Tables > Descriptive Statistics…</strong> to determine the relative participation rates:</p>
<p style="marginleft: 40px;"><img alt="table of age groups" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/ea6f727fa2fe1b74e52fca86d984c103/table_of_age_groups.png" style="width: 221px; height: 161px;" /></p>
<p>The scatterplot of the relative frequency of participation for patients in each Age group again yields greater insight into the relationship between this factor and the likelihood of participation. In this case, a much higher percentage of patients in the younger groups take part. </p>
<p style="marginleft: 40px;"><img alt="Scatterplot of participation vs age group" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/d07e5a473647c2bebe92ad90688e269f/scatterplot_of_participation_vs_age_group.png" style="lineheight: 20.8px; width: 576px; height: 384px;" /></p>
Looking at the Influence of Mobility and Gender on Patient Participation
<p>Because both "Mobility" and "Participation" are binary variables, we can select <strong>Stat > Tables > Descriptive Statistics...</strong> to give us a tabular view of the data. Select "Mobility" as the row, and Participation as the columns, and Minitab will provide the following output, which gives you percentages of participation among those patients who do not own a car and those who do. </p>
<p>We can put these data into a bar chart for a quick visual assessment. Minitab offers several ways to accomplish this easily; I opted to place the table data for each variable into the worksheet as shown here:</p>
<p style="marginleft: 40px;"><img alt="Gender and Mobility Data" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/69a65b6e075feb78814d10ce9e86e080/gender_mobiliyt_table_data.png" style="width: 280px; height: 69px;" /></p>
<p>Now, by selecting <strong>Graph > Bar Chart</strong>, and choosing a simple chart in which "Bars represent values from a table"...</p>
<p style="marginleft: 40px;"><img alt="Bar Chart dialog" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/9fa830b35bd8e0f9d99f87f0b6240d86/bar_chart_dialog.png" style="width: 369px; height: 409px;" /></p>
<p>we can create the following bar charts that show the proportion of those with and without cars who participate in the program, and the proportion of men and women who participate: </p>
<p style="marginleft: 40px;"><img alt="Bar Chart of Gender" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/e301e11bdff356c72b3d6d0864f7c27c/bar_chart___gender.png" style="lineheight: 20.8px; width: 576px; height: 384px;" /></p>
<p style="marginleft: 40px;"><img alt="Participation by Mobility" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/8f2ab35e52d0fcd09da34ebcae577cc6/bar_chart___mobility.png" style="width: 576px; height: 384px;" /> </p>
<p>It appears that gender could have a slight influence on participation, but the impact of having a car on participation is clearly an important factor. </p>
<p>An initial look at these factors indicates that access to the hospital is very important in getting people to participate. Offering a bus or shuttle service for people who do not have cars might be a good way to increase participation, but only if such service doesn't cost more than the amount of <span style="lineheight: 20.8px;">increased</span><span style="lineheight: 20.8px;"> </span><span style="lineheight: 1.6;">revenue it might generate by increasing participation. </span></p>
<p>In the next part of this series, we'll use binary logistic regression—which is not as scary as it might sound—to develop <a href="http://blog.minitab.com/blog/understandingstatistics/asixsigmahealthcareprojectpart3creatingbinarylogisticregressionmodelsforpatientparticipation">a model that will let us predict the probability a patient will join the program</a> based on the influence factors we've looked at. A good estimate of that probability will enable us to calculate the breakeven point for such a service. </p>
<p> </p>
Health Care Quality Improvement
Wed, 01 Jun 2016 12:00:00 +0000
http://blog.minitab.com/blog/understandingstatistics/asixsigmahealthcareprojectpart2visualizingtheimpactofindividualfactors
Eston Martz