Data Analysis Software  Minitab
Blog posts and articles with tips for using statistical software to analyze data for quality improvement.
http://blog.minitab.com/blog/dataanalysissoftware/rss
Wed, 27 Jul 2016 09:38:36 +0000
FeedCreator 1.7.3

On Paying Bills, Marriage, and Alert Systems
http://blog.minitab.com/blog/meredithgriffith/onpayingbillsmarriageandalertsystems
<p>When I blogged about <a href="http://blog.minitab.com/blog/meredithgriffith/whatatriptothedentisttaughtusaboutautomation">automation</a> back in March, I made my husband out to be an automation guru. Well, he certainly is. But what you don’t know about my husband is that while he loves to automate everything in his life, sometimes he drops the ball. He’s human; even I have to cut him a break every now and then.</p>
<p>On the other hand, instances of hypocrisy in his behavior tend to make for a good story. So here we are again.</p>
<span style="lineheight: 1.2;">On Paying Bills</span>
<p>When we married 5 years ago and began combining our bank accounts, I learned a few things about my husband. Nothing that I haven’t already shared with you. Because he loves automation, it came as no surprise to me that all his accounts resided in a single online repository (mint.com) where he could view his net worth—assets such as his home and car value, and debts including the loan left on his home and bills and credit card expenses that needed to be paid. He’d also made sure to automate the payment of all loans, utility bills, and credit cards—and the respective account would notify him when a payment was made.</p>
<p>This mint.com account served as one dashboard view of all possible accounts he would otherwise have to access independently to see statements and make payments. It was genius! </p>
<p><img alt="mint" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/dae6c7b7fc2246169d65f04909c20ab1/Image/299516c1c0685e413532648e7a185d6e/mint.jpg" style="width: 1000px; height: 563px;" /></p>
<p>He could set up savings goals, budgets, email alerts for credit card payment reminders and notification of payment, suspicious account activity, and just about any other miscellaneous charge or activity or change in spending habits. It really did make life easier.</p>
<p>Until I entered the picture.</p>
<span style="lineheight: 1.2;">On Marriage</span>
<p>We married, I synced my bank accounts, and we combined cash. I scoured his historical data to observe spending habits—areas where we could save money (Taco Bell topped the ‘high spending’ for the Food/Dining category). As I began poking around his accounts, I noticed a monthly fee his Chase Freedom Visa credit card was charging him. I asked him about the fee; he pleaded ignorance. When I investigated further, I discovered that he’d been charged this fee for <em>years</em>, since he first got the credit card.</p>
<p>I researched online and discovered that other cardholders had complained of being erroneously enrolled in a protection program when they first got their Chase Freedom card, and were being charged a similar fee of varying amounts monthly. Turns out this monthly fee was a percentage of monthly spending—and the Chase Freedom Visa credit card incentivized a cardholder to make all his purchases with that card, given its offer of 5% cash back on all purchases at the time.</p>
<p>Needless to say, I wanted that money back. No less than a few minutes later, we were on the phone with Chase disputing the program enrollment and monthly charges. They acknowledged their error and refunded us the money lost over a span of several years.</p>
<p>The lesson in all of this? Marry someone who’s not afraid to dig through your historical data.</p>
On Alert Systems
<p>More seriously, automating processes or workflows is incredibly helpful, but without the proper attention and alert systems in place, you may still encounter holes in the story. Automation and alerts must go handinhand to be effective—and as a consumer of the information you’re automating, you still must be invested enough to look at the big picture.</p>
<p>For my husband, the beauty in automating his bill payments and aggregating all his accounts on mint.com was to save time he'd otherwise spend paying bills separately and checking cash flows in multiple different accounts. But he failed to set up alerts about important aspects of the process he was automating, and he failed to check in on his process from time to time. Mint.com provides an incredibly useful dashboard to give you the big picture overview of your accounts and your net worth; it also provides a plethora of alert options that save a consumer time from digging for red flags <em>after</em> the undesirable event has become a regular occurrence in the process (like I did). But without checking the status of the system or using its full automation potential, the system is only as good as its inputs until you revisit it or tweak it.</p>
<p>This is just one piece of the puzzle. Alert systems offer so much more!</p>
<ol>
<li><strong>Awareness</strong>—setting alerts through mint.com with regard to miscellaneous fees would have offered insight about the credit card program my husband had been erroneously enrolled in.</li>
<li><strong>Immediate Feedback</strong>—the first time a fee was charged, he would have been able to take immediate action rather than waiting years later for his wife to discover the charge (manually, mind you).</li>
<li><strong>Time Saver</strong>—aside from automating bill pay and combining all accounts into a single repository for a big picture view of one’s financial status (which is certainly a timesaver in reviewing accounts and paying bills in various locations), an alert system would have saved me a lot of time in digging through my husband’s financial data to understand the origin of the fee Chase was charging him.</li>
<li><strong>Money Saver</strong>—while we <em>were </em>refunded all the money charged in monthly fees by Chase, clearly an alert system would have been a more foolproof way to save money in the first place. Alerts are also effective in ensuring bill pay occurs on time, notifying you when a statement has been prepared, when the bill is due, and when the bill has been paid.</li>
</ol>
<p>As process engineers or quality managers in the manufacturing world, you are very close to your process and its inputs. You want to know when something goes wrong, right when it happens. You don’t want a consumer to discover a flaw in a part or product you manufactured and sold years before, only to be faced with product recalls, customer reimbursements, time and money invested to remanufacture and replace the defective product for unhappy customers, and in some cases, lawsuits. The stakes are high.</p>
<p>Minitab offers a solution to this pain point in its RealTime SPC dashboard. The dashboard is completely powered by Minitab Statistical Software, taking the graphs and output you know and love and placing them on customized dashboard views that show the current state of your processes. The dashboard gives you a big picture view of your processes across all your production sites, for instance, and highlights where improvements can be made. You can incorporate any graph or analysis you want—such as histograms, control charts, or process capability analysis. You can automatically generate quality reports about your processes, and set up any alert that will help you respond to defects faster.</p>
<p><img alt="qualityDashboard" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/dae6c7b7fc2246169d65f04909c20ab1/Image/c9c6bb0f36670d640bf29072a830b9d5/qualitydashboard.jpg" style="width: 900px; height: 651px;" /></p>
<p><img alt="spcDashboard" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/dae6c7b7fc2246169d65f04909c20ab1/Image/27347695ab637e3931fe251860d12079/spcdashboard.jpg" style="lineheight: 1.6; width: 900px; height: 665px;" /></p>
<p><span style="lineheight: 1.6;">In the case of my marriage, alert systems are certainly practical from a financial standpoint. But in the world of manufacturing, ensuring alerts are set up around your automated processes has farreaching implications as the time and moneysaving elements of alert systems greatly impacts a company’s bottom line. To learn more about how Minitab can help you, contact us at </span><a href="mailto:sales@minitab.com" style="lineheight: 1.6;">Sales@minitab.com</a><span style="lineheight: 1.6;">.</span></p>
<p>And if you’ve ever thought twice about whether or not you should marry, let this story be an encouragement to you—you may actually find a spouse who can make you richer.</p>
<p> </p>
Automation
Data Analysis
Quality Improvement
Six Sigma
Mon, 25 Jul 2016 12:00:00 +0000
http://blog.minitab.com/blog/meredithgriffith/onpayingbillsmarriageandalertsystems
Meredith Griffith

Can Regression and Statistical Software Help You Find a Great Deal on a Used Car?
http://blog.minitab.com/blog/understandingstatistics/canregressionandstatisticalsoftwarehelpyoufindagreatdealonausedcar
<p>You need to consider many factors when you’re buying a used car. Once you narrow your choice down to a particular car model, you can get a wealth of information about individual cars on the market through the Internet. How do you navigate through it all to find the best deal? By analyzing the data you have available. </p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/710ce579b4120727bf67e8b48f5965e8/240_used_car_kovacs.jpg" style="lineheight: 20.7999992370605px; borderwidth: 1px; borderstyle: solid; margin: 10px 15px; float: right; width: 240px; height: 240px;" /></p>
<p>Let's look at how this works using <a href="http://blog.minitab.com/blog/understandingstatistics/wejustgotridoffivereasonstofeardataanalysis">the Assistant</a> in Minitab 17. With the Assistant, you can use regression analysis to calculate the expected price of a vehicle based on variables such as year, mileage, whether or not the technology package is included, and whether or not a free Carfax report is included.</p>
<p>And it's probably a lot easier than you think. </p>
<p>A search of a leading Internet auto sales site yielded data about 988 vehicles of a specific make and model. After putting the data into Minitab, we choose <strong>Assistant > Regression…</strong></p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/9e87de993a0daa39e6643b8c6d3aed9c/regression_dialog.png" style="width: 395px; height: 247px;" /></p>
<p>At this point, if you aren’t very comfortable with regression, <a href="http://www.minitab.com/products/minitab/assistant/">the Assistant makes it easy to select the right option for your analysis</a>.</p>
A Decision Tree for Selecting the Right Analysis
<p>We want to explore the relationships between the price of the vehicle and four factors, or X variables. Since we have more than one X variable, and since we're not looking to optimize a response, we want to choose Multiple Regression.</p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/bc802d35bfb57ca3b86e061da4fa4b09/regression_decision_tree_w640.png" style="width: 640px; height: 502px;" /></p>
<p>This <a href="//cdn2.content.compendiumblog.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/File/9ecb2280228deb621ee2db7f6fbe300e/used_cars.MTW">data set</a> includes five columns: mileage, the age of the car in years, whether or not it has a technology package, whether or not it includes a free CARFAX report, and, finally, the price of the car.</p>
<p>We don’t know which of these factors may have significant relationship to the cost of the vehicle, and we don’t know whether there are significant twoway interactions between them, or if there are quadratic (nonlinear) terms we should include—but we don’t need to. Just fill out the dialog box as shown. </p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/b93a0a755e8e73dc7f681ea4b1965749/regression_dialog_box.png" style="width: 532px; height: 382px;" /></p>
<p>Press OK and the Assistant assesses each potential model and selects the bestfitting one. It also provides a comprehensive set of reports, including a Model Building Report that details how the final model was selected and a Report Card that notifies you to potential problems with the analysis, if there are any.</p>
Interpreting Regression Results in Plain Language
<p>The Summary Report tells us in plain language that there is a significant relationship between the Y and X variables in this analysis, and that the factors in the final model explain 91 percent of the observed variation in price. It confirms that all of the variables we looked at are significant, and that there are significant interactions between them. </p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/746574a27bba821ffab4f77ae1a2931b/multiple_regression_summary_report_w640.png" style="width: 640px; height: 480px;" /></p>
<p>The Model Equations Report contains the final regression models, which can be used to predict the price of a used vehicle. The Assistant provides 2 equations, one for vehicles that include a free CARFAX report, and one for vehicles that do not.</p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/58598060212558634d62d75a7045bf0b/regression_equation_w640.png" style="width: 640px; height: 186px;" /></p>
<p>We can tell several interesting things about the price of this vehicle model by reading the equations. First, the average cost for vehicles with a free CARFAX report is about $200 more than the average for vehicles with a paid report ($30,546 vs. $30,354). This could be because these cars probably have a clean report (if not, the sellers probably wouldn’t provide it for free).</p>
<p>Second, each additional mile added to the car decreases its expected price by roughly 8 cents, while each year added to the cars age decreases the expected price by $2,357.</p>
<p>The technology package adds, on average, $1,105 to the price of vehicles that have a free CARFAX report, but the package adds $2,774 to vehicles with a paid CARFAX report. Perhaps the sellers of these vehicles hope to use the appeal of the technology package to compensate for some other influence on the asking price. </p>
Residuals versus Fitted Values
<p>While these findings are interesting, our goal is to find the car that offers the best value. In other words, we want to find the car that has the largest difference between the asking price and the expected asking price predicted by the regression analysis.</p>
<p>For that, we can look at the Assistant’s Diagnostic Report. The report presents a chart of Residuals vs. Fitted Values. If we see obvious patterns in this chart, it can indicate problems with the analysis. In that respect, this chart of Residuals vs. Fitted Values looks fine, but now we’re going to use the chart to identify the best value on the market.</p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/d55ae8720ba281bf37135b68b2069434/multiple_regression_diagnostic_report_w640.png" style="width: 640px; height: 480px;" /></p>
<p>In this analysis, the “Fitted Values” are the prices predicted by the regression model. “Residuals” are what you get when you subtract the actual asking price from the predicted asking price—exactly the information you’re looking for! The Assistant marks large residuals in red, making them very easy to find. And three of those residuals—which appear in light blue above because we’ve selected them—appear to be very far below the asking price predicted by the regression analysis.</p>
<p>Selecting these data points on the graph reveals that these are vehicles whose data appears in rows 357, 359, and 934 of the data sheet. Now we can revisit those vehicles online to see if one of them is the right vehicle to purchase, or if there’s something undesirable that explains the low asking price. </p>
<p>Sure enough, the records for those vehicles reveal that two of them have severe collision damage.</p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/5dbbf5aa405d4b2d53ec720657a09556/vehicles.jpg" style="width: 320px; height: 356px;" /></p>
<p>But the remaining vehicle appears to be in pristine condition, and is several thousand dollars less than the price you’d expect to pay, based on this analysis!</p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/640bd720a3d1f8b04713aa0ec321a570/nice_car.png" style="width: 254px; height: 189px;" /></p>
<p>With the power of regression analysis and the Assistant, we’ve found a great used car—at a price you know is a real bargain.</p>
<p> </p>
Data Analysis
Fun Statistics
Regression Analysis
Statistics
Statistics Help
Fri, 22 Jul 2016 10:00:00 +0000
http://blog.minitab.com/blog/understandingstatistics/canregressionandstatisticalsoftwarehelpyoufindagreatdealonausedcar
Eston Martz

High Cpk and a FunnyLooking Histogram: Is My Process Really that Amazing?
http://blog.minitab.com/blog/marilynwheatleysblog/highcpkandafunnylookinghistogramismyprocessreallythatamazing
<p>Here is a scenario involving process capability that we’ve seen from time to time in Minitab's technical support department. I’m sharing the details in this post so that you’ll know where to look if you encounter a similar situation.</p>
<p>You need to run a capability analysis. You generate the output using <a href="http://www.minitab.com/enus/products/minitab/">Minitab Statistical Software</a>. When you look at the results, the Cpk is huge and the histogram in the output looks strange:</p>
<p style="marginleft: 40px;"><img border="0" height="468" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/f6d0da32ba1d41d4ace1af34dcb51351/File/9549037dc2e0a30a77ab36737edeeb09/9549037dc2e0a30a77ab36737edeeb09.png" width="624" /></p>
<p>What’s going on here? The Cpk seems unrealistic at 42.68, the "within" fit line is tall and narrow, and the bars on the histogram are all smashed down. Yet if we use the exact same data to make a histogram using the Graph menu, we see that things don’t look so bad:</p>
<p style="marginleft: 40px;"><img border="0" height="384" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/f6d0da32ba1d41d4ace1af34dcb51351/File/d111d612a239ac72e49fe7d3fccab0f5/d111d612a239ac72e49fe7d3fccab0f5.png" width="576" /></p>
<p><span style="lineheight: 1.6;">So what explains the odd </span><span style="lineheight: 20.8px;">output for the </span><span style="lineheight: 1.6;">capability analysis?</span></p>
<p>Notice that the ‘within subgroup’ variation in the capability output is represented by the tall dashed line in the middle of the histogram. This is the StDev (Within) shown on the left side of the graph. The within subgroup variation of 0.0777 is very small relative to the overall standard deviation. </p>
<p>So what is causing the within subgroup variation to be so small? Another graph in Minitab can give us the answer: The Capability Sixpack. In the case above, the subgroup size was 1 and Minitab’s Capability Sixpack in <strong>Stat</strong> > <strong>Quality Tools</strong> > <strong>Capability Sixpack</strong> > <strong>Normal</strong> will plot the data on a control chart for individual observations, an Ichart:</p>
<p style="marginleft: 40px;"><img border="0" height="468" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/f6d0da32ba1d41d4ace1af34dcb51351/File/46352209a7f75bbfba20794c925fc897/46352209a7f75bbfba20794c925fc897.png" width="624" /></p>
<p>Hmmm...this could be why, in <a href="http://www.minitab.com/enus/services/training/">Minitab training</a>, our instructors recommend using the Capability Sixpack first.</p>
<p>In the Capability Sixpack above, we can see that the individually plotted values on the Ichart show an upward trend, and it appears that the process is <em>not </em>stable and in control (as <span><a href="http://blog.minitab.com/blog/understandingstatistics/ithinkicaniknowicanahighleveloverviewofprocesscapabilityanalysis">it should be for data used in a capability analysis</a></span>). A closer look at the data in the worksheet clearly reveals that the data was sorted in ascending order:</p>
<p style="marginleft: 40px;"><img border="0" height="265" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/f6d0da32ba1d41d4ace1af34dcb51351/File/44945e3490bf95cfc2c796618195b75e/44945e3490bf95cfc2c796618195b75e.png" width="103" /></p>
<p>Because the withinsubgroup variation for data not collected in subgroups is estimated based on the <a href="http://blog.minitab.com/blog/marilynwheatleysblog/whatsamovingrangeandhowisitcalculated">moving ranges</a> (average of the distance between consecutive points), sorting the data causes the withinsubgroup variation to be very small. With very little withinsubgroup variation we see a very tall, narrow fit line that represents the within subgroup variation, and that is ‘smashing down’ the bars on the histogram. We can see this by creating a histogram in the Graph menu and forcing Minitab to use a very small standard deviation (by default this graph uses the overall standard deviation that is used when calculating Ppk): <strong>Graph</strong> > <strong>Histogram </strong>> <strong>Simple</strong>, enter the data, click <strong>Data View</strong>, choose the <strong>Distribution </strong>tab, check <strong>Fit distribution</strong> and for the Historical StDev enter 0.0777, then click <strong>OK</strong> and now we get:</p>
<p style="marginleft: 40px;"><img src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/f6d0da32ba1d41d4ace1af34dcb51351/File/a20807e5892fdb0823bccf0828b5f585/a20807e5892fdb0823bccf0828b5f585.png" /></p>
<p style="marginleft: 40px;"><img src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/f6d0da32ba1d41d4ace1af34dcb51351/File/421afe5e10ba17256da42ac98bf11192/421afe5e10ba17256da42ac98bf11192.png" /></p>
<p>Mystery solved! And if you still don’t believe me, we can get a better looking capability histogram by randomizing the data first (<strong>Calc</strong> > <strong>Random Data</strong> > <strong>Sample From Columns</strong>):</p>
<p style="marginleft: 40px;"><img border="0" height="312" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/f6d0da32ba1d41d4ace1af34dcb51351/File/7f92652636a67fd8f17396bcb52e960c/7f92652636a67fd8f17396bcb52e960c.png" width="397" /></p>
<p>Now if we run the capability analysis using the randomized data in C2 we see:</p>
<p style="marginleft: 40px;"><img border="0" height="468" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/f6d0da32ba1d41d4ace1af34dcb51351/File/5c4c93d603abfb79a1f49b4c412c689b/5c4c93d603abfb79a1f49b4c412c689b.png" width="624" /></p>
<p>A note of caution: I’m <strong><em>not </em></strong>suggesting that the data for a capability analysis should be randomized. The moral of the story is that the data in the worksheet should be entered in the order it was collected so that it is representative of the normal variation in the process (i.e., the data should not be <em>sorted</em>). </p>
<p>Too bad our Cpk doesn’t look as amazing as it did before…now it's time to get to <a href="http://blog.minitab.com/blog/michelleparet/howtoimprovecpk">work with Minitab to improve our Cpk</a>!</p>
Capability Analysis
Data Analysis
Lean Six Sigma
Quality Improvement
Reliability Analysis
Wed, 20 Jul 2016 12:00:00 +0000
http://blog.minitab.com/blog/marilynwheatleysblog/highcpkandafunnylookinghistogramismyprocessreallythatamazing
Marilyn Wheatley

DOE Center Points: What They Are & Why They're Useful
http://blog.minitab.com/blog/michelleparet/doecenterpointswhattheyarewhytheyreuseful
<p><a href="http://blog.minitab.com/blog/statisticsandqualitydataanalysis/designofexperimentdoe:searchingforaselfiefountainofyouth">Design of Experiments</a> (DOE) is the perfect tool to efficiently determine if key inputs are related to key outputs. Behind the scenes, DOE is simply a regression analysis. What’s not simple, however, is all of the choices you have to make when planning your experiment. What X’s should you test? What ranges should you select for your X’s? How many replicates should you use? Do you need center points? Etc. <span style="lineheight: 1.6;">So let’s talk about center points.</span></p>
What Are Center Points?
<p>Center points are simply experimental runs where your X’s are set halfway between (i.e., in the center of) the low and high settings. For example, suppose your DOE includes these X’s:</p>
<p><img alt="TimeAndTemp" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/6060c2dbf5d9449babe268eade74814a/Image/a353ac7271581a7dadcf8dac48e33d3f/timeandtemp.jpg" style="width: 300px; height: 80px;" /></p>
<p>The center point would then be set midway at a Temperature of <strong>150 °C</strong> and a Time of <strong>20 seconds</strong>.</p>
<p>And your data collection plan in <a href="http://www.minitab.com/products/minitab/">Minitab Statistical Software</a> might look something like this, with the center points shown in blue:</p>
<p><img alt="Minitab Worksheet" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/6060c2dbf5d9449babe268eade74814a/Image/32105497dbf355948fd66f57cde703bf/minitabworksheet.jpg" style="width: 361px; height: 320px;" /></p>
<p>You can have just 1 center point, or you can collect data at the center point multiple times. This particular design includes 2 experimental runs at the center point. Why pick 2, you may be asking? We’ll talk about that in just a moment.</p>
Why Should You Use Center Points in Your Designed Experiment?
<p>Including center points in a DOE offers many advantages:</p>
<strong><em>1. Is Y versus X linear?</em></strong>
<p>Factorial designs assume there’s a linear relationship between each X and Y. Therefore, if the relationship between any X and Y exhibits curvature, you shouldn’t use a factorial design because the results may mislead you.</p>
<p>So how do you statistically determine if the relationship is linear or not? With center points! If the center point pvalue is significant (i.e., less than alpha), then you can conclude that curvature exists and use response surface DOE—such as a central composite design—to analyze your data. While factorial designs can <em>detect </em>curvature, you have to use a response surface design to <em>model</em> (build an equation for) the curvature.</p>
<p><img alt="Bad Fit Factorial Design" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/6060c2dbf5d9449babe268eade74814a/Image/53c32bd3909d45e8354cb646226163c8/bad_fit.jpg" style="width: 300px; height: 200px; marginleft: 5px; marginright: 5px;" /><img alt="Good Fit Response Surface Design" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/6060c2dbf5d9449babe268eade74814a/Image/253453eaca36868d557cc80e23c8e4de/good_fit.jpg" style="width: 300px; height: 200px;" /></p>
<p>And the good news is that curvature often indicates that your X settings are near an optimum Y, and you've discovered insightful results!</p>
<strong><em>2. Did you collect enough data?</em></strong>
<p>If you don’t collect enough data, you aren’t going to detect significant X’s even if they truly exist. One way to increase the number of data points in a DOE is to use replicates. However, replicating an entire DOE can be expensive and timeconsuming. For example, if you have 3 X’s and want to replicate the design, then you have to increase the number of experimental runs from 8 to 16!</p>
<p>Fortunately, using replicates is just one way to increase power. An alternative way to increase power is to use center points. By adding just a few center points to your design, you can increase the probability of detecting significant X’s, and estimate the variability (or pure error, statistically speaking).</p>
Learn More about DOE
<p><span style="lineheight: 1.6;">DOE is a great tool. It tells you a lot about your inputs and outputs and can help you optimize process settings. But it’s only a great tool if you use it the right way. If you want to learn more about DOE, check out our elearning course <a href="http://www.minitab.com/products/qualitytrainer/">Quality Trainer</a> for $30 US. Or, you can participate in a fullday Factorial Designs course at one of our <a href="http://www.minitab.com/services/training/schedule/">instructorled training sessions</a>.</span></p>
Data Analysis
Design of Experiments
Lean Six Sigma
Quality Improvement
Six Sigma
Statistics
Fri, 15 Jul 2016 12:00:00 +0000
http://blog.minitab.com/blog/michelleparet/doecenterpointswhattheyarewhytheyreuseful
Michelle Paret

Does Major League Baseball Really Need the Second Half of the Season?
http://blog.minitab.com/blog/thestatisticsgame/doesmajorleaguebaseballreallyneedthesecondhalfoftheseason
<p><img alt="MLB Logo" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/fe2c58f624104b6fb687d378929b1f9b/Image/8fe78a1febf88c009d5cf2943615c4a2/mlb_logo.gif" style="width: 250px; height: 129px; float: right; margin: 10px 15px;" />When you perform a statistical analysis, you want to make sure you collect enough data that your results are reliable. But you also want to avoid wasting time and money collecting more data than you need. So it's important to find an appropriate middle ground when determining your sample size.</p>
<p>Now, technically, the Major League Baseball regular season isn't a statistical analysis. But it does kind of work like one, since the goal of the regular season is to "determine who the best teams are." The National Football League uses a 16game regular season to determine who the best teams are. Hockey and Basketball use 82 games. </p>
<p>Baseball uses 162 games.</p>
<p>So is baseball wasting time collecting more data than it needs? Right now the MLB regular season is about halfway over. So could they just end the regular season now? Will playing another 81 games really have a significant effect on the standings? Let's find out.</p>
How much do MLB standings change in the 2nd half of the season?
<p>I went back through five years of records and recorded where each MLB team ranked in their league (American League and National League) on July 8, and then again at the end of the season. We can use this data to look at concordant and discordant pairs. A pair is concordant if the observations are in the same direction. A pair is discordant if the observations are in opposite directions. This will let us compare teams to each other two at a time.</p>
<p>For example, let's compare the Astros and Angels from 2015. On July 8th, the Astros were ranked 2nd in the AL and the Angels were ranked 3rd. At the end of the season, Houston was ranked 5th and the Angles were ranked 6th. This pair is concordant since in both cases the Astros were ranked higher than the Angels. But if you compare the Astros and the Yankees, you'll see the Astros were ranked higher on July 8th, but the Yankees were ranked higher at the end of the season. That pair is discordant.</p>
<p>When we compare every team, we end up with 11,175 pairs. How many of those are concordant? <a href="http://www.minitab.com/enus/products/minitab/" target="_blank">Minitab Statistical Software</a> has the answer.</p>
<p style="marginleft: 40px;"><img alt="Measures of Concordance" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/fe2c58f624104b6fb687d378929b1f9b/Image/29a56ecd2f92d8adf4e17f8dd54c9765/measures_of_concordance.jpg" style="width: 461px; height: 150px;" /></p>
<p>There are 8,307 concordant pairs, which is just over 74% of the data. So most of the time, if a team is higher in the standings as of July 8th, they will finish higher in the final standings too. We can also use Spearman's rho and Pearson's r to asses the association between standings on July 8th and the final standings. These two values give us a coefficient that can range from 1 to +1. The larger the absolute value, the stronger the relationship between the variables. A value of 0 indicates the absence of a relationship. </p>
<p style="marginleft: 40px;"><img alt="Pearsons r and Spearmans rho" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/fe2c58f624104b6fb687d378929b1f9b/Image/24cba34f8af1f68e9a6ef5647d876695/meaures_of_association.jpg" style="width: 180px; height: 52px;" /></p>
<p>Both values are high and positive, once again indicating that teams ranked higher than other teams on July 8th usually stay that way by the end of the season. So did we do it? Did we show that baseball doesn't really need the 2nd half of their season?</p>
<p>Not quite.</p>
<p>Consider that each league has 15 teams. So a lot of our pairs are comparing teams that aren't that close together, like 1st team to the 15th, the 1st team to the 14th, the 2nd team to the 15th, and so on. It's not very surprising that those pairs are going to be concordant. So let's dig a little deeper and compare each individual team's ranking in July compared to the end of the season. The following <a href="http://blog.minitab.com/blog/michelleparet/3thingsahistogramcantellyou">histogram</a> shows the difference in a team's rank. Positive values mean the team moved up in the standings, negative values mean they fell.</p>
<p style="marginleft: 40px;"><img alt="Histogram" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/fe2c58f624104b6fb687d378929b1f9b/Image/1612d472c3a617bfee1098bc29e00f27/histogram_of_difference.jpg" style="width: 576px; height: 384px;" /></p>
<p>The most common outcome is that a team doesn't move up or down in the standings, as 34 of our observations have a difference of 0. However, there are 150 total observations, so most of the time a team does move up or down. In fact, 55 times a team moved up or down in the standings by 3 or more spots. That's over a third of the time! And there are multiple instances of a team moving 6, 7, or even 8 spots! That doesn't seem to imply that the 2nd half of the season doesn't matter. So what if we narrow the scope of our analysis?</p>
Looking at the Playoff Teams
<p>We previously noted that the regular season is supposed to determine the best teams. So let's focus on the top of the MLB standings. I took the top 5 teams in each league (since the top 5 teams make the playoffs) on July 8th, and recorded whether they were still a top 5 team (and in the playoffs) at the end of the season. The following pie chart shows the results.</p>
<p style="marginleft: 40px;"><img alt="Pie Chart" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/fe2c58f624104b6fb687d378929b1f9b/Image/be2891f00cde3cab6eadb050a0abeadb/pie_chart_of_playoffs_end.jpg" style="width: 576px; height: 384px;" /></p>
<p>Twenty eight percent of the time, a team that was in the playoffs in July fell far enough in the standings to drop out. So over a quarter of your playoff teams would be different if the season ended around 82 games. That sounds like a significant effect to me. And last, let's return to our concordant and discordant pairs. Except this time, we'll just look at the top half of the standings (top 8 teams). </p>
<p style="marginleft: 40px;"><img alt="Measures of Concordance" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/fe2c58f624104b6fb687d378929b1f9b/Image/4ed88280224f9eba7726e54b50450f4c/measures_of_concordance_2.jpg" style="width: 468px; height: 194px;" /></p>
<p>This time our percentage of concordant pairs has dropped to 59%, and the values for Spearman's rho and Pearson's r show a weaker association. Teams ranked higher in the 1st half of the season are usually still ranked higher at the end of the season. But there is clearly enough shuffling among the top teams to warrant the 2nd half of the season. So don't worry baseball fans, your regular season will continue to extend to September.</p>
<p>Because, you know, Major League Baseball <em>totally </em>would have shorten the season if this statistical analysis suggested doing so!</p>
<p>And if you're looking to determine the appropriate sample size for your own analysis, Minitab offers a wide variety of <a href="http://support.minitab.com/enus/minitab/17/topiclibrary/basicstatisticsandgraphs/powerandsamplesize/powerandsamplesizeanalysesinminitab/">power and sample size analyses</a> that can help you out.</p>
<p> </p>
Data Analysis
Fun Statistics
Statistics
Statistics in the News
Fri, 08 Jul 2016 12:00:00 +0000
http://blog.minitab.com/blog/thestatisticsgame/doesmajorleaguebaseballreallyneedthesecondhalfoftheseason
Kevin Rudy

Using Marginal Plots, aka "StuffedCrust Charts"
http://blog.minitab.com/blog/dataanalysisandqualityimprovementandstuff/usingmarginalplotsakastuffedcrustcharts
<p><span style="lineheight: 1.6;">In <a href="http://blog.minitab.com/blog/dataanalysisandqualityimprovementandstuff/thematrixitsacomplexplot" target="_blank">my last post</a>, we took the red pill and dove deep into the unarguably fascinating and uncompromisingly compelling world of the matrix plot. I've stuffed this post with information about a topic of marginal interest...the marginal plot.</span></p>
<p>Margins are important. Back in my English composition days, I recall that margins were particularly prized for the inverse linear relationship they maintained with the number of words that one had to string together to complete an assignment. Mathematically, that relationship looks something like this:</p>
<p style="marginleft: 40px;">Bigger margins = fewer words</p>
<p><img alt="stuffed crust" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/62b6ee0d191245cf8e077f414a1e1d2d/stuffed_crust.jpg" style="width: 250px; height: 213px; margin: 10px 15px; float: right;" />In stark contrast to my concept of margins as informationfree zones, the marginal plot actually utilizes the margins of a scatterplot to provide timely and important information about your data. Think of the marginal plot as the stuffedcrust pizza of the graph world. Only, instead of extra cheese, you get to bite into extra data. And instead of filling your stomach with carbs and cholesterol, you're filling your brain with data and knowledge. And instead of arriving late and cold because the delivery driver stopped off to canoodle with his girlfriend on his way to your house (<em>even though he's just not sure if the relationship is <span style="lineheight: 20.8px;">really </span>working out: she seems distant lately and he's not sure if it's the constant cologne of consumables about him, or the everpresent film of pizza <span style="lineheight: 1.6;">grease on his car seats, on his clothes, in his ears?)</span></em></p>
<p><span style="lineheight: 1.6;">...anyway, unlike a cold, late pizza, marginal plots are always fresh and hot, because you bake them yourself, in </span><a href="http://www.minitab.com/enus/products/minitab/" style="lineheight: 1.6;" target="_blank">Minitab Statistical Software</a><span style="lineheight: 1.6;">.</span></p>
<p>I tossed some randomlygenerated data around and came up with this halfbaked example. Like the pepperonis on a hastily prepared pie, the points on this plot are mostly piled in the middle, with only a few slices venturing to the edges. In fact, some of those points might be outliers. </p>
<p style="marginleft: 40px;"><img alt="Scatterplot of C1 vs C2" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/8de770baa50a4f6b91449713c3b99f66/Image/1f1e94af3820cd7138eda393fa0b0859/scatterplot_of_c1_vs_c2.jpg" style="width: 360px; height: 240px;" /></p>
<p><span style="lineheight: 20.8px;">If only there were an easy, interesting, and integrated way to assess the data for outliers when we make a scatterplot. </span></p>
<p><span style="lineheight: 20.8px;">Boxplots are a useful way look for outliers. You could make separate boxplots of each variable, like so:</span></p>
<p style="marginleft: 40px;"><img alt="Boxplot of C1" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/8de770baa50a4f6b91449713c3b99f66/Image/d542dd348a0c357f5e5dc0476bc5ea9f/boxplot_of_c1.jpg" style="width: 360px; height: 240px;" /> <img alt="Boxplot of C2" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/8de770baa50a4f6b91449713c3b99f66/Image/b89d246d7a4e9951d4e0a49be2ad7eaf/boxplot_of_c2.jpg" style="lineheight: 1.6; width: 360px; height: 240px;" /></p>
<p><span style="lineheight: 20.8px;">It's fairly easy to relate the boxplot of C1 to the values plotted on the yaxis of the scatterplot. But it's a little harder to relate the boxplot of C2 to the scatterplot, because the yaxis on the boxplot corresponds to the xaxis on the scatterplot. You can transpose the scales on the boxplot to make the comparison a little easier. Just </span><span style="lineheight: 20.8px;">doubleclick one of the axes and select <strong>Transpose value and category scales</strong>:</span></p>
<p style="marginleft: 40px;"><img alt="Boxplot of C2, Transposed" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/8de770baa50a4f6b91449713c3b99f66/Image/6c4db0ef1ee268a3c6f38400fd9e1f1c/boxplot_of_c2__transposed.jpg" style="width: 360px; height: 240px;" /></p>
<p>That's a little better. The only thing that would be <em>even better</em> is if you could put each boxplot right up against the scatterplot...if you could stuff the crust of the scatterplot with boxplots, so to speak. Well, guess what? You can! Just choose <strong>Graph > Marginal Plot > With Boxplots</strong>, enter the variables and click <strong>OK</strong>: </p>
<p style="marginleft: 40px;"><img alt="Marginal Plot of C1 vs C2" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/8de770baa50a4f6b91449713c3b99f66/Image/69fdd93c28ebcfd93071ad22af62f407/marginal_plot_of_c1_vs_c2.jpg" style="width: 360px; height: 240px;" /></p>
<p>Not only are the boxplots nestled right up next to the scatterplot, but they also share the same axes as the scatterplot. For example, the outlier (asterisk) on the boxplot of C2 corresponds to the point directly below it on the scatterplot. Looks like that point could be an outlier, so you might want to investigate further. </p>
<p><span style="lineheight: 1.6;">Marginal plots can also help alert you to other important complexities in your data. Here's another halfbaked example. Unlike our pizza delivery guy's relationship with his girlfriend, it looks like the relationship between the fake response and the fake predictor represented in this scatterplot really is working out:</span><span style="lineheight: 20.8px;"> </span></p>
<p style="lineheight: 20.8px; marginleft: 40px;"><img alt="Scatterplot of Fake Response vs Fake Predictor" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/8de770baa50a4f6b91449713c3b99f66/Image/a8a2fa08a7a9a917b7130e740c69514d/scatterplot_of_fake_response_vs_fake_predictor.jpg" style="lineheight: 20.8px; width: 360px; height: 240px;" /> </p>
<p style="lineheight: 20.8px;"><span style="lineheight: 1.6;">In fact, i</span><span style="lineheight: 20.8px;">f you use <strong>Stat > Regression > Fitted Line Plot</strong>, the fitted line appears to fit the data nicely. And the regression analysis is highly significant:</span></p>
<p style="lineheight: 20.8px; marginleft: 40px;"><img alt="Fitted Line_ Fake Response versus Fake Predictor" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/8de770baa50a4f6b91449713c3b99f66/Image/aea45ae01a9481d7bf2553e1780521c5/fitted_line__fake_response_versus_fake_predictor.jpg" style="width: 360px; height: 240px;" /></p>
<strong>Regression Analysis: Fake Response versus Fake Predictor </strong>
The regression equation is
Fake Response = 2.151 + 0.7723 Fake Predictor
S = 2.12304 RSq = 50.3% RSq(adj) = 49.7%
Analysis of Variance
Source DF SS MS F P
Regression 1 356.402 356.402 79.07 0.000
Error 78 351.568 4.507
Total 79 707.970
<p><span style="lineheight: 1.6;">But wait. If you create a marginal plot instead, you can augment your exploration of these data with histograms and/or dotplots, as I have done below. Looks like there's trouble in </span>paradise:</p>
<p style="marginleft: 40px;"><img alt="Marginal Plot of Fake Response vs Fake Predictor, with Histograms" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/8de770baa50a4f6b91449713c3b99f66/Image/107dfda7466c6690e33aee6e7f3918b6/marginal_plot_of_fake_response_vs_fake_predictor__with_histograms.jpg" style="width: 360px; height: 240px;" /> <img alt="Marginal Plot of Fake Response vs Fake Predictor, with Dotplots" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/8de770baa50a4f6b91449713c3b99f66/Image/8e3044a69af337cfcc15d4aaacd88f9f/marginal_plot_of_fake_response_vs_fake_predictor__with_dotplots.jpg" style="width: 360px; height: 240px;" /></p>
<p><span style="lineheight: 20.8px;">Like the poorly made pepperoni pizza, the points on our plot are distributed unevenly. There appear to be two clumps of points. The distribution of values for the fake predictor is bimodal: that is, it has two distinct peaks. The distribution of values for the response may also be bimodal.</span></p>
<p>Why is this important? Because the <span style="lineheight: 20.8px;">two </span>clumps of toppings may suggest that you have more than one metaphorical cook in the metaphorical pizza kitchen. For example, it could be that Wendy, who is left handed, started placing the pepperonis carefully on the pie and then got called away, leaving Jimmy, who is right handed, to quickly and carelessly complete the covering of cured meats. In other words, it could be that the <span style="lineheight: 20.8px;">two </span>clumps of points represent <span style="lineheight: 20.8px;">two </span>very different populations. </p>
<p>When I tossed and stretched the data for this example, I took random samples from two different populations. I used 40 random observations from a normal distribution with a mean of 8 and a standard deviation of 1.5, and 40 random observations from a normal distribution with a mean of 13 and a standard deviation of 1.75. The two clumps of data are truly from <span style="lineheight: 20.8px;">two </span>different populations. To illustrate, I separated the <span style="lineheight: 20.8px;">two </span>populations into two different groups in this scatterplot: </p>
<p style="marginleft: 40px;"> <img alt="Scatterplot with Groups" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/8de770baa50a4f6b91449713c3b99f66/Image/924cfc845dd807e6e2fd57cbbcc0abcb/scatterplot_of_fake_response_vs_fake_predictor_with_groups.jpg" style="width: 360px; height: 240px;" /></p>
<p>This is a classic conundrum that can occur when you do a <span><a href="http://blog.minitab.com/blog/adventuresinstatistics/regressionanalysishowdoiinterpretrsquaredandassessthegoodnessoffit">regression analysis</a></span>. The regression line tries to pass through the center of the data. And because there are two clumps of data, the line tries to pass through the center of each clump. This <em>looks </em>like a relationship between the response and the predictor, but it's just an illusion. If you separate the clumps and analyze each population separately, you discover that there is no relationship at all: </p>
<p style="marginleft: 40px;"><img alt="Fitted Line_ Fake Response 1 versus Fake Predictor 1" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/8de770baa50a4f6b91449713c3b99f66/Image/e5981c9b2604bf841a525926d8282d8f/fitted_line__fake_response_1_versus_fake_predictor_1.jpg" style="width: 360px; height: 240px;" /></p>
<strong>Regression Analysis: Fake Response 1 versus Fake Predictor 1 </strong>
The regression equation is
Fake Response 1 = 9.067  0.1600 Fake Predictor 1
S = 1.64688 RSq = 1.5% RSq(adj) = 0.0%
Analysis of Variance
Source DF SS MS F P
Regression 1 1.609 1.60881 0.59 0.446
Error 38 103.064 2.71221
Total 39 104.673
<p style="marginleft: 40px;"><img alt="Fitted Line_ Fake Response 2 versus Fake Predictor 2" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/8de770baa50a4f6b91449713c3b99f66/Image/e7db8af8c22bd83b72ad559ca5aece86/fitted_line__fake_response_2_versus_fake_predictor_2.jpg" style="width: 360px; height: 240px;" /></p>
<strong>Regression Analysis: Fake Response 2 versus Fake Predictor 2</strong>
The regression equation is
Fake Response 2 = 12.09 + 0.0532 Fake Predictor 2
S = 1.62074 RSq = 0.3% RSq(adj) = 0.0%
Analysis of Variance
Source DF SS MS F P
Regression 1 0.291 0.29111 0.11 0.741
Error 38 99.818 2.62679
Total 39 100.109
<p>If only our unfortunate pizza delivery technician could somehow use a marginal plot to help him assess the state of his own relationship. But alas, I don't think a marginal plot is going to help with that particular analysis. Where is that guy anyway? I'm getting hungry. </p>
Fun Statistics
Project Tools
Regression Analysis
Statistics
Wed, 06 Jul 2016 12:27:00 +0000
http://blog.minitab.com/blog/dataanalysisandqualityimprovementandstuff/usingmarginalplotsakastuffedcrustcharts
Greg Fox

Applying DOE for Great Grilling, part 2
http://blog.minitab.com/blog/understandingstatistics/applyingdoeforgreatgrillingpart2
<p><img alt="grill" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/111e4a65160cf20662dfb13013408f1f/grill.jpg" style="margin: 10px 15px; width: 202px; height: 202px; lineheight: 18.9px; float: right;" /></p>
<p style="lineheight: 18.9px;"><span style="lineheight: 18.9px;">Design of Experiments is an extremely powerful statistical method, and we added a DOE tool to the Assistant in Minitab 17 to make it more accessible to more people.</span></p>
<p style="lineheight: 18.9px;"><span style="lineheight: 18.9px;">Since it's summer grilling season, I'm applying the Assistant's DOE tool to outdoor cooking.</span><span style="lineheight: 18.9px;"> </span>Earlier, I showed you <a href="http://blog.minitab.com/blog/understandingstatistics/applyingdoeforgreatgrillingpart1">how to set up a designed experiment</a> that will let you optimize how you grill steaks. </p>
<p>If you're not already using it and you want to play along, you can download the <a href="http://it.minitab.com/products/minitab/freetrial.aspx">free 30day trial version</a> of Minitab Statistical Software.</p>
<p style="lineheight: 18.9px;">Perhaps you are following along, and you've already grilled your steaks according to the experimental plan and recorded the results of your experimental runs. Otherwise, feel free to download my data <a href="//cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/File/a0d8f12f27ee5a981619c2c3af59d524/steaks___asst_doe.MTW">here</a> for the next step: analyzing the results of our experiment. </p>
Analyzing the Results of the Steak Grilling Experiment
<p style="lineheight: 18.9px;">After collecting your data and entering it into Minitab, you should have an experimental worksheet that looks like this: </p>
<p style="lineheight: 18.9px;"><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/ed29e3c1fb41872df6529e91786215f2/grill_doe_worksheet.png" style="width: 500px; height: 320px;" /></p>
<p style="lineheight: 18.9px;">With your results entered in the worksheet, select <strong>Assistant > DOE > Analyze and Interpret</strong>. As you can see below, the only button you can click is "Fit Linear Model." </p>
<p style="lineheight: 18.9px;"><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/1ce7cb7744e6fb78c4f5cb74d1903cf6/grill_doe_analyze.png" style="width: 500px; height: 375px;" /></p>
<p style="lineheight: 18.9px;">As you might gather from the flowchart, when it analyzes your data, the Assistant first checks to see if the response exhibits curvature. If it does, the Assistant will prompt you to gather more data so you it can fit a quadratic model. Otherwise, the Assistant will fit the linear model and provide the following output. </p>
<p style="lineheight: 18.9px;">When you click the "Fit Linear Model" button, the Assistant automatically identifies your response variable.</p>
<p style="lineheight: 18.9px;"><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/a851201ceabf727ba38e53ef383d6091/grill_doe_analyze2.png" style="width: 435px; height: 260px;" /></p>
<p style="lineheight: 18.9px;">All you need to do is confirm your response goal—maximizing flavor, in this case—and press OK. The Assistant performs the analysis, and provides you the results in a series of easytointerpret reports. </p>
Understanding the DOE Results
<p style="lineheight: 18.9px;">First, the Assistant offers a summary report that gives you the bottomline results of the analysis. The Pareto Chart of Effects in the top left shows that Turns, Grill type, and Seasoning are all statistically significant, and there's a significant interaction between Turns and Grill type, too. </p>
<p style="lineheight: 18.9px;"><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/9ac1a8e009efec8b90fdbb32cfebd1df/grill_doe_results_summary.png" style="width: 751px; height: 563px;" /></p>
<p style="lineheight: 18.9px;">The summary report also shows that the model explains very high proportion of the variation in flavor, with an R2 value of 95.75 percent. And the "Comments" window in the lower right corner puts things if plain language: "You can conclude that there is a relationship between Flavor and the factors in the model..."</p>
<p style="lineheight: 18.9px;">The Assistant's Effects report, shown below, tells you more about the nature of the relationship between the factors in the model and Flavor, with both Interaction Plots and Main Effects plots that illustrate how different experimental settings affect the Flavor response. </p>
<p style="lineheight: 18.9px;"><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/4a9d3a9939ad51a9326bed0fbd061048/grill_doe_results_effects.png" style="width: 751px; height: 563px;" /></p>
<p style="lineheight: 18.9px;">And if we're looking to make some changes as a result of our experimental results—like selecting an optimal method for grilling steaks in the future—the Prediction and Optimization report gives us the optimal solution (1 turn on a charcoal grill, with Montreal seasoning) and its predicted Flavor response (8.425). </p>
<p style="lineheight: 18.9px;"><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/f9189c39c79160de4b9c5dbf8f4523ab/grill_doe_results_optimization.png" style="width: 751px; height: 563px;" /></p>
<p style="lineheight: 18.9px;"><span style="lineheight: 1.6;">It also gives us the Top 5 alternative solutions, shown in the bottom right corner, so if there's some reason we can't implement the optimal solution—for instance, if we only have a gas grill—we can still choose the best solution that suits our circumstances. </span></p>
<p style="lineheight: 18.9px;">I hope this example illustrates how easy a designed experiment can be when you use the Assistant to create and analyze it, and that designed experiments can be very useful not just in industry or the lab, but also in your everyday life. </p>
<p style="lineheight: 18.9px;">Where could you benefit from analyzing process data to optimize your results? </p>
Design of Experiments
Fun Statistics
Statistics Help
Tue, 05 Jul 2016 12:00:00 +0000
http://blog.minitab.com/blog/understandingstatistics/applyingdoeforgreatgrillingpart2
Eston Martz

Applying DOE for Great Grilling, part 1
http://blog.minitab.com/blog/understandingstatistics/applyingdoeforgreatgrillingpart1
<p>Design of Experiments (DOE) has a reputation for difficulty, and to an extent, this statistical method <em>deserves </em>that reputation. While it's easy to grasp the basic idea—<em>acquire the maximum amount of information from the fewest number of experimental runs</em>—practical application of this tool can quickly become very confusing. </p>
<p><img alt="steaks" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/33d85058b493aff4240dfb9d78aff673/steaks.jpg" style="margin: 10px 15px; width: 250px; height: 250px; float: right;" />Even if you're a longtime user of designed experiments, it's still easy to feel uncertain if it's been a while since you last looked at splitplot designs or needed to choose the appropriate resolution for a fractional factorial design.</p>
<p>But DOE <em>is</em> an extremely powerful and useful tool, so when we launched Minitab 17, we added a DOE tool to the Assistant to make designed experiments more accessible to more people.</p>
<p>Since summer is here at Minitab's world headquarters, I'm going to illustrate how you can use the Assistant's DOE tool to optimize your grilling method. </p>
<p>If you're not already using it and you want to play along, you can download the free 30day <a href="http://it.minitab.com/products/minitab/freetrial.aspx">trial version of Minitab Statistical Software</a>.</p>
Two Types of Designed Experiments: Screening and Optimizing
<p>To create a designed experiment using the Assistant, open Minitab and select <strong>Assistant > DOE > Plan and Create</strong>. You'll be presented with a decision tree that helps you take a sequential approach to the experimentation process by offering a choice between a screening design and a modeling design.</p>
<p><img alt="DOE Assistant" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/5b585531f6031882fb7880a49700f52c/grill_doe_1.png" style="width: 487px; height: 366px;" /></p>
<p>A <strong>screening design</strong> is important if <span><a href="http://blog.minitab.com/blog/understandingstatistics/whyistheofficecoffeesobadascreeningexperimentnarrowsdownthecriticalfactors">you have a lot of potential factors to consider</a></span> and you want to figure out which ones are important. The Assistant guides you through the process of testing and analyzing the main effects of 6 to 15 factors, and identifies the factors that have greatest influence on the response.</p>
<p>Once you've identified the critical factors, you can use the <strong>modeling design.</strong> Select this option, and the Assistant guides you through testing and analyzing 2 to 5 critical factors and helps you find optimal settings for your process.</p>
<p>Even if you're an old hand at analyzing designed experiments, you may want to use the Assistant to create designs since the Assistant lets you print out easytouse data collection forms for each experimental run. After you've collected and entered your data, the designs created in the Assistant can also be analyzed using <span style="lineheight: 18.9px;">Minitab's </span><span style="lineheight: 1.6;">core DOE tools available through the <strong>Stat > DOE</strong> menu.</span></p>
<span style="lineheight: 1.6;">Creating a DOE to Optimize How We Grill Steaks</span>
<p>For grilling steaks, there aren't that many variables to consider, so we'll use the Assistant to pl<span style="lineheight: 1.6;">an and create a <strong>modeling design</strong> that will optimize our grilling process. Select <strong>Assistant > DOE > Plan and Create</strong>, then click the "Create Modeling Design" button. </span></p>
<p><span style="lineheight: 1.6;">Minitab brings up an easytofollow dialog box; all we need to do is fill it in. </span></p>
<p><span style="lineheight: 1.6;"><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/eb90fd8499ab96a579aa6dd63fa325d2/grill_doe_dialog_1.png" style="width: 461px; height: 500px;" /></span></p>
<p>First we enter the name of our Response and the goal of the experiment. Our response is "Flavor," and the goal is "Maximize the response." Next, we enter our factors. We'll look at three critical variables:</p>
<ul>
<li>Number of turns, a continuous variable with a low value of 1 and high value of 3.</li>
<li>Type of grill, a categorical variable with Gas or Charcoal as options. </li>
<li>Type of seasoning, a categorical variable with SaltPepper or Montreal steak seasoning as options. </li>
</ul>
<p>If we wanted to, we could select more than 1 replicate of the experiment. A replicate is simply a complete set of experimental runs, so if we did 3 replicates, we would repeat the full experiment three times. But since this experiment has 16 runs, and neither our budget nor our stomachs are limitless, we'll stick with a single replicate. </p>
<p>When we click OK, the Assistant first asks if we want to print out data collection forms for this experiment: </p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/c4c63c4b5af7a4c6e4f3c4caa327f523/grill_doe_collection_form1.png" style="width: 445px; height: 207px;" /></p>
<p>Choose Yes, and you can print a form that lists each run, the variables and settings, and a space to fill in the response:</p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/06ed8ad486f3a243c4aea352c9738b2c/grill_doe_collection_form2.png" style="borderwidth: 1px; borderstyle: solid; width: 500px; height: 313px;" /></p>
<p>Alternatively, you can just record the results of each run in the worksheet the Assistant creates, which you'll need to do anyway. But having the printed data collection forms can make it much easier to keep track of where you are in the experiment, and exactly what your factor settings should be for each run. </p>
<p>If you've used the Assistant in Minitab for other methods, you know that it seeks to demystify your analysis and make it easy to understand. When you create your experiment, the Assistant gives you a Report Card and Summary Report that explain the steps of the DOE and important considerations, and a summary of your goals and what your analysis will show. </p>
<p><img alt="" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/a767257b9db465b81d6bd1456e5eb508/grill_doe_2_w1024.png" style="width: 650px; height: 439px;" /></p>
<p>Now it's time to cook some steaks, and rate the flavor of each. If you want to do this for real and collect your own data, please do so! <a href="http://blog.minitab.com/blog/understandingstatistics/applyingdoeforgreatgrillingpart2">Tomorrow's post</a> will show how to analyze your data with the Assistant. </p>
Mon, 04 Jul 2016 12:00:00 +0000
http://blog.minitab.com/blog/understandingstatistics/applyingdoeforgreatgrillingpart1
Eston Martz

Those 10 Simple Rules for Using Statistics? They're Not Just for Research
http://blog.minitab.com/blog/understandingstatistics/those10simplerulesforusingstatisticstheyrenotjustforresearch
<p><span style="lineheight: 1.6;">Earlier this month, PLOS.org published an article titled "<a href="http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004961" target="_blank">Ten Simple Rules for Effective Statistical Practice</a>." </span><span style="lineheight: 20.8px;">The 10 rules are good reading for </span><em style="lineheight: 20.8px;">anyone </em><span style="lineheight: 20.8px;">who draws conclusions and makes decisions based on data</span><span style="lineheight: 20.8px;">, whether you're trying to extend the boundaries of scientific knowledge or make good decisions for your business. </span></p>
<p><span style="lineheight: 20.8px;">Carnegie Mellon University's Robert E. Kass and several coauthors</span><span style="lineheight: 20.8px;"> </span><span style="lineheight: 1.6;">devised the rules in response to the increased pressure on scientists and researchers—many, if not most, of whom are <em>not</em> statisticians—to present accurate findings based on sound statistical methods. </span></p>
<p><span style="lineheight: 20.8px;">Since </span><span style="lineheight: 1.6;">the paper and the discussions it has prompted focus on scientists and researchers, it seems worthwhile to consider how the rules might apply to </span><span style="lineheight: 20.8px;">quality practitioners or business decisionmakers as well</span><span style="lineheight: 1.6;">. </span><span style="lineheight: 1.6;">In this post, I'll share the 10 rules, some with a few modifications to make them more applicable to the wider population of all people who use data to inform their decisions. </span></p>
<img alt="questions" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/d2c0571aacbd48c784f4222276c293fe/Image/36fa08b0c862c669f4e41596fbb76ddd/question_mark_signs.jpg" style="width: 200px; height: 200px; float: right; margin: 10px 15px; borderwidth: 1px; borderstyle: solid;" />1. Statistical Methods Should Enable Data to Answer <span style="color:#FF0000;">Scientific Specific</span> Questions
<p>As the article points out, new or infrequent users of statistics tend to emphasize finding the "right" method to use—often focusing on the structure or format of their data, rather than thinking about how the data might answer an important question. But choosing a method based on the data is putting the cart before the horse. Instead, we should start by clearly identifying the question we're trying to answer. Then we can look for a method that uses the data to answer it. If you haven't already collected your data, so much the better—you have the opportunity to identify and obtain the data you'll need.</p>
2. Signals Always Come With Noise
<p>If you're familiar with <a href="http://blog.minitab.com/blog/understandingstatistics/controlcharttutorialsandexamples">control charts</a> used in statistical process control (SPC) or the Control phase of a Six Sigma DMAIC project, you know that they let you distinguish process variation that matters (specialcause variation) from normal process variation that doesn't need investigation or correction.</p>
<p style="marginleft: 40px;"><img alt="control chart" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/d2c0571aacbd48c784f4222276c293fe/Image/632a05ec67ddca317eb4bc1f4daabe9a/i_chart_of_ph.gif" style="lineheight: 20.8px; width: 573px; height: 172px;" /><br />
<em style="lineheight: 20.8px;">Control charts are one common tool used to distinguish "noise" from "signal." </em></p>
<p>The same concept applies here: whenever we gather and analyze data, some of what we see in the results will be due to inherent variability. Measures of probability for analyses, such as confidence intervals, are important because they help us understand and account for this "noise." </p>
3. Plan Ahead, Really Ahead
<p>Say you're starting a DMAIC project. Carefully considering and developing good questions right at the start of a project—the DEFINE stage—will help you make sure that you're getting the right data in the MEASURE stage. That, in turn, should result in a much smoother and stressfree ANALYZE phase—and probably more successful IMPROVE and CONTROL phases, too. The alternative? You'll have to complete the ANALYZE phase with the data you have, not the data you wish you had. </p>
4. Worry About Data Quality
<p><img alt="gauge" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/b82fc879fa26a76f2b00424550aafe9e/gage.jpg" style="width: 250px; height: 173px; float: right; margin: 10px 15px;" />"Can you trust your data?" My Six Sigma instructor asked us that question so many times, it still flashes through my mind every time I open Minitab. That's good, because he was absolutely right: if you can't trust your data, you shouldn't do anything with it. Many people take it for granted that the data they get is precise and accurate, especially when using automated measuring instruments and similar technology. But how do you <em>know </em>they're measuring precisely and accurately? How do you <em>know </em>your instruments are calibrated properly? If you didn't test it, <em>you don't know</em>. And if you don't know, you can't trust your data. Fortunately, with measurement system analysis methods like <span><a href="http://blog.minitab.com/blog/meredithgriffith/fundamentalsofgagerr">gage R&R</a></span> and <a href="http://blog.minitab.com/blog/understandingstatistics/gotgoodjudgmentproveitwithattributeagreementanalysis">attribute agreement analysis</a>, we never have to trust <span style="lineheight: 20.8px;">data</span><span style="lineheight: 20.8px;"> </span><span style="lineheight: 1.6;">quality to blind faith. </span></p>
5. Statistical Analysis Is More Than a Set of Computations
<p>Statistical techniques are often referred to as "tools," and that's a very apt metaphor. A saw, a plane, and a router all cut wood, but they aren't interchangeable—the end product defines which tool is appropriate for a job. Similarly, you might apply ANOVA, regression, or time series analysis to the same data set, but the right tool depends on what you want to understand. To extend the metaphor further, just as we have circular saws, jigsaws, and miter saws for very specific tasks, each family of statistical methods also includes specialized tools designed to handle particular situations. The point is that we select a tool to <em>assist </em>our analysis, not to <em>define </em>it. </p>
6. Keep it Simple
<p>Many processes are inherently messy. If you've got dozens of input variables and multiple outcomes, analyzing them could require many steps, transformations, and some thorny calculations. Sometimes that degree of complexity is required. But a more complicated analysis isn't always better—in fact, overcomplicating it may make your results less clear and less reliable. It also potenitally makes the analysis more difficult than necessary. <span style="lineheight: 20.8px;">You may not </span><em style="lineheight: 20.8px;">need </em><span style="lineheight: 20.8px;">a complex process model that includes 15 factors if you can improve your output by optimizing the three or four most important inputs. </span><span style="lineheight: 1.6;">If you need to improve a process that includes many inputs, </span><a href="http://blog.minitab.com/blog/statisticsandqualityimprovement/createadoescreeningexperimentwiththeassistantinminitab17" style="lineheight: 1.6;">a short screening experiment</a><span style="lineheight: 1.6;"> can help you identify which factors are most critical, and which are not so important. </span></p>
7. Provide Assessments of Variability
<p>No model is perfect. No analysis accounts for all of the observed variation. Every analysis includes a degree of uncertainty. Thus, no statistical finding is 100% certain, and that degree of uncertainty needs to be considered when using statistical results to make decisions. If you're the decisionmaker, be sure that you understand the risks of reaching a wrong conclusion based on the analysis at hand. If you're sharing your results with stakeholders and executives, especially if they aren't statistically inclined, make sure you've communicated that degree of risk to them by offering and explaining confidence intervals, margins of error, or other appropriate measures of uncertainty. </p>
8. Check Your Assumptions
<p>Different statistical methods are based on different assumptions about the data being analyzed. For instance, many common analyses assume that your data follow a normal distribution. You can check most of these assumptions very quickly using functions like a normality test in your statistical software, but it's easy to forget (or ignore) these steps and dive right into your analysis. However, failing to verify those assumptions can yield results that aren't reliable and shouldn't be used to inform decisions, so don't skip that step. <a href="http://www.minitab.com/products/minitab/assistant/">If you're not sure about the assumptions for a statistical analysis, Minitab's Assistant menu explains them</a>, and can even flag violations of the assumptions before you draw the wrong conclusion from an errant analysis. </p>
9. <span style="color:#FF0000;">When Possible, Replicate Verify Success!</span>
<p><span style="lineheight: 1.6;">In science, replication of a study—ideally by another, independent scientist—is crucial. It indicates that the first researcher's findings weren't a fluke, and provides more evidence in support of the given hypothesis. Similarly, when a quality project results in great improvements, we can't take it for granted those benefits are going to be sustained—they need to be verified and confirmed over time. Control charts are probably the most common tool for making sure a project's benefits endure, but depending on the process and the nature of the improvements, hypothesis tests, capability analysis, and other methods also can come into play. </span></p>
10. <span style="color:#FF0000;">Make Your Analysis Reproducible Share How You Did It</span>
<p>In the original 10 Simple Rules article, the authors suggest scientists share their data and explain how they analyzed it so that others can make sure they get the same results. This idea doesn't translate so neatly to the business world, where your data may be proprietary or private for other reasons. But just as science benefits from transparency, the quality profession benefits when we share as much information as we can about our successes. <span style="lineheight: 20.8px;">Of course you can't share your company's secretsauce formulas with competitors</span><span style="lineheight: 20.8px;">—but i</span><span style="lineheight: 1.6;">f you solved a quality challenge in your organization, chances are your experience could help someone facing a similar problem. If a peer in another organization already solved a problem like the one you're struggling with now, wouldn't you like to see if a similar approach might work for you? Organizations like <a href="http://asq.org/index.aspx" target="_blank">ASQ</a> and forums like <a href="https://www.isixsigma.com/" target="_blank">iSixSigma.com</a> help quality practitioners network and share their successes so we can all get better at what we do. And here at Minitab, we love sharing <a href="http://www.minitab.com/company/casestudies/">case studies and examples of how people have solved problems using data analysis</a>, too. </span></p>
<p>How do you think these rules apply to the world of quality and business decisionmaking? What are <em>your </em>guidelines when it comes to analyzing data? </p>
<p> </p>
Data Analysis
Lean Six Sigma
Quality Improvement
Six Sigma
Statistics
Statistics Help
Statistics in the News
Stats
Wed, 29 Jun 2016 12:00:00 +0000
http://blog.minitab.com/blog/understandingstatistics/those10simplerulesforusingstatisticstheyrenotjustforresearch
Eston Martz

Using the Nelson Rules for Control Charts in Minitab
http://blog.minitab.com/blog/statisticsinthefield/usingthenelsonrulesforcontrolchartsinminitab
<p><em style="lineheight: 1.6;">by Matthew Barsalou, guest blogger</em></p>
<p>Control charts plot your process data to identify and distinguish between common cause and special cause variation. This is important, because identifying the different causes of variation lets you take action to make improvements in your process without <em>over</em>controlling it.</p>
<p>When you create a control chart, the software you're using should make it easy to see where you may have variation that requires your attention. For example, Minitab Statistical Software automatically flags any control chart data point that is more than three standard deviations above the centerline, as shown in the I chart below.</p>
<div style="marginleft:40px; width:577; fontsize:11px;"><img alt="I Chart of Data  Nelson Rules" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/be7c11b66ec1be76d0ae6cf7ab43f3e4/image002.png" style="lineheight: 20.8px; borderwidth: 0px; borderstyle: solid; width: 576px; height: 384px;" /><br />
I chart example with one outofcontrol point.</div>
<p>A data point that more than three standard deviations from the centerline is one indicator for detecting specialcause variation in a process. There are additional control chart rules introduced by Dr. Lloyd S. Nelson in his April 1984 <em>Journal of Quality Technology </em><a href="http://asq.org/data/subscriptions/jqt_open/1984/oct/jqtv16i4technical.pdf" target="_blank">column</a>. The eight Nelson Rules are shown below, and if you're interested in using them, they can be activated in Minitab.</p>
<div style="marginleft: 40px; width: 550px; fontsize: 11px;"><img alt="Nelson Rules for special cause variation in control charts" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/e7bd87833a33a442c6d171932e6e7480/image003.png" style="width: 545px; height: 691px; borderwidth: 1px; borderstyle: solid;" /><br />
The Nelson rules for tests of special causes. Reprinted with permission from <em>Journal of Quality Technology</em> ©<strong><em>1984</em> ASQ</strong>, asq.org.</div>
<p>To activate the Nelson rules, go to <strong>Control Charts > Variables Charts for Individuals > Individuals... </strong>and then click on "I Chart Options." Go to the <strong>Tests </strong>tab and place a check mark next to the test you would like to select—or simply use the dropdown menu and select “Perform all tests for special causes,” as shown below.</p>
<p style="marginleft: 40px;"><img alt="Individual Charts Options in Minitab" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/31bbbbe82ef49054f70eb2f474da9a34/image005.png" style="borderwidth: 0px; borderstyle: solid; width: 456px; height: 453px;" /></p>
<p>The resulting session window explains which tests failed.</p>
<p style="marginleft: 40px;"><img alt="session window output" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/f0e44dccacaf2851ea8c590d0291cd6e/image006.png" style="borderwidth: 0px; borderstyle: solid; width: 626px; height: 351px;" /></p>
<p>On the chart itself, the data points that failed each test are identified in red as shown below.</p>
<p style="marginleft: 40px;"><img alt="I chart of data" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/4832d48fb67f7085cdc38af84e6c2658/image009.png" style="lineheight: 1.6; borderwidth: 0px; borderstyle: solid; width: 576px; height: 384px;" /></p>
<p>Simply activating all of the rules is not recommended—the false positive rate goes up as each additional rule is activated. At some point the control chart will become more sensitive than it needs to be and corrective actions for <a href="http://blog.minitab.com/blog/understandingstatistics/controlchartsshowyouvariationthatmatters">special causes of variation</a> may be implemented when only common cause is variation present.</p>
<p>Fortunately, Nelson provided detailed guidance on the correct application of his namesake rules. Nelson’s guidance on applying his rules for tests of special causes is presented below.</p>
<div style="marginleft:40px; width:685px; fontsize:11px;">
<p><img alt="comments on test for special causes" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/f6a2616d41592b9c671ac60cb12c98c1/image010.png" style="lineheight: 1.6; borderwidth: 1px; borderstyle: solid; width: 630px; height: 683px;" /><br />
Comments on tests for special causes. Reprinted with permission from <em>Journal of Quality Technology</em> ©<strong><em>1984</em> ASQ</strong>, asq.org.</p>
</div>
<p>Nelson’s tenth comment is an especially important one, regardless of which tests have been activated. </p>
<p>Minitab, together with the Nelson rules, can be very helpful, but neither can replace or remove the need for the analyst's judgment when assessing a control chart. These rules can, however, assist the analyst in making the proper decision. </p>
<p> </p>
<p><strong>About the Guest Blogger</strong></p>
<p><em><a href="https://www.linkedin.com/pub/matthewbarsalou/5b/539/198" target="_blank">Matthew Barsalou</a> is a statistical problem resolution Master Black Belt at <a href="http://www.3kwarner.de/" target="_blank">BorgWarner</a> Turbo Systems Engineering GmbH. He is a Smarter Solutions certified Lean Six Sigma Master Black Belt, ASQcertified Six Sigma Black Belt, quality engineer, and quality technician, and a TÜVcertified quality manager, quality management representative, and auditor. He has a bachelor of science in industrial sciences, a master of liberal studies with emphasis in international business, and has a master of science in business administration and engineering from the Wilhelm Büchner Hochschule in Darmstadt, Germany. He is author of the books <a href="http://www.amazon.com/RootCauseAnalysisStepStep/dp/148225879X/ref=sr_1_1?ie=UTF8&qid=1416937278&sr=81&keywords=Root+Cause+Analysis%3A+A+StepByStep+Guide+to+Using+the+Right+Tool+at+the+Right+Time" target="_blank">Root Cause Analysis: A StepByStep Guide to Using the Right Tool at the Right Time</a>, <a href="http://asq.org/qualitypress/displayitem/index.html?item=H1472" target="_blank">Statistics for Six Sigma Black Belts</a> and <a href="http://asq.org/qualitypress/displayitem/index.html?item=H1473&xvl=76115763" target="_blank">The ASQ Pocket Guide to Statistics for Six Sigma Black Belts</a>.</em></p>
Data Analysis
Lean Six Sigma
Quality Improvement
Six Sigma
Statistics
Mon, 27 Jun 2016 13:57:00 +0000
http://blog.minitab.com/blog/statisticsinthefield/usingthenelsonrulesforcontrolchartsinminitab
Guest Blogger

How to Identify Outliers (and Get Rid of Them)
http://blog.minitab.com/blog/michelleparet/howtoidentifyoutliersandgetridofthem
<p><img alt="an outlier among falcon tubes" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/389155acc918fbd094941685c31a33b8/falcontubes.jpg" style="width: 250px; height: 188px; margin: 10px 15px; borderwidth: 1px; borderstyle: solid; float: right;" />An outlier is an observation in a data set that lies a substantial distance from other observations. These unusual observations can have a disproportionate effect on statistical analysis, <a href="http://blog.minitab.com/blog/michelleparet/usingthemeanitsnotalwaysaslamdunk">such as the mean</a>, which can lead to misleading results. Outliers can provide useful information about your data or process, so it's important to investigate them. Of course, you have to find them first. </p>
<p>Finding outliers in a data set is easy <span style="lineheight: 20.8px;">using </span><a href="http://www.minitab.com/products/minitab/" style="lineheight: 20.8px;">Minitab Statistical Software</a><span style="lineheight: 1.6;">, and there are a few ways to go about it. </span></p>
Finding Outliers in a Graph
<p><span style="lineheight: 1.6;">If you want to identify them graphically and visualize where your outliers are located compared to rest of your data, you can use </span><strong style="lineheight: 1.6;">Graph > Boxplot</strong><span style="lineheight: 1.6;">.</span></p>
<p style="marginleft: 40px;"><img alt="Boxplot" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/6060c2dbf5d9449babe268eade74814a/Image/652993cfab104ddfe0076fd52ab0f5fd/boxplot_of_strength.jpg" style="width: 600px; height: 394px;" /></p>
<p>This boxplot shows a few outliers, each marked with an asterisk. Boxplots are certainly one of the most common ways to visually identify outliers, but there are <a href="http://blog.minitab.com/blog/funwithstatistics/visualizingthegreatestolympicoutlierofalltime">other graphs, such as scatterplots and individual value plots</a>, to consider as well.</p>
Finding Outliers in a Worksheet
<p>To highlight outliers directly in the worksheet, you can rightclick on your column of data and choose <strong>Conditional Formatting > Statistical > Outlier</strong>. Each outlier in your worksheet will then be highlighted in red, or whatever color you choose.</p>
<p style="marginleft: 40px;"><img alt="Conditional Formatting Menu in Minitab" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/6060c2dbf5d9449babe268eade74814a/Image/a898ff565350c40dcbad28bf6d878f82/conditionalformattingmenu.jpg" style="width: 549px; height: 145px;" /></p>
Removing Outliers
<p>If you then want to create a new data set that excludes these outliers, that’s easy to do too. Now I’m not suggesting that removing outliers should be done without thoughtful consideration. After all, they may have a story – perhaps a very important story – to tell. However, for those situations where removing outliers is worthwhile, you can first highlight outliers per the Conditional Formatting steps above, then rightclick on the column again and use <strong>Subset Worksheet > Exclude Rows with Formatted Cells</strong> to create the new data set.</p>
The Math
<p>If you want to know the mathematics used to identify outliers, let's begin by talking about quartiles, which divide a data set into quarters:</p>
<ul>
<li><em>Q</em>1 (the 1st quartile): 25% of the data are <em>less than</em> or equal to this value</li>
<li><em>Q</em>3 (the 3rd quartile): 25% of the data are <em>greater than</em> or equal to this value</li>
<li>IQR (the interquartile range): the distance between <em>Q</em>3 – <em>Q</em>1, it contains the middle 50% of the data</li>
</ul>
<p>Outliers are then defined as any values that fall outside of:</p>
<p style="marginleft: 40px;"><em>Q</em>1 – (1.5 * IQR)</p>
<p style="marginleft: 40px;">or</p>
<p style="marginleft: 40px;"><em>Q</em>3 + (1.5 * IQR)</p>
<p><span style="lineheight: 1.6;">Of course, rather than doing this by hand, you can leave the heavylifting up to Minitab and instead focus on what your data are telling you.</span></p>
<p>Don't see these features in your version of Minitab? Choose <strong>Help > Check for Updates </strong>to see if you're using Minitab 17.3.</p>
Data Analysis
Learning
Statistics
Statistics Help
Stats
Wed, 22 Jun 2016 15:00:00 +0000
http://blog.minitab.com/blog/michelleparet/howtoidentifyoutliersandgetridofthem
Michelle Paret

Using Multivariate Statistical Tools to Analyze Customer and Survey Data
http://blog.minitab.com/blog/applyingstatisticsinqualityprojects/usingmultivariatestatisticaltoolstoanalyzecustomerandsurveydata
<p>Businesses are getting more and more data from existing and potential customers: whenever we click on a web site, for example, it can be recorded in the vendor's database. And whenever we use electronic ID cards to access public transportation or other services, our movements across the city may be analyzed.</p>
<p>In the very near future, connected objects such as cars and electrical appliances will continuously generate data that will provide useful insights regarding user preferences, personal habits, and more. Companies will learn a lot from users and the way their products are being used. This learning process will help them focus on particular niches and improve their products according to customer expectations and profiles.</p>
<p>For example, insurance companies will monitor how motorists are driving connected cars, to adjust insurance premiums according to perceived risks, or to analyze driving behaviors so they can advise motorists how to boost fuel efficiency. No formal survey will be needed, because customers will be continuously surveyed.</p>
<p>Let's look at some statistical tools we can use to create and analyze user profiles, map expectations, study which expectations are related, and so on. I will focus on multivariate tools, which are very efficient methods for analyzing surveys and taking into account a large number of variables. My objective is to provide a very high level, general overview of the statistical tools that may be used to analyze such survey data.</p>
A Simple Example of Multivariate Analysis
<p>Let us start with a very simple example. The table below presents data some customers have shared about their enjoyment of specific types of food :</p>
<p style="marginleft: 40px;"><img height="134" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/31b80fb2db664edfa75374d4c9804ab8/File/386a610e5bb77c8aa7a5adf8ba5adf03/386a610e5bb77c8aa7a5adf8ba5adf03.png" width="532" /></p>
<p>A simple look at the table does not really help us easily understand preferences. So we can use Simple Correspondence Analysis, a statistical multivariate tool, has been used to visually display expectations.</p>
<p>In Minitab, go to <strong>Stat > Multivariate > Simple Correspondence Analysis...</strong> and enter your data as shown in the dialogue box below. (Also click on "Graphs" and check the box labeled "Symmetric plot showing rows and columns.")</p>
<p style="marginleft: 40px;"><img height="347" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/31b80fb2db664edfa75374d4c9804ab8/File/3d7fdd8b981b91398acfbffc1d02f1e4/3d7fdd8b981b91398acfbffc1d02f1e4.png" width="451" /></p>
<p>Minitab creates the following plot: </p>
<p style="marginleft: 40px;"><img height="384" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/31b80fb2db664edfa75374d4c9804ab8/File/9e75de185fc35b03062c8f87492d3246/9e75de185fc35b03062c8f87492d3246.png" width="576" /></p>
<p>Looking at the plot, we quickly see that vegetables tend to be associated with “Disagree” (positioned close to each other in the graph) and Ice cream is positioned close to “Neutral” (they are related to each other). As for Meat and Potatoes, the panel tends either to “Agree” or “Strongly agree.”</p>
<p>We now have a much better understanding of the preferences of our panel, because we know what they tend to like and dislike.</p>
Selecting the Right Type of Tool to Analyze Survey Data
<p>Many multivariate tools are available, so how can you choose the right one to analyze your survey data?</p>
<p>The decision tree below shows which method you might choose according to your objectives and the <a href="http://blog.minitab.com/blog/understandingstatistics/understandingqualitativequantitativeattributediscreteandcontinuousdatatypes">type of data you have</a>. For example, we selected correspondence analysis in the<span style="lineheight: 1.6;"> </span><span style="lineheight: 20.8px;">previous</span><span style="lineheight: 20.8px;"> </span><span style="lineheight: 1.6;">example because all our variables were categorical, or qualitative in nature.</span></p>
<p style="marginleft: 40px;"><img alt="multivariate diagram 1" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/b6150beff1fbc04623fcccdadc0faac4/multivariate_1.gif" style="lineheight: 20.8px; width: 624px; height: 464px;" /></p>
<p> </p>
Categorical Data and Prediction of Group Membership (Right Branch)
<p><strong>Clustering</strong><br />
If you have some numerical (or continuous) data and you want to understand how your customers might be grouped / aggregated (from a statistical point of view) into several homogeneous groups, you can use clustering techniques. This could be helpful to define profiles and user groups.</p>
<p><strong>Discriminant Analysis or Logistic Regression (Scoring)</strong><br />
If your individuals already belong to different groups and you want to understand which variables are important to define an existing user group, or predict group membership for new individuals, you can use discriminant analysis, or binary logistic regression (if you only have two groups).</p>
<p><strong>Correspondence Analysis </strong><br />
<span style="lineheight: 1.6;">As we saw in the first example, correspondence analysis lets us study relationships between variables that are categorical / qualitative.</span></p>
Numeric or Continuous Data Analysis (Left Branch)
<p><strong>Principal Component Analysis or Factor Analysis</strong><br />
I<span style="lineheight: 1.6;">f all your variables are numeric, you can use principal components analysis to understand how variables are related to one another. Factor analysis may be useful to identify an underlying, unknown factor associated to your variables.</span></p>
<p><strong>Item Analysis</strong><br />
This tool was specifically created for survey analysis. Do the items of a survey evaluate similar characteristics? Which items differ from the remaining questions The objective is to assess internal consistency of a survey. </p>
<p>They <em>are </em>computationally intensive, but performing these multivariate analyses in Minitab is very userfriendly, and the software produces easytounderstand graphs (as in the food preference example above).</p>
A Closer Look at Some Specific Multivariate Tools
<p>Let's take a closer look at the tools for numerical survey data analysis. The graph below shows the tools that are available to you and their objectives in each case. These methods are often used to group numeric variables according to similarity, they may also be useful in studying how individuals are positioned according the main groups of variables in order to identify user profiles.</p>
<p style="marginleft: 40px;"> <img alt="multivariate diagram 2" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/283b5721982a2c167236a120799ede2c/multivariate_2.gif" style="width: 624px; height: 438px;" /></p>
<p>And now let's look a bit more closely at the tools we can use for analyzing categorical survey data. Again, the diagram below shows the tools that are available to you and their objectives. Many of these tools can be used to study how numeric variable relate to qualitative categories.</p>
<p style="marginleft: 40px;"><img height="430" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/31b80fb2db664edfa75374d4c9804ab8/File/58cd6feb678c0f5d2232c304b0173391/58cd6feb678c0f5d2232c304b0173391.png" width="624" /></p>
Conclusion
<p>This is a very general overview of multivariate tools for survey analysis. If you want to go deeper and learn more about these techniques, you can find some resources on the <a href="http://support.minitab.com/minitab/17/topiclibrary/modelingstatistics/multivariate/basics/multivariateanalysesinminitab/">Minitab web site</a>, in the Help menu in Minitab's statistical software, or you can contact <a href="http://www.minitab.com/support/">our technical support team</a>. </p>
Data Analysis
Insights
Learning
Statistics
Statistics Help
Stats
Wed, 15 Jun 2016 12:00:00 +0000
http://blog.minitab.com/blog/applyingstatisticsinqualityprojects/usingmultivariatestatisticaltoolstoanalyzecustomerandsurveydata
Bruno Scibilia

Poisson Data: Examining the Number Deaths in an Episode of Game of Thrones
http://blog.minitab.com/blog/thestatisticsgame/poissondataexaminingthenumberdeathsinanepisodeofgameofthrones
<p><img alt="Game of Thrones" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/fe2c58f624104b6fb687d378929b1f9b/Image/d11b4341996f340e24132eb12253d8e5/game_of_thrones.jpg" style="float: right; width: 250px; height: 141px; margin: 10px 15px; borderwidth: 1px; borderstyle: solid;" />There may not be a situation more perilous than being a character on <a href="http://www.hbo.com/gameofthrones" target="_blank"><em>Game of Thrones</em></a>. Warden of the North, Hand of the King, and apparent protagonist of the entire series? Off with your head before the end of the first season! Last male heir of a royal bloodline? Here, have a pot of molten gold poured on your head! Invited to a wedding? Well, you probably know what happens at weddings in the show. </p>
<p>So what do all these gruesome deaths have to do with statistics? They are data that come from a <a href="http://blog.minitab.com/blog/funwithstatistics/poissonprocessesandprobabilityofpoop">Poisson distribution</a>.</p>
<p>Data from a Poisson distribution describe the number of times an even occurs in a finite observation space. For example, a Poisson distribution can describe the number of defects in the mechanical system of an airplane, the number of calls to a call center, or in our case it can describe the number of deaths in an episode of Game of Thrones.</p>
GoodnessofFit Test for Poisson
<p>If you're not certain whether your data follow a Poisson distribution, you can use <a href="http://www.minitab.com/enus/products/minitab/" target="_blank">Minitab Statistical Software</a> to perform a goodnessoffit test. If you don't already use Minitab and you'd like to follow along with this analysis, download the <a href="http://www.minitab.com/products/minitab/freetrial/">free 30day trial</a>.</p>
<p>I collected the <a href="http://genius.com/Gameofthroneslistofgameofthronesdeathsannotated" target="_blank">number of deaths for each episode</a> of Game of Thrones (as of this writing, 57 episodes have aired), and put them in a Minitab worksheet. Then I went to <strong>Stat > Basic Statistics > GoodnessofFit Test for Poisson </strong>to determine whether the data follow a Poisson distribution. You can get the data I used <a href="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/f73acb13fa520a25583149f8b780a31c/game_of_thrones_deaths.mtw">here</a>. </p>
<p style="marginleft: 40px;"><img alt="GoodnessofFit Test for Poisson Distribution " src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/fe2c58f624104b6fb687d378929b1f9b/Image/0c9dcb9ecb6eb644109d86e3501143b3/gof_test_poisson.jpg" style="width: 492px; height: 417px;" /></p>
<p>Before we interpret the pvalue, we see that we have a problem. Three of the categories have an expected value less than 5. If the expected value for any category is less than 5, the results of the test may not be valid. To fix our problem, we can combine categories to achieve the minimum expected count. In fact, we see that Minitab actually already started doing this by combining all episodes with 7 or more deaths.</p>
<p>So we'll just continue by making the highest category 6 or more deaths, and the lowest category 1 or 0 deaths. To do this, I created a new column with the categories 1, 2, 3, 4, 5 and 6. Then I made a frequency column that contained the number of occurrences for each category. For example, the "1" category is a combination of episodes with 0 deaths and 1 death, so there were 15 occurrences. Then I ran the analysis again with the new categories.</p>
<p style="marginleft: 40px;"><img alt="GoodnessofFit Test for Poisson Distribution " src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/93551e38ce5c4cc5321c249fee184e24/gof_test_poisson_2.jpg" style="width: 420px; height: 323px;" /></p>
<p>Now that all of our categories have expected counts greater than 5, we can examine the pvalue. If the pvalue is less than the significance level (usually 0.05 works well), you can conclude that the data do not follow a Poisson distribution. But in this case the pvalue is 0.228, which is greater than 0.05. Therefore, we cannot conclude that the data do not follow the Poisson distribution, and can continue with analyses that assume the data follow a Poisson distribution. </p>
Confidence Interval for 1Sample Poisson Rate
<p>When you have data that come from a Poisson distribution, you can use <strong>Stat > Basic Statistics > 1Sample Poisson Rate</strong> to get a rate of occurrence and calculate a range of values that is likely to include the population rate of occurrence. We'll perform the analysis on our data.</p>
<p style="marginleft: 40px;"><img alt="1Sample Poisson Rate" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/fe2c58f624104b6fb687d378929b1f9b/Image/259b9b0cb11fed7e5b7467703f7037ad/1_poisson_rate.jpg" style="width: 489px; height: 133px;" /></p>
<p>The rate of occurrence tells us that on average there are about 3.2 deaths per episode on <em>Game of Thrones</em>. If our 57 episodes were a sample from a much larger population of <em>Game of Thrones</em> episodes, the confidence interval would tell us that we can be 95% confident that the population rate of deaths per episode is between 2.8 and 3.7.</p>
<p>The length of observation lets you specify a value to represent the rate of occurrence in a more useful form. For example, suppose instead of deaths per episode, you want to determine the number of deaths per season. There are 10 episodes per season. So because an individual episode represents 1/10 of a season, 0.1 is the value we will use for the length of observation. </p>
<p style="marginleft: 40px;"><img alt="1Sample Poisson Rate" src="https://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/fe2c58f624104b6fb687d378929b1f9b/Image/b6fa9d2e740aacc86d4223ea75487d95/1_poisson_rate_season.jpg" style="width: 495px; height: 106px;" /></p>
<p>With a different length of observation, we see that there are about 32 deaths per season with a confidence interval ranging from 28 to 37.</p>
Poisson Regression
<p>The last thing we'll do with our Poisson data is perform a regression analysis. In Minitab, go to <strong>Stat > Regression > Poisson Regression > Fit Poisson Model</strong> to perform a Poisson regression analysis. We'll look at whether we can use the episode number (1 through 10) to predict how many deaths there will be in that episode.</p>
<p style="marginleft: 40px;"><img alt="Poisson Regression" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/fe2c58f624104b6fb687d378929b1f9b/Image/0540d6716d13c4de50421155038b2c03/poisson_regression.jpg" style="width: 402px; height: 238px;" /></p>
<p>The first thing we'll look at is the pvalue for the predictor (episode). The pvalue is 0.042, which is less than 0.05, so we can conclude that there is a statistically significant association between the episode number and the number of deaths. However, the Deviance RSquared value is only 18.14%, which means that the episode number explains only 18.14% of the variation in the number of deaths per episode. So while an association exists, it's not very strong. Even so, we can use the coefficients to determine how the episode number affects the number of deaths. </p>
<p style="marginleft: 40px;"><img alt="Poisson Regression" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/fe2c58f624104b6fb687d378929b1f9b/Image/adb7514fd7892c3b8591895321c96918/poisson_regression_2.jpg" style="width: 241px; height: 227px;" /></p>
<p>The episode number was entered as a categorical variable, so the coefficients show how each episode number affects the number of deaths relative to episode number 1. A positive coefficient indicates that episode number is likely to have more deaths than episode 1. A negative coefficient indicates that episode number is likely to have fewer deaths than episode 1.</p>
<p>We see that the start of each season usually starts slow, as 7 of the 9 episode numbers have positive coefficients. Episodes 8, 9, and 10 have the highest coefficients, meaning relative to the first episode of the season they have the greatest number of deaths. So even though our model won't be great at predicting the exact number of deaths for each episode, it's clear that the show ends each season with a bang.</p>
<p>And considering episode 8 of the current season airs this Sunday, if you're a <em>Game of Thrones</em> viewer you should brace yourself, because death is coming. Or, as they would say in Essos:</p>
<p><em>Valar morghulis.</em></p>
Data Analysis
Fun Statistics
Statistics
Statistics in the News
Fri, 10 Jun 2016 12:03:00 +0000
http://blog.minitab.com/blog/thestatisticsgame/poissondataexaminingthenumberdeathsinanepisodeofgameofthrones
Kevin Rudy

A Six Sigma Healthcare Project, part 4: Predicting Patient Participation with Binary Logistic ...
http://blog.minitab.com/blog/understandingstatistics/asixsigmahealthcareproject%2Cpart4%3Apredictingpatientparticipationwithbinarylogisticregression
<p>By looking at the data we have about 500 cardiac patients, we've learned that easy access to the hospital and good transportation are key factors influencing participation in a rehabilitation program.</p>
<p><span style="lineheight: 20.8px;"><img alt="monitor" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/96f126b828fb05a099854d278cfba6eb/monitor.jpg" style="margin: 10px 15px; float: right; width: 296px; height: 212px;" />Past data shows that each month, about 15 of the patients discharged after cardiac surgery do not have a car. Providing transportation to the hospital might make these patients more likely to join the rehabilitation program, but the costs of such a service </span><span style="lineheight: 1.6;">can't exceed the potential revenue from participation. </span></p>
<p><span style="lineheight: 1.6;">We can use </span><a href="http://blog.minitab.com/blog/understandingstatistics/asixsigmahealthcareprojectpart3creatingbinarylogisticregressionmodelsforpatientparticipation" style="lineheight: 1.6;">the binary logistic regression model developed in part 3</a><span style="lineheight: 1.6;"> to predict probabilities of participation, to identify where </span><span style="lineheight: 20.8px;">transportation assistance</span><span style="lineheight: 1.6;"> might make the biggest impact, and to develop an estimate of how much we could invest in such assistance. </span></p>
<p>Download the <a href="//cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/521a4efabc5f892eef403e8c4a354f9a/cardiacparticipationdata_1.mtw">data set</a> to follow along and try these analyses yourself. If you don't already have Minitab, you can <a href="http://www.minitab.com/products/minitab/freetrial/">download and use our statistical software free for 30 days</a>.</p>
Using the Regression Model to Predict Patient Participation
<p>We want to develop some estimates of the probability of participation based on whether or not a patient has access to transportation. The first step is make some mesh data representing our population. In Minitab, go to <strong>Calc > Create Mesh Data...</strong>, and complete the dialog box as shown below. (The maximum and minimum ranges for Age and Distance are drawn directly from the descriptive statistics for the sample data we used to create our regression model.) </p>
<p style="marginleft: 40px;"><img alt="Make Mesh Data Dialog" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/b8437a0b42a63b84e2dcdd65281a3eef/make_mesh_data.png" style="width: 484px; height: 340px;" /></p>
<p>When you press OK, Minitab adds 2 new columns to the worksheet that contain the 200 different combinations of the levels of these factors. Now we'll add two additional columns, one representing patients who have access to a car, and one representing those who don't. Now our worksheet should include four columns of data as shown:</p>
<p style="marginleft: 40px;"><img alt="mesh data in worksheet" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/68996051dec970430f44a13696661e02/mesh_data_worksheet.png" style="width: 255px; height: 191px;" /></p>
<p>Now we'll go to <strong>Stat > Regression > Binary Logistic Regression > Predict...</strong> Minitab remembers the last regression model that was run; to make sure it's the right one, click the "View Model..." button...</p>
<p style="marginleft: 40px;"><img alt="view model" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/1b1f96f9fcd549add0aca00a958137d1/view_model.png" style="width: 232px; height: 93px;" /></p>
<p>and confirm that the model displayed is the correct one.</p>
<p style="marginleft: 40px;"><img alt="view model" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/fb09621305a2dcc0c898c7ac90eaa79d/view_model.png" style="width: 600px; height: 267px;" /></p>
<p>Next, press the "Predict" button and complete the dialog box using the mesh variables we created, as shown. We can also press the "Storage" button to tell Minitab to store the Fits (the predicted probabilities) for each data point in the worksheet. Note that the column selected for the Mobility term is "Car," so all of these predictions will be based on the equation for patients who have access to a vehicle. </p>
<p style="marginleft: 40px;"><img alt="regression prediction dialog" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/a7d5fe724e64117e1a26d18e21203d5a/prediction_dialog_1.png" style="lineheight: 20.8px; width: 819px; height: 392px;" /></p>
<p>When you click <strong>OK</strong> through all dialogs, Minitab will add a column of data that shows the predicted probability of participation for patients, assuming they have a vehicle. </p>
<p>Now we'll create the predictions for individuals who don't have cars. Press <strong>CTRLE</strong> to edit the previous dialog box. This time, for the M<span style="lineheight: 1.6;">obility column, select "NoCar."</span></p>
<p style="marginleft: 40px;"><img alt="no car" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/f7076a96b1cd37b6d8d38e7e7138631c/prediction_dialog_2.png" style="width: 307px; height: 68px;" /></p>
<p>When you press OK, Minitab recalculates the probabilities for the patients, this time using the equation that assumes they do not have a vehicle. The probabilities of participation for each data point are stored in two columns in the worksheet, which I've renamed PFITSCar and PFITSNo car. </p>
<p style="marginleft: 40px;"><img alt="pfits" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/165b5cbca4f6edb53c10507d49d4438b/pfits_in_worksheet.png" style="width: 404px; height: 306px;" /></p>
Where Can Providing Transportation Make an Impact?
<p>Now we have estimated probabilities of participation for patients with the same age and distance characteristics, both with and without access to a vehicle. It would be helpful to visualize the differences in these probabilities to see where offering transportation might make the biggest impact in increasing participation rates.</p>
<p>First, we'll use Minitab's calculator to compute the difference in probabilities between having and not having a car. Go to <strong>Calc > Calculator...</strong> and complete the dialog as shown: </p>
<p style="marginleft: 40px;"><img alt="calculator" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/5e7a07e8f7f41d8569535006f9d5debd/calculator.png" style="width: 433px; height: 383px;" /></p>
<p>Now we have column of data named "Car  NoCar" that contains the probability difference for patients with the same age and distance characteristics both with and without a vehicle. We can use that column to create a contour plot that offers additional insight into the relationships between the likelihood of participation in the rehabilitation program and a patient's age, distance, and mobility. Select <strong>Graph > Contour Plot...</strong> and complete the dialog as shown: </p>
<p style="marginleft: 40px;"><img alt="contour plot dialog box" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/648a12ddd172dee236c96cccc0c1d0bc/contour_plot_dialog.png" style="width: 531px; height: 344px;" /></p>
<p>Minitab produces this contour plot (we have edited the range of colors from the default):</p>
<p style="marginleft: 40px;"><img alt="contour plot" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/132dc089b5f7a9c2fb849f5427b1c927/contourplot.png" style="width: 576px; height: 384px;" /></p>
<p>From this plot we can see the patients for whom transportation assistance is likely to make the most impact. These are the patients whose age and distance characteristics fall within the darkredcolored area, where access to a vehicle raises the probability of participation by more than 40 percent.</p>
<p>The hospital <em>could </em>use this information to carefully target potential recipients of transportation assistance, but doing so would raise many ethical issues. Instead, the hospital will offer transportation assistance to any potential participant who needs it. The project team decides to calculate the average probability of participation for all patients without access to a vehicle.</p>
<p>To obtain that average, select <strong>Stat > Basic Statistics > Display Descriptive Statistics...</strong> in Minitab, and choose "PFITSNoCar" as the variable. Click on the "Statistics" button to make sure the Mean is among the descriptive statistics being calculated, and click OK. Minitab will display the descriptive statistics you've selected in the Session Window. </p>
<p style="marginleft: 40px;"><img alt="descriptive statistics" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/e14f812d91728a9e17b8ae2dde0d8f30/pfits_nocar_mean.png" style="width: 290px; height: 89px;" /></p>
<p>According to our binary logistic regression model, the average probability of participation for all patients without a car equals 0.1695, which we will round up to .17. Now we can easily calculate an estimated breakeven point for ensuring transport for patients who need it. We have the following information on hand: </p>
<div style="marginleft:40px;">
Patients per month without a car.................................................
15
Average probability of participation without a car...........................
.30
Average number of sessions per participant..................................
29
Revenue per session..................................................................
$23
</div>
<p>Based on these figures, a perpatient maximum for transportation can be calculated as:</p>
<p style="marginleft: 40px;">.17 probability of participation x 29 sessions x $23 per session = $113.39</p>
<p><span style="lineheight: 1.6;">Since about 15 discharged cardiac patients each month do not have a car, we can invest at most 15 x $113.39 = $1700.85/month in transportation assistance. </span></p>
<span style="lineheight: 1.6;">Implementing Transportation Assistance for Patient Participation</span>
<p>As described in the <a href="http://dx.doi.org/10.1080/08982112.2011.553761" target="_blank">article on which inspired this series of posts</a>, the project team evaluated potential improvement options against this this economic calculation and developed a process that brought together patients with cars and those without to carpool to sessions. A pilottest of the process proved successful, and most of the carless patients noted that they would not have participated in the rehabilitation program without the service. </p>
<p>After implementing the new carpool process, the project team revisited the key factors they had considered at the start of the initiative, the number of patients enrolling in the program each month, and the average number of sessions participants attended.</p>
<p>After implementing the carpool process, the average number of sessions attended remained constant at 29. But patient participation rose from 33 to 45 per month, which exceeded the project goal of increasing participation to 36 patients per month. Additional revenues turned out to be circa $96,000 annually.</p>
TakeAway Lessons from This Project Study
<p>If you've read all four parts of this series, you may recall that at the start of the <span style="lineheight: 20.8px;"> </span><span style="lineheight: 20.8px;">Six Sigma</span><span style="lineheight: 20.8px;"> </span><span style="lineheight: 1.6;">project, several stakeholders believed that the problem of low participation could be addressed by creating a nicer brochure for the program, and by encouraging surgeons to tell their patients about it at an earlier point in their treatment. </span></p>
<p>None of those initial ideas wound up being implemented, but the project team succeeded in meeting the project goals by enacting improvements that were supported by their data analysis. For me, this is a core takeaway from this article. </p>
<p>As the authors note, "Often people’s ideas on processes are incorrect, but improvement actions based on these are still being implemented. These actions cause frustrated employees, may not be cost effective, and in the end do not solve the problem."</p>
<p>Thus, the article makes a compelling case for the value of applying data analysis to improve processes in healthcare. "<span style="lineheight: 1.6;">Even when a somewhat more advanced technique like logistic regression modeling is required," the authors write, "exploratory graphics such as boxplots and bar charts point the direction toward a valuable solution."</span></p>
Health Care Quality Improvement
Thu, 09 Jun 2016 12:00:00 +0000
http://blog.minitab.com/blog/understandingstatistics/asixsigmahealthcareproject%2Cpart4%3Apredictingpatientparticipationwithbinarylogisticregression
Eston Martz

A Six Sigma Healthcare Project, part 3: Creating a Binary Logistic Regression Model for Patient ...
http://blog.minitab.com/blog/understandingstatistics/asixsigmahealthcareprojectpart3creatingbinarylogisticregressionmodelsforpatientparticipation
<p>In part 2 of this series, we used graphs and tables to see <a href="http://blog.minitab.com/blog/understandingstatistics/asixsigmahealthcareprojectpart2visualizingtheimpactofindividualfactors">how individual factors affected rates of patient participation</a> in a cardiac rehabilitation program. This initial look at the data indicated that ease of access to the hospital was a very important contributor to patient participation.</p>
<p><img alt="physical therapy facility" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/cf2f4a8979304153c3ea8fd5210215e8/rehab_facility.jpg" style="margin: 10px 15px; float: right; width: 320px; height: 211px;" />Given this revelation, a bus or shuttle service for people who do not have cars might be a good way to increase participation, but only if such a service doesn't cost more than the amount of revenue generated by participation.</p>
<p>A good estimate of that probability will enable us to calculate the breakeven point for such a service. We can use regression to develop a statistical model that lets us do just that.</p>
<p>We have a binary response variable, because only two outcomes exist: a patient either participates in the rehabilitation program, or does not. To model these kinds of responses, we need to use a statistical method called "Binary Logistic Regression." This may sound intimidating, but it's really not as scary as it sounds, especially with a statistical software package like Minitab.</p>
<p>Download the <a href="//cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/521a4efabc5f892eef403e8c4a354f9a/cardiacparticipationdata_1.mtw">data set</a> to follow along and try these analyses yourself. If you don't already have Minitab, you can <a href="http://www.minitab.com/products/minitab/freetrial/">download and use our statistical software free for 30 days</a>.</p>
Using Stepwise Binary Logistic Regression to Obtain an Initial Model
<p>First, let's review our data. We know the gender, age, and distance from the hospital for 500 cardiac patients. We also know whether or not they have access to a vehicle ("Mobility") and whether or not they participated in the rehabilitation program after their surgery (coded so that 0 = no, and 1 = yes). </p>
<p style="marginleft: 40px;"><img alt="data" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/98b12f3c127d5370169e1eee44679577/data_snapshot_for_blr.png" style="width: 339px; height: 348px;" /></p>
<p>The process of developing a regression equation that can predict a response based on your data is called "Fitting a model." We'll do this in Minitab by selecting <strong>Stat > Regression > Binary Logistic Regression > Fit Binary Logistic Model...</strong> </p>
<p style="marginleft: 40px;"><img alt="Binary Logistic Regression menu" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/4d475b704b654470760bc8fb561d6547/binary_logistic_regression_menu.png" style="width: 633px; height: 367px;" /></p>
<p>In the dialog box, we need to select the appropriate columns of data for the response we want to predict, and the factors we wish to base the predictions on. In this case, our response variable is "Participation," and we're basing predictions on the continuous factors of "Age" and "Distance," along with the categorical factor "Mobility." </p>
<p style="marginleft: 40px;"><img alt="binary logistic regression dialog 1" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/2cefc4466635a596b45a8a7d58472486/binary_logistic_regression_dialog_1.png" style="width: 580px; height: 494px;" /></p>
<p>After selecting the factors, click on the "Model" button. This lets us tell Minitab whether we want to consider interactions and polynomial terms in addition to the main effects of each factor. Complete the Model dialog as shown below. To include the twoway interactions in the model, highlight all the items in the Predictors window, make sure that the “Interactions through order:” dropdown reads “2,” and press the Add button next to it:</p>
<p style="marginleft: 40px;"><img alt="Binary Logistic Regression Dialog 2" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/2687d8f135fa8f4ef8822d47e72e357a/binary_logistic_regression_model_dialog.png" style="width: 520px; height: 542px;" /></p>
<p>Click OK to return to the main dialog, then press the “Coding” button. In this subdialog, we can tell Minitab to automatically standardize the continuous predictors, Age and Distance. There are several reasons you might want to standardize the continuous predictors, and different ways of standardizing depending on your intent.</p>
<p>In this case, we’re going to standardize by subtracting the mean of the predictor from each row of the predictor column, then dividing the difference by the standard deviation of the predictor. This centers the predictors and also places them on a similar scale. This is helpful when a model contains highly correlated predictors and interaction terms, because standardizing helps reduce multicollinearity and improves the precision of the model’s estimated coefficients. To accomplish this, we just need to select that option from the dropdown as shown below:</p>
<p style="marginleft: 40px;"><img alt="Binary Logistic Regression  Coding" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/429d47a6cf7b5a9e63363229512691f2/binary_logistic_regression_coding_dialog.png" style="width: 519px; height: 560px;" /></p>
<p>After you click OK to return to the main dialog, press the "Stepwise" button. We use this subdialog to perform a stepwise selection, which is a technique that automatically chooses the best model for your data. Minitab will evaluate several different models by adding and removing various factors, and select the one that appears to provide the best fit for the data set. You can have Minitab provide details about the combination of factors it evaluates at each "step," or just show the recommended model<span style="lineheight: 1.6;">.</span></p>
<p style="marginleft: 40px;"><span style="lineheight: 1.6;"><img alt="Binary Logistic Regression  stepwise" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/f9a8375d2b429a1b7bd475182b6b1461/binary_logistic_regression_dialog_3.png" style="width: 490px; height: 542px;" /> </span></p>
<p>Now click OK to close the Stepwise dialog, and OK again to run the analysis. The output in Minitab's Session window will include details about each potential model, followed by a summary or "deviance" table for the recommended model.</p>
<span style="lineheight: 1.6;">Assessing and Refining the Regression Model</span>
<p><span style="lineheight: 1.6;">Using software to perform stepwise regression is extremely helpful, but it's always important to check the recommended model to see if it can be refined further. In this case, all of the model terms are significant, and the deviance table's adjusted R2 indicates that the model explains about 40 percent of the observed variation in the response data. </span></p>
<p style="marginleft: 40px;"><img alt="stepwise regression selected model" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/f6a95017c24e7ed0f315471d685a91c6/output_deviance_table.png" style="width: 520px; height: 289px;" /></p>
<p>We also want to look at the table of coded coefficients immediately below the summary. The final column of the table lists the VIFs, or variance inflation factors, for each term in the model. This is important because VIF values greater than 5–10 can indicate unstable coefficients that are difficult to interpret.</p>
<p>None of these terms have VIF values over 10<span style="lineheight: 1.6;">. </span></p>
<p style="marginleft: 40px;"><img alt="variance inflation factors (VIF)" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/bc95899d33a4511e92288e158693e39d/output_coded_coefficients.png" style="width: 307px; height: 182px;" /></p>
<p>Minitab also performs goodnessoffit tests that assess how well the model predicts observed data. The first two tests, the deviance and Pearson chisquared tests, have high pvalues, indicating that these tests do not support the conclusion that this model is a poor fit for the data. However, the low pvalue for the HosmerLemeshow test indicates that the model could be improved.</p>
<p style="marginleft: 40px;"><img alt="goodnessoffit tests" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/5bf6a28128eebbed608253c122c3daf6/output_goodness_of_fit_tests.png" style="width: 364px; height: 119px;" /></p>
<p>It may be that our model does not account for curvature that exists in the data. We can ask Minitab to add polynomial terms, which model curvature between predictors and the response, to see if it improves the model. Press CTRLE to recall the binary logistic regression dialog box, then press the "Model" button. To add the polynomial terms, select Age and Distance in the Predictors window, make sure that "2" appears in the “Terms through order:” dropdown, and press "Add" to add those polynomial terms to the model. An order 2 polynomial is the square of the predictor.</p>
<p style="marginleft: 40px;"><img alt="binary logistic regression dialog 4" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/d2cd01d9ac0a49a57885801204896885/model_dialog_adding_polynomial_terms.png" style="width: 520px; height: 542px;" /></p>
<p>You may have noticed that we did not select “Mobility” above. Why? Because that categorical variable is coded with 1’s and 0’s, so the polynomial term would be identical to the term that is already in the model.</p>
<p>Now press OK all the way out to have Minitab evaluate models that include the polynomial terms. Minitab generates the following output:: </p>
<p style="marginleft: 40px;"><img alt="binary logistic regression final model" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/ad1c37024ed6de634bb5215f448b2227/model_with_polynomials_deviance_table.png" style="width: 483px; height: 311px;" /></p>
<p>However, the VIFs for Mobility and the Distance*Mobility interaction remain higher than desirable:</p>
<p style="marginleft: 40px;"><img alt="VIF" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/173c061a582e9c7e3d00ceba0c65c94d/binary_logistic_regression_model_2_coefficients.png" style="width: 349px; height: 184px;" /></p>
<p>So far, so good—all model terms are significant, and the adjusted R2 indicates that the new model accounts for 51 percent of the observed variation in the response, compared to the initial model’s 40 percent. The coefficients are also acceptable, with no variance inflation factors above 10. These terms are moderately correlated, but probably not enough to make the regression results unreliable: </p>
<p style="marginleft: 40px;"><img alt="binarylogisticregressionmodelVIF" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/7935289a46fbd462c1f13d8c901fc94b/model_with_polynomials_coefficients.png" style="width: 313px; height: 188px;" /></p>
<p>The goodnessoffit tests for this model also look good—the lack of pvalues below 0.05 indicate that these tests do not suggest the model is a poor fit for the observed data.</p>
<p style="marginleft: 40px;"><img alt="finalbinarylogisticregressionmodelgoodnessoffittests" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/6d188aa3bda808ab62df9f4ef08f692c/model_with_polynomials_goodness_of_fit_tests.png" style="width: 342px; height: 118px;" /></p>
The Binary Logistic Regression Equations
<p><span style="lineheight: 1.6;">This model seems like the best option for predicting the probability of patient participation in the program. Based on the available data, Minitab has calculated the following regression equations, one that predicts the probability of attendance for people who have access to their own transportation, and one for those who do not: </span></p>
<p style="marginleft: 40px;"><img alt="regression equations" src="http://cdn.app.compendium.com/uploads/user/458939f4fe084dbcb271efca0f5a2682/479b4fbdf8c040119409f4109cc4c745/Image/25c8ca3c2f4b195e446bf18b75fffee8/model_with_polynomials_regression_equations.png" style="width: 533px; height: 112px;" /></p>
<p><span style="lineheight: 1.6;">In the next post, we'll complete this process by <a href="http://blog.minitab.com/blog/understandingstatistics/asixsigmahealthcareproject%2Cpart4%3Apredictingpatientparticipationwithbinarylogisticregression">using this model to make predictions about the probability of participation</a> </span><span style="lineheight: 20.8px;">in the rehabilitation program </span><span style="lineheight: 1.6;">and how much we can afford to invest in transportation to help more cardiac patients. </span></p>
Health Care Quality Improvement
Tue, 07 Jun 2016 12:00:00 +0000
http://blog.minitab.com/blog/understandingstatistics/asixsigmahealthcareprojectpart3creatingbinarylogisticregressionmodelsforpatientparticipation
Eston Martz