Data Analysis Software | MinitabMinitab:Data Analysis Software
http://blog.minitab.com/blog/data-analysis-software/rss
Mon, 01 Sep 2014 18:30:29 +0000FeedCreator 1.7.3Analyzing NFL Ticket Prices: How Much Would You Pay to See the Green Bay Packers?
http://blog.minitab.com/blog/the-statistical-mentor/analyzing-nfl-ticket-prices3a-how-much-would-you-pay-to-see-the-green-bay-packers
<p><span style="line-height: 1.6;">The 2014-15 NFL season is only days away, and fans all over the country are planning their fall weekends accordingly. In this post, I'm going to use data analysis to answer some questions related to ticket prices, such as:</span></p>
<ul>
<li>Which team is the least/most expensive to watch at home? </li>
<li>Which team is the least/most expensive to watch on the road? </li>
<li>If you are thinking of a road trip, which stadiums offer the largest ticket discount for your team?<img alt="Football stadium crowd" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/5781b1d52907305361bd13535983580b/stadium.jpg" style="float: right; width: 350px; height: 269px; border-width: 1px; border-style: solid; margin: 10px 15px;" />
<ul>
</ul>
</li>
</ul>
<p>For dedicated fans, this is far from a trivial matter. As we'll see, fans of one team can get an average 48% discount on road-game tickets, while fans of two other teams will pay, on average, more than double the cost to see their team on the road.</p>
Gathering and Preparing NFL Ticket Price Data
<p>The data I'm analyzing comes from Stubhub, an online ticket marketplace owned by ebay. You'll find a summary of the number of Stubhub tickets available and mimimum price on Stubhub for each NFL game in 2014 on the ESPN website: <a href="http://espn.go.com/nfl/schedule/_/seasontype/2/week/1">http://espn.go.com/nfl/schedule/_/seasontype/2/week/1</a></p>
<p><img alt="snapshot of NFL data from ESPN.com" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/0844140f9ada17966cf7a63eda771c6a/nfl_data.jpg" style="width: 600px; height: 384px;" /></p>
<p>I did a quick copy-and-paste from ESPN into Excel to put each variable nicely into a column, and then another copy-and-pasted the data into Minitab <a href="http://www.minitab.com/products/minitab">Statistical Software</a> to prepare it for analysis. I used the <a href="http://blog.minitab.com/blog/understanding-statistics/three-ways-to-get-more-out-of-your-text-data"><strong>Calc > Calculator</strong></a> commands Left() and Right() in Minitab to extract the minimum ticket price, the first few letters of the away team name, and the first few letters of the home team name. (Since the summary on ESPN.com only shows the minimum price, the analysis below is based only on the minimum ticket price available for each game.)</p>
Which Is the Most Expensive Team to See on the Road?
<p>The Bar Chart below shows that Green Bay is the most expensive road team to watch play with a 2014 average price of $145 per road game. This is noticeably higher than the other NFL teams. The next closest is San Francisco with an average price of $128 per road game. But catching a Jacksonville road game is a fraction of those costs, averaging $48. </p>
<p><img alt="Bar Chart of Average Minimum Price for Away Team 2014 NFL Season" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/b7a4e021aea84b3bf1b37ee619afc93c/avg_min_price_away_team_2014_season.jpg" style="width: 586px; height: 390px;" /></p>
Which Is the Most Expensive Team to See at Home?
<p>The Bar Chart below shows that Chicago is the most expensive team to watch play on their home turf, with a 2014 average price of $175 per home game. Seattle is a close second with an average price of $171 per home game. Seeing Dallas or St. Louis in a home game is a fraction of those costs, averaging just $35. </p>
<p><img alt="Bar Chart of Average Minimum Price for Home Team 2014 NFL Season" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/27f5c44c660d02e27573ca4cd4366632/avg_min_price_home_team_2014_season.jpg" style="width: 580px; height: 387px;" /></p>
Is It Cheaper to See Your Favorite Team on the Road?
<p>Finally, I compared the average home game ticket price to the average road game ticket price for each NFL team.</p>
<p>The road team discount award goes to the Seattle Seahawks. You'll save, on average, 48% watching their games on the road. But if you're a fan of Dallas or Miami, you'll be financially better off watching your team at home—their average price increases more than 110% when they're on the road. One factor that drives this result is the popularity of Dallas and Miami across the country: the higher demand supports their higher road-game price. Also, Dallas' enormous home stadium (AT&T) offers cheap Party Pass seats (which aren't really seats at all, but rather a standing room section). </p>
<p><img alt="Is It Cheaper to See Your Favorite NFL Team on the Road? " src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/8063637683d8b39c1fcc4083965e7428/cheaper_on_the_road.jpg" style="width: 582px; height: 387px;" /></p>
<p>One drawback with this analysis is it doesn't take into account the opponent that each team faces. For example, Chicago may happen to be playing some very popular teams at home in 2014, which drives their home-game ticket prices up for this season.</p>
<p>In a future post, I'll discuss how to adjust for opponents and other variables such as game day and game time.</p>
Tue, 26 Aug 2014 12:00:00 +0000http://blog.minitab.com/blog/the-statistical-mentor/analyzing-nfl-ticket-prices3a-how-much-would-you-pay-to-see-the-green-bay-packersJim ColtonUse a Line Plot to Show a Summary Statistic Over Time
http://blog.minitab.com/blog/statistics-and-quality-improvement/use-a-line-plot-to-show-a-summary-statistic-over-time
<p><img alt="Terrorist Attacks, 2013, Concentration and Intensity" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/2ecbcca8429152afb73d991b4a532f5a/start_globalterrorismdatabase_2013terroristattacksconcentrationintensitymap_w640.png" style="width: 500px; height: 216px;" /></p>
<p>If you’re already a strong user of Minitab Statistical Software, then you’re probably familiar with <a href="https://blog.minitab.com/blog/starting-out-with-statistical-software/investigating-starfighters-with-bar-charts3a-function-of-a-variable">how to use bar charts to show means</a>, medians, sums, and other statistics. Bar charts are excellent tools, but traditionally used when you want all of your categorical variables to have different sections on the chart. When you want to plot statistics with groups that flow directly from one category to the next, look no further than Minitab’s <a href="http://www.minitab.com/en-us/Support/Tutorials/Minitab-s-Line-Plots/">line plots</a>. I particularly like line plots when I want to use time as a category, because I prefer the connect line display to separated bars.</p>
<p>I like to illustrate Minitab with data about pleasant subjects: <a href="https://blog.minitab.com/blog/statistics-and-quality-improvement/practicing-data-analysis-get-some-fun-data-into-minitab-v1">poetry</a>, <a href="https://blog.minitab.com/blog/statistics-and-quality-improvement/gummi-bear-measurement-systems-analysis-msa-the-gage-randr-study">candy</a>, and maybe even <a href="https://blog.minitab.com/blog/statistics-and-quality-improvement/process-capability-statistics-cp-and-cpk-working-together">the volume of ethanol in E85 fuel</a>. Data that are about unpleasant subjects also exist, and we can learn from that data too. We’re fortunate to have both the <a href="http://cpost.uchicago.edu/">Chicago Project on Security and Terrorism</a> (CPOST) and the <a href="http://www.start.umd.edu/">National Consortium for the Study of Terrorism and Responses to Terrorism</a> (START) working hard to produce publicly-accessible databases with information about terrorism.</p>
<p>START has been sharing <a href="http://www.start.umd.edu/news/majority-2013-terrorist-attacks-occurred-just-few-countries">analyses of its 2013 data</a> recently. The new data prompted staff from the two institutions to engage in an interesting debate on the Washington Post’s website about whether the Global Terrorism Database (GTD) that Start maintains “<a href="http://www.washingtonpost.com/blogs/monkey-cage/wp/2014/08/15/global-terrorism-data-show-that-the-reach-of-terrorism-is-expanding/">exaggerates a recent increase in terrorist activities</a>.” For today, I’m just going to use the GTD to demonstrate a nice line plot in Minitab, which will give a tiny bit of insight into what that debate is about.</p>
<p>When you <a href="http://www.start.umd.edu/gtd/contact/">download the GTD data</a>, you can open one file that has all of the data except for the year 1993. Incident-level data for 1993 was lost, so that year is not included, although you can get country-level totals for numbers of attacks and casualties from the <a href="http://www.start.umd.edu/gtd/downloads/Codebook.pdf">GTD Codebook</a>. Those who maintain the GTD <a href="http://www.start.umd.edu/gtd/using-gtd/">recommend</a> “users should note that differences in levels of attacks and casualties before and after January 1, 1998, before and after April 1, 2008, and before and after January 1, 2012 are at least partially explained by differences in data collection” (START, downloaded August 18th, 2014).</p>
<p>The GTD is great for detail. One column it contains records a one if an event was a suicide attack and a 0 if an event is not a suicide attack, which makes it easy to sum that column so that you can see the number of suicide attacks per year. Absent from the data is a column that references the changes in methodology, but we can easily add this column in Minitab. Without a methdology column, it’s easy to end up with the <a href="http://www.washingtonpost.com/blogs/monkey-cage/wp/2014/07/21/government-data-exaggerate-the-increase-in-terrorist-attacks/">recently-criticized</a> graph that started the debate between the staff at the two institutions. The graph shows all of the data in the GTD for <a href="http://warontherocks.com/2014/06/infographic-suicide-terrorism-past-and-present/">the number of suicide attacks for each year since 1970</a>. It looks a bit like this:</p>
<p><img alt="The number of suicide attacks increases dramatically in the past two years." src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/87aeffb3b5c3391d19070bc9fbc6f717/all_gtd_data.jpg" style="width: 576px; height: 384px;" /></p>
<p>The message of this graph is that the number of suicide attacks has never been higher. The criticism about the absence of the different methodologies seems fair. So how would we capture the different methodologies in Minitab? With a calculator formula, of course. Try this, if you’re following along:</p>
<ol>
<li>Choose <strong>Calc > Calculator</strong>.</li>
<li>In <strong>Store result in variable</strong>, enter <em>Methodology</em>.</li>
<li>In <strong>Expression</strong>, enter:</li>
</ol>
<p><em>if(iyear < 1998, 1, iyear < 2009, 2, iyear=2009 and imonth < 4, 2, iyear < 2012, 3, 4)</em></p>
<ol>
<li value="4">Click <strong>OK</strong>.</li>
</ol>
<p>Notice that because the GTD uses 3 separate columns to record the dates, I’ve used two conditions to identify the second methodology. With the new column, you can easily divide the data series trends according to the method for counting events. This is where the line plot comes in. The line plot is the easiest way in Minitab to plot a summary statistic with time as a category. You can try it this way:</p>
<ol>
<li>Choose <strong>Graph > Line Plot</strong>.</li>
<li>Select <strong>With Symbols</strong>, <strong>One Y</strong>. Click <strong>OK</strong>.</li>
<li>In <strong>Function</strong>, select <strong>Sum</strong>.</li>
<li>In <strong>Graph variables</strong>, enter <em>suicide</em>.</li>
<li>In <strong>Categorical variable for X-scale grouping</strong>, enter <em>iyear</em>.</li>
<li>In <strong>Categorical variable for legend grouping</strong>, enter <em>Methodology</em>.</li>
</ol>
<p>You’ll get a graph that looks a bit like this, though I already edited some labels.</p>
<p><img alt="The last two years, which are dramatically higher in number, have a new methodology." src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/22959d1fc87b72c4ed837c82ef3f1b7a/number_of_attacks_divided.jpg" style="width: 576px; height: 384px;" /></p>
<p>One interesting feature of this line plot is that there are two data points for 2009. Because we’re calling attention to the different methodologies, it’s important to consider that the first quarter and the last 3 quarters of 2009 use different methodologies. In this display, we can see the mixture of methodologies. The fact that the two highest points are from the newest methodology also lend some credence to the question of whether the numbers from 2012 and 2013 should be directly compared to numbers from earlier years. The amount of the increase due to better data collection is not clear.</p>
<p>Interestingly, a line plot that shows the proportion of suicide attacks out of all terrorist attacks presents a different picture about the increase related to the different methodologies. That’s what you get if you make a line plot of the means instead of the sums.</p>
<p><img alt="By proportion, the increase in suicide attacks in the last two years does not look as dramatic." src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/7cd0512a55b4876dac769278eba9d90c/proportion_of_attacks_divided.jpg" style="width: 576px; height: 384px;" /></p>
<p>Considering which statistics to compute and how to interpret them in conjunction with one another is an important task for people doing data analysis. In the final installment of the series on the Washington Post’s website, GTD staff members note that they do not “rely solely on global aggregate percent change statistics when assessing trends.” The flexibility of the line plot to show different statistics can make the work of considering the data from different perspectives much easier.</p>
<p>We do like to have fun at the Minitab Blog, but we know that there’s serious data in the world too. Whether your application is <a href="http://www.minitab.com/en-us/Case-Studies/Bridgestone/">making tires that keep people safe on the road</a> or <a href="http://www.minitab.com/en-us/Case-Studies/Northern-Sydney-Central-Coast-Health-Service/">helping people recover from wounds</a>, our goal is to give you the best possible tools to make your process improvement efforts successful.</p>
<p> </p>
Statistics in the NewsWed, 20 Aug 2014 15:48:18 +0000http://blog.minitab.com/blog/statistics-and-quality-improvement/use-a-line-plot-to-show-a-summary-statistic-over-timeCody SteeleHow Accurate are Fantasy Football Rankings? Part II
http://blog.minitab.com/blog/the-statistics-game/how-accurate-are-fantasy-football-rankings-part-ii
<p>Previously, we looked at how accurate fantasy football rankings were <a href="http://blog.minitab.com/blog/the-statistics-game/how-accurate-are-fantasy-football-rankings">for quarterbacks and tight ends</a>. We found out that rankings for quarterbacks were quite accurate, with most of the top-ranked quarterbacks in the preseason finishing in the top 5 at the end of the season. Tight end rankings had more variation, with 36% of the top 5 preseason tight ends (over the last 5 years) actually finishing outside the top 10!</p>
<p><img alt="Cheat Sheat" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/14edab962b5c1df587e395a75459439b/2014_fantasy_football_cheat_sheat.jpg" style="float: right; width: 275px; height: 157px;" />Now it’s time to move our attention to the running backs and wide receivers. Just like before, I went back the previous 5 seasons and found ESPN’s preseason rankings. For each season I recorded where the top preseason players finished at the end of the season, and also where the top players at the end of the season were ranked before the season started.</p>
<p>With quarterbacks and tight ends, I only looked at the top 5 players. But since more running backs and receivers are drafted, I’ll look at the top 10 players. Now let's analyze the data using <a href="http://www.minitab.com/products/minitab/">Minitab Statistical Software</a>. </p>
How did the top-ranked preseason RBs and WRs finish the season ranked?
<p>Let’s start by looking at how the top-rated preseason players fared at the end of the season. I took the top 10 ranked preseason RBs and WRs for each season from 2009-2013 and recorded where they ranked to finish the season. </p>
<p><img alt="IVP" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/f2db50de83552939b23088e7cb196a7b/ivp_rbs_wrs_preseason_w640.jpeg" style="width: 640px; height: 427px;" /></p>
<p><img alt="Describe" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/85fe1626bed20238726f327e27bb78af/describe_presason_rb_wr_w640.jpeg" style="width: 640px; height: 109px;" /></p>
<p>At first glance, the individual plots show that the spread for running backs and wide receivers appears to be about the same. But the descriptive statistics tell a different story. The 3rd quartile value (Q3) is the most telling. 75% of preseason top 10 running backs finish in the top 18, while that number rises all the way to 28.75 for wide receivers! In fact, 32% of wide receivers ranked in the top 10 in the preseason finished the season outside the <em>top 20</em>, while the same was only true for 24% of running backs. Running backs do have the biggest outlier (when Ryan Grant had a season ending injury in his first game of 2010 and finished as the 126th ranked running back), but injuries like that are random and impossible to predict. Overall, preseason ranks for running backs are more accurate than for wide receivers.</p>
How were the top-scoring RBs and WRs ranked in the preseason?
<p>Let’s shift our focus to later in the draft. How often can you draft a lower-ranked running back or wide receiver and still have them finish in the top 10?</p>
<p><img alt="IVP" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/6b08619fded42ff700cb050fa03f0033/ivp_wrs_rbs_postseason_w640.jpeg" style="width: 640px; height: 427px;" /></p>
<p><img alt="Describe" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/fe2c58f6-2410-4b6f-b687-d378929b1f9b/Image/c7bc6b2d68572eff76d8a22edfeda563/describe_postseason_rbs_wrs_w640.jpeg" style="width: 640px; height: 108px;" /></p>
<p>Wide receivers have had more players come out of nowhere to be top 10 scorers at the end of the season (Victor Cruz in 2011 and Brandon Lloyd and Stevie Johnson in 2010 were all ranked 87th or worse, yet finished in the top 10). But the descriptive statistics indicate a pretty even distribution otherwise. About half of the top 10 scoring RBs and WRs were <em>not</em> ranked in the top 10 to begin the season. And 25% of players were ranked outside the top 25, yet were still able to finish in the top 10. For both positions, there are frequently lower ranked players that exceed expectations and finish in the top 10.</p>
<p>But if you want one of the <em>best</em> players, say top 3...can you afford to wait or do you need to select a top ranked player early? The following table shows the 3 highest scoring players for each year, with their preseason rank in parentheses.</p>
<p align="center"><strong>Year</strong></p>
<p align="center"><strong>Top Scoring RB</strong></p>
<p align="center"><strong>2nd Highest Scoring RB</strong></p>
<p align="center"><strong>3rd Highest Scoring RB</strong></p>
<p align="center"><strong>Top Scoring WR</strong></p>
<p align="center"><strong>2nd Highest Scoring WR</strong></p>
<p align="center"><strong>3rd Highest Scoring WR</strong></p>
<p align="center">2013</p>
<p align="center">Jamaal Charles (6)</p>
<p align="center">LeSean McCoy (10)</p>
<p align="center">Matt Forte (12)</p>
<p align="center">Calvin Johnson (1)</p>
<p align="center">Josh Gordon (42)</p>
<p align="center">Demaryius Thomas (6)</p>
<p align="center">2012</p>
<p align="center">Adrian Peterson (10)</p>
<p align="center">Arian Foster (1)</p>
<p align="center">Doug Martin (27)</p>
<p align="center">Calvin Johnson (1)</p>
<p align="center">Brandon Marshall (12)</p>
<p align="center">Dez Bryant (15)</p>
<p align="center">2011</p>
<p align="center">LeSean McCoy (6)</p>
<p align="center">Ray Rice (5)</p>
<p align="center">Arian Foster (4)</p>
<p align="center">Calvin Johnson (5)</p>
<p align="center">Wes Welker (22)</p>
<p align="center">Victor Cruz (110)</p>
<p align="center">2010</p>
<p align="center">Arian Foster (23)</p>
<p align="center">Adrian Peterson (2)</p>
<p align="center">Peyton Hillis (63)</p>
<p align="center">Dwayne Bowe (20)</p>
<p align="center">Brandon Lloyd (123)</p>
<p align="center">Greg Jennings (11)</p>
<p align="center">2009</p>
<p align="center">Chris Johnson (7)</p>
<p align="center">Adrian Peterson (1)</p>
<p align="center">Maurice Jones-Drew (3)</p>
<p align="center">Andre Johnson (2)</p>
<p align="center">Randy Moss (4)</p>
<p align="center">Miles Austin (68)</p>
<p>Since 2009, nine different receivers finished the season in the top 3 despite being ranked outside the preseason top 10. <em>That’s 60%</em>! And two of those players were ranked outside the top 100 in the preseason! But amongst all the inconsistency is Calvin Johnson. He’s the only wide receiver that is listed more than once. And he’s finished as the #1 ranked receiver 3 times in a row!</p>
<p>Meanwhile only 4 running backs (27%) were able to finish in the top 3 despite being ranked outside the preseason top 10. Right now in ESPN’s average draft position, the 10th running back is being drafted with the 19th overall pick. So before the 2nd round of the draft is even over, there is a good chance that the top 3 running backs have already been selected. Compare that to wide receivers, where the 10th receiver is being drafted with the 34th overall pick. So in the middle of the 4th round, a top 3 wide receiver (or even two) could still be on the board!</p>
<p>You can definitely wait to draft a wide receiver. The same can’t be said of running backs.</p>
<p>So how should you use this information in your fantasy football draft?</p>
Focus on Running Backs Early
<p>It’s not that the running back you pick is guaranteed to have a great season, but we just saw that, on average, 10 running backs are being selected before the end of the 2nd round! After that, your chances of picking a top running back start to diminish. At least one of your first two picks should be a running back, if not both!</p>
<p>However, keep in mind that selecting RB/RB with your first two picks can be a high-variance strategy. Consider that last year, in a 10-team league you could have taken Jamaal Charles and Matt Forte with the 6th and 15th pick respectively. Those players finished as the #1 and #3 RB, and if you didn’t win your fantasy league you definitely made the playoffs. Of course, you could have just as easily picked C. J. Spiller and Stevan Ridley, who finished 31st and 26th. Unless you got really lucky with your later picks, you could say hello to the consolation bracket.</p>
<p>If you want to play it more conservative, this data analysis pointed out a few other options. We know that quarterbacks are the most consistent position (Aaron Rodgers in 2013 aside), and this year Peyton Manning, Aaron Rodgers, and Drew Brees are the top 3 ranked quarterbacks. Spending an early pick on one of them should give you a consistent scorer who is much less likely to be a bust than an early running back.</p>
<p>Calvin Johnson and Jimmy Graham are also two very consistent players at two very inconsistent positions. Both players have finished in the top 3 at their position for the last 3 years (with Johnson finishing #1 all 3 years). You should feel just fine using your first two picks on one of these players and a running back. But use caution on selecting a different TE or WR with an early pick.</p>
Wait on Your Wide Receivers
<p>Wide receivers have the least accurate preseason rankings. Half of the preseason top 10 finish outside the top 12, and 25% finish <em>outside the top 28!</em> Because of this, there is value to be found later in the draft for wide receivers. Try to identify some wide receivers you like in later rounds, and focus your early picks on other positions.</p>
<p>This example is a bit extreme, but last year in a fantasy draft I spent 4 of my first 5 draft picks on running backs (with Jimmy Graham being the non-running back pick). I was able to do so because I was fine getting Eric Decker (preseason #20) and Antonio Brown (preseason #24) in the 6th and 7th rounds. They finished as the 8th and 6th ranked wide receivers. Obviously I got a little lucky that they were <em>that</em> <em>good</em>, but that’s kind of the point. I like to think of fantasy football picks as lottery tickets. You could hit the jackpot with some players, win a decent amount with others, and have some that are busts. After the first few rounds, wide receivers have a better chance of being winning lottery tickets than other positions.</p>
<p>Now, you don’t have to <em>completely</em> neglect the WR position before the 6th round like in the example above. Just know that you’re putting the odds in your favor by waiting to draft the bulk of your wide receivers.</p>
Who Needs a Backup QB?
<p>One last thing while we’re on the lottery ticket analogy. Let’s say you draft one of the top quarterbacks (Manning, Rodgers, or Brees). Don’t draft a backup quarterback! We already saw quarterbacks have the most accurate preseason rankings. By the time you draft a backup, it’s unlikely that lower-ranked player you choose will rise into a star that you will start each week or be able to use as trade bait. And on your QB’s bye week, you can easily pick somebody up off the waiver wire.</p>
<p>So why waste that pick on somebody with very little upside? Even if you’re picking in the 100s, there is still value to be had! Josh Gordon, Alshon Jeffery, Knowshon Moreno, and Julius Thomas were all ranked outside the preseason top 100 last year, and all turned into great fantasy players! </p>
<p>Want to take this idea to the (slightly crazy) extreme? If you have a late first round pick, try and use your first two picks on Jimmy Graham and one of Manning, Rodgers, or Brees. With your QB and TE position locked up, spend your next 12 picks on nothing but RBs and WRs. Then use your last two picks on a defense and kicker! I know this goes against the advice of focusing on running backs early, but I <em>did </em>say it was a slightly crazy and extreme strategy! If you can get lucky and find a winning lottery ticket with a lower-ranked running back or two (maybe Montee Ball, Ben Tate, Andre Ellington), it <em>could</em> even be a winning strategy. </p>
<p>If you decide to try that draft strategy, let me know how it goes! And whatever strategy you use, good luck with your 2014 fantasy football season!</p>
Fun StatisticsFri, 15 Aug 2014 15:48:00 +0000http://blog.minitab.com/blog/the-statistics-game/how-accurate-are-fantasy-football-rankings-part-iiKevin RudyUsing the G-Chart Control Chart for Rare Events to Predict Borewell Accidents
http://blog.minitab.com/blog/statistics-in-the-field/using-the-g-chart-control-chart-for-rare-events-to-predict-borewell-accidents
<p><em>by Lion "Ari" Ondiappan Arivazhagan, guest blogger</em></p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/ac11ba7bc8daa85327ad905ba5dc5f96/borewell_screencap.jpg" style="margin: 10px 15px; width: 400px; height: 283px; float: right;" />In India, we've seen this story far too many times in recent years:</p>
<p>Timmanna Hatti, a six-year old boy, was trapped in a 160-feet borewell for more than 5 days in Sulikeri village of Bagalkot district in Karnataka after falling into the well. Perhaps the most heartbreaking aspect of the situation was the decision of the Bagalkot district administration to stop the rescue operation because the digging work, if continued further, might lead to collapse of the vertical wall created by the side of the borewell within which Timmanna had struggled for his life.</p>
<p><a href="http://timesofindia.indiatimes.com/city/mysore/8-days-on-boys-body-pulled-out/articleshow/40082590.cms?" target="_blank">Timmanna's body was retrieved from the well 8 days after he fell in</a>. Sadly, this is just one of an alarming number of borewell accidents, especially involving little children, across India in the recent past.</p>
<p>This most recent event prompted me to conduct a preliminary study of borewell accidents across India in the last 8-9 years.</p>
Using Data to Assess Borewell Accidents
<p>My main objective was to find out the possible causes of such accidents and to assess the likelihood of such adverse events based on the data available to date.</p>
<p>This very preliminary study has heightened my awareness of lot of uncomfortable and dismaying factors involved in these deadly incidents, including the pathetic circumstances of many rural children and carelessness on the part of many borewell contractors and farmers.</p>
<p>In this post, I'll lead you through my analysis, which concludes with the use of a G-chart for the possible prediction of the next such adverse event, based on Geometric distribution probabilities.</p>
Collecting Data on Borewell Accidents
<p>My search of newspaper articles and Google provided details about a total of 34 borewell incidents since 2006. The actual number of incidents may be higher, since many incidents go unreported. The table below shows the total number of borewell cases reported each year between 2006 and 2014.</p>
<p><img alt="Borewell Accident Summary Data" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/9e60f3b9c08b0125a38b30d717e1acb8/borewell_g_chart_table_2.jpg" style="width: 189px; height: 289px;" /></p>
Summary Analysis of the Borewell Accident Data
<p>First, I used Minitab to create a histogram of the data I'd collected, shown below.</p>
<p>A quick review of the histogram reveals that out of 34 reported cases, the highest number of accidents occurred in the years 2007 and 2014.</p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/23338847757384f399eb013afe81191f/borewell_histogram_of_accidents.jpg" style="width: 500px; height: 334px;" /></p>
<p>The ages of children trapped in the borewells ranged from 2 years to 9 years. More boys (21) than girls (13) were involved in these incidents.</p>
<p>What hurts most is that, in this modern India, more than 70% of the children did not survive the incident. They died either in the borewell itself or in the hospital after the rescue. Only about 20% of children (7 out of 34) have been rescued successfully. The ultimate status of 10% of the cases reported is not known.</p>
Pie Chart of Borewell Incidents by Indian State
<p>Analysis of a state-wise pie chart, shown below, indicates that Haryana, Gujarat, and Tamil Nadu top the list of the borewell accident states. These three states alone account for more than 50% of the borewell accidents since 2006.</p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/8466766e4788ea2d73b7d8672692be4d/borewell_pie_chart.jpg" style="width: 500px; height: 334px;" /></p>
Pareto Chart for Vital Causes of Borewell Accidents
<p>I used a <a href="http://blog.minitab.com/blog/michelle-paret/fast-food-and-identifying-the-vital-few">Pareto chart</a> to analyze the various causes of these borewell accidents, which revealed the top causes of these tragedies:</p>
<ol>
<li>Children accidentally falling into open borewell pits while playing in the fields.</li>
<li>Abandoned borewell pits not bring properly closed / sealed.<br />
</li>
</ol>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/8012effc2a1aa662d5a276d487e55954/borewell_pareto_chart_w640.jpeg" style="width: 500px; height: 335px;" /></p>
Applying the Geometric Distribution to Rare Adverse Events
<p>There are many different types of control charts, but for rare events, we can use <a href="http://www.minitab.com/products/minitab">Minitab Statistical Software</a> and the G chart. Based on the geometric distribution, the G chart is designed specifically for monitoring rare events. In the geometric distribution, we count the number of opportunities before or until the defect (adverse event) occurs.</p>
<p>The figure below shows the geometric probability distribution of days between such rare events if the probability of the event is 0.01. As you can see, the odds of an event happening 50 or 100 days after the previous one are much higher than the odds of the next event happening 300 or 400 days later.</p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/1587c05dd9a8d77bcda5be87bb2a748b/borewell_distribution_plot.jpg" style="width: 500px; height: 333px;" /></p>
<p>By using the geometric distribution to plot the number of <a href="http://www.minitab.com/support/tutorials/monitoring-rare-events-with-g-charts/">days between rare events</a>, such as borewell accidents, the G chart can reveal patterns or trends that might enable us to prevent such accidents in future. In this case, we count the number of days between reported borewell accidents. One key assumption, when counting the number of days between the events, is that the number of accidents per day was fairly constant.</p>
A G-Chart for Prediction of the Next Borewell Accident
<p>I now used Minitab to create a G-chart for the analysis of the borewell accident data I collected, shown below.</p>
<p>Although the observations fall within the upper and lower control limits (UCL and LCL), the G chart shows a cluster of observations below the center line (the mean) after the 28th observation and before the 34th observation (the latest event). Overall, the chart indicates/detects an unusually high rate adverse events (borewell accidents) over the past decade.</p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/7571156e97822d68efe18af3225902e5/borewell_g_chart_date_between_events.jpg" style="width: 500px; height: 332px; border-width: 1px; border-style: solid;" /></p>
<p>Descriptive statistics based on the Gaussian distribution for my data show 90.8 days as the mean "days between events." But the G-chart, based on geometric distribution, which is more apt for studying the distribution of adverse events, indicates a Mean (CL) of only 67.2 days as "days between events."</p>
Predicting Days Between Borewell Accidents with a Cumulative Probability Distribution
<p>I used Minitab to create a cumulative distribution function for data, using the geometric distribution with probability set at 0.01. This gives us some additional detail about how many incident-free days we're likely to have until the next borewell tragedy strikes: </p>
<p style="margin-left: 40px;"><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/77a56196f91723fca7f7e7222a815573/borewell_output.jpg" style="width: 290px; height: 640px;" /></p>
<p>Based on the above, we can reasonably predict when next borewell accident is most likely to occur in any of the states included in the data, especially in the states of Haryana, Tamil Nadu, Gujarat, Rajasthan, and Karnataka.</p>
<p>The probabilities are shown below, with the assumption that the sample size and the Gage R&R / Measurement errors of event data reported and collected are adequate and within the allowable limits.</p>
<p><strong>Probability of next borewell event happening in...</strong></p>
<ul>
<li>31 days or less: 0.275020 = 27.5% appx.<br />
</li>
<li>104 days or less = 0.651907 = 65% appx.<br />
</li>
<li>181 days or less = 0.839452 = 84% appx.<br />
</li>
<li>488 days or less = 0.992661 = 99% appx.</li>
</ul>
<p> </p>
<p>My purpose in preparing this study would be fulfilled if enough people take preventive actions before the possibility of occurrence next such an adverse event within next 6 months (p > 80%). NGOs, government officials, and individuals all need to take preventive actions—like sealing all open borewells across India, especially in the above 5 states—to prevent many more innocent children from dying while playing.</p>
<p> </p>
<p><strong>About the Guest Blogger:</strong></p>
<p><em>Ondiappan "Ari" Arivazhagan is an honors graduate in civil / structural engineering from the University of Madras. He is a certified PMP, PMI-SP, PMI-RMP from the Project Management Institute. He is also a Master Black Belt in Lean Six Sigma and has done Business Analytics from IIM, Bangalore. He has 30 years of professional global project management experience in various countries and has almost 14 years of teaching / training experience in project management and Lean Six Sigma. He is the Founder-CEO of International Institute of Project Management (IIPM), Chennai, and can be reached at <a href="mailto:askari@iipmchennai.com?subject=Minitab%20Blog%20Reader" target="_blank">askari@iipmchennai.com</a>.</em></p>
<p><em>An earlier version of this article was published on LinkedIn. </em></p>
Data AnalysisStatistics in the NewsTue, 19 Aug 2014 12:00:00 +0000http://blog.minitab.com/blog/statistics-in-the-field/using-the-g-chart-control-chart-for-rare-events-to-predict-borewell-accidentsGuest BloggerHow Deadly Is this Ebola Outbreak?
http://blog.minitab.com/blog/the-statistical-mentor/how-deadly-is-this-ebola-outbreak
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/0b8e48b97aed2afbce026c9be82263d0/ebola_sign.png" style="border-width: 1px; border-style: solid; margin: 10px 15px; width: 350px; height: 212px; float: right;" />The current Ebola outbreak in Guinea, Liberia, and Sierra Leone is making headlines around the world, and rightfully so: it's a frightening disease, and last week the World Health Organization reported its spread is outpacing their response. Nearly 900 of the more than 1,600 people infected during this outbreak have died, including some leading medical professionals trying to stanch the outbreak's spread. And yesterday, one of the American doctors who contracted the disease arrived back in the U.S. for treatment.</p>
<p>Many sources state that Ebola virus outbreaks have a case fatality rate of up to 90%, but a look at the data about ebola shows the death rate significantly varies based on the ebola species, case location, and year.</p>
Plotting Ebola Outbreaks Since 1976
<p>Infection with the ebola virus causes a hemorrhagic fever. Symptoms most commonly appear 8 to 10 days after exposure, and include fever, headache, joint and muscle aches, and weakness. These symptoms quickly escalate to diarrhea, vomiting, stomach pain, lack of appetite, abnormal internal and external bleeding, and organ failure.</p>
<p>The disease first appeared in Africa in 1976, and since then sporadic outbreaks have occurred as indicated in graph 1, which depicts data from the World Health Organization web site. (You can download my Minitab project file, which includes all of the data used in this blog post, <a href="//cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/File/64eb13a3deb8e4b026e24bdefb846038/ebola2.MPJ">here</a>.)</p>
<p><img alt="ebola virus cases per year" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/b4f9d7d91f5bde98f7a162ec21b74457/ebola_cases_by_year.png" style="width: 500px; height: 333px;" /></p>
<p>According to the Centers for Disease Control, of the five known species of the Ebola virus, only three have resulted in large outbreaks. The current outbreak is associated with the species Zaire ebolavirus (EBOV). The two other species that have been associated with large outbreaks are Bundibugyo ebolavirus (BDBV) and Sudan ebolavirus (SUDV).</p>
<p>Graphing the outbreak death rate over time can help us understand the impact of species, location, and year. But plotting raw outbreak death rates, as I did above, is not ideal due to the difference in case numbers (sample size) across outbreaks. Let's try a different approach.</p>
Assessing Ebola Outbreaks with Binary Logistic Regression
<p>Fitting a model which accounts for the different sample sizes and <em>then </em>plotting the model predictions over time is more appropriate than simply graphing the raw fatality numbers.</p>
<p>I put the data into <a href="http://www.minitab.com/products/minitab">Minitab Statistical Software</a> and used binary logistic regression to fit a model with three predictors: year, ebola virus species, and location of outbreak. I could not fit interactions among these factors because of the limited amount of data available.</p>
<p>All three predictors had <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-correctly-interpret-p-values">p-values</a> below 0.001, indicating strong statistical significance:</p>
<p style="margin-left: 40px;"><img alt="ebola virus binary logistic regression analysis" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/39820ae9941558d84f39e4f897c588d7/ebola_binary_logistic_regression.gif" style="width: 410px; height: 128px;" /></p>
<p>I also created a scatterplot to illustrate the model's predicted death rates over time:</p>
<p><img alt="ebola scatterplot of predicted death rate vs year" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/a2738293bfffe3d097765a50eef2a602/ebola_predicted_death_rate_scatterplot.png" style="width: 500px; height: 333px;" /></p>
<p>We can draw the following conclusions from the binary logistic regression analysis and the graph above:</p>
<ol>
<li>The death rate from ebola decreases over time.</li>
<li>The death rate is significantly different across species. After accounting for the effects of location and time, species SUDV and BDBV have lower death rates than EBOV. The current outbreak is EBOV.</li>
<li>The death rate is significantly different across locations. After accounting for the effects of species and time, Gabon, Sudan, and the current outbreak location (Guinea, Sierra Leone, and Liberia), appear to have a lower death rate.</li>
</ol>
Assessing the Current EBOV Outbreak with Binary Logistic Regression
<p>The current outbreak has a low death rate relative to previous EBOV outbreaks. Since the current location has not appeared before, we can not tell whether this decreased death rate is due to improvements in treatment over time, the quality of care available in the location of the outbreak, or some other factor, such as better immunity to the virus in the region.</p>
<p>The graph below shows the EBOV death rate predictions from a binary logistic regression model fit to the EBOV data only.</p>
<p><img alt="ebola scatterplot of predicted death rate vs year - EBOV only" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/477783387835e4af4c0c85742097c41c/ebola_binary_logistic_regression_scatterplot.png" style="width: 500px; height: 333px;" /></p>
<p>The current outbreak is severe in terms of number of cases, but the death rate is lower than expected based on past EBOV outbreaks in different locations.</p>
Seeing the Outbreak Day by Day
<p>One final graph shows the number of new cases per day by location for the current outbreak.</p>
<p><img alt="ebola scatterplot of new cases per day vs. date" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/fcb8517f5fef67229aa4ff3250a2b994/ebola_scatterplot_of_new_cases_by_day_vs_date.png" style="width: 500px; height: 332px;" /></p>
<p>Cases per day has fluctuated widely in Guinea, while Liberia and Sierra Leone have both seen an extremely rapid rise in cases per day since mid-July.</p>
<p>This is one graph that will change greatly from day-to-day as the outbreak runs its course. Let's hope the data quickly return to 0 new cases per day for all locations.</p>
<p> </p>
Statistics in the NewsWed, 06 Aug 2014 12:00:00 +0000http://blog.minitab.com/blog/the-statistical-mentor/how-deadly-is-this-ebola-outbreakJim ColtonCuckoo for Quality: A Birdseye View of a Classic ANOVA Example
http://blog.minitab.com/blog/statistics-and-quality-data-analysis/cuckoo-for-quality3a-a-birdseye-view-of-a-classic-anova-example
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/05e98b196c6ea1b2f89c0c6e0c00c6ce/cuckoo_eggs.jpg" style="float: right; width: 282px; height: 268px; border-width: 1px; border-style: solid; margin: 10px 15px;" />If you teach statistics or quality statistics, you’re probably already familiar with the cuckoo egg data set.</p>
<p>The common cuckoo has decided that raising baby chicks is a stressful, thankless job. It has better things to do than fill the screeching, gaping maws of cuckoo chicks, day in and day out.</p>
<p>So the mother cuckoo lays her eggs in the nests of other bird species. If the cuckoo egg is similar enough to the eggs of the host bird, in size and color pattern, the host bird may be tricked into incubating the egg and raising the hatchling. (The cuckoo can then fly off to the French Riviera, or punch in to work at a nearby cuckoo clock, or do whatever it is that cuckoos with excess free time do.)</p>
<p>The cuckoo egg data set contains measurements of the lengths of cuckoo eggs that were collected from the nests of 5 different bird species. Using Analysis of Variance (ANOVA), students look for statistical evidence that the mean length of the cuckoo eggs differs depending on the host species. Presumably, that supports the idea that the cuckoo may adapt the length of its eggs to better match those of the host.</p>
Old Data Never Dies..It Just Develops a Rich Patina
<p>Sample data sets have a way of sticking around for awhile. The cuckoo egg data predate the production of the Model T Ford! (Apparently no one has measured a cuckoo egg in over 100 years. Either that or cuckoo researchers are jealously guarding their cuckoo egg data in the hopes of becoming eternally famous in the annals of cuckoology.)</p>
<p>Originally, the data was published in a 1902 article in <em>Biometrika</em> by OM Latter. LHC Tippet, an early pioneer in statistical quality control, included the data set in his classic text<em>, the Methods of Statistics, </em>a few decades later.</p>
<p>That's somewhat fitting. Because if you think about it, the cuckoo bird really faces the ultimate quality assurance problem. If its egg is recognized as being different (“defective”) by the host bird, it may be destroyed before it’s hatched. And the end result could be no more cuckoos.</p>
Analyze the Cuckoo Egg Data in Statistical Software
<p>Displaying boxplots and performing ANOVA is the classic 1-2 punch that’s often used to statistically compare groups of data. And that’s how this vintage data set is typically evaluated.</p>
<p>To try this in Minitab Statistical Software, click to <a href="//cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/File/c38943bb6bcf5aca859630a0fa00ce64/cuckoo_egg_data.MPJ" target="_blank">download the data set</a>. (You'll need Release 17, which you can <a href="http://it.minitab.com/en-us/products/minitab/free-trial.aspx" target="_blank">download free</a> for a 30-day trial period.) Then follow the instructions below.</p>
Display Side-by-Side Boxplots
<ol>
<li>In Minitab, choose <strong>Graph > Boxplots</strong>. Under <strong>One Y</strong>, choose <strong>With Groups</strong>, then click <strong>OK</strong>.</li>
<li>Fill out the dialog box as shown below, then click <strong>OK</strong>.<br />
<br />
<img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/2897d47b4c129e39bd343b4e9cb90148/boxplot_dialog.jpg" style="width: 522px; height: 332px;" /></li>
</ol>
<p>Minitab displays the boxplots</p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/bd51ea3b4d6dd127618fdb9744c72ec6/boxplot_of_egg_length.jpg" style="width: 576px; height: 384px;" /></p>
<p>The boxplots suggest that the mean length of the cuckoo eggs may differ slightly among the host species. But are any of the differences statistically significant? The next step is to perform ANOVA to find out.</p>
Perform One-Way ANOVA
<ol>
<li>In Minitab, choose <strong>Stat > ANOVA > One-Way</strong>.</li>
<li>Complete the dialog box as shown below.<br />
<img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/776372a207b56f361f430f0597d8eab2/anova_dialog.jpg" style="width: 498px; height: 387px;" /></li>
<li>Click <strong>Comparisons</strong>, check <strong>Tukeys</strong>, then click <strong>OK</strong> in each dialog box.</li>
</ol>
<p>The ANOVA output includes the following results</p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ba6a552e-3bc0-4eed-9c9a-eae3ade49498/Image/f09bb832616c3b1a0ab54124af7ae1e7/interval_plot_of_length_of_cuckoo_egg_vs_nest.jpg" style="width: 576px; height: 384px;" /></p>
<p style="margin-left: 40px;">Analysis of Variance</p>
<p style="margin-left: 40px;">Source DF Adj SS Adj MS F-Value <strong> <span style="color:#FF0000;">P-Value</span></strong><br />
Nest 5 42.94 8.5879 10.39 <strong><span style="color:#FF0000;"> 0.000</span></strong><br />
Error 114 94.25 0.8267<br />
Total 119 137.19</p>
<p style="margin-left: 40px;">Tukey Pairwise Comparisons</p>
<p style="margin-left: 40px;">Grouping Information Using the Tukey Method and 95% Confidence</p>
<p style="margin-left: 40px;">Nest N Mean Grouping<br />
HDGE SPRW 14 23.121 A<br />
TREE PIPIT 15 23.090 A<br />
PIED WTAIL 15 22.903 A B<br />
ROBIN 16 22.575 A B<br />
<strong><span style="color:#FF8C00;">MDW PIPIT 45 22.299 B</span></strong><br />
<strong><span style="color:#0000FF;">WREN 15 21.130 C</span></strong></p>
<p style="margin-left: 40px;">Means that do not share a letter are significantly different.</p>
<p>----------------------------------------------------</p>
<p>The interval plot displays the mean and 95% confidence interval for each group. In the ANOVA table, the p-value is less than the alpha level 0.05. So you reject the null hypothesis that the means do not differ.The egg lengths are statistically different for at least one group.</p>
<p>Based on Tukey's multiple comparisons procedure, two groups significantly differ. The mean length of the cuckoo eggs in the wren nest are significantly smaller than the eggs in all the other nests. The mean length of the eggs in the meadow pipit nest are significantly smaller than the eggs in the tree sparrow or tree pipit nests.</p>
<p>With that said, the case of the morphing cuckoo eggs is frequently considered closed. The ANOVA results are said to support the theory that the cuckoo adapts egg length to the host nest.</p>
<p>Bottom line: If you're a mother cuckoo, stay away from ostrich nests.</p>
Post-Analysis Pondering
<p>As alluring and sexy as a <a href="http://blog.minitab.com/blog/michelle-paret/when-a-p-value-might-be-misleading">p-value</a> is to the data-driven mind, it has its dangers. If you're not careful, it can act like a giant door that slams shut on your mind. Its air of finality may prevent you from looking more closely—or more practically—at your results.</p>
<p>Case in point: Most of us know that a wren is smaller than a robin. But what about the other bird species?</p>
<p>Personally, I wouldn’t recognize a pied wigtail or a tree pipit if it dropped a load of statistically significant doo-doo on my shiny bald head.</p>
<p>How big is each bird species—or more to the point, how long, on average, are its eggs? If two species have about the same size egg, then the <em>lack of a significant difference</em> in the ANOVA results would actually <em>support</em> the theory that the cuckoo may adapt its egg length to match the host. Without any indication of whether the lengths of the eggs of these bird species differ significantly to begin with and, if so, <em>how </em>they differ, it's really difficult to determine how ANOVA results will support or contradict the idea of egg-length adaptation by the cuckoo.</p>
<p>Apart from that, there's the issue of practical consequence. Upon closer examination of the confidence intervals, it appears that the actual mean difference itself could be fractions of a millimeter. Does that size difference really matter if you're a host bird? Would it make a difference between the eggs being accepted or rejected?</p>
<p>Finally, there's the proverbial elephant in the room whenever you perform a statistical analysis. The one that trumpets noisily in the back of an asymptotically conscientious mind: "Assssssumptions!! Asssssumptions!"</p>
<p>How well <em>do</em> the cuckoo egg data satisfy the critical assumptions for ANOVA?</p>
<p>Stay tuned for the next post.</p>
Quality ImprovementStatisticsStatistics HelpMon, 04 Aug 2014 13:35:00 +0000http://blog.minitab.com/blog/statistics-and-quality-data-analysis/cuckoo-for-quality3a-a-birdseye-view-of-a-classic-anova-examplePatrick Runkel“You’ve got a friend” in Minitab Support
http://blog.minitab.com/blog/real-world-quality-improvement/youve-got-a-friend-in-minitab-support
<p>I caught the end of Toy Story over the weekend, which is definitely one of my all-time favorite children’s movies. Now—unfortunately or fortunately—I can’t get Randy Newman's theme song,“You’ve Got a Friend in Me,” out of my head!</p>
<p>It's also got me thinking about the nature of friendship, and how "best friends forever" are supposed to always be there when you need them. And, not to get too maudlin about it, but just like Woody and Buzz eventually realize their friendship, all of us hope the professionals who use our software also realize that “you’ve got a friend” in Minitab.</p>
<p></p>
<p>Now what do I mean by all this “BFF” business? I’m talking about our <a href="http://www.minitab.com/support/" target="_blank">free technical support</a> services (online and by telephone), as well as the plethora of free documentation that’s available online for each of our products. <em>We’re here for you!</em></p>
<p>Be sure to visit the <a href="http://www.minitab.com/support/" target="_blank">Support</a> section of our website to browse the individual support sections that are available for each of our product offerings. From there, you can access the latest software downloads, documentation, and tutorials, and find the answers to all of your questions about software use, statistics, and quality improvement. In fact, there's a lot of great information there even if you're not using our software yet!</p>
<p>And for our latest and greatest release, Minitab 17 Statistical Software, we’ve expanded our online support offerings. Be sure to check out the following:</p>
<strong>1. <u><a href="http://support.minitab.com/minitab/17/getting-started/" target="_blank">Getting Started with Minitab 17</a></u></strong>
<p><em>Getting Started</em> is our user guide that introduces you to some of the most commonly used features and tasks in Minitab—including how to explore your data with graphs, conduct statistical analyses and interpret the results, assess quality using control charts and capability analysis, and design an experiment.</p>
<p>The guide also includes shortcuts and tips for customizing Minitab.</p>
<strong>2. <u><a href="http://support.minitab.com/minitab/17/topic-library/" target="_blank">Topic Library</a></u></strong>
<p>The Minitab 17 Topic Library is a compilation of content from Help, StatGuide™, and Glossary—all of which are also available within the software itself. The library is arranged by statistical area so that you can easily find relevant topics, such as Basic Statistics and Graphs, Quality Tools, and Modeling Statistics (ANOVA, regression, DOE, etc.).</p>
<strong>3. <u><a href="http://support.minitab.com/datasets/" target="_blank">Data Sets</a></u></strong>
<p>We took the best data sets from Minitab 17 Help and made them accessible online. We also made them even more realistic, so you can practice performing analyses and interpretation, explore alternate data layouts, and investigate statistical tools commonly used in your industry.</p>
<strong>4. <u><a href="http://support.minitab.com/minitab/17/macro-library/" target="_blank">Macros Library</a></u></strong>
<p>Our Macros Library includes many macros that allow you to <a href="http://blog.minitab.com/blog/customized-data-analysis/creating-a-custom-report-using-minitab-part-1">automate, customize and repeat an analysis</a> of your choice. You can download the .mac file for each macro we offer.</p>
<strong>5. <u><a href="http://support.minitab.com/minitab/17/technical-papers/" target="_blank">Technical Papers</a></u></strong>
<p>Access technical papers that describe the research conducted to develop the methods and data checks used in the <a href="http://www.minitab.com/products/minitab/assistant/">Assistant</a>, as well as the methodology and supporting researching underlying two new analyses in Minitab 17.</p>
<strong>6. <a href="http://www.minitab.com/support/licensing/" target="_blank">Installation</a> and <a href="http://www.minitab.com/support/licensing/" target="_blank">Licensing</a> FAQs</strong>
<p>Browse our troubleshooting solutions to the most common error messages, installation issues, and activation/licensing topics.</p>
The Personal Touch
<p>If you've checked the website and still need help, know that we’re here whenever you need us (a real, live person I might add!). <span style="line-height: 1.6;">Access unlimited phone or online support from experts in statistics, quality improvement, and computer systems by visiting </span><a href="http://www.minitab.com/support/" style="line-height: 1.6;" target="_blank">http://www.minitab.com/support/</a><span style="line-height: 1.6;">.</span></p>
<p>You really do have a friend in Minitab Support!</p>
Statistics HelpFri, 15 Aug 2014 12:54:37 +0000http://blog.minitab.com/blog/real-world-quality-improvement/youve-got-a-friend-in-minitab-supportCarly BarryTwo-Way ANOVA in Minitab 17
http://blog.minitab.com/blog/marilyn-wheatleys-blog/two-way-anova-in-minitab-17
<p><span style="line-height: 1.6;">After upgrading to the latest and greatest version of our statistical software, Minitab 17, some users have contacted tech support to ask "Wait a minute, where is that Two-Way ANOVA option in Minitab 17?" </span></p>
<p><span style="line-height: 1.6;">The answer is that it’s not there. That’s right! The 2-Way ANOVA option that was available in Minitab 16 and prior versions was removed from Minitab 17.</span> Why would this feature be removed from the new version? Shouldn’t the new version have more features instead of less? </p>
<p>Two-Way ANOVA was removed from Minitab 17 because you can get the same output by using the <a href="http://support.minitab.com/en-us/minitab/17/topic-library/modeling-statistics/anova/basics/what-is-a-general-linear-model/">General Linear Model</a> option in the <a href="http://blog.minitab.com/blog/understanding-statistics/you-dont-need-a-weatherman-using-anova-graphs-and-regression-to-fact-check-the-forecasts">ANOVA </a>menu. Removing the separate 2 way ANOVA menu choice reduces redundancy and creates a more similar workflow for the linear models options.</p>
<p>Let's look at an example that shows how to replicate the Two-Way ANOVA output from Minitab 16 using Minitab 17.</p>
<p>The data shown below is a sample dataset used for 2-Way ANOVA in Minitab 16: <em>You as a biologist are studying how zooplankton live in two lakes. You set up twelve tanks in your laboratory, six each with water from one of the two lakes. You add one of three nutrient supplements to each tank and after 30 days you count the zooplankton in a unit volume of water. You use two-way ANOVA to test whether there is significant evidence of interactions and main effects.</em></p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/Image/391474f6a45afe676d1d403b477a9547/1.png" style="width: 242px; height: 309px; border-width: 1px; border-style: solid;" /></p>
<p>The Two-Way ANOVA option in Minitab 16 yields the following output:</p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/Image/f9a2412b309c5e35fb7e446ef68c74bd/2.png" style="border-width: 1px; border-style: solid; width: 399px; height: 164px;" /></p>
<p>To replicate the Two-Way ANOVA output from Minitab 16 using Minitab 17, use <strong>Stat</strong> > <strong>ANOVA</strong> > <strong>General Linear Model</strong> > <strong>Fit General Linear Model</strong>:</p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/Image/9db638c9ed6424030c6fb1e1c06f115c/3.png" style="border-width: 1px; border-style: solid; width: 471px; height: 274px;" /></p>
<p><span style="line-height: 1.6;">Using GLM, we can enter our response column (Zooplankton) in the </span><strong style="line-height: 1.6;">Responses</strong><span style="line-height: 1.6;"> field and our two factors in the </span><strong style="line-height: 1.6;">Factors</strong><span style="line-height: 1.6;"> field without the need to specify one factor as the row and one as the column factor:</span></p>
<p> </p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/Image/cd2fd2232e010dc13cc154e948bb8cfa/4.png" style="border-width: 1px; border-style: solid; width: 580px; height: 439px;" /></p>
<p> </p>
<p>Minitab 16's Two-Way ANOVA option also shows the two-factor interaction, so in Minitab 17 we need to manually add the interaction by clicking the <strong>Model</strong> button in the GLM dialog box. There we can highlight the factors listed on the left side (step 1 below); when we do that, the <strong>Add</strong> button on the right will become available. To add the interaction, click <strong>Add</strong> (step 2) and the interaction will be shown at the bottom under <strong>Terms in the model</strong> (step 3).</p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/Image/596dd69d168cfcac30e0aa77f683ed4f/capture.PNG" style="border-width: 1px; border-style: solid; width: 522px; height: 544px;" /></p>
<p>Click <strong>OK </strong>in the Model dialog box to return to the main GLM dialog.</p>
<p>By default, Minitab 17 will provide more detailed output than Two-Way ANOVA in Minitab 16. To make the results match, we can remove the additional output by clicking the <strong>Results</strong> button within the GLM dialog box. Unchecking the additional options so that only <strong>Analysis of variance</strong> and <strong>Model summary</strong> are selected (as shown below) will make the output match Minitab 16’s Two-Way ANOVA results.</p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/Image/06cbefa1d2c4c21fc8323308a74e7467/5.png" style="border-width: 1px; border-style: solid; width: 487px; height: 422px;" /></p>
<p>The results from General Linear Model in Minitab 17 now match the output from Two-Way ANOVA in Minitab 16:</p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/f6d0da32-ba1d-41d4-ace1-af34dcb51351/Image/49008c4124ecd0fabb99f70a3afb663e/6.png" style="border-width: 1px; border-style: solid; width: 472px; height: 258px;" /></p>
<p>If you're wondering how to do something with Minitab, our <a href="http://www.minitab.com/support/">technical support team</a> is always ready to help you. Our technical support representatives are knowledgeable in statistics, quality improvement, and computer systems. Best of all, our assistance is free.<br />
</p>
<p> </p>
Data AnalysisStatisticsStatsWed, 16 Jul 2014 12:00:00 +0000http://blog.minitab.com/blog/marilyn-wheatleys-blog/two-way-anova-in-minitab-17Marilyn WheatleyGuest Post: Did Ma's Diabetes Get Cured by Back Surgery?
http://blog.minitab.com/blog/voice-of-the-customer/guest-post3a-did-mas-diabetes-get-cured-by-back-surgery
<p><strong><em>The Minitab Fan section of the Minitab blog is your chance to share with our readers! We always love to hear how you are using Minitab products for quality improvement projects, Lean Six Sigma initiatives, research and data analysis, and more. If our software has helped you, please <a href="http://blog.minitab.com/blog/landing-pages/share-your-story-about-minitab/n"> share your Minitab story</a>, too!</em></strong></p>
<p>Once my Mom was diagnosed with Diabetes Type II, I began to track her blood sugar readings in Minitab Statistical Software.</p>
<p>I did it three times a day before meals...over weeks, then months, then years. At each doctor's appointment I would take in her 'book' of readings, and I would take my charts, too.</p>
<p>The <a href="http://blog.minitab.com/blog/real-world-quality-improvement/three-ways-individual-value-plots-can-help-you-analyze-data">individual value plot</a> chart was very telling. Her blood sugars increased with each meal during the day. The doctor changed her insulin based on the undeniable visual trends.</p>
<p><img src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/84d328f1-1b81-41d1-aad4-fc00026edd38/Image/570a16b79c624b3e794fefea622d6570_w480.jpeg" /></p>
<p>Then the biggest surprise came. In June 2013, over a year after blood sugar tracking began, she decided to get back surgery to alleviate leg pain. The day after her back surgery, still in the hospital, her blood sugar dropped approx. 75 points. After a few months of this obvious transition and new trend, the doctor removed her from her diabetes medicine and insulin.</p>
<p><img src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/84d328f1-1b81-41d1-aad4-fc00026edd38/Image/40523346df4516a0c168896c5255e06a_w480.jpeg" /></p>
<p>Minitab charts help guide my Mom's health, even in her 80's. She is 86 today, and is still off insulin and diabetes medicine. And many Minitab <a href="http://www.minitab.com/products/minitab">Statistical Software</a> charts are a part of my Mom's health history, kept in her doctor's office records.<br />
<br />
Paul Kelly<br />
Black Belt<br />
Air Products<br />
Trexlertown, Pa.</p>
<p> </p>
Data AnalysisHealth Care Quality ImprovementSix SigmaStatsMon, 14 Jul 2014 12:00:00 +0000http://blog.minitab.com/blog/voice-of-the-customer/guest-post3a-did-mas-diabetes-get-cured-by-back-surgeryMinitab FanThe 6 coolest tools on Minitab's toolbars
http://blog.minitab.com/blog/statistics-and-quality-improvement/the-6-coolest-th-on-minitabs-toolbars
<p>Toolbars are there to make your life easier, but if you don’t take the time to hover over each button and wait for a description, it’s pretty easy to never know that there’s a faster way to do something.</p>
<p>The toolbars in Minitab Statistical Software include some pretty nifty shortcuts. Here are my favorite 6:</p>
<ol>
<li>
<p><img alt="StatGuide Button" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/8944fa16dcbba19e329e595fb9388298/statguide_button.png" style="width: 23px; height: 27px;" /> StatGuide</p>
</li>
</ol>
<p>As soon as you have results in Minitab, the <a href="http://blog.minitab.com/blog/understanding-statistics/hidden-helpers-in-minitab-statistical-software">StatGuide</a> button becomes active on your toolbar. Click the button, and the StatGuide opens directly to guidance for the analysis that you’re looking at. Minitab saves you the time you would have spent looking for information about your results so that you have more time to get things done.</p>
<ol>
<li value="2">
<p><img alt="Edit Last Dialog" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/b742205b500fb457c0c67551860916fd/edit_last_dialog_button.png" style="width: 23px; height: 21px;" /> Edit Last Dialog</p>
</li>
</ol>
<p>To repeat an analysis, either because you want to run it on a different column or because you want to change a setting, all you have to do is click a button. Even better, most of Minitab’s analyses will remember what you entered the last time the dialog box was open. Make the small adjustments you need to make, and you’re ready to perform your new analysis.</p>
<ol>
<li value="3">
<p><img alt="Show Session Folder" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/92dec208bff467280065fc63c36f426c/project_manager_button.png" style="width: 21px; height: 23px;" /> Show Session Folder</p>
</li>
</ol>
<p>When you’ve run several analyses in Minitab, it can be nice to have a quick way to find the results of a particular analysis. Minitab’s project manager is the best way to find the results of an analysis quickly, and that’s why it’s so nice that it’s accessible from the toolbar. Click the button, and you get a list of all of the analyses and graphs in your Minitab project.</p>
<ol>
<li value="4">
<p><img alt="Current Data Window" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/872fea599c8786f805e4990cbab7a8eb/cycle_worksheets_button.png" style="width: 21px; height: 20px;" /> Current Data Window</p>
</li>
</ol>
<p>If you have a lot of worksheets open, you might want to be able to see both your worksheet and your results at the same time. When you click the button, the current worksheet comes to the front, without maximizing to hide your results. Click it again, and the next worksheet comes to the front. Click it again, and the next worksheet comes to the front. You can quickly cycle through the worksheets to find the one that you want, while still being able to see the results from your analysis.</p>
<ol>
<li value="5">
<p><img alt="Assign Formula To Column" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/1067f139e7036a2f3393897fbc76f1b3/add_formula_to_column_button.png" style="width: 23px; height: 22px;" /> Assign Formula To Column</p>
</li>
</ol>
<p>The <a href="http://blog.minitab.com/blog/real-world-quality-improvement/two-tip-tuesday-getting-the-most-out-of-your-text-data-in-minitab">Minitab calculator</a>’s a nice tool, but with the toolbar, you can use it even faster. The best part of all is that when you use the toolbar, you specify which column will have the formula without having to tell the calculator. Plus, when you’re in a complicated series of formulas, the column where the formula goes is not in the list of columns to select, so you can never get a recursive formula error.</p>
<p><img alt="No field to indicate where to store the formula." src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/f340a34e53923ed30389555c88e1aefb/without_column_to_store_in.png" style="float: left; width: 261px; height: 200px;" /><img alt="With a field where you can select where to store the formula." src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/144c0597ac4e208158470d7bc6f100a8/with_column_to_store_in.png" style="width: 282px; height: 200px;" /></p>
<ol>
<li value="6">
<p><img alt="Show Info" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/fcc5c9baee2de507effaab606f07b1ed/info_button.png" style="width: 20px; height: 22px;" /> Show Info</p>
</li>
</ol>
<p>Especially after you open or copy data from Excel, it can be helpful to get a quick snapshot of the columns in your worksheet. When you click the Info button, the Project Manager shows the column names, the lengths, the number of missing values, and the format of the columns. You can investigate why a column that contains numeric data is formatted as text. If any of the columns are the wrong length because of missing values at the end, you know right where to look. You won't have to spend your time scrolling around the worksheet looking for things that are amiss, so that you can get to your analysis faster.</p>
<p><img alt="The Project Manager shows the Id, length, number of missing values, and type for each column." src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/1ed15fa7eafa8e0d393d68f156380ae2/info_window.png" style="width: 410px; height: 172px;" /></p>
Faster than you used to be
<p>The only thing better than doing fearless data analysis is doing fearless data analysis even faster. Minitab’s toolbars come ready with shortcuts that help you analyze your data faster, from generating your results to interpreting them. Of course, the toolbars that everyone uses can’t be perfect for everyone. If you’re feeling emboldened, check out how to <a href="http://support.minitab.com/en-us/minitab/17/topic-library/minitab-environment/interface/customize-the-minitab-interface/customize-menus-toolbars-and-shortcut-keys/">customize the existing toolbars</a> or even to <a href="http://support.minitab.com/en-us/minitab/17/getting-started/customizing-minitab/">create your own toolbars</a>!</p>
Data AnalysisStatisticsStatistics HelpWed, 11 Jun 2014 16:17:00 +0000http://blog.minitab.com/blog/statistics-and-quality-improvement/the-6-coolest-th-on-minitabs-toolbarsCody SteeleGuest Post: Analysis of Road Accidents in Hyderabad
http://blog.minitab.com/blog/voice-of-the-customer/guest-post%3a-analysis-of-road-accidents-in-hyderabad
<p><strong><em>The Minitab Fan section of the Minitab blog is your chance to share with our readers! We always love to hear how you are using Minitab products for quality improvement projects, Lean Six Sigma initiatives, research and data analysis, and more. If our software has helped you, please <a href="http://blog.minitab.com/blog/landing-pages/share-your-story-about-minitab/n"> share your Minitab story</a>, too!</em></strong></p>
An Analysis of Road Accidents in Hyderabad, India
<p>The data taken for this study is obtained from the official website of Hyderabad Traffic Police (<a href="http://www.htp.gov.in/Default.htm" rel="nofollow">http://www.htp.gov.in/Default.htm</a>). Also note that the data for 2014 covers only the period until April.</p>
<p>Reviewing the time series plot I obtained using <a href="http://www.minitab.com/products/minitab">Minitab 17</a> indicates that the number of accidents steadily decreased every year from 2011-2013, but there seems to be a rise from January-April 2014.</p>
<p><img src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/84d328f1-1b81-41d1-aad4-fc00026edd38/Image/ad29eb472012769f8ebb3a1c61a7623a_w480.jpeg" /></p>
<img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/b239c867c06151e8bfc2d65f22614195/crash.jpg" style="border-width: 1px; border-style: solid; margin: 10px 15px; float: right; width: 250px; height: 250px;" />
<p>As I was brought up in city of Hyderabad, my experience has been that the following factors influence road accidents here:</p>
<ul>
<li>Increasing vehicle population leading to heavy traffic during peak hours</li>
<li>Drunken driving</li>
<li>Speed limit violation</li>
<li>Lack of properly laid roads</li>
<li>Violation of traffic and safety rules</li>
<li>Roads getting water logged during rainy season</li>
<li>Using cell phone while driving</li>
<li>Not wearing seat belts</li>
<li>Unwanted hurrying/negligence of the driver</li>
<li>Inattention while backing the vehicle</li>
<li>Not getting clear picture of surroundings—lack of signage</li>
<li>Using high beam light</li>
<li>Driving without a helmet</li>
<li>Speed driving on the flyovers and the Outer Ring Road</li>
<li>Tripping of heavy load vehicles in the city during the day time</li>
</ul>
<p>Following is a time series plot of the 852 accidents that took place from January-April 2014 according to the days of the week. This graph clearly indicates that the number of accidents occurring over the weekends is high.</p>
<p><img src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/84d328f1-1b81-41d1-aad4-fc00026edd38/Image/f3e309f8cf3ca640ff4bd561d619fe5e_w480.jpeg" /></p>
<p>The increase in the number of accidents over the weekend is a serious concern which requires attention since these accidents may be preventable by awareness campaigns targeted to the youth of the city.</p>
Conclusion
<p>Based on the results of the above analysis, preventive actions that I believe could be taken by the concerned authorities are:</p>
<ol>
<li>Make citizens aware of the importance of strictly adhering to the traffic rules, and impose fines on those who do not abide by them.</li>
<li>Issue driving licenses only as per age limits, and only after the person clears all the tests.</li>
<li>Inspect vehicles to make sure they are road-worthy.</li>
<li>Increase the number of traffic police in areas of heavy traffic.</li>
<li>Make sure the timers installed at traffic signals function properly.</li>
<li>Analyze the major accident-prone areas scientifically to reduce the rate of occurrence.</li>
<li>Check medians, footpaths, and curvatures carefully.</li>
<li>Use paint to clearly mark humps on the roads.</li>
<li>Remove attention-seeking boards, banners, and advertisements.</li>
</ol>
<p><br />
<strong>Dhatry Yaso Kala</strong><br />
Independent Consultant and Lean Six Sigma Black Belt<br />
Hyderabad, India<br />
</p>
Fri, 13 Jun 2014 12:00:00 +0000http://blog.minitab.com/blog/voice-of-the-customer/guest-post%3a-analysis-of-road-accidents-in-hyderabadMinitab FanThe Five Coolest Things You Can Do When You Right-click a Graph in Minitab Statistical Software
http://blog.minitab.com/blog/statistics-and-quality-improvement/the-five-coolest-things-you-can-do-when-you-right-click-a-graph-in-minitab-statistical-software
<p>Minitab graphs are powerful tools for investigating your process further and removing any doubt about the steps you should take to improve it. With that in mind, you’ll want to know every feature about Minitab graphs that can help you share and communicate your results effectively. While many ways to modify your graph are on the <strong>Editor</strong> menu, some of the best features become available when you right-click your graph.</p>
<p>Here are the five coolest things you can do when you right-click a graph in Minitab Statistical Software.</p>
Send graph to...
<p>Once your graph is ready for your report or presentation, you’ll want to put the graph in your document. Minitab makes this easy because you can right-click your graph and select either <strong>Send Graph to Microsoft Word</strong> or <strong>Send Graph to Microsoft PowerPoint</strong>. With that, you’re all set to go.</p>
<p> <img alt="The right-click menu, with "Sned Graph to Microsoft Word" highlighted." src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/5c2d6174d12ee1a9bfd306566bc0f5f7/context_menu.png" style="width: 170px; height: 341px;" /> </p>
<p>When you use the Minitab menu to transfer your graph to a presentation document, Minitab automatically selects the format that provides the clearest graph. In the case of PowerPoint, Minitab also makes sure that the graph is automatically fit to fill the receiving slide.</p>
StatGuide™
<p>Getting your graph into a report is an important step, but you also want to be ready to explain your results. That’s where Minitab’s <a href="http://blog.minitab.com/blog/understanding-statistics/five-ways-to-get-help-with-statistics">StatGuide</a>™ comes into play. Right-click your graph, and the last menu item is always going to be <strong>StatGuide</strong>. Select <strong>StatGuide</strong> and you’ll be taken directly to a page about the graph that you’re examining. Minitab saves you the time you would have spend looking for information about the output so that you have more time to get things done.</p>
<p><img alt="The residuals versus order plot has a pattern in it." src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/c285b252141dee3ab6b83b869e227850/residuals_vs_order_for_yield.jpg" style="width: 297px; height: 198px;" /><img alt="StatGuide contains information to help you interpret and explain your graph." src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/84b2dbfcab19b03b56a09dddfccdad6c/statguide.png" style="width: 226px; height: 358px;" /></p>
Copy Text
<p>Graphs are excellent tools for exploring and communicating, but that doesn’t mean that you never want to see the exact numbers. Getting the numbers from a graph is as easy as selecting an individual component and choosing <strong>Copy Text</strong>.</p>
<p>For example, you have a boxplot and would like to see the exact statistics for the graph. The tooltip for the boxplot includes the mean, quartiles, minimum, maximum, interquartile range, and sample size. Select the box with a right-click and <strong>Copy Text</strong> is active in the context menu. You can even paste a text box directly onto the graph with the information from the tooltip!</p>
<p><img alt="The tooltip shows the statstics Minitab uses to draw the boxplot." src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/07b066ff8c50b5e6c312521963b9c488/boxplot_tooltip_w640.png" style="width: 384px; height: 243px;" /><img alt="The statistics from the tooltip are pasted directly on the graph." height="243" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/231d2fb919e6facd416289bc46222d5c/boxplot_of_strength.jpg" width="366" /></p>
Switch to worksheet
<p>If you have a lot of graphs open, your Minitab window can sometimes get a little full. On those occasions, the right-click menu makes it easy for you to compare what you see on your graph with what’s in your data. For example, say that you’re looking at a residuals vs. fits plot and you brush the rows with the largest fits. Right-click the graph and choose <strong>Switch to</strong>, and you can quickly match up the brushed rows with the data on your graph. Here, you can easily see that while the catalyst changes between rows 4 and 8, the settings for time and temperature are the same.</p>
<p><img alt="The brushed points are the two largest fits." src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/7ae0ef21bfec4c03b08d64dd9218cdfe/burshed_graph_w640.png" style="width: 500px; height: 334px;" /><img alt="The black dots indicate the brushed points in the worksheet." height="249" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/22791f44-517c-42aa-9f28-864c95cb4e27/Image/1d64afb4426b880c264d573da24ba2d5/worksheet.png" width="569" /></p>
Go to Session Line
<p>Sometimes you have to produce a lot of output in Minitab to understand your data. When you get a mix of lots of statistics and graphs, you’ll want to be able to easily find the Coefficients table for a particular residual plot or the p-value for a t-test shown on a specific individual value plot. Right-clicking a graph can save you again. When you right-click and select <strong>Go to Session Line</strong>, you’re taken to the portion of the session window where the graph was made. Any tables or statistics that Minitab produced at the same time as the graph are right above that point in the session window!</p>
Ready to go
<p>Graphs are an important tool for making sure that everyone understands the results of your data analysis. When you right-click a graph in Minitab, you’ll find a number of tools that make it easier to share and understand your results. The right-click menu is one more step on your path to <a href="http://www.minitab.com/en-us/products/minitab/features/?WT.ac=EN_WIL">fearless data analysis</a>.</p>
Data AnalysisStatisticsStatistics HelpStatsWed, 28 May 2014 15:48:00 +0000http://blog.minitab.com/blog/statistics-and-quality-improvement/the-five-coolest-things-you-can-do-when-you-right-click-a-graph-in-minitab-statistical-softwareCody SteeleCan I Just Delete Some Values to Reduce the Standard Variation in My ANOVA?
http://blog.minitab.com/blog/understanding-statistics/can-i-just-delete-some-values-to-reduce-the-standard-variation-in-my-anova
<p>We received the following question via social media recently:</p>
<p style="margin-left: 40px;"><em>I am using Minitab 17 for ANOVA. I calculated the mean and standard deviation for these 15 values, but the standard deviation is very high. If I delete some values, I can reduce the standard deviation. Is there an option in Minitab that will automatically indicate values that are out of range and delete them so that the standard deviation is low?</em></p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/730b09b8826f9404e8661473c1e91ec0/throwing_out_data.gif" style="margin: 10px 15px; float: right; width: 177px; height: 177px;" />In other words, this person wanted a way to automatically eliminate certain values to lower the standard deviation.</p>
<p>Fortunately, Minitab 17 does <em>not </em>have the functionality that this person was looking for.</p>
<p>Why is that fortunate? Because cherry-picking data isn’t a statistically sound practice. In fact, if you do it <em>specifically</em> to reduce variability, removing data points can amount to fraud.</p>
When <em>Is </em>It OK to Remove Data Points?
<p>So that raises a question: is it <em>ever </em>acceptable to remove data? The answer is yes. If you know, for a fact, that some values in your data were inappropriately attained, then it is okay to remove these bad data points. For example, if <a href="http://blog.minitab.com/blog/statistics-and-quality-data-analysis/using-minitab-to-weed-out-bloopers">data entry errors</a> resulted in a few data points from Sample A being entered under Sample B, it would make sense to remove those data points from the analysis of Sample B.</p>
<p>But you may encounter other suggestions for removing data. Some people will use a "trimmed" data set. This means you remove the top and bottom 1-2 samples. Depending upon what the data is, and how you plan to use it, this too can be fraud.</p>
<p>Some people will use the term "Data Cleansing." When they do this, they remove a few data points from a large data set. The end results tend to be minimal on data analysis. But when this changes the end results of an analysis, it again can amount to fraud.</p>
<p>The bottom line? If you don't know for certain that the data points are bad, removing them—especially to change the outcome of an analysis—is virtually impossible to defend.</p>
Finding and Handling Outliers in Your Data
<p>Minitab 17 won't automatically delete values to make your standard deviation small. However, our statistical software does make it easy to identify potential outliers that may be skewing your data, so that you can investigate them. You can access the outlier detection tests at <strong>Stat > Basic Statistics > Outlier Test…</strong></p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/bd0c3a768510bd4f6c68907f8155daad/outlier_menu.gif" style="width: 351px; height: 184px;" /></p>
<p>You can also look at specific statistical measures that indicate the presence of <a href="http://support.minitab.com/minitab/17/topic-library/modeling-statistics/regression-and-correlation/model-assumptions/ways-to-identify-outliers/">outliers in regression and ANOVA</a>.</p>
<p>Of course, before removing any data points you need to make sure that the values are really outliers. First, think about whether those values were collected under the same conditions as the other values. Was there a substitute lab technician working on the day that the potential outliers were collected? If so, did this technician do something differently than the other technicians? Or could the digits in a value be reversed? That is, was 48 recorded as 84?</p>
<p>If you have just one factor in an ANOVA, try using <strong>Assistant > Hypothesis Tests > One-Way ANOVA…</strong> Outliers will be flagged in the output automatically:</p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/97f93bd1c0b42f2ff85db7cce08bcc90/assistant_outlier_flagged_w640.png" style="width: 640px; height: 94px;" /></p>
<p>You could then run the analysis again after manually removing outliers as appropriate.</p>
<p>You also can use a boxplot chart to identify outliers:</p>
<p><img alt="Finding Outliers in a Boxplot" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/Image/d1235a60856d3bde4bb8638116845f9a/outlier_boxplot_1_.png" style="width: 260px; height: 173px;" /></p>
<p>As you can see above, Minitab's boxplot uses an asterisk (*) symbol to identify outliers, defined as observations that are at least 1.5 times the interquartile range from the edge of the box. You can easily identify the unwanted data point by clicking on the outlier symbols so you can investigate further. After editing the worksheet you can update the boxplot, perhaps finding more outliers to remove.</p>
Are Your Outliers "Keepers"?
<p>While Minitab won't offer an automated "make my data look acceptable" tool, the software does make it easy to find specific data points that may take the results of your analysis in an inaccurate or unwanted direction.</p>
<p>However, before removing any "bad" data points you should understand their causes and be sure you can avoid recurrence of those causes in the actual process. If the "bad" data could contribute to a more accurate understanding of the actual process, removing them from the calculation will produce wrong results. </p>
Data AnalysisLearningStatisticsStatistics HelpStatsMon, 30 Jun 2014 12:00:00 +0000http://blog.minitab.com/blog/understanding-statistics/can-i-just-delete-some-values-to-reduce-the-standard-variation-in-my-anovaEston MartzCommon Statistical Mistakes You Should Avoid
http://blog.minitab.com/blog/real-world-quality-improvement/common-statistical-mistakes-you-should-avoid
<p>It's all too easy to make mistakes involving statistics. Powerful statistical software can remove a lot of the difficulty surrounding statistical calculation, reducing the risk of mathematical errors—but correctly interpreting the results of an analysis can be even more challenging. </p>
<p>No one knows that better than <a href="http://www.minitab.com/training/trainers/" target="_blank">Minitab's technical trainers</a>. All of our trainers are seasoned statisticians with years of quality improvement experience. They spend most of the year traveling around the country (and around the world) to help people learn to make the best use of Minitab software for analyzing data and improving quality. </p>
<p>A few years ago, Minitab trainers compiled a list of common statistical mistakes—the ones they encountered over and over again. Below are a few of their most commonly observed mistakes that involve drawing an incorrect conclusion from the results of analysis. </p>
Statistical Mistake 1: Misinterpreting Overlapping Confidence Intervals
<p>When comparing multiple means, statistical practitioners are sometimes advised to compare the results from confidence intervals and determine whether the intervals overlap. When 95% confidence intervals for the means of two independent populations don’t overlap, there will indeed be a statistically significant difference between the means (at the 0.05 level of significance). <a href="http://www.minitab.com/en-us/Published-Articles/Some-Misconceptions-about-Confidence-Intervals/" style="font-size: 13px; line-height: 1.6;" target="_blank"><strong>However, the opposite is not necessarily true.</strong></a> CI’s may overlap, yet there may be a statistically significant difference between the means.</p>
<p>Take this example:</p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/5de2bc0040f8e8aab8b5b97e87c55dd9/ci_plot_w640.jpeg" style="width: 440px; height: 293px;" /><br />
<br />
Two 95% confidence intervals that overlap may be significantly different at the 95% confidence level.</p>
<p>What is the significance of the t-test P-value? The P-value in this case is less than 0.05 (0.049 < 0.05), telling us that there is a statistical difference between the means, (yet the CI's overlap considerably). </p>
Statistical Mistake 2: Making Incorrect Inferences about the Population
<p>With statistics, we can analyze a small sample to make inferences about the entire population. But there are a few situations where you should avoid making inferences about a population that the sample does not represent:</p>
<ul>
<li>In <a href="http://blog.minitab.com/blog/starting-out-with-statistical-software/starting-out-with-capability-analysis" target="_blank"><strong>capability analysis</strong></a>, data from a single day is sometimes inappropriately used to estimate the capability of the entire manufacturing process.</li>
<li>In <a href="http://support.minitab.com/en-us/minitab/17/topic-library/quality-tools/acceptance-sampling/basics/what-is-acceptance-sampling/" target="_blank"><strong>acceptance sampling</strong></a>, samples from one section of the lot are selected for the entire analysis.</li>
<li>A common and severe case occurs in a <a href="http://blog.minitab.com/blog/understanding-statistics/choosing-the-right-distribution-model-for-reliability-data" target="_blank"><strong>reliability analysis</strong></a> when only the units that failed are included in an analysis and the population is all units produced.</li>
</ul>
<p>To avoid these situations, define the population before sampling and take a sample that truly represents the population.</p>
Statistical Mistake 3: Assuming Correlation = Causation
<p>It’s sometimes overused, but “correlation does not imply causation” is a good reminder when you’re dealing with statistics. Correlation between two variables does not mean that one variable causes a change in the other, especially if correlation statistics are the only statistics you are using in your data analysis.</p>
<p>For example, data analysis has shown a strong positive correlation between shirt size and shoe size. As shirt size goes up, so does shoe size. Does this mean that wearing big shirts causes you to wear bigger shoes? Of course not! There could be other “hidden” factors at work here, such as height. (Tall people tend to wear bigger clothes and shoes.)</p>
<p>Take a look at this scatterplot that shows that HIV antibody false negative rates are correlated with patient age:</p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/985458eb7adf5fd68aa4ede7da69cb29/scatterplot.jpg" style="width: 576px; height: 384px;" /><br />
<br />
Does this show that the HIV antibody test does not work as well on older patients? Well, maybe …</p>
<p>But you can’t stop there and assume that just because patients are older, age is the factor that is causing them to receive a false negative test result (a false negative is when a patient tests negative on the test, but is confirmed to have the disease).</p>
<p><em>Let’s dig a little deeper.</em> Below you see that patient age and days elapsed between at-risk exposure and test are correlated:</p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/5f93449fe79a55047e0636c48bcf751a/scatterplot_2.jpg" style="width: 576px; height: 384px;" /><br />
<br />
Older patients got tested faster … before the HIV antibodies were able to fully develop and show a positive test result.</p>
<p>Keep the idea that “correlation does not imply causation” in your mind when reading some of the many <a href="http://blog.minitab.com/blog/real-world-quality-improvement/cell-phones-and-cancer-correlation-is-not-causation" target="_blank"><strong>studies publicized in the media</strong></a>. Intentionally or not, the media frequently imply that a study has revealed some cause-and-effect relationship, even when the study's authors detail precisely the limitations of their research.</p>
Statistical Mistake 4: Not Distinguishing Between Statistical Significance and Practical Significance
<p>It's important to remember that using statistics, we can find a statistically significant difference that has no discernible effect in the "real world." In other words, just because a difference <em>exists </em>doesn't make the difference <em>important</em>. And you can waste a lot of time and money trying to "correct" a statistically significant difference that doesn't matter. </p>
<p>Let's say you love Tastee-O's cereal. The factory that makes them weighs every cereal box at the end of the filling line using an automated measuring system. Say that 18,000 boxes are filled per shift, with a target fill weight of 360 grams and a standard deviation of 2.5 grams. </p>
<p>Using statistics, the factory can detect a shift of 0.06 grams in the mean fill weight 90% of the time. But just because that 0.06 gram shift is statistically significant doesn't mean it's practically significant. A 0.06 gram difference probably amounts to two or three Tastee-O’s—not enough to make you, the customer, notice or care. </p>
<p>In most hypothesis tests, we know that the null hypothesis is not <em>exactly</em> true. In this case, we don’t expect the mean fill weight to be precisely 360 grams -- we are just trying to see if there is a <em>meaningful</em> difference. Instead of a hypothesis test, the cereal maker could use a confidence interval to see how large the difference might be and decide if action is needed.</p>
Statistical Mistake 5: Stating That You've Proved the Null Hypothesis
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/ccb8f6d6-3464-4afb-a432-56c623a7b437/Image/b8c8b7bfc58720fb6e45d20b5da06df6/coin_flip.JPG" style="float: right; width: 291px; height: 167px; border-width: 1px; border-style: solid; margin-left: 7px; margin-right: 7px;" />In a hypothesis test, you pose a null hypothesis (H0) and an alternative hypothesis (H1). Then you collect data, analyze it, and use statistics to assess whether or not the data support the alternative hypothesis. A p-value above 0.05 indicates “there is not enough evidence to conclude H1 at the .05 significance/alpha level”.</p>
<p>In other words, even if we do not have enough evidence in favor of the alternative hypothesis, the null hypothesis may or may not be true. </p>
<p>For example, we could flip a fair coin 3 times and test:</p>
<p>H0: Proportion of Heads = 0.40 </p>
<p>H1: Proportion of Heads ≠ 0.40</p>
<p>In this case, we are guaranteed to get a p-value higher than 0.05. Therefore we cannot conclude H1. But not being able to conclude H1 doesn't prove that H0 is correct or true! This is why we say we "fail to reject" the null hypothesis, rather than we "accept" the null hypothesis. </p>
Statistical Mistake 6: Not Seeking the Advice of an Expert
<p>One final mistake we’ll cover here is not knowing when to seek the advice of a statistical expert. Sometimes, employees are placed in statistical training programs with the expectation that they will come out immediately as experienced statisticians. While this training is excellent for basic statistical projects, it’s usually not enough to handle more advanced issues that may come about. After all, most skilled statisticians have had 4-8 years of education in statistics and at least 10 years of real-world experience!</p>
<p>If you’re in need of some help, you can hire a Minitab statistician. Learn more about Minitab’s Mentoring service by visiting <a href="http://www.minitab.com/training/" target="_blank">http://www.minitab.com/training/</a>. </p>
<p><em><a href="http://blog.minitab.com/blog/understanding-statistics" target="_blank">Eston Martz</a> and <a href="http://blog.minitab.com/blog/michelle-paret" target="_blank">Michelle Paret</a> contributed to the content of this post.</em></p>
<p><strong>Tell us in the comments below: Have you ever jumped to the wrong conclusion after looking at statistics? </strong></p>
<p> </p>
StatisticsStatistics HelpFri, 23 May 2014 14:20:00 +0000http://blog.minitab.com/blog/real-world-quality-improvement/common-statistical-mistakes-you-should-avoidCarly BarryGage This or Gage That? How the Number of Distinct Categories Relates to the %Study Variation
http://blog.minitab.com/blog/michelle-paret/gage-this-or-gage-that-how-the-number-of-distinct-categories-relates-to-the-study-variation
<p>We cannot improve what we cannot measure. Therefore, it is critical that we conduct a measurement systems analysis (MSA) before we start analyzing our data to make any kind of decisions.</p>
<p>When conducting an MSA for continuous measurements, we typically using a Gage R&R Study. And in these Gage R&R Studies, we look at output such as the <a href="http://blog.minitab.com/blog/quality-data-analysis-and-statistics/how-to-interpret-gage-output-part-2">percentage study variation</a> (%Study Var, or %SV) and the <a href="http://blog.minitab.com/blog/quality-data-analysis-and-statistics/understanding-your-gage-randr-output">Number of Distinct Categories</a> (ndc) to assess whether our measurement system is adequate.</p>
<p>Looking at these 2 values to assess a measurement system often leads to questions like "Should I look at both values? Will both values simultaneously indicate if my measurement system is poor? Are these 2 values related?" </p>
<p>The answer to all of these questions is "Yes," and here's why.</p>
How Are NDC and %Study Var Related?
<p>To clearly understand how number of distinct categories and percentage study variation are related, first consider how they are mathematically defined:</p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/d840d539abbf0f0cc70c3cb03c823cb1/equation1.jpg" style="width: 401px; float: left; height: 72px; margin-left: 50px; margin-right: 50px" /></p>
<p> </p>
<p> </p>
<p><br />
<span face="">where sigma represents the square root of the variance components. </span></p>
<p><span face="">Using substitution, we can express the relationship between ndc and %SV as:</span></p>
<p><span face=""><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/b8624dccb97d74650d8f3389eef2db64/equation2.jpg" style="width: 350px; float: left; height: 152px; margin-left: 50px; margin-right: 50px" /></span></p>
<p> </p>
<p> </p>
<p> </p>
<p> </p>
<p> </p>
<p><span face="">The last equation shows that ndc and %SV are inversely proportional: the larger %SV is, the smaller the ndc is, and vice-versa. However, it also suggests that the value of ndc depends not only on %SV, but on the variance components as well.</span></p>
NDC as a Function of %SV
<p>To simplify the equation and represent ndc solely as a function of %SV, we can express the variance components in another way. The total variance is the sum of two variance components, one corresponding to gage repeatability and reproducibility and the other to part-to-part variation:</p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/8cdb02ebc3a57a05010fe627dfe8bb45/equation3.jpg" style="width: 222px; float: left; height: 36px; margin-left: 50px; margin-right: 50px" /></p>
<p> </p>
<p> </p>
<p>Solving for sigma-squared for part and dividing each side of the equation by sigma-squared for total yields:</p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/cfe7fe6042e0c688436844d14f9c9460/equation4.jpg" style="width: 193px; float: left; height: 73px; margin-left: 50px; margin-right: 50px" /></p>
<p> </p>
<p> </p>
<p> </p>
<p><span face="">Because %SV / 100 = sigma gage / sigma total, the equation above can be rewritten as:</span></p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/89d53946a51b100b7a4573c9677b3cf7/equation6.jpg" style="width: 350px; float: left; height: 82px; margin-left: 50px; margin-right: 50px" /></p>
<p> </p>
<p> </p>
<p> </p>
<p>Substituting this value into the previous equation for ndc gives the following simplified formula:</p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/00cecc8f1fff70ad94e17d0c785253b9/equation7.jpg" style="width: 330px; float: left; height: 144px; margin-left: 50px; margin-right: 50px" /></p>
<p> </p>
<p> </p>
<p> </p>
<p> </p>
<p> </p>
<p>This equation clearly shows the relationship between ndc and %SV and can be used to calculate the number of distinct categories for a given percentage study variation. As shown in Table 1, the calculated ndc value is then truncated to obtain a whole number (integer).</p>
<p><img alt="" src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/6060c2db-f5d9-449b-abe2-68eade74814a/Image/394a73d4dd88ac618ceb3fe68a18922b/equation8.jpg" style="width: 270px; float: left; height: 268px; margin-left: 50px; margin-right: 50px" /></p>
<p> </p>
<p> </p>
<p> </p>
<p> </p>
<p> </p>
<p> </p>
<p> </p>
<p> </p>
<p>For example, if the calculated value is 15.8, mathematically you are not quite capable of differentiating between 16 categories. Therefore, Minitab <a href="http://www.minitab.com/products/minitab">Statistical Software</a> is conservative and truncates the number of distinct categories to 15. For practical purposes, you can also round the calculated ndc values to obtain the number of distinct categories.</p>
Guidelines and Limitations for Evaluating a Measurement System Using NDC
<p>You can evaluate a measurement system by looking only at the number of distinct categories and using the following guidelines (based on the truncation method used by Minitab):</p>
<ul>
<li>≥ <strong>14 distinct categories </strong>– the measurement system is acceptable</li>
<li><strong>4-13 distinct categories </strong>– the measurement system is marginally acceptable, depending on the importance of the application, cost of measurement device, cost of repair, and other factors</li>
<li><strong>≤ 3 distinct categories </strong>– the measurement system is unacceptable and should be improv</li>
</ul>
<p>These guidelines have some limitations. For example, in some cases when the %SV is over 30% the number of distinct categories is 4. Therefore, a measurement system with 32% study variation, which is unacceptable under the AIAG criteria for %SV, is acceptable under the ndc criteria. To avoid this discrepancy, some authors suggest only accepting a measurement system when it can distinguish between 5 or more categories. Although this fixes the original problem, it makes measurement systems with a 28-30% study variation unacceptable, because their corresponding ndc value equals 4.</p>
<p>To resolve this issue you can establish more specific guidelines based on the exact calculated ndc values, without truncating or rounding. For example, you could define an unacceptable measurement system based on an ndc < 4.497.</p>
<p>And that is how the number of distinct categories is related to %Study Var.</p>
Data AnalysisLean Six SigmaLearningQuality ImprovementSix SigmaStatisticsStatistics HelpStatsMon, 19 May 2014 12:00:00 +0000http://blog.minitab.com/blog/michelle-paret/gage-this-or-gage-that-how-the-number-of-distinct-categories-relates-to-the-study-variationMichelle Paret