dcsimg
 

Jim Frost

Data analysis gives you the keys to how to manufacture the best product, provide the best services, or answer an academic research question. I’ll share practical tidbits that may help you do just that. Continue Reading »

Stepwise regression and best subsets regression are both automatic tools that help you identify useful predictors during the exploratory stages of model building for linear regression. These two procedures use different methods and present you with different output. An obvious question arises. Does one procedure pick the true model more often than the other? I’ll tackle that question in this post. Fi... Continue Reading
Analysis of variance (ANOVA) is great when you want to compare the differences between group means. For example, you can use ANOVA to assess how three different alloys are related to the mean strength of a product. However, most ANOVA tests assess one response variable at a time, which can be a big problem in certain situations. Fortunately, Minitab statistical software offers a... Continue Reading
Using a sample to estimate the properties of an entire population is common practice in statistics. For example, the mean from a random sample estimates that parameter for an entire population. In linear regression analysis, we’re used to the idea that the regression coefficients are estimates of the true parameters. However, it’s easy to forget that R-squared (R2) is also an estimate.... Continue Reading
I’ve written about the importance of checking your residual plots when performing linear regression analysis. If you don’t satisfy the assumptions for an analysis, you might not be able to trust the results. One of the assumptions for regression analysis is that the residuals are normally distributed. Typically, you assess this assumption using the normal probability plot of the residuals. Are... Continue Reading
Astronomy is cool! And, it’s gotten even more exciting with the search for exoplanets. You’ve probably heard about newly discovered exoplanets that are extremely different from Earth. These include hot Jupiters, super-cold iceballs, super-heated hellholes, very-low-density puffballs, and ultra-speedy planets that orbit their star in just hours. And then there is PSR J1719-1438 which has the mass... Continue Reading
In my previous post, I described how I was asked to weigh in on the ethics of researchers (DeStefano et al. 2004) who reportedly discarded data and potentially set scientific knowledge back a decade. I assessed the study in question and found that no data was discarded and that the researchers used good statistical practices. In this post, I assess a study by Brian S. Hooker that was... Continue Reading
The other day I received a request from a friend to look into a new study in a peer reviewed journal that found a link between MMR vaccinations and an increased risk of autism in African Americans boys. To draw this conclusion, the new study reanalyzed data that was discarded a decade ago by a previous study. My friend wanted to know, from a statistical perspective, was it unethical for the... Continue Reading
Previously, I showed why there is no R-squared for nonlinear regression. Anyone who uses nonlinear regression will also notice that there are no P values for the predictor variables. What’s going on? Just like there are good reasons not to calculate R-squared for nonlinear regression, there are also good reasons not to calculate P values for the coefficients. Why not—and what to use instead—are the... Continue Reading
Previously, I’ve written about when to choose nonlinear regression and how to model curvature with both linear and nonlinear regression. Since then, I’ve received several comments expressing confusion about what differentiates nonlinear equations from linear equations. This confusion is understandable because both types can model curves. So, if it’s not the ability to model a curve, what isthe... Continue Reading
In regression analysis, you'd like your regression model to have significant variables and to produce a high R-squared value. This low P value / high R2 combination indicates that changes in the predictors are related to changes in the response variable and that your model explains a lot of the response variability. This combination seems to go together naturally. But what if your regression model... Continue Reading
In Minitab, the Assistant menu is your interactive guide to choosing the right tool, analyzing data correctly, and interpreting the results. If you’re feeling a bit rusty with choosing and using a particular analysis, the Assistant is your friend! Previously, I’ve written about the new linear model features in Minitab 17. In this post, I’ll work through a multiple regression analysis example and... Continue Reading
There is high pressure to find low P values. Obtaining a low P value for a hypothesis test is make or break because it can lead to funding, articles, and prestige. Statistical significance is everything! My two previous posts looked at several issues related to P values: P values have a higher than expected false positive rate. The same P value from different studies can correspond to different false... Continue Reading
The interpretation of P values would seem to be fairly standard between different studies. Even if two hypothesis tests study different subject matter, we tend to assume that you can interpret a P value of 0.03 the same way for both tests. A P value is a P value, right? Not so fast! While Minitab statistical software can correctly calculate all P values, it can’t factor in the larger context of the... Continue Reading
The P value is used all over statistics, from t-tests to regression analysis. Everyone knows that you use P values to determine statistical significance in a hypothesis test. In fact, P values often determine what studies get published and what projects get funding. Despite being so important, the P value is a slippery concept that people often interpret incorrectly. How do you interpret P values? In... Continue Reading
One-way ANOVA can detect differences between the means of three or more groups. It’s such a classic statistical analysis that it’s hard to imagine it changing much. However, a revolution has been under way for a while now. Fisher's classic one-way ANOVA, which is taught in Stats 101 courses everywhere, may well be obsolete thanks to Welch’s ANOVA. In this post, I not only want to introduce you to... Continue Reading
Nonlinear regression is a very powerful analysis that can fit virtually any curve. However, it's not possible to calculate a valid R-squared for nonlinear regression. This topic gets complicated because, while Minitab statistical software doesn’t calculate R-squared for nonlinear regression, some other packages do. So, what’s going on? Minitab doesn't calculate R-squared for nonlinear models... Continue Reading
We released Minitab 17 Statistical Software a couple of days ago. Certainly every new release of Minitab is a reason to celebrate. However, I am particularly excited about Minitab 17 from a data analyst’s perspective.  If you read my blogs regularly, you’ll know that I’ve extensively used and written about linear models. Minitab 17 has a ton of new features that expand and enhance many types of... Continue Reading
Atlanta was a mess on January 28th, 2014.  Thousands were trapped on the roads overnight while others managed to get to roadside stores to camp out. Thousands of students were forced to spend the night in their schools and the National Guard was called in to get them home. Many wondered how less than three inches of snow could cripple the city, particularly when Atlanta had experienced a similar... Continue Reading
I didn’t expect that our family trip to Florida would end with me driving a plane load of passengers nearly 200 miles to their homes, but it did. Yes, it was a long and strange journey home. A journey that started in the tropical warmth of southern Florida and ended the next morning in central Pennsylvania, which felt like the arctic wastelands thanks to the dreaded polar vortex. During this... Continue Reading
R-squared gets all of the attention when it comes to determining how well a linear model fits the data. However, I've stated previously that R-squared is overrated. Is there a different goodness-of-fit statistic that can be more helpful? You bet! Today, I’ll highlight a sorely underappreciated regression statistic: S, or the standard error of the regression. S provides important information that... Continue Reading