dcsimg

Adventures in Statistics

Thanks to my desire to understand the deeper mechanics that lie behind what we observe in the world, I suppose it’s natural that I love data analysis. Observation is great, but I can only observe a small slice of reality. I really want to understand the larger picture and know how it all works. Data analysis gives you the keys to do just this whether you are studying how to manufacture the best product, provide the best services, or answering an academic research question.

I’m Jim Frost and I came to Minitab with a background in a wide variety of academic research. My role was the “data/stat...

What Is the Difference between Linear and Nonlinear Equations in Regression Analysis?

Previously, I’ve written about when to choose nonlinear regression and how to model curvature with both linear and nonlinear regression. Since then, I’ve received several comments expressing confusion about what differentiates nonlinear equations from linear equations. This confusion is understandable because both types can model curves.

So, if it’s not the ability to model a curve, what is the difference between a linear and nonlinear regression equation?

Linear Regression Equations

Linear regression requires a linear model. No surprise, right? But what does that really mean?

A model is linear...

How to Interpret a Regression Model with Low R-squared and Low P values

In regression analysis, you'd like your regression model to have significant variables and to produce a high R-squared value. This low P value / high R2 combination indicates that changes in the predictors are related to changes in the response variable and that your model explains a lot of the response variability.

This combination seems to go together naturally. But what if your regression model has significant variables but explains little of the variability? It has low P values and a low R-squared.

At first glance, this combination doesn’t make sense. Are the significant predictors still...

Multiple Regression Analysis and Response Optimization Examples using the Assistant in Minitab 17

In Minitab, the Assistant menu is your interactive guide to choosing the right tool, analyzing data correctly, and interpreting the results. If you’re feeling a bit rusty with choosing and using a particular analysis, the Assistant is your friend!

Previously, I’ve written about the new linear model features in Minitab 17. In this post, I’ll work through a multiple regression analysis example and optimize the response variable to highlight the new features in the Assistant.

Choose a Regression Analysis

As part of a solar energy test, researchers measured the total heat flux. They found that heat...

Five Guidelines for Using P values

There is high pressure to find low P values. Obtaining a low P value for a hypothesis test is make or break because it can lead to funding, articles, and prestige. Statistical significance is everything!

My two previous posts looked at several issues related to P values:

In this post, I’ll look at whether P values are still helpful and provide guidelines on how to use them with these issues in mind.

Sir Ronald A Fisher

Are P Values Still Valuable?

Given...

Not All P Values are Created Equal

The interpretation of P values would seem to be fairly standard between different studies. Even if two hypothesis tests study different subject matter, we tend to assume that you can interpret a P value of 0.03 the same way for both tests. A P value is a P value, right?

Not so fast! While Minitab statistical software can correctly calculate all P values, it can’t factor in the larger context of the study. You and your common sense need to do that!

In this post, I’ll demonstrate that P values tell us very different things depending on the larger context.

Recap: P Values Are Not the Probability of...

How to Correctly Interpret P Values

The P value is used all over statistics, from t-tests to regression analysis. Everyone knows that you use P values to determine statistical significance in a hypothesis test. In fact, P values often determine what studies get published and what projects get funding.

Despite being so important, the P value is a slippery concept that people often interpret incorrectly. How do you interpret P values?

In this post, I'll help you to understand P values in a more intuitive way and to avoid a very common misinterpretation that can cost you money and credibility.

What Is the Null Hypothesis in Hypothesis...

Did Welch’s ANOVA Make Fisher's Classic One-Way ANOVA Obsolete?

One-way ANOVA can detect differences between the means of three or more groups. It’s such a classic statistical analysis that it’s hard to imagine it changing much.

However, a revolution has been under way for a while now. Fisher's classic one-way ANOVA, which is taught in Stats 101 courses everywhere, may well be obsolete thanks to Welch’s ANOVA.

In this post, I not only want to introduce you to Welch’s ANOVA, but also highlight some interesting research that we perform here at Minitab that guides the implementation of features in our statistical software.

One-Way ANOVA Assumptions

Like any...

Why Is There No R-Squared for Nonlinear Regression?

Nonlinear regression is a very powerful analysis that can fit virtually any curve. However, it's not possible to calculate a valid R-squared for nonlinear regression. This topic gets complicated because, while Minitab statistical software doesn’t calculate R-squared for nonlinear regression, some other packages do.

So, what’s going on?

Minitab doesn't calculate R-squared for nonlinear models because the research literature shows that it is an invalid goodness-of-fit statistic for this type of model. There are bad consequences if you use it in this context.

Why Is It Impossible to Calculate a...

Unleash the Power of Linear Models with Minitab 17

We released Minitab 17 Statistical Software a couple of days ago. Certainly every new release of Minitab is a reason to celebrate. However, I am particularly excited about Minitab 17 from a data analyst’s perspective. 

If you read my blogs regularly, you’ll know that I’ve extensively used and written about linear models. Minitab 17 has a ton of new features that expand and enhance many types of linear models. I’m thrilled!

In this post, I want to share with my fellow analysts the new linear model features and the benefits that they provide.

New Linear Model Analyses in Minitab 17

We’ve added...

Are Atlanta's Winters Getting Colder and Snowier?

Atlanta was a mess on January 28th, 2014.  Thousands were trapped on the roads overnight while others managed to get to roadside stores to camp out. Thousands of students were forced to spend the night in their schools and the National Guard was called in to get them home. Many wondered how less than three inches of snow could cripple the city, particularly when Atlanta had experienced a similar storm in 2011?

This traumatic event, the recollection of recent snow storms, and now the current storm prompted some to wonder whether Atlanta has been experiencing more cold and snow than before. How...

Lessons in Quality During a Long and Strange Journey Home

I didn’t expect that our family trip to Florida would end with me driving a plane load of passengers nearly 200 miles to their homes, but it did.

Yes, it was a long and strange journey home. A journey that started in the tropical warmth of southern Florida and ended the next morning in central Pennsylvania, which felt like the arctic wastelands thanks to the dreaded polar vortex.

During this journey, I didn’t just experience temperature extremes, but also extremely different levels in the quality of customer care. Working at Minitab, I'm very aware of the quality of service because quality...

Regression Analysis: How to Interpret S, the Standard Error of the Regression

R-squared gets all of the attention when it comes to determining how well a linear model fits the data. However, I've stated previously that R-squared is overrated. Is there a different goodness-of-fit statistic that can be more helpful? You bet!

Today, I’ll highlight a sorely underappreciated regression statistic: S, or the standard error of the regression. S provides important information that R-squared does not.

What is the Standard Error of the Regression (S)?

S becomes smaller when the data points are closer to the line.

In the regression output for Minitab statistical software, you can find...

How High Should R-squared Be in Regression Analysis?

Just how high should R2 be in regression analysis? I hear this question asked quite frequently.

Previously, I showed how to interpret R-squared (R2). I also showed how it can be a misleading statistic because a low R-squared isn’t necessarily bad and a high R-squared isn’t necessarily good.

Clearly, the answer for “how high should R-squared be” is . . . it depends.

In this post, I’ll help you answer this question more precisely. However, bear with me, because my premise is that if you’re asking this question, you’re probably asking the wrong question. I’ll show you which questions you should...

Regression Analysis Tutorial and Examples

I’ve written a number of blog posts about regression analysis and I think it’s helpful to collect them in this post to create a regression tutorial. I’ll supplement my own posts with some from my colleagues.

This tutorial covers many aspects of regression analysis including: choosing the type of regression analysis to use, specifying the model, interpreting the results, determining how well the model fits, making predictions, and checking the assumptions. At the end, I include examples of different types of regression analyses.

If you’re learning regression analysis right now, you might want to...

Statistically, How Thankful Should We Be: A Look at Global Income Distributions, part 2

In my previous post, I looked at how personal income levels fit into the global distribution of incomes. Although, I’d be the last person to suggest that a higher income guarantees more happiness—after all, I’ve visited a number of developing countries and, as long as their basic needs are met, the people seem to be just as happy and hard working as people here at home.

So instead of personal income levels, I’d like to assess something more meaningful: global well-being. How does the overall global welfare today compare to 1970? Do more people have their basic needs met? That’s what we’ll look...

Statistically, How Thankful Should We Be: A Look at Global Income Distributions, part 1

In the United States, our Thanksgiving holiday is fast approaching. On this day, we give thanks for the good things in our lives.

For this post, I wanted to quantify how thankful we should be. Ideally, I’d quantify something truly meaningful, like happiness. Unfortunately, most countries are not like Bhutan, which measures the gross national happiness and incorporates it into their five-year development plans.

Instead, I’ll focus on something that is more concrete and regularly measured around the world—income. By examining income distributions, I’ll show that you have much to be thankful for,...

Four Tips on How to Perform a Regression Analysis that Avoids Common Problems

In my previous post, I highlighted recent academic research that shows how the presentation style of regression results affects the number of interpretation mistakes. In this post, I present four tips that will help you avoid the more common mistakes of applied regression analysis that I identified in the research literature.

I’ll focus on applied regression analysis, which is used to make decisions rather than just determining the statistical significance of the predictors. Applied regression analysis emphasizes both being able to influence the outcome and the precision of the predictions.

Tip...

Applied Regression Analysis: How to Present and Use the Results to Avoid Costly Mistakes, part 2

Applied regression analysis can be a great decision-making tool because you can predict the average outcome given input values. However, predictions are not as simple as plugging numbers into an equation. In my previous post I showed how a majority of experts vastly underestimated the variability around the predicted outcome in a manner that can lead to costly mistakes.

We also saw how graphing the data is a simple way to avoid these mistakes because it highlights the uncertainty. In this post, I'll explore other techniques that you can use in Minitab statistical software to facilitate good...

Applied Regression Analysis: How to Present and Use the Results to Avoid Costly Mistakes, part 1

Imagine that you’ve studied an empirical problem using linear regression analysis and have settled on a well-specified, actionable model to present to your boss. Or perhaps you’re the boss, using applied regression models to make decisions.

In either case, there’s a good chance a costly mistake is about to occur!

How regression results are presented can lead decision-makers to make bad choices. Emre Soyer and Robin M. Hogarth*, who study behavioral decision-making, found that even experts are frequently tripped up when making decisions based on applied regression models.

In this post, I'll look...

Size Matters: Metabolic Rate and Longevity

John Tukey once said, “The best thing about being a statistician is that you get to play in everyone’s backyard.” I enthusiastically agree!

I frequently enjoy reading and watching science-related material. This invariably raises questions, involving other "backyards," that I can better understand using statistics. For instance, see my post about the statistical analysis of dolphin sounds.

The latest topic that grabbed my attention was an apparent error in the BBC program Wonders of Life. In the episode “Size Matters,” Professor Brian Cox presents a graph with a linear regression line that...