The Joy of Playing in Endless Backyards with Statistics
As 2016 comes to a close, it’s time to reflect on the passage of time and changes. As I’m sure you’ve guessed, I love statistics and analyzing data! I also love talking and writing about it. In fact, I’ve been writing statistical blog posts for over five years, and it’s been an absolute blast. John Tukey, the renowned statistician, once said, “The best thing about being a statistician is that you get to play in everyone’s backyard.” I enthusiastically agree!
However, when I first started writing the blog, I wondered about being able to keep up a constant supply of fresh blog posts. And, when I first mentioned to some non-statistician friends that I’d be writing a statistical blog, I noticed a certain lack of enthusiasm. For instance, I heard a variety of comments like, “So, you’ll be writing things along the lines of 9 out of 10 dentists recommend . . .” Would readers even be interested in what I had to say about statistics?
It turns out that with a curious mind, statistical knowledge, data, and a powerful tool like Minitab statistical software, the possibilities are endless. You can play in a wide variety of fascinating backyards!
The most surprising statistic is that my blog posts have received over 5.5 million views in the past year alone. Never in my wildest dreams did I imagine so many readers when I wrote my first post! It’s a real testament to the growing importance of data analysis that so many people are interested in a blog dedicated to statistics. Thank you all for reading!
Endless Backyards . . .
Some of the topics I've written about are out of this world. I’ve assessed dolphin communications and compared it to the search for extraterrestrial intelligence and analyzed exoplanet data in the search for the Earth’s twin! (As an aside, my analysis showed that my writing style is similar to dolphin communications. I'll take that as a compliment!)
For more Earthly subjects, I’ve studied the relationship between mammal size and their metabolic rate and longevity. I’ve analyzed raw research data to assess the effectiveness of flu shots first hand. I’ve downloaded economic data to assess patterns in both the U.S. GDP and U.S. job growth. For a Thanksgiving Day post, I analyzed world income data to answer the question of how thankful we should be statistically. As for Easter, I can tell you the date on which it falls in any of 2,517 years, along with which dates are the most and least common.
In the world of politics, I’ve used data to predict the 2012 U.S. Presidential election, analyzed the House Freedom Caucus and the search for the new Speaker of the House, assessed the factors that make a great President, and even helped Mitt Romney pick a running mate. Everyone talks about the weather, so of course I had to analyze that. My family loves the Mythbusters and it was fun applying statistical analyses to some of the myths that they tested (here and here). That's my family and I meeting them in the picture to the right!
Some of my posts have even been a bit surreal. I took my turn at attempting to explain the statistical illusion of the infamous Monty Hall problem. I’ve compared world travel to adjusting scales in graphs (seriously). For Halloween-themed posts, I showed how to go ghost hunting with a statistical mindset and how regression models can be haunted by phantom degrees of freedom. I analyzed the fatality rates in the original Star Trek TV series. I explored how some people can find so many four leaf clovers despite their rarity. And, I wondered whether a statistician can say that age is just a number?
See, not a mention of those dentists...well, not until now. By this point, 9 out of 10 dentists are probably feeling neglected!
Helping Others Perform Their Own Analyses
I’ve also written many posts aimed at helping those who are learning and performing statistical analyses. I described why statistics is cool based on my own personal experiences and how the whole field of statistics is growing in importance. I showed how anecdotal evidence is unreliable and explained why it fails so badly. And, I took a look forward at how statistical analyses are expanding into areas traditionally ruled by expert judgment.
I zoomed in to cover the details about how to perform and interpret statistical analyses. Some might think that covering the nitty gritty of statistical best practices is boring. Yet, you’d be surprised by the lively discussions we’ve had. We’ve had heated debates and philosophical discussions about how to correctly interpret p-values and what statistical significance does and does not tell you. This reached a fever pitch when a psychology journal actually banned p-values!
We had our difficult questions and surprising topics to grapple with. How high should R-squared be? Should I use a parametric or nonparametric analysis? How is it possible that a regression model can have significant variables but still have a low R-squared? I even had the nerve to suggest that R-squared is overrated! And, I made the unusual case that control charts are also very important outside the realm of quality improvement. Then, there is the whole frequentist versus Bayesian debate, but let’s not go there!
However, it’s true that not all topics about how to perform statistical analyses are riveting. I still love these topics. The world is becoming an increasingly data-driven place, and to produce trustworthy results, you must analyze your data correctly. After all, it’s surprisingly easy to make a costly mistake if you don’t know what you’re doing.
A data-driven world requires an analyst to understand seemingly esoteric details such as: the different methods of fitting curves, the dangers of overfitting your model, assessing goodness-of-fit, checking your residual plots, and how to check for and correct multicollinearity and heteroscedasticity. How do you choose the best model? Do you need to standardize your variables before performing the analysis? Maybe you need a regression tutorial?
You may need to know how to identify the distribution of your data. And just how do hypothesis tests work anyway? F-tests? T-tests? How do you test discrete data? Should you use a confidence interval, prediction interval, or a tolerance interval? How do you know when X causes a change in Y? Is a confounding variable distorting your results? What are the pros and cons of using a repeated measures design? Fisher’s or Welch’s ANOVA? ANOVA or MANOVA? Linear or nonlinear regression?
These may not be “sexy” topics but they are the meat and potatoes of being able to draw sound conclusions from your data. And, based on numerous blog comments, they have been well received by many people. In fact, the most rewarding aspect of writing blog posts has been the interactions I've had with all of you. I've communicated with literally hundreds and hundreds of students learning statistics and practitioners performing statistics in the field. I’ve had the pleasure of learning how you use statistical analyses, understanding the difficulties you face, and helping you resolve those issues.
It's been an amazing journey and I hope that my blog posts have allowed you to see statistics through my eyes―as a key that can unlock discoveries that are trapped in your data. After all, that's the reason why I titled my blog Adventures in Statistics. Discovery is a bumpy road. There can be statistical challenges en route, but even those can be interesting, and perhaps even rewarding, to resolve. Sometimes it is the perplexing mystery in your data that prompts you to play detective and leads you to surprising new discoveries!
To close out the old year, it's good to remember that change is constant. There are bound to be many new and exciting adventures in the New Year. I wish you all the best in your endeavors.
“We will open the book. Its pages are blank. We are going to put words on them ourselves. The book is called Opportunity and its first chapter is New Year's Day.” ― Edith Lovejoy Pierce
May you all find happiness in 2017! Onward and upward!