The Longest Drive: Golf and Design of Experiments, Part 2
Step 1 in our DOE problem-solving methodology is to use process experts, literature, or past experiments to characterize the process and define the problem. Since I had little experience with golf myself, this was an important step for me.
This is not an uncommon situation. Experiment designers often find themselves working on processes that they have little or no experience with. For example, a quality engineer might be assigned to try to solve a process problem as part of a cross-functional team in another department, or in a supplier’s facility. Regardless of the situation, it’s never too early to open your mind and ears to all possible information to characterize your process and problem.
Based on contributions we solicited from experienced golfers, a Penn State golf coach, other statisticians who had tackled similar problems, and all the Internet had to offer, we were able to assemble a list of potential inputs for flight distance (Carry), rolling distance (Roll), and the corresponding total drive distance (Total).
- Golfer – Block
- Club speed on ball contact – Covariate
- Contact point on the club face with respect to the center of the club face – Covariate
- Tilt angle designed into the club face – Factor
- Club shaft stiffness – Factor
- Ball characteristics such as hardness, core composition, aerodynamics, etc. – Factor
- Height of the ball on the tee – Factor
- Club path and position on contact such as angle, arc and shaft flex – Noise
- Golfer’s grip strength on the club – Noise
- Ground surface conditions (incline, firmness, grass height, dampness, etc.) – Noise
- Air temperature and humidity – Noise
- Wearing expensive shorts with a Bubba Watson logo – Noise
- Golfer’s arm length – Noise
- Club head weight and club length – Noise
- Ball spin rate and direction coming off the club – Noise
Managing the Inputs
These inputs have been classified into 4 categories according to how they will be managed in our experiment. Let’s walk through each of the four groups.
Each golfer brings a unique style, swing, and athleticism to the game. Because each data point can be traced back to an individual golfer, the variability between golfers will be handled using a technique called blocking. The data from each golfer will be standardized according to the average driving performance of that golfer compared to the other golfers in the study.
Pardon the pun, but blocking essentially handicaps each golfer by their average distance so that all the data can be combined into one analysis without concern about golfer-to-golfer variation. When you set up an experiment, blocking allows you to take advantage of all the resources available to you (three manufacturing lines, two measurement technicians, etc.) without concern for the variability from block to block. Our experimental design and analysis will block on the golfer so we can take advantage of using several different golfers in the experiment.
Analysis of Covariance (Covariates)
Inputs 2 and 3, club speed and club/ball contact location on the club, are noise variables that we cannot control from drive to drive, but have a strong effect on our responses. The important distinction from other noise variables is that we can measure them on each drive. By establishing the average linear relationship between these covariates and our responses, we can mathematically adjust each Carry and Roll measurement for club speed and club/ball contact location for that drive.
By treating club speed and club/ball contact location as covariates, we can greatly reduce the level of background noise in our experiment data, thus giving us a clearer estimate of the effects of the experimental factors we are studying. Nearly all processes have noise variables that cannot be controlled and contribute to the overall process variation. A key experiment design goal for your study should be to plan to measure these variables during each run so that their effect can be removed from the background variability. This will reduce the overall level of noise you have to deal with in the final analysis.
Noise variables add to the background variability of our data. This variability can obscure our ability to measure the true effect of our research variables, which are the ones of interest to us.
Imagine seeing Michael Jordan playing basketball on the rare occasions when he had a bad game. If you'd never seen him play before, you could easily walk out of the game thinking “He is not very good.” Of course, you would have reached the wrong conclusion, even though you used data to make you decision, because your data was affected by background variability. In our DOE, we want to minimize the background variability in order to decrease the probability of making an incorrect conclusion.
There are several ways to do this. For example, inputs 8 and/or 15 can result in a very bad drive. We can minimize the impact on our results by discarding any drive that is an obvious slice into the trees. Likewise, as experimenters, we should be attentive to measurements that are strongly influenced by noise variables and should exclude those measurements as outliers. Analysis of residuals is a powerful tool to detect outliers (one which will be described in a later post), but the best time to identify an outlier is when the sample is made or measured. This way, the extraneous circumstances leading to the outlier can be immediately noted and considered in the decision to remove the data point.
Club head weight and club length could have been factors in this experiment. But the cost of the additional drivers required to study these club properties would have been prohibitive. In addition, club length and weight are fairly well standardized, and are not something every golfer can change on demand. Because of this, club length and club weight were held constant in our experiment to minimize their impact on the variability of the results and to prevent them from unintentionally biasing the measured effects of the factors in the experiment. In your own process experiments, you will have to use your engineering knowledge and practical considerations to determine which inputs should be held constant.
Finally, a noise variable such as input 9, grip strength on the club, will vary from drive to drive within a golfer and this change in grip strength cannot be easily measured. Therefore, grip strength variability will be one of many sources contributing to the overall process variation that we will have to contend with in our experiment. For such variables, the best approach is to remind the experimenters of the importance of consistency in their performance. For your process experiment, standardizing procedures and protocols where possible, along with requiring consistency within an experimenter throughout the study, is a good way to lower your background process variation.
In the end, there will be some unexplained process variation. We had no control over which golfers were going to break out the Bubba Watson golf shorts for this event. Unexplained variation will be quantified and utilized in two ways. First, the amount of unexplained process variability (error) will have to be accurately measured so that it can be compared to the size of the experimental factor effects in the final analysis. This comparison allows us to determine which effects are much larger than the level of error in the data—or in other words, which effects are statistically significant. Second, in the experiment design phase, we will have to estimate the level of process variation we expect to see in the experiment so that the number of measurements (sample size = N) needed to detect the factor effects in midst of the noise can be calculated. This calculation, known as power and sample size, will be illustrated in a future post.
In summary, all of our process knowledge was used to generate a list of potential inputs to our responses. The inputs that were not selected to be studied in the experiment were classified as noise variables, blocking variables, or covariates. The blocking variables and covariates will be incorporated into our experiment design and analysis. The noise variables are either held constant during the experiment, carefully monitored to remove obvious outliers, controlled as best we can, or allowed to vary during the experiment because there is no other practical option.
After all of this planning, we haven’t said anything about experiment factors! This is a little-known fact about experiment design: the first key to success is planning to control, limit, account for, and finally measure the variability in your data. In your final analysis, your estimate of the background process variation will be in the calculation of every test statistic in your analysis. You have to work hard to get it right!
Experiment Design for Factors
Now that we have a good handle on our process variation, we can move forward to the four research variables (factors) in our experiment: tee height, shaft stiffness, ball quality, and club tilt angle.
One commonly held theory is that greater tilt angle will give higher loft for longer Carry. However, the velocity spent going up is not going forward and a longer “hang time” means that air resistance has more time to have a negative effect. In addition, the sharp angle of decent from high loft will lower the Roll. But does the tee height result in higher or lower loft? How do the shaft stiffness and ball quality affect the velocity of the ball off the tee? Of course a faster ball is going to go further.
In our experiment, as with your process, there will be many competing theories about how to reach the end goal. What levels of the factors should be tested and in what combinations to get to the right solution? What is the smallest number of tests we can run while still achieving an accurate and comprehensive understanding of our process (including a way to optimize the responses)? In the next post, I will discuss the different options for factor level settings/combinations for our four factors and ultimately, determine our basic experiment plan.