High School Researchers: What Do We Do with All of this Data?
by Colin Courchesne, guest blogger, representing his Governor's School research team.
High-level research opportunities for high school students are rare; however, that was just what the New Jersey Governor’s School of Engineering and Technology provided.
Bringing together the best and brightest rising seniors from across the state, the Governor’s School, or GSET for short, tasks teams of students with completing a research project chosen from a myriad of engineering fields, ranging from biomedical engineering to, in our team's case, industrial engineering.
Tasked with analyzing, comparing, and simulating queue processes at Dunkin’ Donuts and Starbucks, our team of GSET scholars spent five days tirelessly collecting roughly 250 data points on each restaurant. Our data included how much time people spent waiting in line, what type of drinks customers ordered, and how much time they spent waiting for their drinks after ordering.
The students used a computerized interface to collect data about customers in two different coffee shops.
But once the data collection was over, we reached a sort of brick wall. What do we do with all this data? As research debutantes not well versed in the realm of statistics and data analysis, we had no idea how to proceed.
Thankfully, the helping hand of our project mentor, engineer Brandon Theiss, guided us towards Minitab.
Getting Meaning Out of Our Data
Our original, raw data told us nothing. In order to compare data between stores and create accurate process simulations, we needed a way to sort the data, determine descriptive statistics, and assign distributions; it is these very tools that Minitab offered. Getting started was both easy and intuitive.
First, we all managed to download Minitab 17 (thanks to the 30-day trial). Our team then went on to learn the ins and outs of Minitab, both through instructional videos on YouTube as well as helpful written guides, all of which are provided by Minitab. Less than an hour later, we were able to navigate the program with ease.
The nature of the simulations our team intended to create called for us to identify the arrival process for each store, the distributions for the wait time of a customer in line at each restaurant, as well as the distributions for the drink preparation time, sectioned off by both restaurant as well as drink type. In order to input this information into our simulation, we also needed certain parameters that were dependent on the distribution. Such parameters ranged from alpha and beta values for Gamma distributions to means and standard deviations for Normal distributions.
Thankfully, running the necessary hypothesis tests and calculating each of these parameters was simple. We first used the “Goodness of fit for Poisson” test in order to analyze our arrival rates.
All Necessary Information
Rather than having to fiddle with equations and arrange cells like in Excel, Minitab quickly provided us with all necessary information, including our P-value to determine whether the distribution fit the data as well as parameters for shape and scale.
As for distributions for individual drink preparation times, the process was similarly simple. Using the “Individual Distribution Identification” tool, Minitab ran a series of hypothesis tests, comparing our data against a total of 16 possible distributions. The software output graphs along with P-values and Anderson-Darling values for each distribution, allowing us to graphically and empirically determine the appropriateness of fit.
Within 3 hours, we had sorted and analyzed all of our data.
Not only was Minitab a fantastic tool for our analysis purposes, but the software also provided us with a graphical platform, a means by which to produce most of the graphs used in our research paper and presentation. Once we determined which distribution to use with what data, we used Minitab to output histograms with fitted data distributions for each set of data points. The ease of use for this feature served to save us time, as a series of simple clicks allowed us to output all 10 of our required histograms at the same time.
The same tools first used to analyze our data were then finally used to analyze the success of our simulations; we ran a Kolmogorov-Smirnov test to determine whether two sets of data—in this case, our observed data and the data output by our simulation—share a common distribution. Like most other features in Minitab, it was extremely easy to use and provided clear and immediate feedback as to the results of the test, both graphically and through the requisite critical and KS values
Research isn’t always fun. It’s often long, tedious, and amounts to nothing. Thankfully, that wasn’t our case. Using Minitab, our entire analysis process was simple and painless. The software was easy to learn and was able to run any test quickly and efficiently, providing us with both empirical and graphical evidence of the results as well as high-quality graphs which were used throughout our project. It really was a pleasure to work with.
—The GSET COFF[IE] Team, whose members were Kenneth Acquah, Colin Courchesne, Sheela Hanagal, Kenneth Li, and Caroline Potts. The team was mentored by Juilee Malavade and Brandon Theiss, PE. Photo courtesy Colin Courchesne.
About the Guest Blogger:
Colin Courchesne was a scholar in the 2015 New Jersey Governor's School of Engineering and Technology, a summer program for high-achieving high school students. Students in the program complete a set of challenging courses while working in small groups on real-world research and design projects that relate to the field of engineering. Governor’s School students are mentored by professional engineers as well as Rutgers University honors students and professors, and they often work with companies and organizations to solve real engineering problems.
Would you like to publish a guest post on the Minitab Blog? Contact email@example.com.