by Matthew Barsalou, guest blogger
A good way to begin researching a topic is with exploratory data analysis (EDA). In his 1977 book Exploratory Data Analysis, John Tukey suggested using EDA to collect and analyze data—not to confirm a hypothesis, but to form a hypothesis that could later be confirmed through other methods.
In some cases, EDA can even eliminate the need for a more in-depth hypothesis test. Here's a case in point.
When I heard about the new Star Trek movie, I had started to complain to anybody who would listen (which was not many people) that director J. J. Abrams had used such a young cast in the 2009 Star Trek film.
With a tentative hypothesis of “the new Star Trek films use very young actors and actresses compared to the older Star Trek series,” I decided to look into this further. The first thing I did was collect data to use later in boxplots, which are a part of Tukey’s EDA.
Collecting Data for the Exploratory Analysis
I needed to determine the ages at which each main Star Trek actor first appeared; however, before I started looking for ages, I needed a method to determine whom I should consider as a main character in each series. To select the actors to consider I went to www.StarTrek.com and observed which characters were listed for each Star Trek series. This way I avoided biasing my results by selecting older or younger crewmembers who may not have had as much relevance as others.
The tables below list the characters and the episode or movie in which they first appeared. The name of the actor playing each character is then listed, and their year of birth as determined by viewing their entry at the Internet Movie Database. To determine the person’s age, the date of birth was subtracted from the year of first appearance. This resulted in rough calculations which could be wrong by a year, because month of birth and month of first appearance were not considered.
Table 1: Star Trek: The Original Series
Name |
Character |
First appeared in |
Year of birth |
Year of first appearance |
Age +/- 1 year |
William Shatner |
James T. Kirk |
The Man Trap |
1931 |
1966 |
35 |
Leonard Nimoy |
Spock |
The Man Trap |
1931 |
1966 |
35 |
DeForest Kelley |
Leonard “Bones” McCoy |
The Man Trap |
1920 |
1966 |
46 |
James Doohan |
Montgomery “Scotty” Scott |
The Man Trap |
1920 |
1966 |
46 |
George Takei |
Sulu |
The Man Trap |
1937 |
1966 |
29 |
Nichelle Nichols |
Uhura |
The Man Trap |
1932 |
1966 |
34 |
Walter Koenig |
Pavel Andreievich Checkov |
Amok Time |
1936 |
1967 |
31 |
Table 2: Star Trek: The Next Generation
Name |
Character |
First appeared in |
Year of birth |
Year of first appearance |
Age |
Patrick Stewart |
Jean-Luc Picard |
Encounter at Farpoint |
1940 |
1987 |
47 |
Jonathan Frakes |
Will Riker |
Encounter at Farpoint |
1952 |
1987 |
35 |
Brent Spiner |
Data |
Encounter at Farpoint |
1949 |
1987 |
38 |
Levar Burton |
Geordi La Forge |
Encounter at Farpoint |
1957 |
1987 |
30 |
Michael Dorn |
Worf |
Encounter at Farpoint |
1952 |
1987 |
35 |
Marina Sirtits |
Deana Troi |
Encounter at Farpoint |
1955 |
1987 |
32 |
Gates McFadden |
Beverly Crusher |
Encounter at Farpoint |
1949 |
1987 |
38 |
Wil Wheaton |
Wesley Crusher |
Encounter at Farpoint |
1972 |
1987 |
15 |
Table 3: Star Trek: Deep Space Nine
Name |
Character |
First appeared in |
Year of birth |
Year of first appearance |
Age |
Avery Brooks |
Benjamin Sisko |
Emissary |
1948 |
1993 |
45 |
Nan Visitor |
Kira Nerys |
Emissary |
1957 |
1993 |
36 |
Rene Auberjonois |
Odo |
Emissary |
1940 |
1993 |
53 |
Alexander Siddig |
Julian Bashir |
Emissary |
1965 |
1993 |
28 |
Colm Meany |
Miles O’Brien |
Emissary |
1953 |
1993 |
40 |
Terry Farrell |
Jadzia Dax |
Emissary |
1963 |
1993 |
30 |
Armin Shimerman |
Quark |
Emissary |
1949 |
1993 |
44 |
Cirroc Lofton |
Jake Sisko |
Emissary |
1978 |
1993 |
15 |
Michael Dorn |
Worf |
The Way of the Warrior |
1952 |
1995 |
46 |
Nicole de Boer |
Ezri Dax |
Image in the Sand |
1970 |
1998 |
28 |
Table 4: Star Trek: Voyager
Name |
Character |
First appeared in |
Year of birth |
Year of first appearance |
Age |
Kate Mulgrew |
Kathryn Janeway |
Caretaker |
1955 |
1995 |
40 |
Robert Beltran |
Chakotay |
Caretaker |
1953 |
1995 |
42 |
Tim Russ |
Tuvok |
Caretaker |
1956 |
1995 |
39 |
Robert Duncan McNeill |
Tom Paris |
Caretaker |
1964 |
1995 |
31 |
Roxann Dawson |
B’Elanna Torres |
Caretaker |
1958 |
1995 |
37 |
Garrett Wang |
Harry Kim |
Caretaker |
1968 |
1995 |
27 |
Robert Picardo |
The Doctor |
Caretaker |
1953 |
1995 |
42 |
Ethan Phillips |
Neelix |
Caretaker |
1955 |
1995 |
40 |
Jennifer Lien |
Kes |
Caretaker |
1974 |
1995 |
21 |
Jerry Ryan |
Seven of Nine |
Scorpion: |
1968 |
1997 |
29 |
Table 5: Star Trek: Enterprise
Name |
Character |
First appeared in |
Year of birth |
Year of first appearance |
Age |
Scott Bakula |
Jonathan Archer |
Broken Bow: |
1954 |
2001 |
47 |
Jolene Blalock |
T’pol |
Broken Bow: |
1975 |
2001 |
26 |
Connor Trinneer |
Charles “Trip” Tucker III |
Broken Bow: |
1969 |
2001 |
32 |
Dominic Keating |
Malcom Reed |
Broken Bow: |
1962 |
2001 |
39 |
John Billingsley |
Phlox |
Broken Bow: |
1960 |
2001 |
41 |
Linda Park |
Hoshi Sato |
Broken Bow: |
1978 |
2001 |
23 |
Anthony Montgomery |
Travis Mayweather |
Broken Bow: |
1971 |
2001 |
30 |
Table 6: Star Trek (2009)
Name |
Character |
First appeared in |
Year of birth |
Year of first appearance |
Age |
Chris Pine |
James T. kirk |
Star Trek (2009) |
1980 |
2009 |
29 |
Zachary Quinto |
Spock |
Star Trek (2009) |
1977 |
2009 |
32 |
Karl Urban |
Leonard “Bones” McCoy |
Star Trek (2009) |
1972 |
2009 |
37 |
Zoe Saldana |
Nyota Uhura |
Star Trek (2009) |
1978 |
2009 |
31 |
Simon Pegg |
Montgomery “Scotty” Scott |
Star Trek (2009) |
1970 |
2009 |
39 |
John Cho |
Hukaru Sulu |
Star Trek (2009) |
1972 |
2009 |
37 |
Anton Yelchin |
Pavel Andreievich Checkov |
Star Trek (2009) |
1989 |
2009 |
30 |
EDA: Interpreting the Data with a Boxplot
Simply looking at the results in tables 1 through 6 led to me suspect my hypothesis may have been incorrect, but I still proceeded to create a Minitab boxplot with the data.
The boxplot depicts the ages of the actors and actresses in each Star Trek series as well as in the 2009 reboot. The rectangular boxes represent the middle 50% of each data set and the vertical lines on top of the rectangular boxes represent the upper 25% of the data. The vertical lines on the bottom of the rectangular boxes represent the lower 25% of the data—except in the case of outliers. Outliers are unusually large or small observations and are represented by an asterisk. There is only one outlier in this boxplot, and that is Will Wheaton as Wesley Crusher in Star Trek: TNG.
The symbol that looks like a plus sign inside of a small circle is used to represent the average of the data set. The average age of actors and actresses in the 2009 reboot is 33.57 years, and this is just slightly lower than Star Trek: TNG, which had an average of 33.75 years of age. The highest average age was for Star Trek: TOS with an average of 36.57.
What truly stands out in the boxplot is the spread of the data. The distribution of actors' ages in the reboot was less than that of all of the other series. This would make sense as it would not be plausible to use actors or actresses in their 50s or 60s to portray people who are still attending Star Fleet Academy.
The hypothesis that originally started this was “the new Star Trek films use very young actors and actresses compared to the older Star Trek series,” but a look at the boxplots in figure one show that this may not be the case. In fact, there is no reason to proceed on to confirmation testing because my hypothesis can be discarded at this point.
It looks like I owe director J. J. Abrams an apology.
Exploratory Data Analysis Raises New Questions
Even a hypothesis that was discarded after performing EDA can lead to the...um...next generation of hypotheses, and new insights. In this case, my new hypothesis could be, “The actors and actresses in Star Trek are not getting younger; I am getting older.” The new hypothesis could also be explored with EDA prior to moving on to more robust methods.
However, in this case, I will not investigate my new hypothesis. I would rather just change the subject.
About the Guest Blogger:
Photo of Star Trek figures by Miguel Bernas, used under creative commons 2.0 license.