# Olympics Boxplot

[7/23/2012 Addition: I've updated these plots using ggplot2 to look nicer.  They can be found here.]

Recently, I saw this pretty cool chart at the Washington Post (I originally saw the chart at this wonderful blog here) about the ages of olympians from the past three olympics.  I commented to myself that I thought it would be more interesting with boxplots of the data, rather than simple ranges, and I also wondered what it would look like if we used data from all of the past olympics.

So, I wrote some R code and began scraping sports-reference.com/olympics to get a data set with all of the olympic athletes from all of the games.  This took me quite some time (and work kept getting in the way), but I eventually got it right and collected the data.

Here are some of the resulting graphs:

Below is a graph of side-by-size boxplots of age for each sport by gender with blue for male, pink for female, and green for mixed competition.  And no the 11 year old female swimmer is not a typo like I originally thought.

The previous graph was kind of messy, so I’ve sorted this one by median age.  Not surprisingly female gymnastics and rhythmic gymnastics have the lowest median ages of competitors while equestrianism has the highest median age of competitor at over 35 years of age.

The previous two graphs were only for the years of 2000-2008, so I re-did the previous graph using data from all of the olympics.  Since the obvious question arising from this graph is is “What is roque?”, I have saved you the trouble of googling it by providing a wikipedia link for roque.

This graph is boxplots of age by year with the color representing the host continent.

[7/15/2012 Correction: The original post had the 1956 box colored blue for Europe.  However, commenter Mules points out that 1956 should actually be yellow for Australia.  They are correct and the correction has been made.  However, as I point out in response, I'm not totally wrong: The equestrian events had to be held in Stockholm, Sweden due to quarantine restrictions.]

[7/23/2012 Correction: The graph below had some mistakes in it, including an olympian who was over 90.  This was pointed out by Kate, and has been corrected.]

Cheers.

## 27 thoughts on “Olympics Boxplot”

1. So this is very nit picky but I like my boxplots to have closed small circles `pch=19` with an alpha `col=rgb(.1,.1,.1,.5)` so I can see the when the outliers are thick. Could you post your code to github so I can be a nitpicking “perfectionist” with the charts? ;)

• I think I finally got github working. Here (should) be my Olmypics code. Feedback (politely) is welcome.

I really like the idea of CRAPL. And this quote from here sums up my code pretty well:

I kept telling myself that I’d clean it all up and release it some day.

I have to be honest with myself: this clean-up is never going to happen.

2. I’d like to see a graph of sport v age v sex v result (say for the top 5-8 in each event, but if it’s only the medalists [top 3], that’s ok). That would reveal what age range is needed to be competitive in which sports, and exclude the (many) ceremonial entries from countries that don’t really have a serious program in that particular sport (e.g. think Jamaica in the bobsled).

This would help illuminate which events truly require youth over experience/training (and perhaps point out which events don’t require much athleticism).

I’d also like to see a graph adding in muscle v fat measures (e.g. BMI). I think there might be some surprises there (for US folk, anyway ;).

…Dave…

• I’ll post the boxplot with medal winners, as that’s easy to do.

As for the BMI suggestion, I’d like to see that plot, too. But I don’t have that data. If you’d like to send me that data, I’d be happy to make that graph.

• Thanks – should be interesting.

I’ve never run across a reasonable body fat percentage data set for any sport, let alone all olympians. I just tried once again to track one down, and didn’t find anything other than the usual numbers for a few, selected athletes. My initial interest in that stat came from a report that the US did this during a combined training camp in Colo Springs many years ago (and there were some surprises), but I was never able to find that data set, nor even confirm the story.

Upon review, BMI would be a meaningless measure for this purpose (many olympic athletes rate as obese on that scale). Accurate body fat measurements are complex to get (e.g. an MRI), and I’m sure many countries are very protective of their athletes, which makes me fairly doubtful that anyone has this data in one place. Perhaps each country has this data though — however it would be measured by different methodologies, complicating the comparison.

If I find such a data set though, I’ll certainly let you know.

…Dave…

• Very nice. I’m not sure of the sort order – avg of men’s and women’s ages?

Who are the 40+ swimmers? Seems odd.

How are the teams sports represented (again possible masking effects – men’s soccer/football seems a very narrow range to me)?

Women’s cycling age is a surprise to me.

• Men’s soccer (or “football” as the rest of the world refers to it “incorrectly”.
According to Wikipedia:

Since 1992 male competitors must be under 23 years old, with three over-23 players allowed per squad. The new format allows teams from around the world to compete equally, and African countries have taken particular advantage of this, with Nigeria and Cameroon winning in 1996 and 2000 respectively.

Swimming:
Dara Torres was 41 in 2008 and she won three silver medals:
http://www.sports-reference.com/olympics/athletes/to/dara-torres-1.html

The graphs are sorted by median.

Cheers.

3. I had read before that the oldest Olympian ever was a Swedish shooter named Oscar Swahn and that he was in his 70s. But in the 4th graph it looks like there is someone who is 92 years old from 1932! Is that right? Am I missing something?

ps I have really been enjoying these.