Category Archives: Olympics
Canada managed to win 18 total medals in the 2012 Olympics, while only tallying one gold medal. Previously, I proposed the question, has anyone ever won more medals than this with fewer golds? The answer, shockingly is yes. In 1952, Germany managed to win 24 total medals and exactly ZERO golds (7 Silver and 17 Bronze). Incredible. Really nice work.
Other countries that have come close include the United Kingdom in 1960 (2G, 6S, 12B), Sweden in 1984 (2G, 11S, 6B), and Cuba in 2008 (2G, 11S, 12B).
Another fun fact: Only 5 times has a team won more than 50 total medals and won more bronze than silver and more silver than gold. They are Sweden 1920 (19G, 20S, 24B), Soviet Union in 1964(30G, 31S, 35B), West Germany in 1984 (17G, 19S, 23B), Germany in 2000 (13G, 17S, 26B), and, most recently, Russia in 2012 (24G, 26S, 32B).
See you in Brazil in 2016!
I read this story about some badminton players that were thrown out of the Olympics for intentionally losing a match. In the words of Deadspin:
The Chinese team of Wang Xiaoli and Yu Yang and South Koreans Jung Kyung-eun and Kim Ha-na played a farce of a match in which players served into the net on purpose or lazily launched shots out of bounds. The Chinese players’ incentive to lose was a bracket placement that would keep them on the opposite side from the other top-ranked Chinese team, meaning they’d avoid facing them until the finals. The Koreans, having sensed the plot from Wang and Yu, attempted to lose themselves in response.
Wait. What? They were trying to lose the match on purpose to avoid playing a team that they didn’t want to play until the finals? That seems like a totally rational thing to do. The goal at the Olympics is to win a medal. The goal is not to get as high a seed as possible coming out of pool play. If you don’t want situations like this then, don’t play this format. Sure, they were throwing the game, but they weren’t involved in a betting scandal or anything. They were losing on purpose to give themselves, in their minds, the largest probability to win a medal. Isn’t that the rational thing to do if your goal is to win a medal? And the goal is to win medals, right? Right?
Now, you could look at this as China colluding to try to maximize the number of medals they can win. By avoiding each other in the elimination rounds until the finals, they avoid eliminating each other from medal contention. This might be frowned upon, but again, isn’t it the rational thing for the Chinese team to do? I guess “always trying your hardest” is more of an Olympic ideal than “doing the rational thing”.
First off, for the R nerds out there, (If you don’t care about R, skip to the next paragraph) I’m quickly becoming a huge fan of ggplot2. Below is an example of combining a facet grid with pie charts using polar coordinates. Pretty cool. (I know most graphics people hate pie charts, but I think it works nicely, especially to display Olympic medal counts). My only question about ggplot2 is how can I color in the background of a plao without changing the pie chart. I’d like to highlight the plots for a country in years where they hosted the Olympics. I’ve tried geom_rect() with unlimited bounds, but this seems to have problems in polar coordinates. Then I started to try using some grobs commands, but they appear to be out of date. Any ggplot2 experts out there have any suggestions?
Below is a plot of the Olympic medals won by year for the top 26 countries (by medal count in the 2008 Olympics) for the years 1952 through 2008. The size of the pie chart is proportional to the number of medals won by a team on a square root scale, and each pie chart shows the break down of total medals by type (gold, silver, or bronze). So you can see, for instance, that in 2008 China won about 100 medals (in fact, it was exactly 100) and, it’s easy to see that over half of them were gold. You can also see that the United State won slightly more medals that China (110 to be exact), but the distribution of medals was nearly evenly distributed across the three types (36 gold, 38 silver, 36 bronze).
The graph below is the same as above, but using raw values instead of square root. This demonstrates why I used a square root scale.
Here is a graph of the top 26 countries by medal count from the 2008 Olympics, across the years from 1896 through 2008.
So, last Monday I posted some Olympics boxplots, and then I left for the week to go golfing with my father (We play 7 rounds in 4 days, you take your best score on each hole, add them up and declare a winner. I won 66-68 this year. Pops now leads the all-time series 3-2). When I came back, the blog has thousands of hits, which I assumed initially to be a mistake. Turns out, however, those boxplots ended up on the front page of FlowingData, which has way more readers than I do.
While I’m excited to be mentioned on FlowingData, I don’t really think the basic R graphics boxplots really live up to the standard set over there (Nathan Yau described my plots as as “barebones”, and he’s right). I mean this post of “Every Idea in History” is just straight up impressive. So since I’ve been trying to learn a new graphing package in R, I decided to update the Olympics plots using ggplot2 to try and look a bit more professional.
In my very limited usage of ggplot2, I have found that it is a little bit harder to use than base graphing and plotting in R. However, I suspect this is only due to the fact that I am used to one way of plotting in R and, I suspect, as I use ggplot2 more often I won’t believe that I ever used the old way of plotting (its like when I switch from Windows to Unix: A little bit harder to learn, but much better). While it’s taking me a bit of time to learn, I also have to say that, once you figure out the code, its much easier to get exactly what you want. For instance, when I originally did my Olympics plots I wanted to order the sports by median age and then split each sport by gender. I had quite a difficult time doing this in base graphing in R (and just ended up not doing it all together), but ggplot handled this very easily (the code is at the end).
Below are side-by-side box plots of the ages of olympics athletes sorted by median age of the competitor in each sport first for the years 2000-2008 followed by all years. Within each sport the gender of the competitors is separated out into the appropriate number of box plots, so now the gender distribution within each sport can be easily compared to one another.
Finally, two things I should note that I didn’t mention in the first post:
- I’ve added a small bit of noise to each of the ages so that the outliers can be seen more clearly.
- This data that I am using is not the complete set of Olympians all time, though it is the vast, vast, majority of them. When I was scraping, some athletes’ names contained non-standard characters (e.g. é or ü), and these had to be converted to the English alphabet equivalent (e.g. e or u). While I manually corrected many of these, I do not believe that I corrected all of them. So, there are probably a few Olympians missing from my data set, though I believe it is a very, very small number relative to the total number of athletes.
p+ scale_fill_manual(name = “”, values = c(“green”,”pink”, “blue”), labels = c(“B” = “Both”, “F” = “Female”,”M”=”Male”))+ xlab(“Sport”)+opts(axis.text.x=theme_text(angle=-90))+ opts(title=”Age Distribution of Olympic Athletes by Age and Gender: 2000-2008″)
[7/23/2012 Addition: I’ve updated these plots using ggplot2 to look nicer. They can be found here.]
Recently, I saw this pretty cool chart at the Washington Post (I originally saw the chart at this wonderful blog here) about the ages of olympians from the past three olympics. I commented to myself that I thought it would be more interesting with boxplots of the data, rather than simple ranges, and I also wondered what it would look like if we used data from all of the past olympics.
So, I wrote some R code and began scraping sports-reference.com/olympics to get a data set with all of the olympic athletes from all of the games. This took me quite some time (and work kept getting in the way), but I eventually got it right and collected the data.
Here are some of the resulting graphs:
Below is a graph of side-by-size boxplots of age for each sport by gender with blue for male, pink for female, and green for mixed competition. And no the 11 year old female swimmer is not a typo like I originally thought.
The previous graph was kind of messy, so I’ve sorted this one by median age. Not surprisingly female gymnastics and rhythmic gymnastics have the lowest median ages of competitors while equestrianism has the highest median age of competitor at over 35 years of age.
The previous two graphs were only for the years of 2000-2008, so I re-did the previous graph using data from all of the olympics. Since the obvious question arising from this graph is is “What is roque?”, I have saved you the trouble of googling it by providing a wikipedia link for roque.
This graph is boxplots of age by year with the color representing the host continent.
[7/15/2012 Correction: The original post had the 1956 box colored blue for Europe. However, commenter Mules points out that 1956 should actually be yellow for Australia. They are correct and the correction has been made. However, as I point out in response, I’m not totally wrong: The equestrian events had to be held in Stockholm, Sweden due to quarantine restrictions.]
[7/23/2012 Correction: The graph below had some mistakes in it, including an olympian who was over 90. This was pointed out by Kate, and has been corrected.]