Category Archives: Sports

Olympic Medals – August 6, 2012

 

 

Cheers.

Olympic Medals: Top ten teams – August 1, 2012

Cheers.

 

Olympic Medals – July 31, 2012

 

 

 

 


Cheers.

Olympic Pie Charts

First off, for the R nerds out there, (If you don’t care about R, skip to the next paragraph) I’m quickly becoming a huge fan of ggplot2.  Below is an example of combining a facet grid with pie charts using polar coordinates.  Pretty cool.  (I know most graphics people hate pie charts, but I think it works nicely, especially to display Olympic medal counts).  My only question about ggplot2 is how can I color in the background of a plao without changing the pie chart.  I’d like to highlight the plots for a country in years where they hosted the Olympics.  I’ve tried geom_rect() with unlimited bounds,  but this seems to have problems in polar coordinates.  Then I started to try using some grobs commands, but they appear to be out of date.  Any ggplot2 experts out there have any suggestions?

Below is a plot of the Olympic medals won by year for the top 26 countries (by medal count in the 2008 Olympics) for the years 1952 through 2008.  The size of the pie chart is proportional to the number of medals won by a team on a square root scale, and each pie chart shows the break down of total medals by type (gold, silver, or bronze).  So you can see, for instance, that in 2008 China won about 100 medals (in fact, it was exactly 100) and, it’s easy to see that over half of them were gold.  You can also see that the United State won slightly more medals that China (110 to be exact), but the distribution of medals was nearly evenly distributed across the three types (36 gold, 38 silver, 36 bronze).

The graph below is the same as above, but using raw values instead of square root.  This demonstrates why I used a square root scale.

Here is a graph of the top 26 countries by medal count from the 2008 Olympics, across the years from 1896 through 2008.


And finally, if you’re interested, here is a graph of all of the countries from 1952 through 2008, but it’s hard to really see anything.  
Cheers.  

Top 8 Tennis Players

Here is a graph of the ATP points for the top 8 tennis players in the world since 2009.  It’s interesting to note that since 2009, Andy Murray has been ranked above each of Nadal, Djokovic, and Federer at some point, but never above them all at once.  Also, over that same time period n0one of the players currently ranked 5-8 have ever been ranked higher than any of the top 4 players.

Cheers.

Tennis Number 1’s

About a week ago, Roger Federer won his seventh Wimbledon title, which ties Pete Sampras, and his 17 Grand Slam title overall, extending that record even further.  Less (more?) importantly, Mr. Federer regains the ranking of number 1 in the word by the slimmest of margins (11075 to 11000) over Novak Djokovic.  Here is my updated plot of the 19 players who have reached number 1 since 1990.  Notably (still) absent is Mr. Andy Murray.  But as he said himself after his Wimbledon defeat: “I’m getting closer.”

Note: This graph has been updated as it originally included Andy Murray and Benjamin Becker instead of Boris Becker.  I apologize for the mistake.

Cheers.

Olympics Box Plots: Part 2 / ggplot2 Shoutout

So, last Monday I posted some Olympics boxplots, and then I left for the week to go golfing with my father (We play 7 rounds in 4 days, you take your best score on each hole, add them up and declare a winner.  I won 66-68 this year. Pops now leads the all-time series 3-2).  When I came back, the blog has thousands of hits, which I assumed initially to be a mistake.  Turns out, however, those boxplots ended up on the front page of FlowingData, which has way more readers than I do.

While I’m excited to be mentioned on FlowingData, I don’t really think the basic R graphics boxplots really live up to the standard set over there (Nathan Yau described my plots as as “barebones”, and he’s right).  I mean this post of “Every Idea in History” is just straight up impressive.  So since I’ve been trying to learn a new graphing package in R, I decided to update the Olympics plots using ggplot2 to try and look a bit more professional.

In my very limited usage of ggplot2, I have found that it is a little bit harder to use than base graphing and plotting in R.  However, I suspect this is only due to the fact that I am used to one way of plotting in R and, I suspect, as I use ggplot2 more often I won’t believe that I ever used the old way of plotting (its like when I switch from Windows to Unix: A little bit harder to learn, but much better).  While it’s taking me a bit of time to learn, I also have to say that, once you figure out the code, its much easier to get exactly what you want.  For instance, when I originally did my Olympics plots I wanted to order the sports by median age and then split each sport by gender.  I had quite a difficult time doing this in base graphing in R (and just ended up not doing it all together), but ggplot handled this very easily (the code is at the end).

Below are side-by-side box plots of the ages of olympics athletes sorted by median age of the competitor in each sport first for the years 2000-2008 followed by all years.  Within each sport the gender of the competitors is separated out into the appropriate number of box plots, so now the gender distribution within each sport can be easily compared to one another.

Finally, two things I should note that I didn’t mention in the first post:

  • I’ve added a small bit of noise to each of the ages so that the outliers can be seen more clearly.
  • This data that I am using is not the complete set of Olympians all time, though it is the vast, vast, majority of them.  When I was scraping, some athletes’ names contained non-standard characters (e.g. é or ü), and these had to be converted to the English alphabet equivalent (e.g. e or u).  While I manually corrected many of these, I do not believe that I corrected all of them.  So, there are probably a few Olympians missing from my data set, though I believe it is a very, very small number relative to the total number of athletes.

Cheers.

p<-qplot(reorder(factor(Sport),Age.median), Age,fill=factor(Sex),data=dat.summer,geom=”boxplot”)

p+ scale_fill_manual(name = “”, values = c(“green”,”pink”, “blue”),  labels = c(“B” = “Both”, “F” = “Female”,”M”=”Male”))+ xlab(“Sport”)+opts(axis.text.x=theme_text(angle=-90))+ opts(title=”Age Distribution of Olympic Athletes by Age and Gender: 2000-2008″)

The next start after a No-Hitter

Cheers.

Olympics Boxplot

[7/23/2012 Addition: I’ve updated these plots using ggplot2 to look nicer.  They can be found here.]

Recently, I saw this pretty cool chart at the Washington Post (I originally saw the chart at this wonderful blog here) about the ages of olympians from the past three olympics.  I commented to myself that I thought it would be more interesting with boxplots of the data, rather than simple ranges, and I also wondered what it would look like if we used data from all of the past olympics.

So, I wrote some R code and began scraping sports-reference.com/olympics to get a data set with all of the olympic athletes from all of the games.  This took me quite some time (and work kept getting in the way), but I eventually got it right and collected the data.

Here are some of the resulting graphs:

Below is a graph of side-by-size boxplots of age for each sport by gender with blue for male, pink for female, and green for mixed competition.  And no the 11 year old female swimmer is not a typo like I originally thought.


The previous graph was kind of messy, so I’ve sorted this one by median age.  Not surprisingly female gymnastics and rhythmic gymnastics have the lowest median ages of competitors while equestrianism has the highest median age of competitor at over 35 years of age.

The previous two graphs were only for the years of 2000-2008, so I re-did the previous graph using data from all of the olympics.  Since the obvious question arising from this graph is is “What is roque?”, I have saved you the trouble of googling it by providing a wikipedia link for roque.


This graph is boxplots of age by year with the color representing the host continent.

[7/15/2012 Correction: The original post had the 1956 box colored blue for Europe.  However, commenter Mules points out that 1956 should actually be yellow for Australia.  They are correct and the correction has been made.  However, as I point out in response, I’m not totally wrong: The equestrian events had to be held in Stockholm, Sweden due to quarantine restrictions.]

[7/23/2012 Correction: The graph below had some mistakes in it, including an olympian who was over 90.  This was pointed out by Kate, and has been corrected.]


And finally, we have overall age by gender.

Cheers.

MLB rankings – 7/9/2012

StatsInTheWild MLB rankings as of July 9, 2012 at 1:22pm.  SOS=strength of schedule

Team Rank Change Record ESPN TeamRankings.com SOS Run Diff
NYY 1 ↑1 52-33 1 1 4 +65
Texas 2 ↓1 52-34 2 2 13 +79
LA Angels 3 ↑1 48-38 4 4 11 +44
ChiSox 4 ↑2 47-38 6 5 14 +63
Boston 5 ↓2 43-43 17 13 6 +43
Toronto 6 ↓1 43-43 18 12 2 +22
Washington 7 49-34 3 3 21 +58
TampaBay 8 45-41 14 8 3 +4
Detroit 9 ↑2 44-42 16 11 10 +6
Baltimore 10 ↓1 45-40 11 7 1 -36
Oakland 11 ↓1 43-43 19 16 7 +3
Cincinnati 12 47-38 8 9 25 +42
St. Louis 13 ↑2 46-40 13 17 26 +70
Atlanta 14 46-39 12 10 20 +34
Pittsburgh 15 ↑4
48-37 5 6 27 +32
Cleveland 16 ↑1 44-41 15 14 12 -29
NY Mets 17 ↓4 46-40 10 15 17 +20
LA Dodgers 18 47-40 7 19 28 +10
Seattle 19 ↑2
36-51 26 24 5 -28
SF 20 ↓4 46-40 9 18 30 -8
Kansas City 21 ↑1 37-47 24 21 9 -41
Arizona 22 ↓2 42-43 20 20 29 +10
Miami 23 41-44 21 22 15 -56
Milwaukee 24 ↑1 40-45 22 25 24 -9
Minnesota 25 ↑1 36-49 25 23 8 -87
Philadelphia 26 ↓2
37-50 23 26 16 -28
Chic Cubs 27 ↑2 33-52 29 27 18 -69
Houston 28 ↓1 33-53 28 30 19 -72
Colorado 29 ↓1 33-52 30 28 22 -66
San Diego 30 34-53 27 29 23 -76

Past Rankings:

7/2/2012

6/25/2012

6/19/2012

6/9/2012

5/28/2012

5/23/2012

5/14/2012

5/7/2012

4/30/2012

4/23/2012

4/16/2012

4/13/2012

Cheers.