Category Archives: R
Here are some star plots for major league baseball batters, pitchers, and ball parks. The star plots represent the outcomes of a particular at bat for a hitter, a pitcher, or at a given ball park. For each plot, batter, pitcher, and ball park was varied, while the other two parameters were filled in with the average value. For instance, all batters outcomes are calculated as if they were facing J. Kinney at Wrigley Field; Pitchers data was calculated as if they were facing K. Medlen at Wrigley; and Park factors were calculated as the outcome of J. Kinney vs K. Medlen at different ball parks. The data use to calculate these were downloaded from baseball-reference.com and includes the results every single plate appearance so far this season (about 125,000 so far) and where the game was played. Six outcomes to an at bat were considered: out, walk, single, double, triple, and home run. The probability of each of these events was estimated creating a vectors of probabilities with six elements corresponding to each of the six outcome considered. I’ve chosen to display this data using the star plots below. The key to the star plot can be found in the lower left corner of each plot and displays the probabilities of each outcome relative to other batters. For instance, a large blue pie piece on the left indicates that batter’s plate appearance ends with a HR more often relative to other players. Likewise, a large red pie on the right indicates that the batter’s plate appearance ends in an out more often than other players.
I’ve chosen 100 batters based on their wide range of hitting styles. In the first row, you’ll players who make outs at the lowest rates relative to other players. These include players like Joey Votto, Andrew McCutchen, David Wight, and Mike Trout. Further down, you’ll start to see players who you might describe as single’s hitters. These include players like Ruben Tejada, Derek jeter, B. Revere, and Juan Pierre. Finally, towards the bottom row, you’ll see the players who are primarily power hitters like Adam Dunn and Jose Bautista with large blue, for home run,s and orange, for walks, pie pieces, with significant red for outs. Other players on this row like Saltalamacchia, Plouffe, and Rosario have the large blue and significant red pieces, but they lack walks.
The star plot for pitchers is below. The first thing we need to say here is that Justin Verlander is very, very good at pitching a baseball. Some other interesting pitchers here are Yu Darvish, Edison Volquez, and Carlos Zambrano. They seem to give up relatively few hits, but they give up many more walks that the average pitcher.
These plots are ordered from highest to lowest probability that an out will be made in a given plate appearance. Pittsburgh, Seattle, and San Francisco lead the way in pitching friendly parks. These are the same as the bottom three according to ESPNs measure of Park factor. The most hitter friendly park is, no surprise, Coors field in Colorado. Other hitter friendly parks include Target and Chase field in Minnesota and Arizona, respectively. Arizona is expected here, but Minnesota is a little bit surprising. It looks like, while it is rare to make an out, most hits are only singles, which don’t generate as many runs are their extra base counterparts. Home run friendly parks include Coors, Chase, Camden Yards, Miller, Comisky, and Yankee Stadium. Fenway park is solidly in the hitter category, but it gets that way, rather than by giving up many homeruns, by yielding a greater percentage of doubles than any other park.
StatsInTheWild MLB rankings as of August 6, 2012 at 8:17pm. SOS=strength of schedule
First off, for the R nerds out there, (If you don’t care about R, skip to the next paragraph) I’m quickly becoming a huge fan of ggplot2. Below is an example of combining a facet grid with pie charts using polar coordinates. Pretty cool. (I know most graphics people hate pie charts, but I think it works nicely, especially to display Olympic medal counts). My only question about ggplot2 is how can I color in the background of a plao without changing the pie chart. I’d like to highlight the plots for a country in years where they hosted the Olympics. I’ve tried geom_rect() with unlimited bounds, but this seems to have problems in polar coordinates. Then I started to try using some grobs commands, but they appear to be out of date. Any ggplot2 experts out there have any suggestions?
Below is a plot of the Olympic medals won by year for the top 26 countries (by medal count in the 2008 Olympics) for the years 1952 through 2008. The size of the pie chart is proportional to the number of medals won by a team on a square root scale, and each pie chart shows the break down of total medals by type (gold, silver, or bronze). So you can see, for instance, that in 2008 China won about 100 medals (in fact, it was exactly 100) and, it’s easy to see that over half of them were gold. You can also see that the United State won slightly more medals that China (110 to be exact), but the distribution of medals was nearly evenly distributed across the three types (36 gold, 38 silver, 36 bronze).
The graph below is the same as above, but using raw values instead of square root. This demonstrates why I used a square root scale.
Here is a graph of the top 26 countries by medal count from the 2008 Olympics, across the years from 1896 through 2008.
Last week, I posted some boxplots of Olympic athletes’ ages by sport, which I then updated using ggplot2 to look nicer. I had a few requests for the code that I used to generate these plots, so I posted my code here.
Isomorphismes has suggested (politely) some improvements to my code in this post, “Outer Product of Character Vectors in R”, on R-bloggers.com. I’ve been reluctant to post my code in the past as I don’t consider myself to a very good coder, but I need to get over this. There is too much to be gained by putting my code out there and having it critiqued by others who have more experience with coding than I do, as opposed to just keeping my crappy code to myself.
Here is a graph of the ATP points for the top 8 tennis players in the world since 2009. It’s interesting to note that since 2009, Andy Murray has been ranked above each of Nadal, Djokovic, and Federer at some point, but never above them all at once. Also, over that same time period n0one of the players currently ranked 5-8 have ever been ranked higher than any of the top 4 players.
About a week ago, Roger Federer won his seventh Wimbledon title, which ties Pete Sampras, and his 17 Grand Slam title overall, extending that record even further. Less (more?) importantly, Mr. Federer regains the ranking of number 1 in the word by the slimmest of margins (11075 to 11000) over Novak Djokovic. Here is my updated plot of the 19 players who have reached number 1 since 1990. Notably (still) absent is Mr. Andy Murray. But as he said himself after his Wimbledon defeat: “I’m getting closer.”
Note: This graph has been updated as it originally included Andy Murray and Benjamin Becker instead of Boris Becker. I apologize for the mistake.