Category Archives: R

Biggest NFL Spread Changes in Rematches in 2018

The Bears just clinched the NFC North for the first time in, I want to say, 100 years, by beating the Green Bay Packers last weekend at Soldier Field.  Their week 15 meeting was the second time these division rivals have played this season and their first meeting came way back in week 1 when Chicago blew a 20 point lead and they looked well on their way to a 5-11 season, while Aaron Rodgers looked like Superman.  But that was a long time ago and everyone seems to have caught up to the idea that the Bears are good this year and Green Bay is not.  And you can see this in the spreads for the two games.

In the first meeting at Lambeau Field in week 1, the Packers were 6.5 point favorites over the Bears, who covered despite losing in crushing fashion.  14 weeks later and spread for the Bears-Packers game at Soldier Field was Bears -5.5.  That is a shift of 12 points.

Now some of this has to do with home field advantage.  If two teams were essentially equal on a neutral field, you’d expect this difference to be about 6-ish (-3 at home and +3 away).  But 12 seemed rather large to me, and I wondered if that was the largest shift in spreads in a rematch this year.  While it is not, in fact, the largest, it is close.  There were two matchups that had a larger shift in spreads.  Stop reading and try to guess what those match-ups were.

Screen Shot 2018-12-21 at 10.11.58 AM.png

Ok.  You ready now?  The largest difference this year was the Titans and Jaguars.  In September the home Jaguars were favored by 10 over the Titans in week 3.  In week 13, The Titans at home were favored by 5.5, for a 15.5 point swing.  Coming in at number 2 was Atlanta and New Orleans. In their first meeting in week 3, the Falcons were favored by 2.  In week 11, the Saints were favored by 12.5.  The previously mentioned Bears and Packers came in at 3rd largest with a shift of 12.0.

Screen Shot 2018-12-21 at 12.01.37 PM.png

The only other two double digit shifts were Buffalo-NY Jets and Dallas-Philadelphia.  The Bill vs Jets shift happened in only 4 weeks.  In week 10, the Jets were favored by 7, then in week 14 the Bills were favored by 4.5 .  Rounding out the top five was the Cowboys and Eagles.  In week 10, the Eagles were favored by 7 points then by week 14 the Cowboys were favored by 3.5.  Here is the list of all of the shifts of at least 5:

  1. Jacksonville – Tennessee: 15.5
  2. Atlanta – New Orleans:  14.5
  3. Chicago – Green Bay:  12.0
  4. Buffalo – New York Jets: 11.5
  5. Dallas – Philadelphia: 10.5
  6. Kansas City – LA Chargers: 7
  7. Dallas – Washington: 6
  8. Miami – New York: 6
  9. San Francisco – Seattle: 5.5
  10. Baltimore – Cincinnati: 5.5
  11. Cleveland – Pittsburgh: 5.5
  12. LA Chargers – Oakland: 5.0

I’ll follow up on this when the season ends, and I also want to go back and look at past seasons.



Obama made out of letters

Per the request of @M_T_Patterson and @statsByLopez

The code to generate the following image is below.  Enjoy.


[code language=”r”]

#im <- readImage("/Users/gregorymatthews/Desktop/guyAlley.jpg")
im <- readImage("/Users/gregorymatthews/Desktop/barry.png")


for (i in 1:ddd[1]){print(i)
for (j in 1:ddd[2]){


Fully open-source, transparent implementation of Wins Above Replacement: Results from 2013

Over the past year, I’ve been involved in a project with Ben Baumer (buy his book!)  and Shane Jensen in developing an open source, completely transparent version of the (rather opaque) baseball statistic Wins Above Replacement that we’re calling  openWAR.  We presented our preliminary results this past summer in a talk at JSM and this fall in a poster at NESSIS, but now our full paper is available on ArXiV.  (Below you can see the chalkboard that resulted from our initial discussion….I assume this will be historic someday.)

2013-05-24 17.47.21 (1)

As part of our open source proposal, we’ve also developed an R package, also called openWAR, that allows the user to scrape play by play data from the web and then, if they choose, compute our version of openWAR.  The package is currently available on Ben’s github and should be available on CRAN soon.   (Jim Albert (!) mentioned this package in his recent book , which you should probably buy even if it didn’t have my name in it.  You should buy it twice, since my name is in it.)

Quick story about Jim Albert: When I was deciding where to go to grad school I applied to Bowling Green specifically because Jim Albert was there.  I got in and even had an email address and was all set to go, but they couldn’t give me an answer about funding.  UConn came along and offered me full funding, and the rest is history.  So it’s a pretty big honor for me to be mentioned in Jim Albert’s book.


So what are our results?  Below you’ll find out top 20 players for 2013.  One interesting thing to note is that according to our openWAR, Trout actually had a better year in 2013 than in 2012 and he still didn’t win the MVP award.

Screen Shot 2014-01-03 at 7.09.08 PMHere is a comparison of our top 10 players from 2013 versus Fangraph’s top 10 players.  Both methods agree that Mike Trout was the best player in 2013, and both methods had Josh Donaldson, Miguel Cabrera, Chris Davis, and Paul Goldschmidt in the top 10.

Screen Shot 2014-01-03 at 7.08.59 PM

Next is a table of the ten best and worst fielders of 2013.  What you should notice about this is that Miguel Cabrera, according to openWAR, was the worst fielder in baseball in 2013.  It’s really incredible that his offensive numbers are so good that they more than compensate for his poor fielding.
Screen Shot 2014-01-03 at 7.31.56 PM

The best base runner of 2013 was Ian Kinsler with a RAA of 10.64 and the worst base runner was Victor Martinez.  The ninth worst base runner in 2013 was….Miguel Cabrera.  Again, think about how good Cabrera has to be as a hitter to overcome his weaknesses as a fielder AND a baserunner to have won TWO AL MVP awards in a row.  Screen Shot 2014-01-03 at 7.08.48 PM


MLB Payroll vs Winning percentage

Dave Cameron over at FanGraphs wrote an interesting article about 2012 payroll and wins.  In it, he used a scatterplot, which I assume was made with excel.  I’d like to try to persuade everyone to stop making graphics in excel.  I’m probably a little bit biased, but R with the ggplot2 package is much, much better.  (And it’s easy!) I present to you below, my entire argument for why R with ggplot2 is better than excel:

Here is the code for making this graph


Presidential Candidates, Search Engine Auto-Complete, and Word Clouds: Bicycles, Unicorns, and American Presidential Politics

Finally, I’ve managed to post something that’s not about @BillBarnwell‘s flawed “study” titled “Mere Mortals” (Here’s why he’s wrong.  Here is what happens when I apply his logic to something else….you get non-sense).  Anyway, here is an update to the presidential candidates search engine auto-complete word clouds (The original post and description of how the data is collected and processed is here).

According to search engines Obama is a gay, socialist/communist, muslim terrorist version of the Antichrist (or possible, not even living thing, but a bicycle), and Romney is an idiot, douche bag, ass hole, mormon unicorn that lies.


Idiots, Liars, and Unicorns: Presidential Politics and Search Engines

In the past I’ve posted search engine auto-completes for some of the presidential candidates.  For instance, here are Romney and Obama’s results from 5/30/2012, here are Romney and Obama’s results from 4/16/2012, and here are the republican primary candidates from 12/29/2011.  Below you will find the auto-completes for the two presidential candidates from 8/16/2012.  I’m also including word clouds now.

I’m using three search engines (Google, Bing, and Yahoo!) and two search terms for each candidate (Mitt Romney, Mitt Romney is, Barack Obama, Barack Obama is).  I’m then weighting the terms from 10 to 1 for Google and Yahoo and 8 to 1 for Bing (as they only return 8 search terms), based on the order they appear in the auto-completes.  For the first two word clouds, I’m additionally weighting the search engines with Google getting weight 11.7, Bing gets 2.7, and Yahoo gets 2.4.  (These numbers are approximately the number, in billions, of searches performed on each site respectively in February 2012.)

The first word cloud represents all of the words with weighting for both presidential candidates.  Kind of makes you think a little bit about the political discourse in this country when some of the tops words for presidential candidates are idiot, liar, and  antichrist.  (For those of you new to the internet, here is the explanation for “your new bicycle”.)

This next word cloud is the same as the previous one, except it is separated by candidate.  The blue and red words are Obama and Romney, respectively.  If you’re wondering about the “Unicorn” on the Romney side of the word cloud, you may be interested in this facebook page.  According to them, “There has never been a conclusive DNA test proving that Mitt Romney is not a unicorn. We have never seen him without his hair — hair that could be covering up a horn. No, we cannot prove it. But we cannot prove that it is not the case.”  Truer words have never been spoken….

The final wordcloud of the trio breaks down the auto-complete terms by search engine.  Note that, these words for this wordcloud are not weighted by search engine, but they are weighted by order within each search engine.  I think it’s kind of interesting that, for Yahoo, the big words are religions: Muslim and Mormon.  This makes me wonder if different search engines might predict in some way political affiliation, and, apparently, I’m not the only one who’s thought about this.  Looks like a group called Engage has already looked into this and their results are summarized nicely in this graphic.  According to them, Googlers tend to be more Democratic and Bingers (?) tend to be more Republican.  (I don’t see Yahoo on their graphic, which I find odd.)  Also, according to Bing users tend to be older than the average internet user, slightly more likely to have “some college” education, and slightly less likely to have a graduate degree.  Google and Yahoo users tend to be very much the average internet user with the exception that they are much less likely to be over the age of 65.

Below here, you’ll find screen shots of the Google, Yahoo, and Bing auto-completes if you’re interested in the raw data that I used.







Low hit, no hit, and perfect games: The King Felix Edition



Felix Hernandez of the Seattle Mariners just threw the third perfect game of the season, so I figured it was a good time to update my low hit games graphs that I posted in June.  So, here they are:



MLB Rankings – 8/14/2012

StatsInTheWild MLB rankings as of August 6, 2012 at 8:17pm.  SOS=strength of schedule

Team Rank Change Record ESPN SOS Run Diff
NYY 1 63-44 3 1 5 +92
Texas 2 63-44 4 2 13 +83
LA Angels 3 58-51 9 6 7 +49
Washington 4 ↑2 65-43 2 4 23 +82
ChiSox 5 ↑4 59-48 7 5 14 +64
Cincinnati 6 ↑5 66-42 1 3 29 +72
Oakland 7 ↑1 58-50 8 7 8 +28
Detroit 8 ↓1 58-50 13 9 12 +24
TampaBay 9 ↑1 56-52 14 12 3 +19
Atlanta 10 ↑5 62-46 5 8 21 +62
Boston 11 ↓6 54-55 17 13 6 +29
Toronto 12 ↓8 53-55 18 14 2 +9
St. Louis 13 ↑1 59-49 10 15 30 +110
Baltimore 14 ↓2 57-51 16 11 1 -57
Pittsburgh 15 ↓2
61-46 6 10 28 +36
Seattle 16 ↑1 51-59 20 17 4 -3
Arizona 17 ↑4 55-53 15 19 26 +42
SF 18 ↓2 59-49 11 16 27 +19
LA Dodgers 19 ↓1
59-50 12 18 25 +15
NY Mets 20 53-56 19 20 17 -5
Minnesota 21 ↑3 47-61 25 22 11 -79
Kansas City 22 45-62 26 23 10 -60
Cleveland 23 ↓4 50-58 21 21 9 -90
Milwaukee 24 ↓1 48-59 22 26 19 -13
Philadelphia 25 49-59 24 24 19 -29
Miami 26 49-60 23 25 15 -100
Chic Cubs 27 ↑1 43-63 28 27 18 -79
San Diego 28 ↓1 46-64 27 28 22 -61
Colorado 29 38-68 29 29 20 -117
Houston 30 36-73 30 30 16 -142

Past Rankings:
















2012 Olympics – Final Medal Count

Canada is incredible.  They somehow managed to win 18 total medal and only 1 gold.  Amazing.  How is it possible to be so consistently nearly the best?  Has anyone ever won more medals with fewer golds?


All-Star Plots?

Here are some star plots for major league baseball batters, pitchers, and ball parks.  The star plots represent the outcomes of a particular at bat for a hitter, a pitcher, or at a given ball park.  For each plot, batter, pitcher, and ball park was varied, while the other two parameters were filled in with the average value.  For instance, all batters outcomes are calculated as if they were facing J. Kinney at Wrigley Field; Pitchers data was calculated as if they were facing K. Medlen at Wrigley; and Park factors were calculated as the outcome of J. Kinney vs K. Medlen at different ball parks.  The data use to calculate these were downloaded from and includes the results every single plate appearance so far this season (about 125,000 so far) and where the game was played.  Six outcomes to an at bat were considered: out, walk, single, double, triple, and home run.  The probability of each of these events was estimated creating a vectors of probabilities with six elements corresponding to each of the six outcome considered.  I’ve chosen to display this data using the star plots below.  The key to the star plot can be found in the lower left corner of each plot and displays the probabilities of each outcome relative to other batters.  For instance, a large blue pie piece on the left indicates that batter’s plate appearance ends with a HR more often relative to other players.  Likewise, a large red pie on the right indicates that the batter’s plate appearance ends in an out more often than other players.

I’ve chosen 100 batters based on their wide range of hitting styles.  In the first row, you’ll players who make outs at the lowest rates relative to other players.  These include players like Joey Votto, Andrew McCutchen, David Wight, and Mike Trout.  Further down, you’ll start to see players who you might describe as single’s hitters.  These include players like Ruben Tejada, Derek jeter, B. Revere, and Juan Pierre.  Finally, towards the bottom row, you’ll see the players who are primarily power hitters like Adam Dunn and Jose Bautista with large blue, for home run,s and orange, for walks, pie pieces, with significant red for outs.  Other players on this row like Saltalamacchia, Plouffe, and Rosario have the large blue and significant red pieces, but they lack walks.

The star plot for pitchers is below.  The first thing we need to say here is that Justin Verlander is very, very good at pitching a baseball.  Some other interesting pitchers here are Yu Darvish, Edison Volquez, and Carlos Zambrano.  They seem to give up relatively few hits, but they give up many more walks that the average pitcher.

These plots are ordered from highest to lowest probability that an out will be made in a given plate appearance.  Pittsburgh, Seattle, and San Francisco lead the way in pitching friendly parks.  These are the same as the bottom three  according to ESPNs measure of Park factor.  The most hitter friendly park is, no surprise, Coors field in Colorado.  Other hitter friendly parks include Target and Chase field in Minnesota and Arizona, respectively.  Arizona is expected here, but Minnesota is a little bit surprising.  It looks like, while it is rare to make an out, most hits are only singles, which don’t generate as many runs are their extra base counterparts.  Home run friendly parks include Coors, Chase, Camden Yards, Miller, Comisky, and Yankee Stadium.  Fenway park is solidly in the hitter category, but it gets that way, rather than by giving up many homeruns, by yielding a greater percentage of doubles than any other park.



Loading cart ...