Category Archives: Math Pictures

Chart Of The Day: 2012 NCAA Tournament Bracket Odds … And Pretty Bubbles

Chart Of The Day: 2012 NCAA Tournament Bracket Odds … And Pretty Bubbles

Cheers.

A rules question about Super Bowl squares

I just came across the article “A Statistician Shares How To Pick Your Super Bowl Pool Like A Champ” at businessinsider.com.  The author of the article, Jill Krasny, asked edgehogs.com statistician, William Briggs, for some advice:

“You want to pick the scores that are most likely to happen, and look at historical information about how score differentials (i.e., pairings) are most realized,” Briggs said. “You shouldn’t pick squares out of the blue that happen infrequently.”

Then she offers this note:

Note: Some people pick the labels on the rows and columns only after all the boxes have been bought, making the game more random. If your office does it this way, and not all do, these statistics will still help you figure your chance of winning.

I would argue that a fundamental rule of the Super Bowl squares game is that you pick a square BEFORE the numbers have been placed on the grid.  Instead of saying “Some people” in her note, she should say “Almost all people.”  (Am I wrong about this?  I’ve never, ever seen the numbers on the board before the squares are filled in.)

The article is still of some use, though, as you get some idea of what your chances of winning are after you get your numbers.  Of course, the whole premise that the article was written on (you get to choose your numbers) is almost never true.

Finally, they looked at the last 2,822 NFL games, but if you’re interested in complete results for over 14,000 games in a pretty heat map grid format, I’ve compiled that here.

Go Pats.

Cheers.

 

A Tale of Two Bradys: It was the best of his games; it was the worst of his games

Here is an article I wrote for Significance Magazine about Tom Brady and the Super Bowl called “A Tale of Two Bradys: It was the best of his games; it was the worst of his games.”

Go Pats.

Cheers.

 

Super Bowl Squares

I received an email this morning from a friend: “Is there any sort of a statistical breakdown for which are the best numbers to have in a Super Bowl squares pool (for entertainment purposes only)?”

Now, if my friend were going to use this information to gamble, it would be highly unethical.  However, since he clearly stated that it was for “entertainment purposes only,” I feel that I can conduct a study with a clear conscience.

If he had wanted to gamble on it, here is a quick explanation of how that usually takes place.  (According to that website: “Basically, if you are at a party where you don’t have betting squares you are a Communist.”)

Anyway, using data from football-reference.com I created a ten by ten frequency table (using R, of course) of exactly how many times each outcome has occurred in the history of the NFL.  You can find the graph here.

Somethings to note:

  • 2-2 is the worst square by far.  It’s only happened 5 times in the history of the league.  The fair odds for this square are over 2800-to-1.
  • The best squares are, no surprise, 7-0 and 0-7, occurring 581 and 577 times, respectively.
  • The other great squares to have are in order, 0-3, 0-4, 4-7, and 7-4.  All of these have occurred over 480 times each.
  • These 6 outcomes (7-0, 0-7, 0-3, 0-4, 4-7, and 7-4) account for almost 23% of all the NFL games ever played.

Cheers.

Tebow Mania and Passer Rating

Now that the college season is over, football fans can concentrate on what really matters: Tebow-mania!  Timmy Terrific has led the Denver Broncos to the AFC Divisional Round of the NFL playoffs.  The Broncos, who started the season 1-4, turned their season around by winning 7 of their last 11 games, many in dramatic fashion, with Mr. Tebow at the helm — good enough to squeak into the playoffs at 8-8 and even earn a home playoff game.  In that game, played Sunday, January 8th, they drew the heavily favored defending AFC champs the Pittsburgh Steelers, who took an early 6-0 lead.  Denver battled back with a big second quarter, but the Steelers made their own charge and ultimately the game went to overtime.  This was the first time a playoff game had gone to overtime since the inception of new NFL overtime rules.  Previously, overtime was sudden death, with the first team to score, either a touchdown or field goal (or safety), winning the game.  Under the new rules, only a touchdown on the first possession will end the game immediately; a field goal allows the other team a chance to possess the ball.  Needless to say, Denver won the coin toss (Pittsburgh called tails) and needed only one play to score a touchdown.  The play was an 80 yard pass over the middle that went the distance.One reason this occurred was because the Steelers had been bringing a lot of defenders close to the line of scrimmage, as they did not believe Tebow could beat them with his passing ability.  It was widely believed among the “experts” that Tebow, who is one of the greatest college football players of all time, and his style of play would not translate to success in the NFL.  Many people still believe this.

I hadn’t really thought much about Tebow one way or the other until one of his stats caught my eye.  In Denver’s week 10 win over the Kansas City Chiefs, Tebow’s passer rating was 102.6 based on 2 completions in 8 attempts, good for 69 yards, 1 touchdown, and no interceptions.  For some context, 102.6 was good enough for 7th best rating among starting quarterbacks that week.  This seemed odd to me since 2 for 8, 69 yards, and 1 touchdown seems like a terrible game.

So this got me wondering: What exactly is passer rating?  This website describes the formula in detail along with some of its history, but the basics are as follows.

1. Compute completions divided by passing attempts, subtract 0.3, and multiply by 5.
2. Compute yards divided by passing attempts, subtract 3, and multiple by 0.25
3. Compute touchdowns divided by passing attempts and multiply by 20.
4. Compute interceptions divided by passing attempts, multiply this by 25, and subtract this from 2.375

If any of the results of the four parts is less than 0 or greater than 2.375, that component is rounded up or down to the respective bound.  Now, add the four, possibly rounded, components together, multiply by 100, and divide by 6.  This yields a maximum score of 158.3.  (I swear I didn’t just make all of that up; the NFL actually uses this.)

Now since I like football and I love R, I decided to do some graphical exploring with passer rating.  Since the only topic anyone wants to talk about in the NFL right now is Tim Tebow, I figured I had to look at him.  And who better to compare him to than his opponent next week, three time Super Bowl champion Patriots quarterback Tom Brady.  Using the data from their regular season games (Tebow started 11 games and came in at half time in week 5; Brady started all 16), I created these graphs for Tom Brady and Tim Tebow.  Each individual graph shows how quarterback rating would vary based on number of completions and total passing yards for a fixed number of passing attempts, touchdowns, and interceptions. The green dot in each plot represents where each quarterback actually fell that week in their game.

What stands out to me in looking at these graphs is Brady’s consistency.  The green Brady dot seems to be always in the right, upper half of the graph.  Week in and week out he puts up around 300 yards (with the occasional 517 yard game thrown in) and a completion percentage in the mid to high 60s.  In fact, Brady had a completion percentage of over 50% in every single game this season.

Tebow, on the other hand, is, to put it politely, all over the place.  In week 13, Tebow put up a nearly perfect passer rating of 149.3, which is almost 14 points higher than Brady’s best passer rating of the season.  On the other hand, Tebow had a lower passer rating than Brady’s worst passer rating, 75.4, in 5 out of the 12 games Tebow started.  So you could say that almost half of the time this season, Tebow was worse than Brady’s worst.

This all adds up to the fact that the Broncos should lose to the Patriots.  Based on the stats, Brady is too good and Tebow is too inconsistent to amount to a Denver victory.  Of course, while you may find all of this interesting, in the end none of these numbers or pretty pictures mean anything at all to Tim Tebow, who, as they say, only cares about one stat and that’s winning.

Republican Presidential Candidates and Google Auto-Complete – 12/29/2011

Auto-complete terms for republican candidates for 12/28/2011.  I searched for these after signing out of google and then I “disabled customizations based on search activity” so my search history would not interfere with the auto-completes.

Here is a plot based on the google auto-complete search data.

And here is the complete data for Romney, Perry, Cain, Paul and Gingrich.

“Mitt Romney” “Rick Perry” “Herman Cain” “Ron Paul” “Newt Gingrich”
wiki gay wiki 2012 scandal
bio drunk 999 wiki affair
 net worth wiki pokemon polls bio
 political views new hampshire sexual harassment on gay marriage gay marriage
 for president issues quotes quotes issues
economic plan  debate  net worth  issues  wives 
 racist bad lip reading  abortion  abortion  polls website 
evolution hunting lodge  scandal  news  website 
 abortion speech  Libya  ron paul  quotes 
 taxes video  smoking ad  debate  books 

Bachmann, Santorum, and Hunstman.

“Michele Bachmann” “Rick Santorum”  “John Huntsman” 
 quotes gay  daughters 
 hot wiki  net worth 
 newsweek scandal  sr 
 corndog for president  jr 
 bio quotes  political views 
 jimmy fallon fetus  issues 
 husband on the issues  chinese 
 crazy biography  twitter 
 hpv evolution  abortion 
twitter twitter  polls 

Cheers!

Multidimensional Scaling, Republican Presidential Candidates, and “a douchebag”

If you don’t want to read this whole thing, just check out the graph: Multidimensional Scaling: Republican Candidates – 8/16/2011

I was having a conversation with some friends today and someone mentioned that Rick Perry might have problems in the election because there were rumors he was gay.  So I went to google and typed in “Rick Perry is” and google kindly offered me the following auto-complete options: “gay”, “an idiot”, “a rino“, “evil”, “not a conservative”.  This got me thinking how this compared with the other candidates google auto-completes.  For instance, if you google “Mitt Romney is” you get suggestions like “a mormon” and ” an idiot” as well as three other suggestions.  I did this for all of the major candidates (sorry Thaddeus) and recorded the five google auto-complete suggestions.

Then I created a vector for each candidate based on the google auto-complete words.  Each candidate was an observation and each word was a variable.  The candidate would get a 5 if the word was first on their list, a 4 if it was second, and so on with a 0 if the word was not mentioned in their auto-complete.

I then used multidimensional scaling (the cmdscale function in R) to allow me to visually display the relative positions of the candidates to each other.  This all led to this graphic: Multidimensional Scaling: Republican Candidates – 8/16/2011.  The location of the circles is based on multidimensional scaling, the size of the circle is relative to their standings in a national poll taken from fivethirtyeight.com, and the top five google auto-completes are displayed in or near the appropriate circle.

Some thoughts:

  • Every single candidate has the term “an idiot” in either the first or second auto-complete term
  • 3 candidates were listed as “hot” (Palin. Bachmann, and Romney)
  • “stupid” was only used to describe women
  • Perry and Santorum (who has a much bigger google problem that anything I’ve listed here) had “gay” listed in their autocpmpletes and Pawlenty had “definitely not gay”
  • Bachman and Palins circles are nearly identical in size (11.7% ad 11.4%, respectively) and words (they share “an idiot”, “hot”, and “stupid”)
  • “a douchebag” appears in auto-completes for Santorum, Gingrich, and Pawlenty.  I imagine it will be hard to win with this word attached to your name. (John Kerry couldn’t do it.)
  • The only overwhelmingly positive google auto-complete was for Herman Cain whose fifth auto-complete option was “awesome”
It can’t be good for Perry that he is so close to Pawlenty and Santorum, but he does have a significant amount of support at this point.  I’ll be interested to see how these Google auto-completes changes over time and with the polls.
For information on how Google auto-complete works, click here.
Cheers.

Chernoff Faces from aplpack

I’ve been playing around with the faces function from the R package aplpack.  I haven’t used it in a while, but there are some new features that I’ve either never noticed before or they are new.  Color has been added to the faces and you can now plot the faces.  There is also the superfluously fantastic option of displaying the faces as Santa Claus.

Here are some of my examples:

Golf: Statistics from several of my friends collected via oobgolf.com.  (I’m SITW on the lower right.) The face is handicap, the mouth is scoring average, the eyes are average putts, the hair is the percentage of fairways hit, nose is greens in regulation (GIR), and ears are the total number of rounds you play. The faces are plotted with fairway percentage on the x-axis and GIR on the y-axis.

Santa_Golf: Same golf data with Santa option.

NFL2010: Final NFL regular season team statistics.  The face represent the offense and the defense is represented by hair. The size of the nose indicates sacks, the ears indicate turnovers (ear width is interceptions; ear height is forced fumbles).  The eyes indicate penalties and, finally, the size of the mouth indicates wins with a smiling face if the team made the playoffs (a really nice touch, if you ask me.)  The face at the bottom right indicates the league leader.

Some observations on the NFL faces:  The two superbowl teams last year (Pittsburgh and Green Bay) are both located at the bottom of the graph and there faces look very, very similar.  San Diego looks similar to to both Green Bay and Pittsburgh (similar face, nose, eyes, and hair), but the big differences are the ears and, of course, the San Diego face is frowning.  Another thing that pops out at me is how similar Houston and New England look to each other.  They have very similar face shape, eyes, and hair.  The big differences are the nose and ears (sacks and turnovers).

 

Cheers.

##NFL CODE

library(aplpack)

 

x<-read.csv(“\StatsInTheWild\NFL2010.csv”,header=TRUE)

x[33,]<-x[32,]

x$abbr<-sort(c(“NE”,”NYJ”,”Mia”,”Buf”,”Pit”,”Bal”,”Cle”,”Cin”,”Ind”,”Jac”,”Hou”,

“Ten”,”KC”,”SD”,”Oak”,”Den”,”Phi”,”NYG”,”Dal”,”Was”,”Chi”,”GB”,”Det”,”Min”,”Atl”

,”NO”,”TB”,”Car”,”Sea”,”StL”,”SF”,”Ari”,”ZZ”))

x$abbr[27:28]<-c(“SF”,”Sea”)

x$abbr[33]<-“League Leader”

x$lab<-paste(x$abbr,x$W,sep=”: “)

x$TOP<-as.numeric(substring(x$TOP.x,1,2))

##Playoff Teams: creating a playoff indicator

rows<-c(2,3,6,12,14,16,19,20,22,24,25,28)

x$playoffs<-rep(0,33)

x$playoffs[rows]<-1

 

##Finding the league leader in all variables

num<-sapply(x,is.numeric)

x[33,num]<-sapply(x[,num],max)

def<-c(6,22:23,26:29)

x[33,def]<-sapply(x[,def],min)

x$lab<-paste(x$abbr,x$W,sep=”: “)

##Defining the names

names(x)[c(2,3)]<-c(“Wins”,”Losses”)

names(x)[c(13,14,15,16)]<-c(“Off PPG”,”Off YPG”,”Off Pass”,”Off Rush”)

names(x)[c(22,23)]<-c(“Penalties”,”Pen Yards”)

names(x)[c(26:29)]<-c(“Def PPG”,”Def YPG”,”Def Pass”,”Def Rush”)

names(x)[c(5:6)]<-c(“Points For”,”Points Against”)

pdf(“/StatsInTheWild/NFL2010.pdf”,width=15,height=10)

##Columns used for plotting

x<- x[order(x[,4]),]

plot.cols<-c(5,6)

##Offense = face, Defense = hair, penalty= eyes, Wins and playoffs = mouth, turnovers = ears

##Columns used for faces: which columns am i going to use for the data

col<-c(15,16,14,2,2,41,22,23,28,29,27,36,36,30,32)

##creating the faces without plotting them.

a<-faces(x[,col],labels=x$lab,face.type=1,plot=FALSE)

##creating text for the legend

g<-paste(a[[2]][,1],a[[2]][,2],sep=”: “)

##building the plot

plot(x[,plot.cols],bty=”n”,xlim=c(200,600),main=”2010 NFL Season”)

text(rep(540,15),seq(475,325,length.out=15),g)

##plotting the faces

plot.faces(a,x[,plot.cols[1]],x[,plot.cols[2]],width=30,height=30)

dev.off()

Pretty pictures from R (in the wild)

Awesome: The R graphics gallery.

Cheers.