Category Archives: Politics

Rick Perry and Google Auto-complete

Auto-complete for search “Rick Perry ” on Google over the last couple of weeks. The last row is the polling percentage based on Real Clear Politics polls.

8-17-2011 8-30-2011 9-6-2011 9-9-2011 9-12-2011
 for president  for president  for president for president gay
gay  gay  gay gay for president
 wiki  wiki  wiki wiki wiki
 for president website  2012  prayer prayer prayer
 2012  for president 2012  2012 galileo secession
18.4 23 29 29 31.8

Auto-complete for search “Rick Perry is ” on google over the last couple of weeks. The last row is the polling percentage based on Real Clear Politics polls.

8-17-2011 8-30-2011 9-6-2011 9-9-2011 9-12-2011
gay gay gay gay gay
an idiot an idiot an idiot an idiot an idiot
a rino a rino crazy crazy crazy
evil evil nuts nuts scary
not a conservative not a conservative stupid stupid evil
18.4 23 29 29 31.8


Republican Presidential Candidates and Multi-dimensional Scaling 3d

So, I’ve got a lot of blog posts that I meant to publish last week, but I never got around to it.  Here is a graph I made using the the auto-complete terms from Google, Yahoo, and Bing for republican presidential candidates.  I looked at the five top auto-completes from each site and scored each word 5 points if it was the first auto-complete, 4 points for second auto-complete, etc.  I did a search for each candidate twice on each site.  First using just the candidates name and a space, then the candidates name followed by the word “is” and then a space. (For example, “Mitt Romney ” and “Mitt Romney is “).   I then weighted the search engines based on their market share (about 75%, 15%, and 10% respectively).  This gives me a data set with 8 observations (8 candidates) and several dozen variables (one variable for each word).  I then used mutli-dimensional scaling to reduce the distances between the vectors down to, in this case, three dimensions.  The size of each circle is proportional to the polling percentage from RealClearPolitics on August 29, 2011 (the same day as the auto-completes were done.)  The word appearing in or next to each circle, is the word with the highest score for each candidate.

Also, one of Michele Bachmann’s auto-complete terms on Google is “slavery”.  I couldn’t imagine what she had done to warrant this as an auto-complete term, but then I found this article by Andrew Gelman (of the blog Statistical Modeling, Causal Inference, and Social Science).  Yikes.



Tracking Republican presidential candidates via online search auto-complete – 8/29/2011

Auto-complete for “Rick Perry is” on the three big search sites on 8/29/2011.

Google Yahoo Bing
gay an idiot an idiot
an idiot crazy good
a rino a scumbag a crook
evil not a conservative bad
not a conservative a republican a scumbag
evil running for president
awesome right about education
a joke


Michele Bachmann: Internet search earch auto-complete terms

The tables below are for the Google and Yahoo search “Michele Bachmann “(including a space after the last name) for the various dates indicated in the table. Each column has the date of the search and the top five Google or Yahoo auto-complete terms for the search.

Michele Bachmann – Google

8-17-2011 8-22-2011 8-23-2011 8-29-2011 8-31-2011 – present
quotes quotes quotes quotes quotes
corn dog husband husband husband husband
husband bio bio bio wiki
elvis corn dog slavery slavery husband gay
bio slavery husband gay husband gay hot

Michele Bachmann – Yahoo

8-24-2011 8-29-2011 9-1-2011 9-2-2011 9-7-2011
hot hurricane hurricane new hair campaign manager
for president sarasota irene margaret thatcher hurricane irene
minnesota hot hot hot hot
bio for president for president for president for president
feet minnesota minnesota minnesota minnesota

What does all this mean?  I have no idea, but I suspect it will be difficult to win a Republican party nomination and then a general election with terms like “slavery” and “husband gay” attached to your name.

Another thought: I wonder if the political affiliations of users are constant across the three major search sites or are there a greater percentage of liberals on Bing than on Google, for instance.  Could you use auto-complete terms to gain any insight into this?  Or is this type of information perhaps already available?


Multidimensional Scaling, Republican Presidential Candidates, and “a douchebag”

If you don’t want to read this whole thing, just check out the graph: Multidimensional Scaling: Republican Candidates – 8/16/2011

I was having a conversation with some friends today and someone mentioned that Rick Perry might have problems in the election because there were rumors he was gay.  So I went to google and typed in “Rick Perry is” and google kindly offered me the following auto-complete options: “gay”, “an idiot”, “a rino“, “evil”, “not a conservative”.  This got me thinking how this compared with the other candidates google auto-completes.  For instance, if you google “Mitt Romney is” you get suggestions like “a mormon” and ” an idiot” as well as three other suggestions.  I did this for all of the major candidates (sorry Thaddeus) and recorded the five google auto-complete suggestions.

Then I created a vector for each candidate based on the google auto-complete words.  Each candidate was an observation and each word was a variable.  The candidate would get a 5 if the word was first on their list, a 4 if it was second, and so on with a 0 if the word was not mentioned in their auto-complete.

I then used multidimensional scaling (the cmdscale function in R) to allow me to visually display the relative positions of the candidates to each other.  This all led to this graphic: Multidimensional Scaling: Republican Candidates – 8/16/2011.  The location of the circles is based on multidimensional scaling, the size of the circle is relative to their standings in a national poll taken from, and the top five google auto-completes are displayed in or near the appropriate circle.

Some thoughts:

  • Every single candidate has the term “an idiot” in either the first or second auto-complete term
  • 3 candidates were listed as “hot” (Palin. Bachmann, and Romney)
  • “stupid” was only used to describe women
  • Perry and Santorum (who has a much bigger google problem that anything I’ve listed here) had “gay” listed in their autocpmpletes and Pawlenty had “definitely not gay”
  • Bachman and Palins circles are nearly identical in size (11.7% ad 11.4%, respectively) and words (they share “an idiot”, “hot”, and “stupid”)
  • “a douchebag” appears in auto-completes for Santorum, Gingrich, and Pawlenty.  I imagine it will be hard to win with this word attached to your name. (John Kerry couldn’t do it.)
  • The only overwhelmingly positive google auto-complete was for Herman Cain whose fifth auto-complete option was “awesome”
It can’t be good for Perry that he is so close to Pawlenty and Santorum, but he does have a significant amount of support at this point.  I’ll be interested to see how these Google auto-completes changes over time and with the polls.
For information on how Google auto-complete works, click here.

NCAA Basketball (in the wild)

It’s that magical time of year again. The three weeks the rest of the country and I care about college basketball. Check out the StatsInTheWild NCAA basketball top 25.

So here it is. The StatsInTheWild annual NCAA tournament preview.

Teams that should have gotten in but didn’t:
Seton hall – I realize it’s hard to take a team that went 9-9 in their conference, but it’s the big east. it really is that good.
Virginia Tech – They should have been in easy. Although 2 losses to lowly Miami, including once in the ACC tournament is really bad.
Mississippi State – A good regular season and a very good run in the SEC tournament. Should have been in.

Teams that should not have gotten in:
Minnesota – They lost to Michigan twice in February and they lost to Indiana who was 4-12 in the conference. And they were only 9-9 in the conference. The big 10 is the most over rated conference in football and basketball.
UNLV – Third in the Moutain West gets in but third in the ACC doesn’t? This was a bad at large bid.
Wake Forest – They finished 6th in the ACC at 9-7 in conference. How did Virgina Tech not get in again?
Georgia Tech – They finished 7th in the ACC at 7-9 in conference. How did Virgina Tech not get in again?

Best 16 seed: Lehigh
Best 15 seed: UC-Santa Barbara
Best 14 seed: Sam Houston State

Worst 1 seed: Duke
Worst 2 seed: Ohio State
Worst 3 seed: Pittsburgh
Worst 4 seed: Wisconsin

Most likely first round upsets:
(14) Sam Houston State over (3) Baylor
(12) New Mexico State over (5) Michigan State
(13) Wofford over (4) Wisconsin
(11 )Old Dominion over (6) Notre Dame

Most likely long shot upset:
(15) UC-Santa Barbara over (2) Ohio State

Lower seed lock:
(10)Missouri over (7) Clemson
(9) Northern Iowa over (8) UNLV

Sweet Sixteen:
All the one’s, two’s, and three’s along with (4) Maryland, (4) Butler, (5) Texas A and M, and (5) Temple.

Elite 8:
All the number one seeds and all of the number 2 seeds except Ohio State. Georgetown gets in.
So that’s (1) Kentucky, (1) Duke, (1) Syracuse, (1) Kansas, (2) Kansas St, (2) Villanova, (2) West Virginia, and (3) Georgetown.

Final 4:
(1) Kansas, (1) Kentucky, (1) Syracuse, (2) Villanova

(1) Kansas vs (1) Kentucky

(1) Kansas over (1) Kentucky 68-66

Push Polling (in the wild)

This is from a while ago (Nov 18, 2008), but it’s still interesting:
Zogby Engages in Apparent Push Polling for Right-Wing Website


Justice and Religion (in the wild)

Here is a nice little graphical display of how the Supreme Court’s religious make-up has changed over time. I sure do love a nice graphical display of data early in the morning. (Yes, I do consider 10:37am early in the morning.)