My gentle criticism of the RPI

Lately, I’ve been chirping about how bad the RPI is.  I had some free time this morning, so I thought I’d dig into it and write down exactly why I dislike the RPI.  Below you’ll find details on how to calculate the RPI, my criticisms of the formula, and, finally, an example of how RPI can go haywire.

RPI formula

There are three components to the RPI:

  1. Wining Percentage (WP)
  2. Opponents’ Wining Percentage (OWP)
  3. Opponents’ Opponents’ Winning Percentage (OOWP)

Winning percentage (WP)

This is calculated by taking the number of a wins a team has and dividing it by the number of games that team has played.  However, since 2005, home wins count as 0.6 of a win and away wins count at 1.4 wins (neutral wins count as 1).

Opponents’ Winning percentage (OWP)

For team i, calculate the winning percentages of each opponent excluding team i from the winning percentage calculation.  When calculating OWP wins are not weighted as in the calculation of WP.  Once each team’s OWP is calculated, take the average of all these winning percentages to get the OWP component for team i.

Opponents’ opponents’ Winning percentage (OOWP)

For team i, calculate the OWP for each of their opponents.  Include games against team i in this calculation.  Take the average of all these OWP to compute the OOWP for team i.

Linear Weight

RPI = 0.25WP+0.5OWP+0.25OOWP

For full details of the RPI see Ken Pomeroy’s explanation.

My criticisms of the RPI

  • Ad hoc weighting of the games in calculation of WP: Away wins are worth 1.4 wins whereas home wins are only counted as 0.6.  This makes an away win more than TWICE  as important as a home win.  This doesn’t sound right to me.  Does anyone know where these numbers (i.e. 1.4 and 0.6) came from?   I’d love to know.
  • Weighting not done in OWP or OOWP: When OWP or OOWP is calculated, all games are worth 1 win again.  Where does the weighting go?  This seems like another totally arbitrary decision.
  • Averaging averages: I didn’t realize this at first because I couldn’t believe the formula would actually do this, but the formula is taking the average of the opponents winning percentages.  That’s different than the winning percentage of the opponents.  Here is an example, imagine there are three teams A, B, and C.  Team A is 1-9 and team B and C are both 1-0.  The winning percentages of these teams is 3/12=0.25.  But if we take the average of the averages, as RPI does, we get (.1+1+1)/3=0.7.  This does not make sense to me unless all teams play exactly the same number of games.
  • Excluding the team from OWP but now OOWP: To calculate the OWP for a team, that team is excluded from the calculation.  But that same team is added back in when calculating the OOWP.  Why?
  • Arbitrary linear weighting: Where did the 0.25, 0.5, 0.25 numbers come from?  This again seems entirely arbitrary.  (UPDATE: Thanks to the wonderful people of the internet, the answer can be found here.)
  • OWP gets the most weight: Why is opponents winning percentage more important than your own win percentage.  To boost your RPI is simple, just play good teams.  It doesn’t even matter if you lose, as long as your opponents just keep winning.  (See my example below.)

A silly example of the RPI gone crazy

I spent this morning trying to come up with silly examples of the RPI.  If I’ve coded everything correctly (if you find a mistake please let me know), here’s one example of the RPI gone wild:

Let’s say there are 5 teams: A, B, C, D, and E.  A beats C twice, B beats D twice, C beats D twice, A beats E twice, and B beats E twice.  (In each set of two games, one game was home and one was away for each team).

Records:

  • A: 4-0 (Home 2-0, Away 2-0)
  • B: 4-0 (Home 2-0, Away 2-0)
  • C: 2-2 (Home 1-1, Away 1-1)
  • D: 0-4 (Home 0-2, Away 0-2)
  • E: 0-4 (Home 0-2, Away 0-2)

Before you look below, try to make a reasonable ranking of these teams in your head.  Write this down and come back to it.

The winning percentages for each team are 1 for A and B, 0.5 for C, and 0 for D and E.  The OWP for these teams are 1 for E, 0.5 for A, C, and D, and 0 for B.  And the OOWPs for these teams are 0.875 for B, 0.750 for A, 0.5 for C, 0.250 for D, and 0.125 for E.

When we apply the 0.25, 0.5, and 0.25 linear weights to WP, OWP, and OOWP, respectively, we get the following RPI results:

  • A: 0.6875
  • E: 0.53125
  • C: 0.5
  • B: 0.46875
  • D: 0.3125

Team A ranked first makes sense.  They were 4-0 and beat C and E.  D ranked last also makes sense.  They were 0-4 and lost to B and C.  But the three teams in the middle make no sense.  Team E is ranked 2nd with NO wins.  They are above C who is 2-2 and both losses came against team A.  Further, and here is the big finish, team E, at 0-4, is rated above undefeated team B in spite of the fact that team B beat E twice!  That makes no sense.  You could go on constructing these scenarios all day. It’s really not that difficult to make crazy scenarios happen with the RPI formula.  What’s happening here is the team is being inflated by their OWP, which is the largest component of the RPI.  More important than even their own winning percentage.  The RPI makes no sense.

What is my point?

The RPI makes no sense.  Get rid of it.  For rankings that make sense, check out Ken Pomeroy or Jeff Sagarin.

Cheers.

Best and Worst Week’s in NCAA basketball

Projected Tournament Seeds.  Who had the best week?

Best Week’s

Up two seeds

Purdue  – On a 4 game winning streak with a win @ Indiana in the last week

Iowa – On a  4 game win streak with a win over Illinois in the last week

LSU – On a 3 game win streak with a win over Ole Miss yesterday

Georgia – On a 3 game win streak with a win @ Ole Miss on Wednesday

Up one seed

Villanova – They haven’t lost since January 3rd.  In the past week they beat Providence by 28 and Xavier by 12, both of whom I have in the NCAA tournament.  All this equals a 1 seed in my mind.

Baylor – They’ve won 4 in a row and in the last week they won at Iowa State and over West Virginia.  Really nice back-to-back wins over tournament teams.

Oklahoma – They beat TCU, which wasn’t that impressive, but enough to bump them up to a low 3 seed.

Maryland – All they did this week was beat Wisconsin.  I’ve got them as a high 4 seed.

Wichita St – They beat Northern Iowa yesterday by 14, splitting the season series with Panthers and clinching the MVC regular season title.  Both of these teams are in unless something crazy happens.  I’ve got the Shockers up to a 5 seed, while Northern Iowa drops to a 6.

Butler – Beat Marquette and DePaul this week.  Not much, but I’ve got them moved up to the bottom 5 seed.

St. Johns – They beat Xavier on Monday, sweeping them on the season, and, more importantly, got a win over Georgetown yesterday.  Up to a 7 seed.

Stanford – Beat Oregon State by 27 points on Tuesday.  They’ve got a big game against Oregen today.

BYU – Biggest win of the week with a victory over Gonzaga AT Gonzaga.  I’ve got BYU as a 9 seed. They’d be higher but they have some not so great losses this year like a loss to San Diego and two losses to Pepperdine.

Dayton – Dayton went to VCU yesterday and won a close game by 4.  All the talk this year in the A10 has been about VCU, but right now there is a three way tie for first in the division and VCU is not one of those teams.

Davidson – They won at Rhode Island by one point on Wednesday putting themselves in a three way tie for first in the A10.  And they’ve won 7 in a row.  This A10 tournament is going to be awesome.  #parity

UCLA – It’s not really about what UCLA did (they beat Washington), but they benefit from losses by Rhode Island (to Davidson) and Illinois (to Iowa).  They play Washington State tonight.

Worst Week’s

Down three seeds

Texas – Texas ends it regular season schedule with 6 games against the top 6 teams in the Big 12.  (Oklahoma, Iowa State, West Virginia, Kansas, Baylor, Kansas State).  They’ve lost the first 4.  I have them in, but realistically they are probably out.  If they win both of their remaining game they will be 8-10 in conference.  Hard to imagine the committee letting in a team that is 19-12, 8-10 (IF they win out).

Down two seeds

Providence – Lost to Villanova twice in last 4 games including a 28 point loss on Tuesday.

Ole Miss -Lost two important games this week against Georgia on Wednesday and then to LSU yesterday.  Not a great as both of those losses should be fresh in the mind of the committee on selection Sunday.

Indiana – Lost to Northwestern yesterday.  Northwestern is now 5-11 in conference play and 14-15 overall.  #notGood (Northwester also beat Iowa 10 days ago.  Weird.)

Down one seed

Wisconsin – Lost to Maryland yesterday. I still have Wisconsin as a 2 seed.

Utah – Lost to Arizona yesterday. No shame in that.  I dropped them to the top 4 seed as they were passed by both Oklahoma and Baylor.

Iowa State – Lost two games this week to Baylor and Kansas State.  Right now I have them as the bottom 4 seed.  Tomorrow’s game against Oklahoma is very important.

West Virginia – Lost to Baylor yesterday, though they were without their leading scorer for the game.

,Northern Iowa – Lost to Wichita State.  Still a lock to make the tourney unless something crazy happens.  I’ve got them as a 6 seed.

Georgetown – Lost to St. John’s yesterday.

Oregon – Beat Utah and Cal this week  Not their fault they fell a seed.  They were passed by Stanford, LSU, and Iowa.

Xavier – Lost to St. John’s and Villanova.  I have them as a 10 seed now.

Texas AM – Lost to Arkansas this week.

NC State – A horrible loss to BC by 16 yesterday.  I have them in, but I wouldn’t be surprised in the committee felt differently than me.  These guys are the definition of a bubble team.

Science, “liberal” and conservative media, and global warming

While I was reading my Twitter feed, I can across this tweet:

Screen Shot 2015-02-23 at 10.04.25 PM

I was curious, because why would anyone ever have a statistician on a radio show (trying to put the general public to sleep?)?  So I clicked.  Beitbart radio was interviewing William Briggs because of the paper he wrote called “Why models run hot: results from an irreducibly simple climate model”.  As far as I can tell, the basic argument of this paper is that projections of the Fifth Intergovernmental Panel on Climate Change (IPCC) are too high.  In the conclusion, the authors state (emphasis added):

Resolving the discrepancies between the methodology adopted by IPCC in AR4 and AR5 is vital. Once those discrepancies are corrected for, it appears that the impact of anthropogenic global warming over the next century, and even as far as equilibrium many millennia hence, may be no more than one-third to one-half of IPCC’s current projections.

Ok.  So the authors are refuting one climate model’s projections.  That seems reasonable.  As far as I can tell, Briggs is a very good statistician (PhD in mathematical statistics from Cornell #impressive) and has a background in atmospheric science and meteorology.  I’m not making any comments on the scientific merits of their paper, but let’s see what happens if we assume that their conclusions are reasonable.  If that is the case, then climate change isn’t as bad as projected, but it’s still a problem that needs to be dealt with.  Basically, let’s still try to to prevent climate change, but we don’t need to be as alarmed as we currently are.  But even if that is the case, the authors still acknowledge that climate change is man made.  They say:

Finally, suppose that remaining affordably recoverable reserves of fossil fuels are as much as thrice those that have been recovered and consumed so far. Then, the total warming we shall cause by consuming all remaining recoverable reserves will be little more than 2.2 K, and not the 12 K imagined by IPCC on the RCP 8.5 scenario.

“We shall cause”.  They are acknowledging that climate change is driven by humans burning fossil fuels, they are merely critiquing the magnitude of the change predicted by other projections.  They are NOT saying global warming is not occurring.  I know this because I asked one of the authors:

Screen Shot 2015-02-23 at 10.22.44 PM

Let me repeat that: An author of the paper is acknowledging that global warming exists and that that they are simply arguing that the projection models are not correct.  That is a reasonable argument.  For the authors views on global warming see here.

So how did Breitbart cover this story?  Their headline was “Experts smeared by media and Greenpeace for debunking global warming”.  By the authors own admission, this is not what they did.  They didn’t “debunk global warming”.  Not even close.  Not even in the ballpark. This is so wrong.  By the authors own admission.  See the screen shot below from the Breitbart article:

Screen Shot 2015-02-23 at 5.31.59 PM

So here is where I lose it a little bit.  Breitbart is overtly a conservative media organization and one of their favorite past times is yelling about the liberal main stream media.  It seems like they found a paper that sort of fit’s their conservative narrative and then took the conclusions a step further (i.e. global warming was debunked).

Screen Shot 2015-02-23 at 10.35.55 PM

Briggs seems to agree with the part about yelling about how terrible the liberal media is (even if he disagrees with the headline that Breitbart wrote).  A recent post from February 22, 2015 is all about how stupid mainstream reporters are.  That post was called “Goon Squad Fails To Distract Public From Fact That Climate Models Stink: Update 3“.  Here is a fun excerpt:

It turns out they’d [reporters] rather remain wallowing in their muck than learn about the subjects on which they write.

And one more for fun:

So I failed. I was a fool to try. I let myself forget that I was dealing with a class of people where the gap between actual and perceived ability is not only wide, but is a gaping chasm. To expect mainstream science reporters to understand science is like asking an environmentalist to be reasonable. I should have remembered most journalists suffer from reporteritis, the degrading ailment whereby because reporters cover important people and events they come believe they are important, too. Sadly, there is no known cure.

Ok.  Maybe he’s right (I don’t think he is).  Maybe science reporters in the media are idiots (I don’t think that) and we should be outraged that they don’t know what they are doing and are spreading false information to the masses.  If that’s the case then he must also be outraged that Breitbart misrepresented his findings so badly.  Below is his response to the headline when I told him what the headline said:

Screen Shot 2015-02-23 at 5.33.16 PM

It seems like Brigg’s words (“It turns out they’d [reporters] rather remain wallowing in their muck than learn about the subjects on which they write”) could equally be applied to Breitbart.  So here is my challenge to Briggs: If you are going to criticize the “liberal” media with such passion when they get it wrong, you should also criticize the conservative media with that same passion.  I look forward to your post of how much of an idiot the writers at Breitbart are.

Cheers.

Super Bowl Squares

statsinthewild:

From a few years ago.

Originally posted on Stats in the Wild:

Last year I wrote a post about super bowl squares:

I received an email this morning from a friend: “Is there any sort of a statistical breakdown for which are the best numbers to have in a Super Bowl squares pool (for entertainment purposes only)?”

Now, if my friend were going to use this information to gamble, it would be highly unethical.  However, since he clearly stated that it was for “entertainment purposes only,” I feel that I can conduct a study with a clear conscience.

If he had wanted to gamble on it, here is a quick explanation of how that usually takes place.  (According to that website: “Basically, if you are at a party where you don’t have betting squares you are a Communist.”)

Anyway, using data from football-reference.com I created a ten by ten frequency table (using R, of course) of exactly how many times each…

View original 228 more words

Cher and Trademarks: Part 2

Another post that has nothing to do with statistics.  (If you want to read about statistics though, maybe you’ll enjoy this piece that @statsbylopez and I wrote for Deadspin.)
Yesterday, I posted about my friend getting her picture she was trying to sell on Society6 pulled based on their concern over trademark issues with Cher.  Here is part 2:
Hi ***********,
We had about a foot and a half of snow fall since my last correspondence, and all of this precipitation has me thinking about intellectual property law. The company’s position, as I understand it, is that my use of the word “Belchertown” in the caption of a photograph that I took in that town is prohibited because contained in “Belchertown” is the word “Cher.” Cher, as we previously discussed, is an internationally popular recording artist who has presumably trademarked her name in the context of the goods and services she provides. (You might assume those goods and services consist primarily of auto-tuned dance hits, but I would urge you not to forget her theatrical work. The Witches of Eastwick was particularly memorable.)
While I admit that I am unable to conceive of precisely how my reference to Belchertown, Massachusetts infringes on Cher’s trademark, or how my photograph of Quabbin Reservoir, which, remarkably, has supplied the city of Boston with water for some 75 years, is in any way similar to Cher’s artistic works such that Cher might have a valid copyright infringement claim against either me or Society6(TM), I will happily take your word for it.
That said, I am curious as to whether you feel any of the following submissions present intellectual property concerns:
You have to admit, ******** — this one’s a doozy: http://society6.com/product/monsanto-mouse_print#1=45
Please advise.
Sincerely,
*******
Cheers!