My gentle criticism of the RPI
Lately, I’ve been chirping about how bad the RPI is. I had some free time this morning, so I thought I’d dig into it and write down exactly why I dislike the RPI. Below you’ll find details on how to calculate the RPI, my criticisms of the formula, and, finally, an example of how RPI can go haywire.
There are three components to the RPI:
- Wining Percentage (WP)
- Opponents’ Wining Percentage (OWP)
- Opponents’ Opponents’ Winning Percentage (OOWP)
Winning percentage (WP)
This is calculated by taking the number of a wins a team has and dividing it by the number of games that team has played. However, since 2005, home wins count as 0.6 of a win and away wins count at 1.4 wins (neutral wins count as 1).
Opponents’ Winning percentage (OWP)
For team , calculate the winning percentages of each opponent excluding team from the winning percentage calculation. When calculating OWP wins are not weighted as in the calculation of WP. Once each team’s OWP is calculated, take the average of all these winning percentages to get the OWP component for team .
Opponents’ opponents’ Winning percentage (OOWP)
For team , calculate the OWP for each of their opponents. Include games against team in this calculation. Take the average of all these OWP to compute the OOWP for team .
For full details of the RPI see Ken Pomeroy’s explanation.
My criticisms of the RPI
- Ad hoc weighting of the games in calculation of WP: Away wins are worth 1.4 wins whereas home wins are only counted as 0.6. This makes an away win more than TWICE as important as a home win. This doesn’t sound right to me. Does anyone know where these numbers (i.e. 1.4 and 0.6) came from? I’d love to know.
- Weighting not done in OWP or OOWP: When OWP or OOWP is calculated, all games are worth 1 win again. Where does the weighting go? This seems like another totally arbitrary decision.
- Averaging averages: I didn’t realize this at first because I couldn’t believe the formula would actually do this, but the formula is taking the average of the opponents winning percentages. That’s different than the winning percentage of the opponents. Here is an example, imagine there are three teams A, B, and C. Team A is 1-9 and team B and C are both 1-0. The winning percentages of these teams is 3/12=0.25. But if we take the average of the averages, as RPI does, we get (.1+1+1)/3=0.7. This does not make sense to me unless all teams play exactly the same number of games.
- Excluding the team from OWP but now OOWP: To calculate the OWP for a team, that team is excluded from the calculation. But that same team is added back in when calculating the OOWP. Why?
- Arbitrary linear weighting: Where did the 0.25, 0.5, 0.25 numbers come from? This again seems entirely arbitrary. (UPDATE: Thanks to the wonderful people of the internet, the answer can be found here.)
- OWP gets the most weight: Why is opponents winning percentage more important than your own win percentage. To boost your RPI is simple, just play good teams. It doesn’t even matter if you lose, as long as your opponents just keep winning. (See my example below.)
A silly example of the RPI gone crazy
I spent this morning trying to come up with silly examples of the RPI. If I’ve coded everything correctly (if you find a mistake please let me know), here’s one example of the RPI gone wild:
Let’s say there are 5 teams: A, B, C, D, and E. A beats C twice, B beats D twice, C beats D twice, A beats E twice, and B beats E twice. (In each set of two games, one game was home and one was away for each team).
- A: 4-0 (Home 2-0, Away 2-0)
- B: 4-0 (Home 2-0, Away 2-0)
- C: 2-2 (Home 1-1, Away 1-1)
- D: 0-4 (Home 0-2, Away 0-2)
- E: 0-4 (Home 0-2, Away 0-2)
Before you look below, try to make a reasonable ranking of these teams in your head. Write this down and come back to it.
The winning percentages for each team are 1 for A and B, 0.5 for C, and 0 for D and E. The OWP for these teams are 1 for E, 0.5 for A, C, and D, and 0 for B. And the OOWPs for these teams are 0.875 for B, 0.750 for A, 0.5 for C, 0.250 for D, and 0.125 for E.
When we apply the 0.25, 0.5, and 0.25 linear weights to WP, OWP, and OOWP, respectively, we get the following RPI results:
- A: 0.6875
- E: 0.53125
- C: 0.5
- B: 0.46875
- D: 0.3125
Team A ranked first makes sense. They were 4-0 and beat C and E. D ranked last also makes sense. They were 0-4 and lost to B and C. But the three teams in the middle make no sense. Team E is ranked 2nd with NO wins. They are above C who is 2-2 and both losses came against team A. Further, and here is the big finish, team E, at 0-4, is rated above undefeated team B in spite of the fact that team B beat E twice! That makes no sense. You could go on constructing these scenarios all day. It’s really not that difficult to make crazy scenarios happen with the RPI formula. What’s happening here is the team E is being inflated by their OWP, which is the largest component of the RPI. More important than even their own winning percentage. The RPI makes no sense.
What is my point?
The RPI makes no sense. Get rid of it. For rankings that make sense, check out Ken Pomeroy or Jeff Sagarin.
Posted on March 2, 2015, in Uncategorized. Bookmark the permalink. 9 Comments.
Andrew Dolphin suggested that the 0.25 / 0.50 / 0.25 weighting is an approximation to an ideal RPI that goes to infinite depth (one ideal formula being 0.27 / 0.46 / 0.27). He also derived a simple formula for an “infinitely deep” RPI, but unfortunately his web page has disappeared. My summary of it can be read here:
In fact you’ll find a variety of RPI discussion scattered through my blog. The short version is I don’t disagree with your criticisms. But given all that, RPI is surprisingly accurate.
That’s really interesting about the weights in the RPI. I agree that most of the time RPI does give a reasonable rankings. I have a philosophical problem with ranking methods that work reasonably in spite of themselves (e.g. Elo). Also with RPI there seems to be a few teams every year that are ranked in odd places in the RPI (e.g. Kansas is 2 right now).
Also, I’d like to apologize to you for our winning approach. 🙂
I think one of the main take aways from our paper is that getting the right data is very important. That’s not an earth shattering realization, but I think sometimes people get too caught up in trying to make their models more complex rather than spending the time to find the right data. That’s just my opinion though.
Finally, your blog is fantastic. (And “Net Prophet” is a GREAT name.)
My review of your paper probably came off more negative than I intended. I’d have said the same thing about a write-up of my method. I was just really hoping that the Kaggle competition would result in a breakthrough, and I’m a little disappointed that didn’t happen.
Of course, you have a $15,000 consolation 🙂
And I did think your post-contest analysis was really good. I’m very curious to see how this year’s contest turns out, and I hope someone will be able to do a year-to-year analysis to see if that reveals any real distinctions between models (assuming we have a core of people who compete both years).
I didn’t take it as negative. It’s a valid point. We didn’t do anything novel. When we were asked to write the paper we had to figure out what to write about because we really didn’t do anything beyond basic statistical analysis. I agree it would have been way cooler if someone did something totally different and crushed everyone with their break through idea. But then again, my bank account is glad that we won….. 🙂
I should also mention that (coincidentally) Erik Forseth has written up something about RPI, but I don’t know that he’s published it yet. I’ll prod him to see if he wants to add it to this discussion.
Thanks Scott. Yeah, I wrote up some notes on RPI that can be found here http://erfunctionratings.com/essays-tutorials/notes-on-binomial-rpi/. These emerged from my thinking about variations on Dolphin’s routine and other ways to schedule-adjust team statistics. Along the way I noticed the standard formula for RPI popped out, which I thought was sort of neat.
If nothing else, I think these notes may provide answers to a few of your points, including “averaging averages,” “arbitrary linear weighting,” and “OWP gets the most weight.” I’ll point out that I don’t worry about the 0.6 vs. 1.4 home/away weights in these notes (nor do I use those weights in any of my code). The ideas here aren’t all that rigorous, and just because some of your questions have answers doesn’t mean RPI is perfect. But I find that infinite “RPI-like” routines work well for opponent-adjusting win percentage and other stats.
I totally welcome any comments or criticism on these, by the way! Planning to add some more notes of this kind if anything else occurs to me to write about.
All NCAA calculations have to be viewed as a political decision at some level. Hockey has some of the same issues in determining its criteria… HOWEVER, the criteria is the sole determination of who makes the field… it is a function of your results and only of your results. This is nice because it means we can figure out its properties and its likely effect.
Now, that being said, my understanding is that the committee has in the past looked at various rating systems as well. Its only a loose understanding though.
Also to note that not all RPI formulations are the same across the NCAAs. For the most part the difference comes down to the weighting of record, SOS, and SOSOS. This is not really a transparent process and is likely done by taking a look at one or two seasons and then saying “this looks reasonable”.
We all know its a dinosaur. However, its at the level of sophistication of the powers that be.
Oh, for that matter, I should let you know I won’t be at JSM Seattle. A combo of my anxiety condition and not getting work in on time screwed that one up. I’m also working on building my own blog but this will be beer focused rather than anything math/stat oriented.
Pingback: NCAA basketball week in review | Stats in the Wild