Blog Archives

My completely uninformed guide to March Madness and some thoughts on my kaggle entry

I submitted my March Madness Machine Learning Mania today. My two entries consist of picks made using the actual spread for first round games and then a simple Bradley-Terry model for the games past that. In my second bracket, which I called the “aggressive” one, I picks UConn in the men’s bracket and South Carolina in the women’s bracket to win each round (and thus the tournament) with probability 1. So is UConn and South Carolina win, maybe I have a shot at winning. I also manually adjusted two teams on the women’s side (South Carolina and USC). The problem with South Carolina is that they haven’t lost any games and Bradley-Terry basically can’t handle that. I adjusted their regression coefficients to match the market price that they win the tournament. I also adjusted USC on the women’s side because they were way off their market price too. I didn’t make any adjustments to the men’s side because the futures prices were generally in the ball park with what I was estimating (i.e. there are three truly top teams in the men’s bracket (Purdue, UConn, Houston), then a sizable gap down to the next team (which I think is Iowa State. North Carolina got a gift of a 1 seed.)

I’m really excited about the new scoring system for Kaggle this year. And I think they got the scoring system right. A few weeks ago I believe I read that the scoring system was going to be average bracket score with traditional bracket scoring (1-2-4-8-16-32). My first thought when I saw this was that the best strategy is to just enter one bracket and hope. I think other people figured this out and they change it to a Brier score metric. But what they really got right this year is that that don’t take the average over all the GAMES, they take the average over the 6 ROUNDS. This weights the game in the finals much more heavily than a game in the first round, much closer to traditional bracket games.

Anyway, here are some probabilities below are based on my 10000 brackets that I submitted:

To win the championship:

UConn – 29.3%

Houston – 20.9%

Purdue – 19.4%

Iowa St – 5.19%

North Carolina – 4.96%

Tennessee – 3.39%

Marquette – 2.49%

Auburn – 1.94%

Illinois – 1.51%

Baylor – 1.43%

Everyone else < 1%

To make the finals:

UConn – 45.5%

Houston – 34.8%

Purdue – 32.4%

North Carolina – 12.6%

Iowa St – 11.4%

Tennessee – 7..92%

Marquette – 6.51%

Auburn – 5.8%

Illinois – 4.54%

Baylor – 4.34%

Arizona – 3.16%

Alabama – 2.68%

South Carolina – 2.46%

Kansas – 2.31%

Kentucky – 2.13%

Creighton 2.03%

Everyone else < 2%

And finally, here are my pre-tournament rankings 1 through UMass:

Rank
TeamName
1Connecticut
2Houston
3Purdue
4Iowa St
5North Carolina
6Tennessee
7Auburn
8Marquette
9Illinois
10South Carolina
11Baylor
12Kansas
13Utah St
14Creighton
15Arizona
16Duke
17Nevada
18Kentucky
19Alabama
20San Diego St
21BYU
22Florida
23Texas Tech
24New Mexico
25Wisconsin
26Gonzaga
27Nebraska
28Colorado St
29Dayton
30Clemson
31Virginia
32Mississippi St
33Boise St
34Texas
35St Mary’s CA
36TCU
37Drake
38Oklahoma
39Grand Canyon
40Northwestern
41Colorado
42Texas A&M
43Washington St
44Indiana St
45Pittsburgh
46NC State
47FL Atlantic
48Syracuse
49Providence
50Michigan St
51Oregon
52St John’s
53James Madison
54Seton Hall
55Mississippi
56Kansas St
57Ohio St
58Indiana
59Princeton
60Wake Forest
61Cincinnati
62Iowa
63Butler
64Virginia Tech
65Villanova
66Samford
67Richmond
68Duquesne
69McNeese St
70Utah
71Memphis
72Loyola-Chicago
73South Florida
74UNLV
75Florida St
76Boston College
77Xavier
78UCF
79LSU
80Georgia
81VCU
82Minnesota
83UAB
84Washington
85Bradley
86San Francisco
87Appalachian St
88Arkansas
89Rutgers
90Cornell
91Vermont
92Miami FL
93Col Charleston
94Maryland
95Penn St
96Georgia Tech
97St Joseph’s PA
98Yale
99USC
100UC Irvine
101St Bonaventure
102Santa Clara
103George Mason
104Massachusetts
Cheers.

Sweet 16 locks: Florida, Arizona (or Belmont), and Saint Louis

Last year my first round submission to Stat Geek Idol was “Predicting the Sweet Sixteen with a Classification Tree.”  My two big suggestions were to 1.) not get overly excited about Florida State or Michigan and 2.) the 14 seeds looked good.  Looking back, neither Florida State nor Michigan made it to the Sweet Sixteen, but none of the 14 seeds won any of their games.  (Though just about every other seed did including 9,10,11,12,13, and 15.  Nearly all the lower seeds EXCEPT for 14.)

So, I was right about some things and wrong about others.  You cant win them all.  Though I did get one strong endorsement from Tweeter @ClevTA who claims that my classification tree helped him win his pool, besting about 350 other entries.  Let’s look at what the classification tree predicts this year.

The first split is based on RPI of 0.6169.  Teams above this threshold will be the R groups (right hand side of the tree image below) and teams below the threshold with be the L groups.  Overall, in the years used to build the model, teams in the R group advanced to the Sweet Sixteen about 67 percent of the time, whereas teams in the L group advanced just less than 10 percent of the time.

The R group teams this year are Duke, New Mexico, Louisville, Miami (FL), Kansas, Gonzaga, Florida, Indiana, Michigan State, Georgetown, Ohio State, Marquette, Memphis, Syracuse, Arizona, North Carolina, Michigan, Kansas State, Belmont, Saint Louis.  All the other teams are in the L group.

CART2

The R teams

The R group teams can be broken down into 4 more sub-groups R1-R4.  Teams in the R1 group qualify for the Sweet Sixteen about 91% of the time and every single team in the R2 group has qualified for the Sweet Sixteen of in the years used to build the model (2007-2011).  On the other hand, teams in the R3 group have only qualified about 51% of the time, and no team in group R4 has qualified for the Sweet Sixteen between 2007 and 2011. So, who’s in each group:

R1 (91.18%): RPI >.643

(2)Duke, (3)New Mexico, (1)Louisville, (2)Miami (FL), (1)Kansas, (1)Gonzaga

R2 (100%): RPI>0.6169 & RPI <0.643 & Opp.Effective.Poss.Ratio<0.9147

(3)Florida, (6)Arizona, (11)Belmont, (4)Saint Louis

Of course all four of these teams can’t make it to the Sweet Sixteen as Arizona plays Belmont in the Second (nee First) round of the tournament.

R3 (51.61%): RPI>0.6169 & RPI <0.643 & Opp.Effective.Poss.Ratio>0.9147  & Avg.2nd.Half.Margin>2.998

(1)Indiana, (3)Michigan State, (2)Georgetown, (2)Ohio State,, (6)Memphis, (4)Syracuse,  (4)Michigan,

R4 (0%): RPI>0.6169 & RPI <0.643 & Opp.Effective.Poss.Ratio>0.9147  & Avg.2nd.Half.Margin<2.998

(3)Marquette, (4)Kansas State, (8) North Carolina [Correction: In the original post, I had Marquette and Kansas State in the R3 group.  They should be in the R4 group.]

The L teams

L1 (5.34%): RPI < 0.6169 and Assists.Turnovers<1.317

Notables: (5) Oklahoma State, (5) VCU, (5) UNLV, (6) Memphis, (6) Butler, (7) Illinois, (7) San Diego State

Butler and VCU have both advanced to the Sweet Sixteen out of this group before.

L2 (17.86%): First Split RPI < 0.6169 and Assists.Turnovers>1.317 and Opp Pct Pts From 2 >=.5133

(7)Notre Dame, (6)UCLA, (11)Bucknell, (7)Creighton, (9)Temple, (5)Wisconsin

L3 (66.67%): RPI < 0.6169 & Assists.Turnovers>1.317 & Opp Pct Pts From 2 <.5133

(8)Pittsburgh

2011 Results

Teams that Made the Sweet Sixteen in Bold

R1 (5/6): Syracuse, Michigan State, Kentucky, North Carolina, Kansas, Duke

R2 (1/1): Ohio State

R3 (3/7): Marquette, Baylor, Indiana, Missouri, Georgetown, Wichita State, Memphis

L1 (6/45): Wisconsin, Cincinnati, Louisville, Xavier, Ohio, North Carolina State, The remaining 39 teams.

L2 (1/9): Florida, St. Mary’s (CA), Notre Dame, Creighton, Purdue, California, South Dakota State, Belmont, Iona

Cheers.

March Madness Projections Updated – March 5, 2013

Full Rankings

Projected Seeds

Number 1 Seeds: Gonzaga, Indiana, Michigan, Duke

Last 4 in: Boise State, Wichita State, Virginia, Stanford

Last 4 out: California, La Salle, Arizona State, Baylor

Full Tournament Projection

Screen Shot 2013-03-05 at 3.47.29 PMCheers.