My completely uninformed guide to March Madness and some thoughts on my kaggle entry

I submitted my March Madness Machine Learning Mania today. My two entries consist of picks made using the actual spread for first round games and then a simple Bradley-Terry model for the games past that. In my second bracket, which I called the “aggressive” one, I picks UConn in the men’s bracket and South Carolina in the women’s bracket to win each round (and thus the tournament) with probability 1. So is UConn and South Carolina win, maybe I have a shot at winning. I also manually adjusted two teams on the women’s side (South Carolina and USC). The problem with South Carolina is that they haven’t lost any games and Bradley-Terry basically can’t handle that. I adjusted their regression coefficients to match the market price that they win the tournament. I also adjusted USC on the women’s side because they were way off their market price too. I didn’t make any adjustments to the men’s side because the futures prices were generally in the ball park with what I was estimating (i.e. there are three truly top teams in the men’s bracket (Purdue, UConn, Houston), then a sizable gap down to the next team (which I think is Iowa State. North Carolina got a gift of a 1 seed.)

I’m really excited about the new scoring system for Kaggle this year. And I think they got the scoring system right. A few weeks ago I believe I read that the scoring system was going to be average bracket score with traditional bracket scoring (1-2-4-8-16-32). My first thought when I saw this was that the best strategy is to just enter one bracket and hope. I think other people figured this out and they change it to a Brier score metric. But what they really got right this year is that that don’t take the average over all the GAMES, they take the average over the 6 ROUNDS. This weights the game in the finals much more heavily than a game in the first round, much closer to traditional bracket games.

Anyway, here are some probabilities below are based on my 10000 brackets that I submitted:

To win the championship:

UConn – 29.3%

Houston – 20.9%

Purdue – 19.4%

Iowa St – 5.19%

North Carolina – 4.96%

Tennessee – 3.39%

Marquette – 2.49%

Auburn – 1.94%

Illinois – 1.51%

Baylor – 1.43%

Everyone else < 1%

To make the finals:

UConn – 45.5%

Houston – 34.8%

Purdue – 32.4%

North Carolina – 12.6%

Iowa St – 11.4%

Tennessee – 7..92%

Marquette – 6.51%

Auburn – 5.8%

Illinois – 4.54%

Baylor – 4.34%

Arizona – 3.16%

Alabama – 2.68%

South Carolina – 2.46%

Kansas – 2.31%

Kentucky – 2.13%

Creighton 2.03%

Everyone else < 2%

And finally, here are my pre-tournament rankings 1 through UMass:

Rank
TeamName
1Connecticut
2Houston
3Purdue
4Iowa St
5North Carolina
6Tennessee
7Auburn
8Marquette
9Illinois
10South Carolina
11Baylor
12Kansas
13Utah St
14Creighton
15Arizona
16Duke
17Nevada
18Kentucky
19Alabama
20San Diego St
21BYU
22Florida
23Texas Tech
24New Mexico
25Wisconsin
26Gonzaga
27Nebraska
28Colorado St
29Dayton
30Clemson
31Virginia
32Mississippi St
33Boise St
34Texas
35St Mary’s CA
36TCU
37Drake
38Oklahoma
39Grand Canyon
40Northwestern
41Colorado
42Texas A&M
43Washington St
44Indiana St
45Pittsburgh
46NC State
47FL Atlantic
48Syracuse
49Providence
50Michigan St
51Oregon
52St John’s
53James Madison
54Seton Hall
55Mississippi
56Kansas St
57Ohio St
58Indiana
59Princeton
60Wake Forest
61Cincinnati
62Iowa
63Butler
64Virginia Tech
65Villanova
66Samford
67Richmond
68Duquesne
69McNeese St
70Utah
71Memphis
72Loyola-Chicago
73South Florida
74UNLV
75Florida St
76Boston College
77Xavier
78UCF
79LSU
80Georgia
81VCU
82Minnesota
83UAB
84Washington
85Bradley
86San Francisco
87Appalachian St
88Arkansas
89Rutgers
90Cornell
91Vermont
92Miami FL
93Col Charleston
94Maryland
95Penn St
96Georgia Tech
97St Joseph’s PA
98Yale
99USC
100UC Irvine
101St Bonaventure
102Santa Clara
103George Mason
104Massachusetts
Cheers.

NCAA basketball squares

We all know about Super Bowl squares. But last night my friend asked me to be in an NCAA squares pool. I asked him how it worked and he explained to me that it’s the same as a Super Bowl pool, but you have the same numbers for all 63 games (not the stupid F-ing play-in games). So naturally, I wanted to know what the best numbers were. Here are the results based n data from 1985 – 2021, which is just the data that I happened find on my computer first.

The worst set of numbers are the ones that are the same. Since it’s based on only final score, the only way these numbers will hit if a team wins by exactly 10, 20, 30, etc. Also, note that in the pct column, if all squares were equally likely, the pct should be exactly 1% for every square. So even in the worst case (i.e. 1-1), the percentage isn’t that far off from what you would expect if each square was equally likely.

The best numbers are combinations that are all 2 or 3 digits away with 1-8, 0-8, 3-1, 5-2 and 9-6 being the top five all with probabilities that are essentially the same.

To better see the pattern, I made a 2d histogram for all the scores. Here, red, white, and green indicate combinations that are worse, about the same, and better than 1%. You can see the dark green band running up the diagonal of combinations that are 2 and 3 apart.

And just for comparison sake, here is the same plot with NFL scores:

Happy gambling in March!

Cheers.

STRAIN and Editors Choice Collection

My article “Here Comes the STRAIN: Analyzing Defensive Pass Rush in American Football with Player Tracking Data” with Quang Nguyen and Ron Yurko was selected for inclusion in the AMSTAT News Editor’s Choice Collection. This means you can read the full article for free (Note: You can always read any of my articles for free. If you just contact me, I’ll send you a free copy. Never ever ever pay the publisher fort access to my work, please).

While glancing through the list there were a few articles that I am particulars interested in reading:

Cheers.

Covid-19 and myocarditis

“People on the internet” (i’m talking about the anti-vax morons) scream about how myocarditis is a side effect of the vaccine, but big pharma doesn’t want you to know about. Chief, it’s literally listed on the CDC website. 

And if you are that worried about myocarditis, you know what causes myocarditis at much higher rates: Actual Covid-19!

Cheers.

The thing about multiple hypothesis testing that has always bothered me

Here is what’s always bothered me about multiple hypothesis testing.

Let’s say there are 10 hypothesis tests and they come back and 3 of them are significant at the 0.05 level. However, after correcting for multiple test using, for simplicity, a Bonferroni correction, none of these tests are significant. (Assume that it doesn’t matter what method you use to correct, but after correction you find nothing significant). So when an individual researcher does these tests together, they have to report that they found nothing significant. 

Now let’s say that 10 different researchers do one of these ten tests each, and they get the exact same p-values. Now 3 of these researchers will get “significant” results because they are only doing one test. So three of these researchers publish their results. 

It’s the same exact set of tests with the same exact p-values. But if a single researcher does it, there is nothing significant. And if they did report something significant they would be accused of p-hacking (and rightly so). But if 10 different independent researchers each do one of the tests, they will come up with 3 out of the 10 tests as significant. Same data. Same results. Same p-values. Different conclusions based on who performed the test. Weird, right? 

Cheers.

Brotherly Shove for the other 31 teams

The “Tush Push” in Philadelphia is called the “Brotherly Shove”. Here are the names I’ve come up with for the other 31 teams. I have no idea for a bunch of these. Please offer suggestions. 

  • Patriots: The Revolutionary Score
  • Jets: The Jet Pack
  • Bills: The Queen Careen (SNF just called this the “Buffalo Bobsled”. Mine is better.)
  • Dolphins: The Flipper Ripper
  • Steelers: Rust Thrust
  • Bengals: Tiger Pile
  • Ravens: Charm Offensive
  • Browns: The Dog Pound
  • Texans: 
  • Jaguars: 
  • Titans: 
  • Colts: The Colt Jolt
  • Chiefs: 
  • Broncos: The Pile High 
  • Chargers: The Hectic Electric
  • Raiders: The Silver and Black Stack
  • Cowboys: The Plow Ploy 
  • Commanders: 
  • Eagles: Brotherly Shove
  • Giants: The Big Grapple
  • Bears: 
  • Packers: Cheese Squeeze
  • Vikings: The Skol Stroll
  • Lions: The Roar for One More
  • Saints: The Big Easy First Down
  • Falcons: Peach Reach
  • Panthers: The Carolina Creeper
  • Buccaneers: The Buc Tuck
  • Rams:  Ram Cram
  • Seahawks: 
  • 49ers: The Gold Rush
  • Cardinals: The Card Yard

As a math professor, I dislike the NFL tie breaker rules

Let’s say there are three generic teams A, B, and C, and I ask you which team is better: A or B. And let’s say you rank them:

  1. A
  2. B

Great. 

Now let’s say I ask you to rank A, B, and C. The only three reasonable rankings are:

  1. C
  2. A
  3. B

OR

  1. A
  2. C
  3. B

OR

  1. A
  2. B
  3. C

Why? Because you already told me that A was better than B head to head, so why would the addition of team C in the rankings switch the order of A and B. This condition is called Independence of Irrelevant Actors (I’ve written about violations of this in Olympic Sport Climbing). 

So how is this related to the NFL tiebreakers? Consider this case:

Going into week 18 this year (2023-24 season), if the Jaguars and Steelers both lost their week 18 games (The Steelers have already won, so this is moot) they would both be 9-8. So which of these two teams gets into the playoffs? The answer is that it depends on the result of the Broncos vs Raiders gamefeaturing two teams that are both eliminated from the playoffs! This meant that there was the possibility that the Broncos vs Raiders was a de facto playoff game…….for the Jaguars vs Steelers with the Broncos representing the Steelers and the Raiders representing the Jaguars! 

How is this possible? In two way ties, the NFL tie breaker is based on head to head results, and the Jaguars beat the Steelers on October 29 so the Jaguars are better than the Steelers head to head. And if the Raiders beat the Broncos, both those teams would finish 8-9 and the Jaguars and Steelers would be in a two way tie. If the Broncos won, that would put the Broncos at 8-9 and in a three way tie with the Steelers and Jaguars. In a three (or more) way tie, you apply the tie breakers and eliminate the bottom teams repeating the procedures starting at step 1 after each elimination. And in the three way tie, the Jaguars are eliminated first based on on strength of victory (all three teams would have been 6-6 in conference play). With the Jaguars eliminated, the Steelers best Denver based on head to head tie breakers. 

What this means is that we have the following rankings of teams with 8-9 records:

  1. Jaguars
  2. Steelers

AND

  1. Steelers
  2. Broncos
  3. Jaguars

(This actually COULD make sense if the Broncos win affected some tie breaking metric with the Jaguars, but that is not what is happening here).

I don’t like this. What I’d like to see (and no one cares what I think, but I’ll tell you anyway) is a tie breaking metric that doesn’t change based on how many teams are tied; the ordering should be consistent whether it’s two teams or ten teams. So, without thinking very much about it, here are some tie breaking procedures that I think would be better:

  • Go straight to conference record. All teams will have the same number of conference games and the number of wins here will give the same ranking when comparing head to head or 3+ way ties.
  • Give ties to the team with more road wins. This would have worked better in the 16 game schedule as all teams played 8 games each at home and away. Now with a 17 game schedule, you would have to do winning percentage in road games, but this isn’t that big of a deal. Again, this would give the same ranking in head to head ties or 3+ way ties. And this is similar to one way that chess breaks ties: if two or more players are tied, the player with the better record as black in the winner. 
  • You could just use strength of victory as a first step tie breaker. The only possible issue with this is that there could be situations where your strength of victory is affected by two teams already eliminated from the playoffs in such a way that you end up in the exact same situation that I describe here that I don’t like. So this wouldn’t be my first choice. 

My major point here is this: There shouldn’t be separate head to head tie breaking procedures and 3+ way tie breaking procedures. This can lead to very strange situations like the one I describe here. This would also greatly simplify understanding of tie breakers for fans. Right now, the tie breaking procedure can get really complicated. So what I’m advocating for is that the NFL pick a few metrics and use those as the tie breaking procedure for everything. I should be able to write the NFL tie breaking procedure by using arrange and just specifying an ordered list of variables (like the EPL (though EPL now has head-to-head tie breaks but further down the list of tie breakers). 

Cheers.

NFL Playoff Scenarios and fun with tree diagrams!

Five AFC teams have already clinched playoff spots: Ravens, Dolphins, Chiefs, Texans, and Browns. This means there are two playoff spots left in the AFC. With the Steelers and Texans winning yesterday, the AFC is now pretty simple and there are only two games today that matter in terms of playoff qualification: Dolphins vs Bill and Jaguars vs Titans. Below you can see a tree diagram of all the possible outcomes for the two remaining teams. 

The NFC is….more complicated:

Five teams have already clinched playoff sports in the NFC: 49ers, Cowboys, Lions, Eagles, and Rams. This leaves two spots available for six teams. This means that five of the eight NFC games today have playoff qualification implication. The three games that don’t matter are the two NFC East games and the Rams vs 49ers. My particular favorite part of the NFC tree diagram is the Vikings playoff scenarios. All they need to do to make the playoffs is to beat the Lions and one of the following:

  1. Bears, Cardinals, and Falcons all win
  2. Bears, Cardinals, and Panthers win

This means that the Vikings are relying on four teams, all with losing records (7-9 Bears, the 4-12 Cardinals, the 2-14 Panthers, and the 7-9 Falcons), to all win today to make the playoffs. And also the Vikings have to beat the Lions. Skol! 

Cheers.

Eugenics

Cheers.

“Four Researchers”

I was mentioned on ESPN today. Well, not me. But my work. I am one of the “four” researchers (That paper has three authors….) they are talking about!

I’m a pretty big deal.

Cheers.