My Kaggle Elite 8 probs for Men and Women

Lost a few spots yesterday. Currently in 75th. I still need UConn and South Carolina to win in all in the Men’s and Women’s tournament, respectively.

Pre tournament probabilities that the remaining men’s teams would reach the Final 4:

UConn: 56.0% (100% in my aggressive Bracket)

Purdue: 51.8%

Tennessee: 18.2%

Alabama: 10.5%

Illinois: 8.11% (0% in aggressive Bracket)

Duke: 6.23%

Clemson: 2.87%

NC State: 0.87%

Pre tournament probabilities that the remaining men’s teams would reach the Final 4:

South Carolina: 82.5%% (100% in my aggressive Bracket)

Oregon St: 8.09% (0% in my aggressive Bracket)

Texas: 28.0%

NC State: 13.8%

LSU: 18.0%

Iowa: 28.4%

UConn: 27.5% / Duke: 1.37%

USC: 33.4% /Baylor: 4.21%

Cheers.

Kaggle March Madness Update – March 26, 2024

I thought I finally understood the scoring for this contest, but I can’t reproduce my score on the leaderboard. So I have no idea. But I’m at 141 after the first weekend. Unlikely to cash, looking to just finish top 100.

Sweet Sixteen probabilities for women: (Men’s Sweet Sixteen probabilities are here):

South Carolina: 91.9% (100% in aggressive bracket)
Indiana: 5.83% (0% in aggressive bracket)

Oregon St: 50.1%
Notre Dame: 41.9%

Gonzaga: 21.3%
Texas: 56.8%

NC State: 27.9%
Stanford: 65.4%

USC: 60.1%
Baylor: 12.2%

Duke: 5.1%
UConn: 47.2%

Colorado: 17.1%
Iowa: 56.6%

LSU: 34.0%
UCLA: 57.2%

Cheers.

Kaggle March Madness Update

Well, I’ve gone up in the rankings every scoring update. From 233rd to 170th to currently in 126th. Just about top 15%. That’s positive. And with the new scoring system, there is a ton of room to move up (and DOWN) very quickly.

Probabilities for women’s games today (remember these are pre-tournament probabilities that these teams would reach the Sweet Sixteen, NOT single game win probabilities):

Notre Dame: 79.33%

Ole Miss: 17.45%

NC State: 78.33%

Tennessee: 20.37%

Syracuse: 18.67%

UConn: 76.76%

Oklahoma: 26.11%

Indiana: 69.29%

West Virginia: 11.69%

Iowa: 83.15%

Creighton: 9.48%

UCLA: 85.15%

Kansas: 9.57%

USC: 85.8%

Utah: 40.58%

Gonzaga: 55.04%

And here are my probabilities for the men’s games next Thursday and Friday:

Thursday

Clemson: 7.75%

Arizona: 26.47%

San Diego St: 6.04% (0% in my aggressive bracket)

UConn: 71.53% (100% in my aggressive bracket)

Alabama: 21.37%

North Carolina: 51.43%

Illinois: 27.09%

Iowa St: 46.87%

Friday

NC State: 4.52%

Marquette: 38.57%

Gonzaga: 6.86%

Purdue: 69.26%

Duke: 12.87%

Houston: 69.82%

Creighton: 21.28%

Tennessee: 44.22%

In my aggressive bracket I have South Carolina and UConn winning the men’s and women’s titles with probability 100%. If that happens I pick up 0 loss for all those games in all 6 rounds. And 0 loss in the finals would be absolutely massive in reducing my score as it would be a full 1/6th of my score with 0 loss. So let’s go UConn and South Carolina!

Cheers.

March 24 Kaggle Probabilities

Currently in 170th. Better than yesterday.

My Kaggle probabilities for the men’s games today:
(Remember, these aren’t probabilities that these teams will win this particular game, it’s the pre-tournament probability that they would make it to a particular round.)

Colorado: 18.3%
Marquette: 60.27%

Purdue: 84.1%
Utah St: 7.62%

James Madison: 11.67%
Duke: 50.77%

Clemson: 17.55%
Baylor: 53.01%

Grand Canyon: 13.95%
Alabama: 49.01%

Northwestern: 4.51% (0% in my aggressive bracket)
UConn: 88.83% (100% in aggressive bracket)

Texas A&M: 6.36%
Houston: 83.74%

Yale: 3.25%
San Diego St: 31.47%

My Kaggle probabilities for the women’s games today:
(Remember, these aren’t probabilities that these teams will win this particular game, it’s the pre-tournament probability that they would make it to a particular round.)

Colorado: 42.38%
Kansas St: 53.03%

South Carolina: 97.7% (100% in aggressive bracket)
North Carolina: 0.79% (0% in my aggressive bracket)

Ohio St: 77.5%
Duke: 19.72%

MTSU: 2.73%
LSU: 78.24%

Nebraska: 10.95%
Oregon St: 83.56%

Alabama: 10.63%
Texas: 82.88%

Iowa St: 6.96%
Stanford: 89.52%

Baylor: 36.57%
Virginia Tech: 57.18%

Go UConn.

Cheers.

NCAA Tournament Stuff – March 23,2024

The real stories everyone should be talking about are how David Carr beat Keegan O’Toole in the semi-finals of the NCAA wrestling tournament at 165, Carter Starocci had to wrestle former national champions back to back in the quarter- (Mekhi Lewis) and semi-finals (Shane Griffith) and he beat them both, and Drake Ayala of Iowa made the finals keep Iowa’s streak of THIRTY THREE years in a row with a national finalist alive.

But you probably want to read about basketball, huh?

So here’s a fun, simple data viz of wins by seed:

Below is a plot of wins by seed in the first round of the men’s NCAA tournament. The 6 and 8 seeds combined for only two total wins ((6)Clemson and (8)Utah State). At least one of every seed 1 through 14 advanced and seed 9 through 12 went went a combined 9-7. In total the lower seed won 11 of the first 32 games.

As for the Kaggle contest, I’m currently in 233 out of 820 (top 30%). Not great, but also not terrible either. There should be a lot more shuffling of the leaderboards in these later rounds with the changes in scoring, so it’s going to be way more exciting than past contests, I think. At least in terms of chaos.

Here are my probabilities for the women’s games today:

Tennessee: 78.2%
UConn: 100%
Kansas: 55.6%
Indiana: 98.0%
Notre Dame: 99.1%
NC State: 98.4%
Iowa: 100%
Syracuse: 72.0%
Oklahoma: 76.0%
USC: 99.3%
Ole Miss: 72.7%
West Virginia: 63.0%
Creighton: 56.2%
Gonzaga: 96.9%
UCLA: 99.0%
Utah: 76.0%

And here are my Kaggle probabilities for the men’s games today:
(Remember, these aren’t probabilities that these teams will win this particular game, it’s the pre-tournament probability that they would make it to a particular round.)

Arizona: 51.6% 
Dayton: 19.5%

Kansas: 47.4%
Gonzaga: 35.1%

UNC: 73.9%
Michigan St: 13.3%

NC State: 14.0%
Oakland: 1.78%

Iowa St: 71.6%
Washington St: 11.9%

Tennessee: 69.5%
Texas: 17.7%

Illinois: 55.5%
Duquesne: 5.83%

Creighton: 47.5%
Oregon: 16.8%

Finally, and most importantly, my numbers (8 and 2) hit in the Utah State-TCU game in NCAA sqaures. So worst case scenario, I only lose half my square buy-in. Lettttttt’s gooooooooooooo.

Cheers.

My completely uninformed guide to March Madness and some thoughts on my kaggle entry

I submitted my March Madness Machine Learning Mania today. My two entries consist of picks made using the actual spread for first round games and then a simple Bradley-Terry model for the games past that. In my second bracket, which I called the “aggressive” one, I picks UConn in the men’s bracket and South Carolina in the women’s bracket to win each round (and thus the tournament) with probability 1. So is UConn and South Carolina win, maybe I have a shot at winning. I also manually adjusted two teams on the women’s side (South Carolina and USC). The problem with South Carolina is that they haven’t lost any games and Bradley-Terry basically can’t handle that. I adjusted their regression coefficients to match the market price that they win the tournament. I also adjusted USC on the women’s side because they were way off their market price too. I didn’t make any adjustments to the men’s side because the futures prices were generally in the ball park with what I was estimating (i.e. there are three truly top teams in the men’s bracket (Purdue, UConn, Houston), then a sizable gap down to the next team (which I think is Iowa State. North Carolina got a gift of a 1 seed.)

I’m really excited about the new scoring system for Kaggle this year. And I think they got the scoring system right. A few weeks ago I believe I read that the scoring system was going to be average bracket score with traditional bracket scoring (1-2-4-8-16-32). My first thought when I saw this was that the best strategy is to just enter one bracket and hope. I think other people figured this out and they change it to a Brier score metric. But what they really got right this year is that that don’t take the average over all the GAMES, they take the average over the 6 ROUNDS. This weights the game in the finals much more heavily than a game in the first round, much closer to traditional bracket games.

Anyway, here are some probabilities below are based on my 10000 brackets that I submitted:

To win the championship:

UConn – 29.3%

Houston – 20.9%

Purdue – 19.4%

Iowa St – 5.19%

North Carolina – 4.96%

Tennessee – 3.39%

Marquette – 2.49%

Auburn – 1.94%

Illinois – 1.51%

Baylor – 1.43%

Everyone else < 1%

To make the finals:

UConn – 45.5%

Houston – 34.8%

Purdue – 32.4%

North Carolina – 12.6%

Iowa St – 11.4%

Tennessee – 7..92%

Marquette – 6.51%

Auburn – 5.8%

Illinois – 4.54%

Baylor – 4.34%

Arizona – 3.16%

Alabama – 2.68%

South Carolina – 2.46%

Kansas – 2.31%

Kentucky – 2.13%

Creighton 2.03%

Everyone else < 2%

And finally, here are my pre-tournament rankings 1 through UMass:

Rank
TeamName
1Connecticut
2Houston
3Purdue
4Iowa St
5North Carolina
6Tennessee
7Auburn
8Marquette
9Illinois
10South Carolina
11Baylor
12Kansas
13Utah St
14Creighton
15Arizona
16Duke
17Nevada
18Kentucky
19Alabama
20San Diego St
21BYU
22Florida
23Texas Tech
24New Mexico
25Wisconsin
26Gonzaga
27Nebraska
28Colorado St
29Dayton
30Clemson
31Virginia
32Mississippi St
33Boise St
34Texas
35St Mary’s CA
36TCU
37Drake
38Oklahoma
39Grand Canyon
40Northwestern
41Colorado
42Texas A&M
43Washington St
44Indiana St
45Pittsburgh
46NC State
47FL Atlantic
48Syracuse
49Providence
50Michigan St
51Oregon
52St John’s
53James Madison
54Seton Hall
55Mississippi
56Kansas St
57Ohio St
58Indiana
59Princeton
60Wake Forest
61Cincinnati
62Iowa
63Butler
64Virginia Tech
65Villanova
66Samford
67Richmond
68Duquesne
69McNeese St
70Utah
71Memphis
72Loyola-Chicago
73South Florida
74UNLV
75Florida St
76Boston College
77Xavier
78UCF
79LSU
80Georgia
81VCU
82Minnesota
83UAB
84Washington
85Bradley
86San Francisco
87Appalachian St
88Arkansas
89Rutgers
90Cornell
91Vermont
92Miami FL
93Col Charleston
94Maryland
95Penn St
96Georgia Tech
97St Joseph’s PA
98Yale
99USC
100UC Irvine
101St Bonaventure
102Santa Clara
103George Mason
104Massachusetts
Cheers.

NCAA basketball squares

We all know about Super Bowl squares. But last night my friend asked me to be in an NCAA squares pool. I asked him how it worked and he explained to me that it’s the same as a Super Bowl pool, but you have the same numbers for all 63 games (not the stupid F-ing play-in games). So naturally, I wanted to know what the best numbers were. Here are the results based n data from 1985 – 2021, which is just the data that I happened find on my computer first.

The worst set of numbers are the ones that are the same. Since it’s based on only final score, the only way these numbers will hit if a team wins by exactly 10, 20, 30, etc. Also, note that in the pct column, if all squares were equally likely, the pct should be exactly 1% for every square. So even in the worst case (i.e. 1-1), the percentage isn’t that far off from what you would expect if each square was equally likely.

The best numbers are combinations that are all 2 or 3 digits away with 1-8, 0-8, 3-1, 5-2 and 9-6 being the top five all with probabilities that are essentially the same.

To better see the pattern, I made a 2d histogram for all the scores. Here, red, white, and green indicate combinations that are worse, about the same, and better than 1%. You can see the dark green band running up the diagonal of combinations that are 2 and 3 apart.

And just for comparison sake, here is the same plot with NFL scores:

Happy gambling in March!

Cheers.

STRAIN and Editors Choice Collection

My article “Here Comes the STRAIN: Analyzing Defensive Pass Rush in American Football with Player Tracking Data” with Quang Nguyen and Ron Yurko was selected for inclusion in the AMSTAT News Editor’s Choice Collection. This means you can read the full article for free (Note: You can always read any of my articles for free. If you just contact me, I’ll send you a free copy. Never ever ever pay the publisher fort access to my work, please).

While glancing through the list there were a few articles that I am particulars interested in reading:

Cheers.

Covid-19 and myocarditis

“People on the internet” (i’m talking about the anti-vax morons) scream about how myocarditis is a side effect of the vaccine, but big pharma doesn’t want you to know about. Chief, it’s literally listed on the CDC website. 

And if you are that worried about myocarditis, you know what causes myocarditis at much higher rates: Actual Covid-19!

Cheers.

The thing about multiple hypothesis testing that has always bothered me

Here is what’s always bothered me about multiple hypothesis testing.

Let’s say there are 10 hypothesis tests and they come back and 3 of them are significant at the 0.05 level. However, after correcting for multiple test using, for simplicity, a Bonferroni correction, none of these tests are significant. (Assume that it doesn’t matter what method you use to correct, but after correction you find nothing significant). So when an individual researcher does these tests together, they have to report that they found nothing significant. 

Now let’s say that 10 different researchers do one of these ten tests each, and they get the exact same p-values. Now 3 of these researchers will get “significant” results because they are only doing one test. So three of these researchers publish their results. 

It’s the same exact set of tests with the same exact p-values. But if a single researcher does it, there is nothing significant. And if they did report something significant they would be accused of p-hacking (and rightly so). But if 10 different independent researchers each do one of the tests, they will come up with 3 out of the 10 tests as significant. Same data. Same results. Same p-values. Different conclusions based on who performed the test. Weird, right? 

Cheers.