Stats in the Wild

From flight stats, describing a flight that is the first leg of a two-leg itinerary I’m flying in the near future – obviously this is the sort of flight where one is interested in knowing whether it tends to be on time, because one does not like being stuck in Charlotte:

This flight has an on-time performance of 84%. Statistically, when controlling for sample size, standard deviation, and mean, this flight is on-time more often than 95% of other flights.

I didn’t realize one could control for standard deviation and mean.

(Presumably controlling for “sample size” could mean some Bayesian approach, where if there is a small amount of data for a flight they tend to give moderate predictions. This is probably not too influential as flight stats uses a sixty-day window.)

View original post

Posted in Uncategorized

Leave a comment

NCAA Basketball Rankings – 2/26/2013

Feb 26

Posted by statsinthewild

Updated 2-26-2013 at 12:19am

Resume ranks the teams based on what they have actually accomplished this season. Predictor ranks teams based on how well they are expected to perform in the future. Seed give the projected tournament seed for a team.

Teams	W	L	Conf	Resume	Predictor	Seed
indiana	24	3	big-ten	1	1	1
michigan	23	4	big-ten	2	5	1
duke	24	3	acc	3	6	1
gonzaga	27	2	wcc	4	4	1
arizona	23	4	pac-12	5	10	2
florida	22	4	sec	6	2	2
kansas	23	4	big-12	7	11	2
louisville	22	5	big-east	8	3	2
miami fl	22	4	acc	9	20	3
kansas state	22	5	big-12	10	41	3
georgetown	21	4	big-east	11	28	3
michigan state	22	6	big-ten	12	21	3
pittsburgh	21	7	big-east	13	8	4
syracuse	22	5	big-east	14	7	4
ohio state	20	7	big-ten	15	12	4
wisconsin	19	8	big-ten	16	9	4
new mexico	23	4	mwc	17	46	5
cincinnati	19	9	big-east	18	14	5
marquette	19	7	big-east	19	43	5
san diego state	20	7	mwc	20	36	5
oklahoma state	20	6	big-12	21	19	6
oklahoma	18	8	big-12	22	49	6
memphis	24	3	cusa	23	32	6
butler	22	6	atlantic-10	24	45	6
saint louis	21	5	atlantic-10	25	35	7

Full Rankings

Posted in Uncategorized

Leave a comment

Feb 18

Posted by statsinthewild

Normal Deviate

STATISTICS DECLARES WAR ON MACHINE LEARNING!

Well I hope the dramatic title caught your attention. Now I can get to the real topic of the post, which is: finite sample bounds versus asymptotic approximations.

In my last post I discussed Normal limiting approximations. One commenter, Csaba Szepesvari, wrote the following interesting comment:

What still surprises me about statistics or the way statisticians do their business is the following: The Berry-Esseen theorem says that a confidence interval chosen based on the CLT is possibly shorter by a good amount of $latex {c/\sqrt{n}}&fg=000000$. Despite this, statisticians keep telling me that they prefer their “shorter” CLT-based confidence intervals to ones derived by using finite-sample tail inequalities that we, “machine learning people prefer” (lies vs. honesty?). I could never understood the logic behind this reasoning and I am wondering if I am missing something. One possible answer is that the Berry-Esseen result could be…

View original post 556 more words

Posted in Uncategorized

Leave a comment

March Madness Preview

Feb 17

Posted by statsinthewild

Using data from the 2012-2013 NCAA basketball season, I’ve ranked all of the division 1 teams in two ways. First, I have build a retrospective model for the season ranking all of the teams based on what they have actually accomplished so far this season. These rankings give weight to individuals games based on margin of victory and weight strength of schedule a little bit more heavily than most models. I used these rankings to create a tournament bracket by taking the highest rated team from each conference plus the next 37 highest rated teams as at large bids. Once this bracket was created, I used my prospective rankings to predict the games. The results are here.

I’m sure everyone out there will let me know what I got wrong.

Cheers.

Posted in Uncategorized

Leave a comment

NCAA Basketball Rankings – 2/9/2013

Feb 9

Posted by statsinthewild

Updated 2-9-2013 at 12:07am

Rank	Team	Conf	Record	Score
1	MICHIGAN	big10	21-2	86.86
2	MIAMI-FLORIDA	acc	18-3	86.35
3	INDIANA	big10	20-3	86.12
4	DUKE	acc	20-2	85.96
5	FLORIDA	sec	18-3	84.93
6	ARIZONA	pac10	20-2	84.4
7	KANSAS	big12	19-3	84.16
8	LOUISVILLE	bigeast	19-4	84.13
9	SYRACUSE	bigeast	19-3	83.24
10	PITTSBURGH	bigeast	19-5	82.62
11	GONZAGA	wcc	22-2	81.96
12	MINNESOTA	big10	17-6	81.7
13	MICHIGAN STATE	big10	19-4	81.69
14	GEORGETOWN	bigeast	16-4	80.56
15	OHIO STATE	big10	17-5	80.19
16	CINCINNATI	bigeast	18-5	80.15
17	NEW MEXICO	mountwest	20-3	79.54
18	MARQUETTE	bigeast	16-5	79.35
19	CREIGHTON	mvc	20-4	78.92
20	COLORADO STATE	mountwest	19-4	78.92
21	NOTRE DAME	bigeast	18-5	78.56
22	OKLAHOMA STATE	big12	16-5	77.95
23	UCLA	pac10	17-6	77.83
24	NC STATE	acc	16-7	77.43
25	WISCONSIN	big10	16-7	77.15

Full Rankings

Posted in Uncategorized

Leave a comment

NCAA Basketball

Feb 3

Posted by statsinthewild

Index	Home	Away	HomePred	AwayPred	Home Win
1	army	lehigh	68	79	0.05
2	connecticut	south florida	63	60	0.65
3	georgia tech	virginia	52	58	0.16
4	illinois	wisconsin	59	66	0.15
5	louisville	marquette	72	64	0.89
6	manhattan	saint peters	62	57	0.77
7	marist	rider	67	71	0.27
8	mcneese state	northwestern state	70	74	0.29
9	minnesota	iowa	76	69	0.83
10	stanford	oregon state	75	70	0.76
11	villanova	providence	70	70	0.51

Posted in Uncategorized

Leave a comment

NCAA Basketball Rankings – 2/3/2012

Feb 3

Posted by statsinthewild

Updated 2/3/2013 at 12:34pm

Indiana reclaims the top rankings after defeating Michigan last night, while previous number 2 Kansas falls three spots to number 5 after a loss to Oklahoma State.

Oregon, Wichita State, and Colorado State all fell out of the top 25. Oregon and Wichita State are both on tow game losing streaks.

Oklahoma State jumps into the top 25 after beating Kansas (at Kansas!) along with UNLV and New Mexico.

Rank	Team	Conf	Record	Score
1	INDIANA	big10	20-2	86.7
2	MICHIGAN	big10	20-2	86.49
3	FLORIDA	sec	18-2	86.05
4	MIAMI-FLORIDA	acc	17-3	85.61
5	KANSAS	big12	19-2	85.27
6	DUKE	acc	19-2	85.22
7	ARIZONA	pac10	19-2	84.22
8	LOUISVILLE	bigeast	17-4	82.27
9	PITTSBURGH	bigeast	18-5	82.25
10	SYRACUSE	bigeast	18-3	82.23
11	MINNESOTA	big10	16-5	81.94
12	CINCINNATI	bigeast	18-4	81.08
13	OHIO STATE	big10	17-4	80.93
14	CREIGHTON	mvc	20-3	80.73
15	MICHIGAN STATE	big10	18-4	80.65
16	GEORGETOWN	bigeast	16-4	80.61
17	GONZAGA	wcc	21-2	80.43
18	MARQUETTE	bigeast	15-4	79.79
19	NOTRE DAME	bigeast	18-4	79.75
20	COLORADO STATE	mountwest	18-4	78.82
21	NEW MEXICO	mountwest	19-3	78.54
22	NC STATE	acc	16-6	78.19
23	OKLAHOMA STATE	big12	15-5	77.55
24	UCLA	pac10	16-6	77.4
25	UNLV	mountwest	17-5	76.94

Full Rankings

Posted in Uncategorized

Leave a comment

What time does the Superbowl start? (and predictions)

Feb 2

Posted by statsinthewild

6:30.

I’ve previously released my Super Bowl pick here, but I’ve also decided to release my forecast for the box score of the game and some visualizations of the distributions of team scoring, totals, and margin of victory as a preview for what I’m going to try to do in the 2013 season.

So, here is my predicted box score of the game:

Team	Score	First Downs	Rushing Yards	Passing Yards	Total Yards	Turnovers
49ers	23.3	19.5	149.2	201.8	351.0	1.48
Ravens	20.2	18.4	108.1	223.3	331.4	1.59

Some selected probabilities:

Team	Win	Cover (4.5)	Cover (3.5)	Win 10 or more	Overtime	Over/Under (47)
49ers	63.2%	43.5%	48.3%	24.5%	4.9%	O 29.5%
Ravens	36.8%	56.5%	51.7%	8.5%	4.9%	U 66.2%

Cheers.

Posted in Uncategorized

Leave a comment

Feb 1

Posted by statsinthewild

I didn’t know this either!

Rmazing

I have been working with R for some time now, but once in a while, basic functions catch my eye that I was not aware of…
For some project I wanted to transform a correlation matrix into a covariance matrix. Now, since cor2cov does not exist, I thought about “reversing” the cov2cor function (stats:::cov2cor).
Inside the code of this function, a specific line jumped into my retina:

What’s this [ ]?

Well, it stands for every element $latex E_{ij}$ of matrix $latex E$. Consider this:

> mat
     [,1] [,2] [,3] [,4] [,5]
[1,]   NA   NA   NA   NA   NA
[2,]   NA   NA   NA   NA   NA
[3,]   NA   NA   NA   NA   NA
[4,]   NA   NA   NA   NA   NA
[5,]   NA   NA   NA   NA   NA

With the empty bracket, we can now substitute ALL values by a new value:

> mat [,1] [,2] [,3] [,4] [,5] [1,] 1 1 1 1…

View original post 55 more words

Posted in Uncategorized

Leave a comment

Feb 1

Posted by statsinthewild

Hilary: the most poisoned baby name in US history

Hilary Parker

I’ve always had a special fondness for my name, which — according to Ryan Gosling in “Lars and the Real Girl” — is a scientific fact for most people (Ryan Gosling constitutes scientific proof in my book). Plus, the root word for Hilary is the Latin word “hilarius” meaning cheerful and merry, which is the same root word for “hilarious” and “exhilarating.” It’s a great name.

Several years ago I came across this blog post, which provides a cursory analysis for why “Hillary” is the most poisoned name of all time. The author is careful not to comment on the details of why “Hillary” may have been poisoned right around 1992, but I’ll go ahead and make the bold causal conclusion that it’s because that was the year that Bill Clinton was elected, and thus the year Hillary Clinton entered the public sphere and was generally reviled for not wanting to…

View original post 1,430 more words

Posted in Uncategorized

Leave a comment

Stats in the Wild

NCAA Basketball Rankings – 2/26/2013

March Madness Preview

NCAA Basketball Rankings – 2/9/2013

NCAA Basketball

NCAA Basketball Rankings – 2/3/2012

What time does the Superbowl start? (and predictions)

Blogroll

Comedy

Data Art

Data Viz

Jobs

R

Tag Cloud