Category Archives: Uncategorized

openWAR in 2014

bbaumer21's avatarExploring Baseball Data with R

Over at Stats In the Wild, my collaborator Greg Matthews has been monitoring the results of openWAR for the current season. How is he doing this?

The first step, of course, is to load the openWAR package.

Getting the 2014 Data

Now we need some data. While data from the previous two years is bundled into the openWAR package, data from the current season is not – so we’ll have to download it. We have made this as painless as possible. All we have to do is tell the getData() function the time interval over which we want to download game data, and it will do the rest. In this case, we want all games from the season opener played by the Dodgers and Diamondbacks in Australia on March 22nd, through today’s games.

Warning: this will take a while to run – possibly an hour or so.

Since we…

View original post 640 more words

The Cubs grounds crew was short staffed because the Cubs were trying to avoid Obamacare

2014 NFL Preview

The 2014 NFL season is almost here.  That means it’s time for my NFL preview  (2013 NFL Season Preview).  I’ve updated and tweaked my model this year so it’s hopefully better than last year. Below you can read my predictions for the 2014 season. Or if you just want to see the numbers and skip my terrible comparisons of NFL divisions to different countries economies, jump to the bottom. Cheers.

AFC

East

The AFC east the story of America and the Patriots are the 1%-ers.  The rest of the division is the 99%.  In that, I’m 99% sure that none of these other teams will make the play-offs this year.  Pick: Patriots win the division by 7 games.

North

If the AFC east is all about inequality, the the AFC North is a socialist paradise.  Like the Finland.  I have 3 teams going 9-7 and all making the play-offs with Cleveland being the exception.  (Cleveland is like a poor guy in Finland; You won’t read that sentence anywhere else.  I promise you that.)  Though if this division ended with four 8-8 teams I wouldn’t be surprised at all.  Pick: Baltimore (Or Pittsburgh) (Or Cincinnati) (In Finland, everyone makes the playoffs).  

South

Want another economy comparison?  Well you’re getting one.  The AFC South is like India.  The have people who are rich (Houston), but not rich, rich like America (New England), they have a developing middle class that is good not great (Indianapolis and Tennessee) and absolute, abject poverty at the bottom (Jacksonville).  Boom.  That actually sort of makes sense, right?  Pick: Houston.  Because we all know they weren’t really 2-14 bad.  

West

I’ve got nothing left.  Wait.  I do.  The AFC west is like China.  You have the super rich (Denver), a developing middle class (Kansas City and San Diego) and a terrible lower class (Oakland) that is poor, but not as poor as India’s poorest (Jacksonville).  Pick: Denver.  Because (if I was from ESPN) PEYTON! But really because they are the best TEAM in the AFC West.  

NFC

East

Alright.  I guess I’m committed to this now.  The NFC East is….Denmark.  One of the most socialist countries in Europe with one of the lowest poverty rates of all OECD nations according to Wikipedia.  Philadelphia finishes 10-6, everyone else finishes 8-8.  And I’m sure Dallas will have a chance to win and get in in their last game of the season and Romo will throw an interception to lose for his team because everyone loves that narrative. (I’m not sure how exactly that is like Denmark, but it is.) Pick: Philadelphia.  Because someone has to win.  

North

Let’s go ahead and say the NFC North is like Brazil.  4 fan bases that couldn’t possibly be sadder when their team let’s them down.  Just like Brazil and the World Cup.  (This is officially off the rails)   Pick: Green Bay.  Because they are publicly owned.  

South

The NFC South is like…France?  Because there are a lot of teams that want to make the play-offs but there just aren’t enough spots (NFC South Teams:French Youth::NFL Play-off spots:French Jobs). And New Orleans.  Pick: New Orleans.   Because they are the most French.

West

The NFC West is clearly Russia.  I can’t do this anymore.  Pick: San Francisco.  Slightly easier schedule than Seattle.  

So who’s gonna make the play-offs?

Pretty boring in the AFC.  I’m picking the exact same 6 teams to make the play-offs as I did last year (New England, Denver, Houston, Baltimore, Pittsburgh, Cincinnati).  (What’s the definition of insanity again? Doing the same thing….)  In fact, the only thing I switched in the AFC was the seeds of the 5 and 6 teams (I swapped Pittsburgh and Cincinnati). In the NFC, I’m obviously picking San Francisco and Seattle to make the play-offs (as is just about everyone else) and one of these teams will be the 1 seed, with the edge to San Francisco who plays the (slightly) easier schedule.  Along with those two, I’m taking New Orleans, Green Bay, and Philadelphia to win their divisions.  And finally, as the 6 seed in the NFC I’m taking….(drumroll please)….Detroit!  I have them squeaking in at 9-7. Finally, I’m taking San Francisco over New Orleans in the NFC title game and New England over Denver in the AFC.  This leads to a Patriots vs 49ers Super Bowl with the Patriots prevailing 24-23.  (I know.  Boring right?  Same pick as last year.)

Ranks

Retro is based only on games played in 2013 and heavily weighs strength of schedule.  New Orleans is ranked number 1 based on this as they finished 11-5 and played 10 teams who finished 8-8 or better.

Prosp is based on 4 years of data weighted fro recency.  It’s based on expected points with New England ranked number 1 in this measure.

Both Retro and Prosp are framed in terms of the probability of defeating and average team.

predMargin column is the average margin of victory if each team played every other team at home and away.  Denver is ranked number 1 in this measure due to their ability and willingness to score a ton of points.

 
Team Retro Prosp PredMargin
New England 70 79 4.52
Seattle 68 70 3.66
San Francisco 58 69 3.72
Denver 64 68 4.56
New Orleans 71 67 3.17
Philadelphia 44 62 1.45
Carolina 61 61 1.58
Green Bay 46 61 2.93
Chicago 50 56 1.11
Houston 29 56 0.89
Washington 33 55 -0.49
Atlanta 40 54 0.56
Detroit 48 53 1.20
Kansas City 50 53 -0.47
San Diego 52 52 0.27
Cincinnati 66 51 0.95
Buffalo 47 47 -1.93
Baltimore 56 46 0.53
Indianapolis 58 45 -1.23
Pittsburgh 51 43 0.23
Dallas 54 43 0.19
St. Louis 51 43 -2.43
Tampa Bay 40 42 -1.86
Minnesota 49 41 -1.78
NY Giants 45 41 -0.32
Oakland 37 39 -3.47
NY Jets 54 38 -2.25
Arizona 58 38 -1.74
Tennessee 41 36 -2.32
Miami 44 32 -1.79
Cleveland 39 31 -2.92
Jacksonville 26 24 -6.52

Playoff Probabilities

 
Team WinDivison MakePlayoffs MakeSuperBowl WinSuperBowl
Arizona 0.18 1.10 0.06 0.00
Atlanta 10.40 23.60 2.28 0.98
Baltimore 36.54 63.96 6.16 2.10
Buffalo 0.14 6.00 0.28 0.10
Carolina 18.88 37.98 4.52 1.98
Chicago 14.42 29.64 3.14 1.48
Cincinnati 29.70 58.72 5.78 2.26
Cleveland 1.12 4.32 0.30 0.00
Dallas 20.62 28.48 3.16 0.98
Denver 95.04 99.38 28.80 15.20
Detroit 20.32 38.96 5.08 2.06
Green Bay 64.70 78.06 15.68 6.92
Houston 86.98 92.04 14.92 6.32
Indianapolis 8.34 21.46 1.38 0.48
Jacksonville 0.00 0.00 0.00 0.00
Kansas City 1.10 22.58 1.16 0.48
Miami 0.32 10.10 0.70 0.16
Minnesota 0.56 1.76 0.08 0.08
New England 99.42 99.78 29.82 19.54
New Orleans 70.26 82.94 17.06 9.36
NY Giants 15.38 21.46 2.20 0.70
NY Jets 0.12 3.32 0.26 0.08
Oakland 0.00 0.68 0.02 0.00
Philadelphia 48.68 57.48 8.76 4.54
Pittsburgh 32.64 61.18 6.26 1.84
San Diego 3.86 43.02 3.20 1.26
San Francisco 51.00 88.36 18.70 10.36
Seattle 48.76 86.70 16.76 9.36
St. Louis 0.06 0.50 0.02 0.00
Tampa Bay 0.46 1.16 0.06 0.02
Tennessee 4.68 13.46 0.96 0.24
Washington 15.32 21.82 2.44 1.12

Projected Records

Team – (Median wins) expected wins

AFC East

New England – (13-3) 13.044

Miami – (6-10) 6.344

Buffalo – (6-10) 5.905

NY Jets (5-11) 5.329

AFC North

Baltimore (9-7) 9.192

Pittsburgh – (9-7) 9.129

Cincinnati – (9-7) 9.041

Cleveland – (5-11) 5.328

AFC South

Houston (11-5) 10.679

Indianapolis – (7-9) 7.114

Tennessee (7-9) 6.623

Jacksonville (2-14) 2.234

AFC West

Denver – (13-3) 12.636

San Diego – (8-8) 8.334

Kansas City – (7-9) 7.369

Oakland – (4-12) 4.213

NFC East

Philadelphia (10-6) 9.579

Dallas (8-8) 8.256

NY Giants (8-8) 7.801

Washington (8-8) 7.75

NFC North

Green Bay (11-5) 10.659

Detroit (9-7) 9.095

Chicago (9-7) 8.505

Minnesota (5-11) 5.352

NFC South

New Orleans (11-5) 10.990

Carolina (9-7) 8.909

Atlanta (8-8) 8.227

Tampa Bay (5-11) 5.227

NFC West

San Francisco (12-4) 11.594

Seattle (11-5) 11.449

Arizona (5-11) 5.312

St. Louis (5-11) 4.781

Projected Playoffs

AFC

1. New England

2. Denver

3. Houston

4. Baltimore

5. Pittsburgh

6. Cincinnati

NFC

1. San Francisco

2. New Orleans

3. Green Bay

4. Philadelphia

5. Seattle

6. Detroit

Projected Wild Card Round

AFC

Houston beats Cincinnati  22-20

Baltimore beats Pittsburgh 22-20

NFC

Green Bay beats Detroit 27-23

Seattle beats Philadelphia 23-22

Projected Divisional Round

AFC

New England beats Baltimore 28-22

Denver beats Houston 27-21

NFC

San Francisco beats Seattle 21-19

New Orleans beats Green Bay 27-24

Projected Conference Round

AFC

New England beats Denver 29-27

NFC

San Francisco beats New Orleans 24-22

Super Bowl

New England beats San Francisco 24-23

Season Long Bets

Win Totals

Arizona Under 7.5 +120

Cleveland Under 6.5 +115

Denver Over 11.5 +105

Houston Over 7.5 +125

Indianapolis Under 9.5 +110

Jacksonville Under 4.5 +145

NY Jets Under 7 +105

Kansas City Under 8.5 +145

Miami Under 7.5 -130

New England Over 10.5 -210

St. Louis Under 7.5 +135

Tampa Bay Under 7 +125

Win Division

Philadelphia +130

Green Bay -130

New England -320

Denver -300

Make Playoffs

AFC

Cincinnati +150

Baltimore +220

San Diego +300

Houston +280

NFC

Seattle -300

San Francisco -250

New Orleans -140

Carolina +235

Detroit +270

Miss Playoffs

Indianapolis -140

Crazy Long Shot Super Bowl Match-Up

Carolina vs Houston +70000

 

Player tracking and snake oil: sports sessions at JSM 2014

statsbylopez's avatarStatsbyLopez

Two of the most interesting sessions – at least as judged by twitter interest – at the 2014 Joint Statistical Meetings in Boston were the player tracking session (abstracts here) and the hockey analytics panel (here). Here’s a brief summary, with some relevant tweets.

A) First, some links

Here’s the .mp3 of our hockey analytics talk – we apologize about the sound quality!

Here are my slides on hockey’s point system

Here are Luke Bornn‘s slides from the player tracking session.

Here are Sam Ventura’s slides on quantifying defensive ability in hockey.

Lastly, click here for Andrew’s post, which links to talks from Michael Schuckers and Kevin Mongeon.

B) Eye in the Sky: The Player Tracking Revolution in Sports Analytics

View original post 595 more words

Postscript to the 2014 Joint Statistical Meetings

statsbylopez's avatarStatsbyLopez

As in 2013, I had a great experience at the 2014 Joint Statistical Meetings (JSM), held August 3-7 in Boston.  I made it to about 20 sessions, and while that sounds like plenty, it still means I missed about 640 other ones!

Here are some of my take home points from the last few days – I’ll post on the sports analytics panels I attended some time next week.

     1. The teaching of statistics is changing, and changing rapidly.

Among the most popular sessions that I attended were a pair (one, two) sponsored by the Section in Statistics Education. While these talks really interesting, I was more amazed at how many statistics teachers appeared willing to enter unfamiliar territory by shunning the curriculum’s that they’ve taught for the past several decades. For example, I heard several comments like these ones, from what appeared to be current…

View original post 738 more words

What I wish I had learned in my graduate program in statistics

Last part of our 4 part series about statistics and graduate school.

statsbylopez's avatarStatsbyLopez

(Note: This is the fourth and final part in a series about graduate life in statistics, co-written by Mike and Greg. For links to all articles in the series, click here).

There are a lot of things you don’t learn in graduate school that you probably should have.

Here’s our perspective, using our programs (Mike @ Brown + UMass, Greg @ UConn and WPI) and the experiences of our peers in other programs as baselines.

  1. Data science

Not many statistics programs cover this kind of thing in their curriculum, but It’s a pretty important skill.  In our view, in modern statistics, if you can’t collect or organize data you’re essentially useless.  In the “real world”, no one (or very rarely) is someone going to give you a rectangular data file with no missing data and say “Analyze this.”

As a result, two elements of data…

View original post 855 more words

Part III: What I wish I had known when I started a graduate program in statistics

statsbylopez's avatarStatsbyLopez

(Note: This is the third in a series about graduate life in statistics, co-written by Mike and Greg. For links to all articles in the series, click here).

1. You’re on your own

Sure, you are going to take classes that are taught by professors, but you are the one responsible for learning the material.  If you have a great professor, that’s wonderful, as it will probably be a lot easier to grasp the material and to do well on exams.  If the professor is terrible, however, you still need to learn the material. And in college, you could learn that material, take a C, forget it and never think about that stuff again.  In grad school, however, you are still responsible for that material, and in many cases its going to show up on your qualifying exams and/or general exam.  (Shhhh: once you…

View original post 1,183 more words

Part II: Thriving in a graduate program in statistics

statsbylopez's avatarStatsbyLopez

(Note: This is the second in a series about graduate life in statistics. For links to all articles in the series, click here).

Here are the best pieces of advice that I can give someone currently involved in a biostatistics or statistics graduate program.

1- Know your interests, and exploit them

Did you sit through a martingale theory lecture and talk excitedly to the professor afterwards? Or, instead, did you start to think you that you would have been better off as an actuary?

These type of gut feelings are useful when it comes to one of the most stressful periods of a graduate student’s career – picking a research topic. My best advice is to (i) find the type of methods papers in statistics journals that you actually enjoy reading, (ii) find a faculty member capable of leading a thesis or dissertation in this…

View original post 1,032 more words

Part I: Deciding on a graduate program in statistics

statsbylopez's avatarStatsbyLopez

(Note: This is first in a three part series about graduate life in statistics. For links to all articles in the series, click here).

Here are some key points to consider when choosing graduate programs in statistics and biostatistics.

1. What’s the difference between biostatistics & statistics?

When I first applied for masters programs in statistics, I had little to no idea what biostatistics was. To the untrained eye – in my case, a liberal arts undergraduate student – the subject biostatistics gave off a connotation aligned with phylums and petri dishes, things I had been hoping to avoid since roughly 10th grade.

Ironically, however, biostatistics is not the intersection of statistics and biology; instead, biostatistics is mostly just statistics applied to fields within or related to public health. In four years of a biostatistics program, for example, I didn’t take a single biology course…

View original post 1,294 more words

My experience with grad school in statistics

In honor of @StatsByLopez‘s upcoming series “So you want a graduate degree in staitstics”, which I will be contributing too, I’ve written a brief history of my graduate school experience.

When I was almost finished with my undergraduate degree at WPI, I got to do a senior project about ranking athletes and sports teams.  It was my first exposure to logistic regression (I had very little understanding of what was actually going on).  But I absolutely fell in love with the project.  Which led me to fall in love with statistics.

I was on pace to finish my bachelor’s degree in 3 years (not a big deal), but I wanted to do something to postpone the real world for at least another year (as any rational 20 year old would do.)  So I applied to graduate school in applied statistics at WPI.  I was equal parts really interested in statistics, really interested in not getting a job yet, and really unprepared for graduate school.

I struggled through 2 years of mathematical statistics, bayesian analysis, linear regression, etc. And I graduated with a less than impressive GPA, but the important part is that I graduated.  Towards the end of my time at WPI remember having a conversation with my advisor and I told him that I wanted to go on and do a Ph.D.  I assume he thought I was nuts because I hadn’t exactly dominated my way through the program.  But he never told me I shouldn’t go.  He did tell me that I didn’t need a Ph.D. to work in industry.  (Which is solid advice.)

I moved on and worked for 2 years in a direct marketing department of a major catalog company building predictive models.  When I first started working their I was really excited to build these predictive models.  I thought it was so cool (and I still think it’s cool) that you can take data from the past to help you better predict the future.  So I asked where the data was.  My boss told me it was here.  And there.  And over here.  And also over there, but you had to modify that before you used it.  And a lot of it was missing.  I thought to myself “Where is the rectangular file with no missing data?  I want to build models.” Ahh young Gregory you were so cute.  I spent much of my time cleaning and organizing the data, and relatively little actually building the models.  But you absolutely need to understand the modeling pieces to do the cleaning and organizing well.  Other wise you don’t really know or understand what data you (might) need.

After about a year I had had enough and wanted to go back to school for a Ph.D. in statistics.  I wanted to teach statistics and have more control over the type of work that I was doing.  I applied to several programs and told myself that I wasn’t going to go unless I got funding.  I got into 2 schools right away, but neither was willing to commit to funding.  I was pretty disappointed.  But at the last moment UConn came through with full funding for me.  I was in.  Go Huskies?

So after two years in the “real world” I went back to cocoon of academia.  I also went back to being broke.  Not college broke.  But like regular adult broke.  (I probably took a 50% pay cut going back to grad school).

I was 25 when I returned to grad school.  And let me tell you, 25 is a lot different than 21.  For instance, I never skipped a class in grad school at UConn to go to a fraternity event.  School is a totally different experience after you’ve worked a full time, 40 hour a week job.  You should treat grad school like this (except it’s probably 60 hours a week).  After 3 semesters, I passed my qualifying exam, and I finished all of my exams and classes in 3 years.  In total Uconn took four years to finish since I was doing research from day 1 (Expect more like 5 or 6 years (or 7 or 8) if you come in without a master’s degree).  I graduated and did a post-doc at UMass in genetics, and just recently hit the academic job market lottery and landed a position at Loyola University Chicago.  I can’t wait to start.

While this post has been mostly a bio of my experience, my piece on StatsByLopez will contain more of my thoughts on what grad school was like for me and what advice I would give to someone else in grad school for statistics.

Cheers.