Stats in the Wild

ENAR (in the wild)

I’m currently attending the 2011 ENAR spring meeting in Miami. I arrived Sunday night an presented a poster at the opening poster presentation session. On Monday, I attended two sessions in the afternoon: the survival analysis section and, later, the policy section.

In the policy section, I saw a presentation entitled “Issues in the use of survival analysis to estimate damages in equal employment cases” by Qing Pan and Jooseph L. Gaswirth, which has been published in the journal Law, Probability, and Risk in the March 2009 issue. The presentation was two-fold: First they presented some basic methods for determining whether or not discrimination had taken place. In this case (age discrimination), it was fairly evident that the infraction has occurred. Second, the authors presented how to assess the compensation that should be awarded to the parties which had been discriminated against. In order to do so, they applied survival analysis techniques to estimate how long someone would have worked at the company if they had been employed. Very interesting stuff.

Along the same legal lines, I happened to pick up a book called “A Very Short Introduction to Statistics” by David J. Hand. While I was flipping through it I came across a section about a woman named Sally Clark. She was a woman who had two children, both of whom died within the first 11 weeks of their respective lives. Subsequently, she was charged with murder as it seemed suspicious that TWO of her children had both died so young. During the trial, Professor Sir Roy Meadow, (famous for proposing the theory of Munchausen Syndrome by Proxy (MSbP)) claimed that the chances of two of her babies dying in this fashion totally by chance was 73,000,000:1. At those odds, I suppose you would have to convict the person. However, his method for arriving at this number was flawed. The Royal Statistics Society issued a statement that began “In the recent highly-publicised case of R v. Sally Clark, a medical expert witness drew on published studies to obtain a figure for the frequency of sudden infant death syndrome (SIDS, or “cot death”) in families having some of the characteristics of the defendant’s family. He went on to square this figure to obtain a value of 1 in 73 million for the frequency of two cases of SIDS in such a family.” (Read the whole statement here.) The way I feel about this can be summed up by some comments my friend (a lawyer) made when I emailed him about this case: “That’s wild that that happened in 1999. I figured it would be like
1899.”

So anyway, I am now sitting in my hotel room at the Leamington (students can’t afford the Hyatt). I’ll leave you with a picture of the hotel I am staying at. I can’t wait to get a job.

Cheers.

Posted in Uncategorized

Leave a comment

Watson (in the wild)

Feb 17

Posted by statsinthewild

Watson, an IBM computer, recently competed on Jeopardy, demoralizing his opponents Ken Jennings and Brad Rutter. (Jennings describes what it was like to lose the Watson here.) Really it was an uneven playing field from the start. For example, “Watson’s brain showcases several IBM technologies. The hardware is jammed into 10 refrigerator-sized racks filled with Power7 server blades. To be exact, there are 90 Power750 servers filled with four processors each — and each processor has 8 cores, for a total of 2,880 cores altogether.” (InformaWorld) That’s a lot of cores for our tiny human brains to compete against. Therefore, I am calling for an all computer Jeopardy tournament. Similar to the netflix prize, different organization could compete in a tournament pitting their algorithms against other organizations’ algorithms with the winner collecting a large cash prize and valuable free publicity and marketing. Just imagine IBM’s Watson competing against Google’s BrinPage and Microsoft’s Gates. That would be pretty intense.

Cheers.

Posted in Sports

Leave a comment

NFL Week 15 (in the wild)

Dec 15

Posted by statsinthewild

14 week of the NFL Season are gone. 3 weeks remain.

There have been a few big changes in the projected playoff seeds. In the AFC, the Jets fall to a projected 6 seed and the Ravens take over the 5 seed. Also, Jacksonville and Kansas City change places with Jacksonville projected to be the three seed with Kansas City the 4 seed.
In the NFC, Green Bay is out and the Giants are in as the six seed.

Here are the updated SITW projected playoff seeds:
View the full rankings here.

Week 15 Playoff projections (through week 14):
Playoff Projections:
AFC Projected seeds (Expected Wins) [Probability they make playoffs]{Ranking}:
1. New England (13.6973)[1]{1}
2. Pittsburgh (12.5784) [.9999]{3}
3. Jacksonville (9.9711) [.8196]{13}
4. Kansas City (9.7525) [.6738]{16}
5. Baltimore (11.3688) [.9992]{4}
6. New York Jets (10.837) [.9088]{5}

NFC Projected seeds:
1. Atlanta (13.6973) [1]{2}
2. Philadelphia (11.2133) [.9151]{8}
3. Chicago (10.6212)[.9047]{10}
4. St. Louis (7.3868) [.5744]{22}
5. New Orleans (11.2239) [.9038]{6}
6. New York Giants (10.6128) [.6983]{12}

Cheers.

Posted in Sports

Leave a comment

NFL Week 14 (in the wild)

Dec 10

Posted by statsinthewild

The interesting thing about these playoff projections are that Philadelphia is projected to be the 2 seed in the NFC over Chicago even though Chicago has 1 more win than them. Take a look at the rest of the Eagle’s schedule and then look at the Bear’s remaining schedule and it becomes very clear why these are the projections.

You can view the full rankings here.

Week 14 Playoff projections:
Playoff Projections:
AFC Projected seeds (Expected Wins) [Probability they make playoffs]{Ranking}:
1. New England (13.6874)[1]{2}
2. Pittsburgh (12.4096) [1]{3}
3. Kansas City (10.425) [.843]{15}
4. Jacksonville (9.518) [.7124]{14}
5. New York Jets (11.8408) [.9998]{4}
6. Baltimore (11.2332) [.9982]{5}

NFC Projected seeds:
1. Atlanta (13.6874) [1]{1}
2. Philadelphia (10.9962) [.8438]{9}
3. Chicago (10.587)[.8484]{10}
4. St. Louis (7.4758) [.477]{23}
5. New Orleans (11.091) [.8424]{6}
6. Green Bay (10.3086) [.7648]{7}

My Rankings: (Wins)
Teams Wins
1 Atlanta (10)
2 NewEngland (10)
3 Pittsburgh (9)
4 NYJets (9)
5 Baltimore (8)
6 NewOrleans (9)
7 GreenBay (8)
8 TampaBay (7)
9 Philadelphia (8)
10 Chicago (9)
11 Miami (6)
12 Cleveland (5)
13 NYGiants (8)
14 Jacksonville (7)
15 KansasCity (8)
16 Tennessee (5)
17 Indianapolis (6)
18 Minnesota (5)
19 SanDiego (6)
20 Oakland (6)
21 Washington (5)
22 Seattle (6)
23 StLouis (6)
24 Houston (5)
25 Dallas (4)
26 SanFrancisco (4)
27 Denver (3)
28 Arizona (3)
29 Cincinnati (2)
30 Buffalo (2)
31 Detroit (2)
32 Carolina (1)

Cheers.

Posted in Sports

Leave a comment

NFL Week 13 (in the wild)

Nov 30

Posted by statsinthewild

First off, what a joke the NFC West is.
Although, I am excited about rooting for a 6 win team to make the playoffs and get a first round home playoff game. Nice work NFL.

A thought about being the 2 seed: I think in a year like this there may be some advantage to being the 2 seed rather than the 1 seed.

Let’s look at the AFC. Let’s assume that my playoff projections hold true. The wild card winners seeds will either be (3,4), (3,5), (4,6), or (5,6). After the first round of playoffs, the one seed plays the lowest remaining seed. This has to be the 4, 5, or 6 seed. So they are going to have to play Indianapolis, New York Jets or Pittsburgh. I would argue that all three of these teams are better than the Kansas City Chiefs. The two seed has to play either the 3, 4, or 5 seed. They are guaranteed to not have to play the Steelers, and they avoid the Jets in all scenarios except where both the Jets and Steelers win. This looks to me like an easier path to the AFC championship game.

Week 12 Playoff projections:
Playoff Projections:
AFC Projected seeds (Expected Wins) [Probability they win the Super Bowl]{Ranking}:
1. New England (12.92)[.199]{2}
2. Baltimore (11.90) [.12]{5}
3. Kansas City (10.05) [.0008]{15}
4. Indianapolis (8.88) [.0004]{14}
5. New York Jets (12.45) [.1292]{3}
6. Pittsburgh (11.77) [.1162]{4}

NFC Projected seeds:
1. Atlanta (13.45) [.349]{1}
2. Philadelphia (10.86) [.0208]{10}
3. Chicago (10.49) [.0114]{11}
4. Seattle (7.29) [0]{22}
5. New Orleans (10.86) [.0254]{6}
6. Green Bay (10.15) [.0138]{8}

My Rankings: (Wins)
1 Atlanta (9)
2 NewEngland (9)
3 NYJets (9)
4 Pittsburgh (8)
5 Baltimore (8)
6 NewOrleans (8)
7 TampaBay (7)
8 GreenBay (7)
9 Miami (6)
10 Philadelphia (7)
11 Chicago (8)
12 Tennessee (5)
13 NYGiants (7)
14 Indianapolis (6)
15 KansasCity (7)
16 Cleveland (4)
17 SanDiego (6)
18 Jacksonville (6)
19 Washington (5)
20 Minnesota (4)
21 Oakland (5)
22 Seattle (5)
23 Houston (5)
24 StLouis (5)
25 SanFrancisco (4)
26 Denver (3)
27 Dallas (3)
28 Arizona (3)
29 Buffalo (2)
30 Cincinnati (2)
31 Detroit (2)
32 Carolina (1)

Cheers.

Posted in Sports

Leave a comment

NFL Week 12 (in the wild)

Nov 25

Posted by statsinthewild

Playoff Picture:

AFC
1. NewEngland
2. Pittsburgh
3. Indianapolis
4. Kansas City
5. NYJets
6. Baltimore

NFC
1. Atlanta
2. Philadelphia
3. GreenBay
4. Seattle
5. NewOrleans
6. TampaBay

Estimated Probabilities of making the playoffs/conference champion/super bowl champion:

AFC.Playoff.Teams

Baltimore Cleveland Denver Houston Indianapolis

0.9706 0.0002 0.0046 0.0024 0.6040

Jacksonville KansasCity Miami NewEngland NYJets

0.3584 0.6874 0.0282 0.9950 0.9840

Oakland Pittsburgh SanDiego Tennessee
0.0830 0.9568 0.2334 0.0920

NFC.Playoff.Teams

Arizona Atlanta Chicago GreenBay NewOrleans

0.1446 0.9996 0.5944 0.8642 0.7922

NYGiants Philadelphia SanFrancisco Seattle StLouis

0.1460 0.9740 0.0332 0.6892 0.1338

TampaBay Washington
0.5752 0.0536

AFC.Champion

Baltimore Indianapolis Jacksonville KansasCity Miami

0.1668 0.0182 0.0046 0.0054 0.0004

NewEngland NYJets Pittsburgh SanDiego Tennessee

0.3490 0.2510 0.2018 0.0012 0.0016

NFC.Champion

Arizona Atlanta Chicago GreenBay NewOrleans

0.0002 0.5206 0.0246 0.0926 0.0704

NYGiants Philadelphia SanFrancisco Seattle StLouis

0.0042 0.2354 0.0002 0.0062 0.0004
TampaBay Washington
0.0434 0.0018

SB.Champion

Atlanta Baltimore Chicago GreenBay Indianapolis

0.2730 0.0854 0.0046 0.0308 0.0048

Jacksonville KansasCity Miami NewEngland NewOrleans

0.0010 0.0006 0.0004 0.2220 0.0214

NYGiants NYJets Philadelphia Pittsburgh SanDiego
0.0012 0.1452 0.0832 0.1116 0.0002

Seattle TampaBay Tennessee Washington
0.0014 0.0124 0.0006 0.0002

Cheers.

Posted in Sports

Leave a comment

NFL Simulations (in the wild)

Nov 15

Posted by statsinthewild

Here are the StatsInTheWild rankings of the NFL teams after Week 10.
Rank Teams Wins
1 NewEngland 7
2 Atlanta 7
3 NYJets 7
4 Pittsburgh 6
5 Baltimore 6
6 Miami 5
7 Philadelphia 6
8 NewOrleans 6
9 TampaBay 6
10 GreenBay 6
11 Indianapolis 6
12 Tennessee 5
13 Cleveland 3
14 NYGiants 6
15 Chicago 6
16 Jacksonville 5
17 KansasCity 5
18 Oakland 5
19 Seattle 5
20 SanDiego 4
21 Houston 4
22 Washington 4
23 Denver 3
24 Minnesota 3
25 Cincinnati 2
26 Arizona 3
27 StLouis 4
28 SanFrancisco 3
29 Dallas 2
30 Detroit 2
31 Buffalo 1
32 Carolina 1

The two teams that pop out are the Giants and the Browns. The Browns are 3-6 with wins over the Bengals, Saints, and Patriots. Their losses are to Tampa Bay, Kansas City, Baltimore, Atlanta, Pittsburgh, and the Jets. Those teams average 6.33 wins each and all hold at least a share of the lead in their respective division.

The Giants have losses to Indianapolis, Tennessee, and Dallas. They have beaten Carolina, Chicago, Houston, Detroit, Dallas, and Seattle. The six teams they have beaten average only 3.33 wins. A vast difference in schedule.

I also wrote some code to simulate the rest of the NFL season based on what has already happened. So I used the games that have already occurred as data for a model (a very simply model). This model predicts the probability of a win for a given team. Then the rest of the season is simulated 5000 times based on the estimated probabilities of winning a game.
Here are the results of that:

Probability that a team wins its division:
Division Winners:
AFCEast
Miami NewEngland NYJets
0.0080 0.6004 0.3916

AFC North
Baltimore Cleveland Pittsburgh
0.4712 0.0004 0.5284

AFC South
Houston Indianapolis Jacksonville Tennessee
0.0020 0.6158 0.1058 0.2764

AFC West
Denver KansasCity Oakland SanDiego
0.0450 0.5646 0.2138 0.1766

NFC East
NYGiants Philadelphia Washington
0.1850 0.8118 0.0032

NFC North
Chicago GreenBay Minnesota
0.2066 0.7886 0.0048

NFC South
Atlanta NewOrleans TampaBay
0.9150 0.0432 0.0418

NFC West
Arizona SanFrancisco Seattle StLouis
0.1726 0.0374 0.6958 0.0942

Conference Champions:
AFC
Baltimore Cleveland Indianapolis Jacksonville KansasCity Miami
0.1524 0.0002 0.0378 0.0010 0.0008 0.0078
NewEngland NYJets Oakland Pittsburgh Tennessee
0.3418 0.1844 0.0138 0.2586 0.0014

NFC
Atlanta Chicago GreenBay NewOrleans NYGiants Philadelphia
0.5882 0.0280 0.0484 0.0556 0.0084 0.2106
Seattle TampaBay
0.0156 0.0452

Super Bowl Champion:

Atlanta Baltimore Chicago GreenBay Indianapolis KansasCity
0.3112 0.0862 0.0040 0.0126 0.0094 0.0002
Miami NewEngland NewOrleans NYGiants NYJets Oakland
0.0032 0.2154 0.0154 0.0016 0.1068 0.0022
Philadelphia Pittsburgh Seattle TampaBay Tennessee
0.0640 0.1522 0.0016 0.0134 0.0006

So, right now SITW is predicting a Atlanta Falcons – New England Patriots Superbowl with Atlanta winning.

Cheers.

Posted in Uncategorized

Leave a comment

TrueSkill Ranking System (in the wild)

Oct 15

Posted by statsinthewild

So I’m in a poker league. We play three out of four Thursday nights in a month for a total of fifteen events in a season. Each event you earn a certain number of points. You’re 10 best finishes based on points out of the fifteen events are counted. After the grueling fifteen week season, there is a finale where you start with an amount of chips proportional to the number of points you earned during the season. Anyway, another player (Shaun) and I took over the scoring for the league this season. Our scoring system is pretty basic (50 points for showing up, 50 points for everyone you beat, 600-300-150-75 bonus for cashing (finishing top 4)). On top of this I’ve devised a fairly reasonable ranking system separate from the points based on your average finish and how many events you have played. One criticism of my ranking system is that I’m not accounting for the strength of the field, I’m just looking at average percentile of finish.

So I usually car pool to and from events with Shaun and we’ve been talking about ranking systems. Last night he mentioned that he had been doing some research on how Xbox live does their rankings. Each player has some level of ability and an uncertainty associated with their skill level. After each game players rankings, skill level and uncertainty, are updated. He described it a little bit more, and I mentioned that it sounded Bayesian to me. Turns out it is!

Here is an introduction to the TureSkill ranking system and here is a more detailed description. For those of you who are interested in all the details, here is the paper “TrueSkill(TM): A Bayesian Skill Rating System” where they propose the system.

Cheers.

P.S. A big SITW congratulations to James T. O’Connor of Belchertown, MA for passing the CT bar exam.

P.P.S. I finished second in the regular season last year, but won the finale. This year I was briefly in first place until last night. I am now in second place by 150 points with 4 events to play.

Posted in Uncategorized

2 Comments

Early Sports Statistics (in the wild)

Sep 13

Posted by statsinthewild

This was forwarded to me by S.J. It’s quite a long journey from the stuff in this video to some of the present day statistics like spatial aggregate fielding evaluation (S.A.F.E).

“[Recorded: circa 1959] – “The Electronic Coach” is a short film made by IBM describing the use of computers in the management of a university basketball team. The film features computer science legend Don Knuth, then a senior at Case Institute of Technology. For all four of his undergraduate years at Case (1956-60), Knuth was manager of the basketball team and sought ways to improve his team’s play by analyzing a series of special statistics he captured during games. The scoring method was unusual in the weightings it gave to activities not necessarily associated with traditional coaching but Knuth’s insights into basketball, combined with his computerization of the reams of data he collected, helped Case’s coaching staff make their basketball team a winner. The computer used is an IBM 650. ”

Cheers.

Posted in Sports

Leave a comment

Lottery Odds (in the wild)

Aug 16

Posted by statsinthewild

I’m teaching my first class this fall, and I’ve been preparing my notes for class this past week. I wanted to use keno as an example of how to compute probabilities. So I was computing some probabilities and checking them against the posted “odds” on masslottery.com. I couldn’t get my computed odds to match with what the lottery had posted, which led to a brief period of panic that I wasn’t qualified to be teaching this class. Turns out, I’m not computing anything wrong. It’s just that what the lottery is calling “odds” are actually probabilities. Take a look again at masslottery.com and look at the posted odds for a one spot game. For the one spot game they say that the odds are 1:4. This is incorrect. The probability of winning this one spot game is $\frac{1}{4}=.25$ , which would make the odds of winning $\frac{.25}{.75}=1:3$ . Likewise, the odds against winning are 3:1. Generally, if the probability of an event is p, the odds of this even occuring are $\frac{p}{(1-p)}$ .

So what the lottery is referring to as odds are actually probabilities of winning. They actually get this correct that the bottom where they say “Probability of winning a prize in this game = 1:4.00”. The mistake is that they aren’t making any distinction between the probability of winning and the odds of winning when, in fact, these are different.

Cheers.

Posted in Uncategorized

Leave a comment

Stats in the Wild

ENAR (in the wild)

Watson (in the wild)

NFL Week 15 (in the wild)

NFL Week 14 (in the wild)

NFL Week 13 (in the wild)

NFL Week 12 (in the wild)

NFL Simulations (in the wild)

TrueSkill Ranking System (in the wild)

Early Sports Statistics (in the wild)

Lottery Odds (in the wild)

Blogroll

Comedy

Data Art

Data Viz

Jobs

R

Tag Cloud