Brown/Warren, Polls, Sampling, and Confidence Intervals

I was reading an article on Huffington Post about the Massachusetts Senate election, and they link to an article that cites a poll conducted by Western New England University’s (nee College) Polling Institute.  I was interested in this because I grew up near the college, and I have never heard of this polling institute before.  So, I decided to take a look.  I was reading their survey and got to the end where it had a description of the methodology.  I was casually reading it and some things jumped out at me.  I have one question and one comment.

First the question:

They state in their methodology:

The Polling Institute dialed household telephone numbers, known as “landline numbers,” and cell phone numbers for the survey. In order to draw a representative sample from the landline numbers, interviewers first asked for the youngest male age 18 or older who was home at the time of the call, and if no adult male was present, the youngest female age 18 or older who was at home at the time of the call.

This seems to me like it will bias the sample as they are much more likely to be taking a sample of men than women.  They do note that “The landline and cell phone data were combined and weighted to reflect the adult population of Massachusetts by gender, race, age, and county of residence using U.S. Census estimates for Massachusetts”, but then why ask for the youngest male over 18?  Is this a valid method?  It seems that in the final results they have a nearly even split of men vs women, but it seems to me that using this method your going to get a sample that is biased toward younger, male voters.  Can someone explain to me why this is or is not valid?  I really don’t know, but it seems odd to me.

And now the comment:

In the next paragraph, they state:

All surveys are subject to sampling error, which is the expected probable difference between interviewing everyone in a population versus a scientific sampling drawn from that population. The sampling error for a sample of 444 likely voters is +/- 4.6 percent at a 95 percent confidence interval. Thus if 55 percent of likely voters said they approved of the job that Scott Brown is doing as U.S. Senator, one would be 95 percent sure that the true figure would be between 50.4 percent and 59.6 percent (55 percent +/- 4.6 percent) had all Massachusetts voters been interviewed, rather than just a sample. The margin of sampling error for the sample of 545 registered voters is +/- 4.2 percent at a 95 percent confidence interval. Sampling error increases as the sample size decreases, so statements based on various population subgroups are subject to more error than are statements based on the total sample. Sampling error does not take into account other sources of variation inherent in public opinion studies, such as non-response, question wording, or context effects.

This is simply an incorrect explanation of a confidence interval (which I’ve actually written about before a long time ago when I first started this blog).  In frequentist statistics there is this true value that you are trying to estimate that is assumed to be fixed and also unknown (hence you are trying to estimate it). A sample of data is then collected to try to estimate the unknown quantity and a confidence interval can be constructed.  However, the probability that the true figure is in this confidence interval is either 0 or 1 since there is nothing stochastic about the true value that is being estimated.  This interpretation will lose you points on a statistics test.  So, I don’t know what they mean by being “95% sure” here.  The true interpretation is that 95% of similarly constructed intervals will contain the true value.  This is a different statement than being 95% sure the true value is between the upper and lower limits of the one confidence interval you have constructed from your one sample.  Imagine that you conducted this survey with exactly the same N many times.  Each time you will come up with a different estimate of the true figure and a different confidence interval.  If you examined all of theses confidence intervals together, 95% of them would contain the true value of the parameter that is being estimated.  This is a pretty common misinterpretation of the meaning of a confidence interval and it took me quite a long time to understand the difference, but what concerns me here is that this isn’t an intro stats course, it’s a polling institute.

Cheers.

Going for 2

Update: Apparently, it seems, I am completely wrong about this, and that it is well established that a team should go for two earlier.  Via Twitter, @bdoc87 points to the book “Mathletics” where they suggest going to two in this situation as early as the second quarter.  There is also this chart, which says that you should go for two at any point in the second half if you are down 9 after scoring a touchdown.

Here is an article by Chase Stuart from www.footballperspective.com with the title “Trailing by 15 in the middle of the 4th quarter, teams are foolish to not go for 2 after touchdowns”.   They are arguing, as stated in the title of the article, that if a team is down by 15 in the middle of the final quarter and they score a touchdown (cutting the lead to 9), that they should go for two in an attempt to cut the lead to 7 rather than take the extra point and cut the lead to 8.  This is a fair argument and the author may well be correct, but they seem to offer absolutely no evidence that this is the correct decision.  For instance, the author states:

If you are going to convert the 2-point attempt, it doesn’t matter all that much whether you go for it early or late. If you’re going to miss it, going for it earlier significantly improves your odds of pulling off a miraculous comeback, precisely because you’re [sic] got almost no chance if you miss it late.

The author first notes that it doesn’t really matter whether you go for the 2 points early or late if you are going to convert it.  Sure fine, I’ll agree with that.  But then in the next sentence argues that “going for it earlier significantly improves your odds of pulling off a miraculous comeback”.  Is this true?  It may well be, but I see nothing in the article that even remotely supports this point.  It seems to just be stated as fact with no supporting evidence.  I  am gonna need more proof than this.  Ideally, one would look to collect actual data on this, and try to compare the two decisions.  However, it seems like football coaches almost always go for one, so a simulation study may be better here.  Make some assumptions, develop a model for a football game, and simulate this scenario say 10000 times going for the extra point and 10000 times going for 2.  Then you can estimate the probabilities of a win and say a team will win X percent of the time going for 1 and Y percent of the time going to 2.  I suspect there probably really isn’t much of a difference at all, but I have no evidence for or against this point.  It’s simply my opinion.  It looks to me like this entire article is stating a hypothesis (going for two is better than going for one), and that it is the authors opinion that going for two is better than going for one.  However, they seem to offer no evidence at all in support of their claims.  Although maybe I am missing something.

Go Falcons!

Cheers.

NCAA Football Rankings – 9/17/2012

NCAA Football Rankings after week 3 as of 9/17/2012 at 6:13am

 

Rank

Team

Record

1 TEXAS  3-0
2 TEXAS TECH  3-0
3 KANSAS STATE  3-0
4 ALABAMA  3-0
5 LSU  3-0
6 IOWA STATE  3-0
7 ARIZONA  3-0
8 STANFORD  3-0
9 OKLAHOMA  2-0
10 TCU  2-0
11 WEST VIRGINIA  2-0
12 BAYLOR  2-0
13 GEORGIA  3-0
14 SO CAROLINA  3-0
15 NOTRE DAME  3-0
16 OREGON  3-0
17 FLORIDA  3-0
18 OKLAHOMA STATE  2-1
19 UCLA  3-0
20 USC  2-1
21 MISS STATE  3-0
22 OREGON STATE  1-0
23 OHIO STATE  3-0
24 ARIZONA STATE  2-1
25 NORTHWESTERN  3-0
26 FLORIDA STATE  3-0
27 CLEMSON  3-0
28 TEXAS A&M  1-1
29 NEBRASKA  2-1
30 LOUISVILLE  3-0
31 CINCINNATI  2-0
32 MICHIGAN  2-1
33 MISSOURI  2-1
34 KANSAS  1-2
35 GEORGIA TECH  2-1
36 WASHINGTON  2-1
37 TENNESSEE  2-1
38 MINNESOTA  3-0
39 OLE MISS  2-1
40 BYU  2-1
41 MICHIGAN STATE  2-1
42 UTAH  2-1
43 RUTGERS  3-0
44 PURDUE  2-1
45 MARYLAND  2-1
46 WISCONSIN  2-1
47 NC STATE  2-1
48 IOWA  2-1
49 CONNECTICUT  2-1
50 WASHINGTON ST  2-1
51 KENTUCKY  1-2
52 MIAMI-FLORIDA  2-1
53 LOUISIANA TECH  2-0
54 INDIANA  2-1
55 FRESNO STATE  2-1
56 SAN JOSE ST  2-1
57 PITTSBURGH  1-2
58 CALIFORNIA  1-2
59 ILLINOIS  2-1
60 MIDDLE TENNESSEE  2-1
61 OHIO U  3-0
62 AUBURN  1-2
63 UTAH STATE  2-1
64 PENN STATE  1-2
65 ARKANSAS  1-2
66 SYRACUSE  1-2
67 DUKE  2-1
68 SO FLORIDA  2-1
69 NEVADA  2-1
70 VANDERBILT  1-2
71 VIRGINIA TECH  2-1
72 SAN DIEGO ST  2-1
73 BOISE STATE  1-1
74 TOLEDO  2-1
75 UCF  2-1
76 TEMPLE  1-1
77 TX-SAN ANTONIO  3-0
78 VIRGINIA  2-1
79 TULSA  2-1
80 BOSTON COLLEGE  1-2
81 TEXAS STATE  1-1
82 WAKE FOREST  2-1
83 NORTH CAROLINA  1-2
84 ULM  1-1
85 COLORADO  0-3
86 BALL STATE  2-1
87 LOUISIANA  2-1
88 COLORADO STATE  1-2
89 WESTERN KY  2-1
90 WYOMING  0-3
91 UNLV  0-3
92 HAWAII  1-1
93 NEW MEXICO  1-2
94 AIR FORCE  1-1
95 TEXAS-EL PASO  1-2
96 NORTH TEXAS  1-2
97 BUFFALO  1-1
98 TROY  1-2
99 NORTHERN ILL  2-1
100 EAST CAROLINA  2-1
101 ARKANSAS STATE  1-2
102 IDAHO  0-3
103 BOWLING GREEN  1-2
104 SOUTH ALABAMA  1-2
105 FLA ATLANTIC  1-2
106 MARSHALL  1-2
107 RICE  1-2
108 FIU  1-2
109 NEW MEXICO ST  1-2
110 SMU  1-2
111 WESTERN MICH  1-2
112 NAVY  0-2
113 KENT STATE  1-1
114 ARMY  0-2
115 CENTRAL MICH  1-1
116 SOUTHERN MISS  0-2
117 UAB  0-2
118 EASTERN MICH  0-3
119 MEMPHIS  0-3
120 TULANE  0-2
121 HOUSTON  0-3
122 MIAMI-OHIO  1-2
123 AKRON  1-2
124 MASSACHUSETTS  0-3

Cheers.

NFL Week 2 Predictions

Week 2 (11-5 SU, 9-7 ATS, 8-8 O/U)

Thursday @8:20pm

Chicago Bears at Green Bay Packers

Prediction: Packers wins 26-21

Pick: Packers -5

O/U: Under 50.5

Sunday @1pm

Tampa Bay Buccaneers at New York Giants

Prediction: Giants 24-21

Pick: Buccaneers +7.5

O/U: Over 43.5

Arizona Cardinals at New England Patriots

Prediction: Patriots wins 28-20

Pick: Cardinals +13.5

O/U: Under 48

Minnesota Vikings at Indianapolis Colts

Prediction: Colts win 23-22

Pick: Colts +1.5

O/U: Over 44

New Orleans Saints at Carolina Panthers

Prediction: Saints win 29-20

Pick: Saints -2.5

O/U: Under 50.5

Kansas City Chiefs at Buffalo Bills

Prediction: Bills win 21-20

Pick: Chiefs +3.5

O/U: Under 45

Baltimore Ravens at Philadelphia Eagles

Prediction: Ravens win 25-24

Pick: Ravens +2.5

O/U: Over 46

Oakland Raiders at Miami Dolphins

Prediction: Dolphins win 21-18

Pick: Dolphins +2.5

O/U: Over 37.5

Cleveland Browns at Cincinnati Bengals

Prediction: Cincinnati wins 18-16

Pick: Browns +9.5

O/U: Under 38.5

Houston Texans at Jacksonville Jaguars

Prediction: Texans win 25-18

Pick: Jaguars +7.5

O/U: Over 41.5

Sunday @4:05pm

Dallas Cowboys at Seattle Seahawks

Prediction: Cowboys win 24-18

Pick: Dallas -3.5

O/U: Over 41.5

Sunday @4:25pm

Washington Redskins at St. Louis Rams

Prediction: Redskins win 18-16

Pick: Rams +3.5

O/U: Under 45.5

New York Jets at Pittsburgh Steelers

Prediction:  Steelers win 23-22

Pick: Jets +6.5

O/U: Over 41.5

Tennessee Titans at San Diego Chargers

Prediction: Chargers win 26-21

Pick: Titans +5.5

O/U: Over 44.5

 Sunday @8:25

Detroit Lions at San Francisco 49ers

Prediction: 49ers win 22-21

Pick: Lions +6.5

O/U: Under 46.5

Monday @8:35pm

Denver Broncos at Atlanta Falcons

Prediction: Falcons win 25-20

Pick: Falcons -3.5

O/U: Under 51

MLB Ranking – 9/12/2012

StatsInTheWild MLB rankings as of September 11, 2012 at 12:18pm.  SOS=strength of schedule

Team Rank Change Record ESPN TeamRankings.com SOS Run Diff
Texas 1 83-57 2 1 11 +116
NYY 2 79-61 8 3 5 +92
Tampa Bay 3 77-63 9 5 7 +78
Washington 4 ↑1 87-54 1 4 25 +132
Oakland 5 ↓1 80-60 7 2 8 +83
LA Angels 6 77-64 10 6 6 +65
Atlanta 7 ↑3 81-61 4 8 19 +82
Chi WSox 8 76-64 11 10 14 +62
Baltimore 9 ↑2 78-62 5 7 3 -22
Cincinnati 10 ↓1 85-57 3 9 30 +82
Detroit 11 ↓4 73-67 12 11 13 +32
SF 12 ↑1 79-62 6 12 26 +43
St. Louis 13 ↓1 75-66 13 15 29 +43
Seattle 14 67-74 19 13 2 -20
Toronto 15 ↑2 64-75 22 14 1 -42
Boston 16 ↓1 63-78 23 18 4 -18
LA Dodgers 17 ↓1 74-67 14 17 23 +17
Arizona 18 69-72 18 20 24 +26
Philadelphia 19 ↑2 70-71 16 16 20 -8
Pittsburgh 20 ↓1 72-68 15 21 27 0
Milwaukee 21 ↑1
70-71 17 22 28 +33
Kansas City 22 ↑1 63-77 24 19 12 -45
NY Mets 23 ↓3 65-76 21 24 15 -46
San Diego 24 67-75 20 23 22 -38
Minnesota 25 59-82 26 25 10 -101
Miami 26 63-79 25 26 16 -94
Cleveland 27 59-82 28 27 9 -172
Colorado 28 57-83 27 28 17 -96
Chi Cubs 29 55-86 29 29 21 -121
Houston 30 44-97 30 30 18 -204

Past Rankings:

8/20/2012

8/14/2012

8/6/2012

7/23/2012

7/9/2012

7/2/2012

6/25/2012

6/19/2012

6/9/2012

5/28/2012

5/23/2012

5/14/2012

5/7/2012

4/30/2012

4/23/2012

4/16/2012

4/13/2012

Cheers.

MLB Playoff Probabilities – 9/12/2012

StatsInTheWild MLB rankings as of September 11, 2012 at 12:18pm.  SOS=strength of schedule

Team Rank Change Record Projected Record Prob of making playoffs SOS Run Diff
Texas 1 83-57 94-68 99.4% 11 +116
NYY 2 79-61 90-72 87.4% 5 +92
Tampa Bay 3 77-63 88-74 46.4% 7 +78
Washington 4 ↑1 87-54 98-64 99.9% 25 +132
Oakland 5 ↓1 80-60 90-72 74.7% 8 +83
LA Angels 6 77-64 86-76 25.0% 6 +65
Atlanta 7 ↑3 81-61 91-71 99.6% 19 +82
Chi WSox 8 76-64 86-76 82.1% 14 +62
Baltimore 9 ↑2 78-62 89-73 71.7% 3 -22
Cincinnati 10 ↓1 85-57 95-67 99.9% 30 +82
Detroit 11 ↓4 73-67 84-78 19.7% 13 +32
SF 12 ↑1 79-62 89-73 98.6% 26 +43
St. Louis 13 ↓1 75-66 84-78 47.7% 29 +43
Seattle 14 67-74 75-87 0% 2 -20
Toronto 15 ↑2 64-75 74-88 0% 1 -42
Boston 16 ↓1 63-78 72-90 0% 4 -18
LA Dodgers 17 ↓1 74-67 83-79 25.0% 23 +17
Arizona 18 69-72 78-84 0.5% 24 +26
Philadelphia 19 ↑2 70-71 80-82 2.2% 20 -8
Pittsburgh 20 ↓1 72-68 82-80 24.9% 27 0
Milwaukee 21 ↑1
70-71 79-83 1.5% 28 +33
Kansas City 22 ↑1 63-77 74-88 0% 12 -45
NY Mets 23 ↓3 65-76 74-88 0% 15 -46
San Diego 24 67-75 75-87 0% 22 -38
Minnesota 25 59-82 68-94 0% 10 -101
Miami 26 63-79 71-91 0% 16 -94
Cleveland 27 59-82 68-94 0% 9 -172
Colorado 28 57-83 67-95 0% 17 -96
Chi Cubs 29 55-86 64-98 0% 21 -121
Houston 30 44-97 52-110 0% 18 -204

Cheers.

MLB Payroll vs Winning percentage

Dave Cameron over at FanGraphs wrote an interesting article about 2012 payroll and wins.  In it, he used a scatterplot, which I assume was made with excel.  I’d like to try to persuade everyone to stop making graphics in excel.  I’m probably a little bit biased, but R with the ggplot2 package is much, much better.  (And it’s easy!) I present to you below, my entire argument for why R with ggplot2 is better than excel:

Here is the code for making this graph

Cheers.

Statistics

To most people, statistics means plugging numbers into an advanced calculator that spits out values, without much thought involved. Those people don’t work with data.

-Nathan Yau

 

Cheres.

Infiltrated by liberals

My favorite critique of my article from Shark974:

Translation: Deadspin is shit and biased and wants to ban the NFL. Nothing more.

Sorry but you know it’s true. Studies these days are worth the paper they’re printed on, having been infiltrated by liberals (probably a few forests have been sacrificed printing fake global warming studies by liberals).

Also, you go through all that trouble to “prove”, drum roll please, baseball players are “no more likely” to die than FB players.

Hmm, given the notion out there that FB is a deathsport or something, I’d call that alone a big win for the NFL. The way Deadspin chose to spin and headline this article says a lot about their obvious bias. I suspect the author votes for Democrats, as well (liberals are much more likely to be both illogical and extremely biased in conducting studies).

But dont worry, we’ve got liberals on the case, I am sure plenty of fake lies, I mean, scientific studies, are coming soon that prove playing in the NFL rapes your dog and gives you cancer, to be shouted from the rooftops everywhere by the liberal media. Again, look to the global warming precedent.

The more I read this, the more I appreciate what a work of are it really is.

Cheers.

Death Rates: A cautionary note from 1937

The crude death rate, for well known reasons, is not a good measure, because it is quite seriously affected by differences in age composition.  Standardized death rates, on the other hand, have the disadvantage that they depend on an arbitrarily selected standard population.

–Dublin, L.I. and Lotka, A.J. “Use of the Life Table in Vital Statistics.” American Journal of Public Health.  Vol. 27, May, 1937.

Cheers.