Brown/Warren, Polls, Sampling, and Confidence Intervals

I was reading an article on Huffington Post about the Massachusetts Senate election, and they link to an article that cites a poll conducted by Western New England University’s (nee College) Polling Institute. I was interested in this because I grew up near the college, and I have never heard of this polling institute before. So, I decided to take a look. I was reading their survey and got to the end where it had a description of the methodology. I was casually reading it and some things jumped out at me. I have one question and one comment.

First the question:

They state in their methodology:

The Polling Institute dialed household telephone numbers, known as “landline numbers,” and cell phone numbers for the survey. In order to draw a representative sample from the landline numbers, interviewers first asked for the youngest male age 18 or older who was home at the time of the call, and if no adult male was present, the youngest female age 18 or older who was at home at the time of the call.

This seems to me like it will bias the sample as they are much more likely to be taking a sample of men than women. They do note that “The landline and cell phone data were combined and weighted to reflect the adult population of Massachusetts by gender, race, age, and county of residence using U.S. Census estimates for Massachusetts”, but then why ask for the youngest male over 18? Is this a valid method? It seems that in the final results they have a nearly even split of men vs women, but it seems to me that using this method your going to get a sample that is biased toward younger, male voters. Can someone explain to me why this is or is not valid? I really don’t know, but it seems odd to me.

And now the comment:

In the next paragraph, they state:

All surveys are subject to sampling error, which is the expected probable difference between interviewing everyone in a population versus a scientific sampling drawn from that population. The sampling error for a sample of 444 likely voters is +/- 4.6 percent at a 95 percent confidence interval. Thus if 55 percent of likely voters said they approved of the job that Scott Brown is doing as U.S. Senator, one would be 95 percent sure that the true figure would be between 50.4 percent and 59.6 percent (55 percent +/- 4.6 percent) had all Massachusetts voters been interviewed, rather than just a sample. The margin of sampling error for the sample of 545 registered voters is +/- 4.2 percent at a 95 percent confidence interval. Sampling error increases as the sample size decreases, so statements based on various population subgroups are subject to more error than are statements based on the total sample. Sampling error does not take into account other sources of variation inherent in public opinion studies, such as non-response, question wording, or context effects.

This is simply an incorrect explanation of a confidence interval (which I’ve actually written about before a long time ago when I first started this blog). In frequentist statistics there is this true value that you are trying to estimate that is assumed to be fixed and also unknown (hence you are trying to estimate it). A sample of data is then collected to try to estimate the unknown quantity and a confidence interval can be constructed. However, the probability that the true figure is in this confidence interval is either 0 or 1 since there is nothing stochastic about the true value that is being estimated. This interpretation will lose you points on a statistics test. So, I don’t know what they mean by being “95% sure” here. The true interpretation is that 95% of similarly constructed intervals will contain the true value. This is a different statement than being 95% sure the true value is between the upper and lower limits of the one confidence interval you have constructed from your one sample. Imagine that you conducted this survey with exactly the same N many times. Each time you will come up with a different estimate of the true figure and a different confidence interval. If you examined all of theses confidence intervals together, 95% of them would contain the true value of the parameter that is being estimated. This is a pretty common misinterpretation of the meaning of a confidence interval and it took me quite a long time to understand the difference, but what concerns me here is that this isn’t an intro stats course, it’s a polling institute.

Cheers.

Posted in Politics, Sampling

Leave a comment

Going for 2

Sep 17

Posted by statsinthewild

Update: Apparently, it seems, I am completely wrong about this, and that it is well established that a team should go for two earlier. Via Twitter, @bdoc87 points to the book “Mathletics” where they suggest going to two in this situation as early as the second quarter. There is also this chart, which says that you should go for two at any point in the second half if you are down 9 after scoring a touchdown.

Here is an article by Chase Stuart from www.footballperspective.com with the title “Trailing by 15 in the middle of the 4th quarter, teams are foolish to not go for 2 after touchdowns”. They are arguing, as stated in the title of the article, that if a team is down by 15 in the middle of the final quarter and they score a touchdown (cutting the lead to 9), that they should go for two in an attempt to cut the lead to 7 rather than take the extra point and cut the lead to 8. This is a fair argument and the author may well be correct, but they seem to offer absolutely no evidence that this is the correct decision. For instance, the author states:

If you are going to convert the 2-point attempt, it doesn’t matter all that much whether you go for it early or late. If you’re going to miss it, going for it earlier significantly improves your odds of pulling off a miraculous comeback, precisely because you’re [sic] got almost no chance if you miss it late.

The author first notes that it doesn’t really matter whether you go for the 2 points early or late if you are going to convert it. Sure fine, I’ll agree with that. But then in the next sentence argues that “going for it earlier significantly improves your odds of pulling off a miraculous comeback”. Is this true? It may well be, but I see nothing in the article that even remotely supports this point. It seems to just be stated as fact with no supporting evidence. I am gonna need more proof than this. Ideally, one would look to collect actual data on this, and try to compare the two decisions. However, it seems like football coaches almost always go for one, so a simulation study may be better here. Make some assumptions, develop a model for a football game, and simulate this scenario say 10000 times going for the extra point and 10000 times going for 2. Then you can estimate the probabilities of a win and say a team will win X percent of the time going for 1 and Y percent of the time going to 2. I suspect there probably really isn’t much of a difference at all, but I have no evidence for or against this point. It’s simply my opinion. It looks to me like this entire article is stating a hypothesis (going for two is better than going for one), and that it is the authors opinion that going for two is better than going for one. However, they seem to offer no evidence at all in support of their claims. Although maybe I am missing something.

Go Falcons!

Cheers.

Posted in Football, NFL, Sports

Leave a comment

NCAA Football Rankings – 9/17/2012

Sep 17

Posted by statsinthewild

NCAA Football Rankings after week 3 as of 9/17/2012 at 6:13am


Rank	Team	Record
1	TEXAS	3-0
2	TEXAS TECH	3-0
3	KANSAS STATE	3-0
4	ALABAMA	3-0
5	LSU	3-0
6	IOWA STATE	3-0
7	ARIZONA	3-0
8	STANFORD	3-0
9	OKLAHOMA	2-0
10	TCU	2-0
11	WEST VIRGINIA	2-0
12	BAYLOR	2-0
13	GEORGIA	3-0
14	SO CAROLINA	3-0
15	NOTRE DAME	3-0
16	OREGON	3-0
17	FLORIDA	3-0
18	OKLAHOMA STATE	2-1
19	UCLA	3-0
20	USC	2-1
21	MISS STATE	3-0
22	OREGON STATE	1-0
23	OHIO STATE	3-0
24	ARIZONA STATE	2-1
25	NORTHWESTERN	3-0
26	FLORIDA STATE	3-0
27	CLEMSON	3-0
28	TEXAS A&M	1-1
29	NEBRASKA	2-1
30	LOUISVILLE	3-0
31	CINCINNATI	2-0
32	MICHIGAN	2-1
33	MISSOURI	2-1
34	KANSAS	1-2
35	GEORGIA TECH	2-1
36	WASHINGTON	2-1
37	TENNESSEE	2-1
38	MINNESOTA	3-0
39	OLE MISS	2-1
40	BYU	2-1
41	MICHIGAN STATE	2-1
42	UTAH	2-1
43	RUTGERS	3-0
44	PURDUE	2-1
45	MARYLAND	2-1
46	WISCONSIN	2-1
47	NC STATE	2-1
48	IOWA	2-1
49	CONNECTICUT	2-1
50	WASHINGTON ST	2-1
51	KENTUCKY	1-2
52	MIAMI-FLORIDA	2-1
53	LOUISIANA TECH	2-0
54	INDIANA	2-1
55	FRESNO STATE	2-1
56	SAN JOSE ST	2-1
57	PITTSBURGH	1-2
58	CALIFORNIA	1-2
59	ILLINOIS	2-1
60	MIDDLE TENNESSEE	2-1
61	OHIO U	3-0
62	AUBURN	1-2
63	UTAH STATE	2-1
64	PENN STATE	1-2
65	ARKANSAS	1-2
66	SYRACUSE	1-2
67	DUKE	2-1
68	SO FLORIDA	2-1
69	NEVADA	2-1
70	VANDERBILT	1-2
71	VIRGINIA TECH	2-1
72	SAN DIEGO ST	2-1
73	BOISE STATE	1-1
74	TOLEDO	2-1
75	UCF	2-1
76	TEMPLE	1-1
77	TX-SAN ANTONIO	3-0
78	VIRGINIA	2-1
79	TULSA	2-1
80	BOSTON COLLEGE	1-2
81	TEXAS STATE	1-1
82	WAKE FOREST	2-1
83	NORTH CAROLINA	1-2
84	ULM	1-1
85	COLORADO	0-3
86	BALL STATE	2-1
87	LOUISIANA	2-1
88	COLORADO STATE	1-2
89	WESTERN KY	2-1
90	WYOMING	0-3
91	UNLV	0-3
92	HAWAII	1-1
93	NEW MEXICO	1-2
94	AIR FORCE	1-1
95	TEXAS-EL PASO	1-2
96	NORTH TEXAS	1-2
97	BUFFALO	1-1
98	TROY	1-2
99	NORTHERN ILL	2-1
100	EAST CAROLINA	2-1
101	ARKANSAS STATE	1-2
102	IDAHO	0-3
103	BOWLING GREEN	1-2
104	SOUTH ALABAMA	1-2
105	FLA ATLANTIC	1-2
106	MARSHALL	1-2
107	RICE	1-2
108	FIU	1-2
109	NEW MEXICO ST	1-2
110	SMU	1-2
111	WESTERN MICH	1-2
112	NAVY	0-2
113	KENT STATE	1-1
114	ARMY	0-2
115	CENTRAL MICH	1-1
116	SOUTHERN MISS	0-2
117	UAB	0-2
118	EASTERN MICH	0-3
119	MEMPHIS	0-3
120	TULANE	0-2
121	HOUSTON	0-3
122	MIAMI-OHIO	1-2
123	AKRON	1-2
124	MASSACHUSETTS	0-3

Cheers.

Posted in Football, NCAA, NCAA College Football, Sports

Leave a comment

NFL Week 2 Predictions

Sep 13

Posted by statsinthewild

Week 2 (11-5 SU, 9-7 ATS, 8-8 O/U)

Thursday @8:20pm

Chicago Bears at Green Bay Packers

Prediction: Packers wins 26-21

Pick: Packers -5

O/U: Under 50.5

Sunday @1pm

Tampa Bay Buccaneers at New York Giants

Prediction: Giants 24-21

Pick: Buccaneers +7.5

O/U: Over 43.5

Arizona Cardinals at New England Patriots

Prediction: Patriots wins 28-20

Pick: Cardinals +13.5

O/U: Under 48

Minnesota Vikings at Indianapolis Colts

Prediction: Colts win 23-22

Pick: Colts +1.5

O/U: Over 44

New Orleans Saints at Carolina Panthers

Prediction: Saints win 29-20

Pick: Saints -2.5

O/U: Under 50.5

Kansas City Chiefs at Buffalo Bills

Prediction: Bills win 21-20

Pick: Chiefs +3.5

O/U: Under 45

Baltimore Ravens at Philadelphia Eagles

Prediction: Ravens win 25-24

Pick: Ravens +2.5

O/U: Over 46

Oakland Raiders at Miami Dolphins

Prediction: Dolphins win 21-18

Pick: Dolphins +2.5

O/U: Over 37.5

Cleveland Browns at Cincinnati Bengals

Prediction: Cincinnati wins 18-16

Pick: Browns +9.5

O/U: Under 38.5

Houston Texans at Jacksonville Jaguars

Prediction: Texans win 25-18

Pick: Jaguars +7.5

O/U: Over 41.5

Sunday @4:05pm

Dallas Cowboys at Seattle Seahawks

Prediction: Cowboys win 24-18

Pick: Dallas -3.5

O/U: Over 41.5

Sunday @4:25pm

Washington Redskins at St. Louis Rams

Prediction: Redskins win 18-16

Pick: Rams +3.5

O/U: Under 45.5

New York Jets at Pittsburgh Steelers

Prediction: Steelers win 23-22

Pick: Jets +6.5

O/U: Over 41.5

Tennessee Titans at San Diego Chargers

Prediction: Chargers win 26-21

Pick: Titans +5.5

O/U: Over 44.5

Sunday @8:25

Detroit Lions at San Francisco 49ers

Prediction: 49ers win 22-21

Pick: Lions +6.5

O/U: Under 46.5

Monday @8:35pm

Denver Broncos at Atlanta Falcons

Prediction: Falcons win 25-20

Pick: Falcons -3.5

O/U: Under 51

Posted in Football, NFL, Sports

Leave a comment

MLB Ranking – 9/12/2012

Sep 12

Posted by statsinthewild

StatsInTheWild MLB rankings as of September 11, 2012 at 12:18pm. SOS=strength of schedule

Team	Rank	Change	Record	ESPN	TeamRankings.com	SOS	Run Diff
Texas	1	–	83-57	2	1	11	+116
NYY	2	–	79-61	8	3	5	+92
Tampa Bay	3	–	77-63	9	5	7	+78
Washington	4	↑1	87-54	1	4	25	+132
Oakland	5	↓1	80-60	7	2	8	+83
LA Angels	6	–	77-64	10	6	6	+65
Atlanta	7	↑3	81-61	4	8	19	+82
Chi WSox	8	–	76-64	11	10	14	+62
Baltimore	9	↑2	78-62	5	7	3	-22
Cincinnati	10	↓1	85-57	3	9	30	+82
Detroit	11	↓4	73-67	12	11	13	+32
SF	12	↑1	79-62	6	12	26	+43
St. Louis	13	↓1	75-66	13	15	29	+43
Seattle	14	–	67-74	19	13	2	-20
Toronto	15	↑2	64-75	22	14	1	-42
Boston	16	↓1	63-78	23	18	4	-18
LA Dodgers	17	↓1	74-67	14	17	23	+17
Arizona	18	–	69-72	18	20	24	+26
Philadelphia	19	↑2	70-71	16	16	20	-8
Pittsburgh	20	↓1	72-68	15	21	27	0
Milwaukee	21	↑1	70-71	17	22	28	+33
Kansas City	22	↑1	63-77	24	19	12	-45
NY Mets	23	↓3	65-76	21	24	15	-46
San Diego	24	–	67-75	20	23	22	-38
Minnesota	25	–	59-82	26	25	10	-101
Miami	26	–	63-79	25	26	16	-94
Cleveland	27	–	59-82	28	27	9	-172
Colorado	28	–	57-83	27	28	17	-96
Chi Cubs	29	–	55-86	29	29	21	-121
Houston	30	–	44-97	30	30	18	-204

Past Rankings:

Cheers.

Posted in Baseball, Sports

Leave a comment

MLB Playoff Probabilities – 9/12/2012

Sep 12

Posted by statsinthewild

StatsInTheWild MLB rankings as of September 11, 2012 at 12:18pm. SOS=strength of schedule

Team	Rank	Change	Record	Projected Record	Prob of making playoffs	SOS	Run Diff
Texas	1	–	83-57	94-68	99.4%	11	+116
NYY	2	–	79-61	90-72	87.4%	5	+92
Tampa Bay	3	–	77-63	88-74	46.4%	7	+78
Washington	4	↑1	87-54	98-64	99.9%	25	+132
Oakland	5	↓1	80-60	90-72	74.7%	8	+83
LA Angels	6	–	77-64	86-76	25.0%	6	+65
Atlanta	7	↑3	81-61	91-71	99.6%	19	+82
Chi WSox	8	–	76-64	86-76	82.1%	14	+62
Baltimore	9	↑2	78-62	89-73	71.7%	3	-22
Cincinnati	10	↓1	85-57	95-67	99.9%	30	+82
Detroit	11	↓4	73-67	84-78	19.7%	13	+32
SF	12	↑1	79-62	89-73	98.6%	26	+43
St. Louis	13	↓1	75-66	84-78	47.7%	29	+43
Seattle	14	–	67-74	75-87	0%	2	-20
Toronto	15	↑2	64-75	74-88	0%	1	-42
Boston	16	↓1	63-78	72-90	0%	4	-18
LA Dodgers	17	↓1	74-67	83-79	25.0%	23	+17
Arizona	18	–	69-72	78-84	0.5%	24	+26
Philadelphia	19	↑2	70-71	80-82	2.2%	20	-8
Pittsburgh	20	↓1	72-68	82-80	24.9%	27	0
Milwaukee	21	↑1	70-71	79-83	1.5%	28	+33
Kansas City	22	↑1	63-77	74-88	0%	12	-45
NY Mets	23	↓3	65-76	74-88	0%	15	-46
San Diego	24	–	67-75	75-87	0%	22	-38
Minnesota	25	–	59-82	68-94	0%	10	-101
Miami	26	–	63-79	71-91	0%	16	-94
Cleveland	27	–	59-82	68-94	0%	9	-172
Colorado	28	–	57-83	67-95	0%	17	-96
Chi Cubs	29	–	55-86	64-98	0%	21	-121
Houston	30	–	44-97	52-110	0%	18	-204

Cheers.

Posted in Baseball, Sports

Leave a comment

MLB Payroll vs Winning percentage

Sep 11

Posted by statsinthewild

Dave Cameron over at FanGraphs wrote an interesting article about 2012 payroll and wins. In it, he used a scatterplot, which I assume was made with excel. I’d like to try to persuade everyone to stop making graphics in excel. I’m probably a little bit biased, but R with the ggplot2 package is much, much better. (And it’s easy!) I present to you below, my entire argument for why R with ggplot2 is better than excel:

Here is the code for making this graph

Cheers.

Posted in Baseball, R, Sports

4 Comments

Statistics

Sep 10

Posted by statsinthewild

To most people, statistics means plugging numbers into an advanced calculator that spits out values, without much thought involved. Those people don’t work with data.

-Nathan Yau

Cheres.

Posted in Uncategorized

1 Comment

Infiltrated by liberals

Sep 6

Posted by statsinthewild

My favorite critique of my article from Shark974:

Translation: Deadspin is shit and biased and wants to ban the NFL. Nothing more.

Sorry but you know it’s true. Studies these days are worth the paper they’re printed on, having been infiltrated by liberals (probably a few forests have been sacrificed printing fake global warming studies by liberals).

Also, you go through all that trouble to “prove”, drum roll please, baseball players are “no more likely” to die than FB players.

Hmm, given the notion out there that FB is a deathsport or something, I’d call that alone a big win for the NFL. The way Deadspin chose to spin and headline this article says a lot about their obvious bias. I suspect the author votes for Democrats, as well (liberals are much more likely to be both illogical and extremely biased in conducting studies).

But dont worry, we’ve got liberals on the case, I am sure plenty of fake lies, I mean, scientific studies, are coming soon that prove playing in the NFL rapes your dog and gives you cancer, to be shouted from the rooftops everywhere by the liberal media. Again, look to the global warming precedent.

The more I read this, the more I appreciate what a work of are it really is.

Cheers.

Posted in Uncategorized

2 Comments

Death Rates: A cautionary note from 1937

Sep 6

Posted by statsinthewild

The crude death rate, for well known reasons, is not a good measure, because it is quite seriously affected by differences in age composition. Standardized death rates, on the other hand, have the disadvantage that they depend on an arbitrarily selected standard population.

–Dublin, L.I. and Lotka, A.J. “Use of the Life Table in Vital Statistics.” American Journal of Public Health. Vol. 27, May, 1937.

Cheers.

Posted in Uncategorized

Leave a comment

Stats in the Wild

Rank

Team

Record

Week 2 (11-5 SU, 9-7 ATS, 8-8 O/U)

Thursday @8:20pm

Chicago Bears at Green Bay Packers

Sunday @1pm

Tampa Bay Buccaneers at New York Giants

Arizona Cardinals at New England Patriots

Minnesota Vikings at Indianapolis Colts

New Orleans Saints at Carolina Panthers

Kansas City Chiefs at Buffalo Bills

Baltimore Ravens at Philadelphia Eagles

Oakland Raiders at Miami Dolphins

Cleveland Browns at Cincinnati Bengals

Houston Texans at Jacksonville Jaguars

Sunday @4:05pm

Dallas Cowboys at Seattle Seahawks

Sunday @4:25pm

Washington Redskins at St. Louis Rams

New York Jets at Pittsburgh Steelers

Tennessee Titans at San Diego Chargers

Sunday @8:25

Detroit Lions at San Francisco 49ers

Monday @8:35pm

Denver Broncos at Atlanta Falcons

Cheers.

Blogroll

Comedy

Data Art

Data Viz

Jobs

R

Tag Cloud