A friend’s thoughts on the NFL
Really at the tipping point with the NFL you guys. It’s a complete blood sport played by barbarians with fake rules that protect no one and make the game less interesting. Never mind the media coverage which treats every story like a reading of The War of the Worlds.- Mike Christman
Well when you put it like that, I guess I’ll have to stop watching the NFL. On the other hand, SPORTS!
Cheers.
Using colorized PNG pictograms in R base plots
Awesome.
Today I stumbled across a figure in an explanation on multiple factor analysis which contained pictograms.
Figure 1 from Abdi & Valentin (2007), p. 8.
I wanted to reproduce a similar figure in R using pictograms and additionally color them e.g. by group membership . I have almost no knowledge about image processing, so I tried out several methods of how to achieve what I want. The first thing I did was read in an PNG file and look at the data structure. The package png allows to read in PNG files. Note that all of the below may not work on Windows machines, as it does not support semi-transparency (see ?readPNG).
View original post 411 more words
NFL Picks – Week 2
Total – SU: 20-12-0 ATS: 18-14-0 O/U: 23-9-0
Week 1 – SU: 10-6-0 ATS: 8-8-0 O/U: 13-3-0
Week 2 – SU: 10-6-0 ATS: 10-6-0 O/U: 10-6-0
Pittsburgh at Baltimore
Prediction: Ravens 23-21 (56.3%)
Pick: Steelers +2.5 (50.8%)
Total: Under 44.5
Miami at Buffalo
Prediction: Bills 22-20 (55.0%)
Pick: Bills +1 (57.8%)
Total: Under 43
Detroit at Carolina
Prediction: Panthers 24-22 (55.1%)
Pick: Lions +2.5 (52.0%)
Total: Over 44
Atlanta at Cincinnati
Prediction: Bengals 25-22 (56.4%)
Pick: Falcons +5.5 (57.8%)
Total: Under 48.5
New Orleans at Cleveland
Prediction: Saints 26-23 (59.3%)
Pick: Browns +6.5 (59.1%)
Total: Over 48
New England at Minnesota
Prediction: Patriots 26-23 (58.8%)
Pick: Vikings +3.5 (51.1%)
Total: Over 49
Arizona at NY Giants
Prediction: Giants 22-19 (57.6%)
Pick: Giants +2 (63.1%)
Total: Under 43.5
Dallas at Tennessee
Prediction: Titans 23-22 (51.7%)
Pick: Cowboys +3.5 (58.3%)
Total: Under 49
Jacksonville at Washington
Prediction: Washington Football Team 25-18 (71.1%)
Pick: Washington Football Team -5.5 (56.5%)
Total: Under 43.5
Seattle at San Diego
Prediction: Seahawks 23-21 (55.4%)
Pick: Chargers +6 (61.5%)
Total: Under 45
St. Louis at Tampa Bay
Prediction: Buccaneers 22-19 (58.3%)
Pick: Rams +5.5 (57.2%)
Total: Over 37
Kansas City at Denver
Prediction: Broncos 27-19 (70.9%)
Pick: Chiefs +13.5 ( 66.1%)
Total: Under 52
NY Jets at Green Bay
Prediction: Packers 25-18 (66.9%)
Pick: NY Jets +8.5 (56.8%)
Total: Under 46
Houston at Oakland
Prediction: Texans 22-20 (57.0%)
Pick: Oakland +3 (51.5%)
Total: Over 39.5
Chicago at San Francisco
Prediction: 49ers 24-19 (63.0%)
Pick: Bears +7 (56.7%)
Total: Under 48.5
Philadelphia at Indianapolis
Prediction: Eagles 24-23 (51.5%)
Pick: Eagles +3 (57.0%)
Total: Under 54
The B1G had a bad Saturday. How bad was it?
The conference formally known as the Big 10 seemingly had one of the worst weekends it could imagine, with most of its football teams either losing on the national stage or struggling against in contests versus perceived lower level opponents.
So how bad was the B1G’s day?
To start, the 13 teams playing (Indiana was off) finished 2-11 against the Las Vegas point spread, with several teams falling well short of the game’s closing number.
Here’s a dot-chart of how each team did, relative to game point spreads. For example, Nebraska, which was favored by 35.5 points over McNeese State but only won by 7, had the conference’s worst day relative to the point spread expectation (-28.5).
On the whole, a conference finishing 2-11 ATS is bad; due to chance, and assuming each game’s ATS result is a coin flip, a sample of 13 games would only produce 2 wins or fewer about…
View original post 572 more words
NFL Picks Week 1
Week 1 – SU: 10-6-0 ATS: 8-8-0 O/U: 13-3-0
Green Bay at Seattle
Prediction: Seahawks 24-21 (57.9%)
Pick: Packers +6 (59.1%)
Total: Over 45
New Orleans at Atlanta
Prediction: New Orleans 25-24 (51.3%)
Pick: Saints +1 (54.2%)
Total: Under 52
Buffalo at Chicago
Prediction: Chicago 24-19 (64.3%)
Pick: Bills +6.5 (53.9%)
Total: Under 48.5
Washington at Houston
Prediction: Texans 24-20 (59.6%)
Pick: Texans -2.5 (52.5%)
Total: Under 45.5
Tennessee at Kansas City
Prediction: Chiefs 23-19 (61.0%)
Pick: Titans +6 (55.9%)
Total: Under 44
New England at Miami
Prediction: Patriots 26-22 (61.7%)
Pick: Patriots -4 (50.5%)
Total: Over 47
Oakland at NY Jets
Prediction: Jets 22-18 (59.2%)
Pick: Raiders +5.5 (56.3%)
Total: Over 39.5
Jacksonville at Philadelphia
Prediction: Eagles 28-18 (76.5%)
Pick: Jaguars +11 (52.6%)
Total: Under 52
Cleveland at Pittsburgh
Prediction: Pittsburgh 24-19 (64.6%)
Pick: Browns +6 (52.2%)
Total: Over 41
Minnesota at St. Louis
Prediction: Rams 22-20 (54.1%)
Pick: Vikings +6 (62.8%)
Total: Under 45
Cincinnati at Baltimore
Prediction: Ravens 22-20 (54.7%)
Pick: Bengals +2.5 (52.4%)
Total: Under 43.5
San Francisco at Dallas
Prediction: 49ers 22-21 (54.2%)
Pick: Cowboys +5 (59.9%)
Total: Under 49
Carolina at Tampa Bay
Prediction: Panthers 21-20 (53.8%)
Pick: Buccaneers +2 (51.9%)
Total: Over 40
Indianapolis at Denver
Prediction: Broncos 29-21 (71.4%)
Pick: Broncos -7.5 (51.1%)
Total: Under 55.5
San Diego at Arizona
Prediction: Chargers 22-21 (50.2%)
Pick: Chargers +3.5 (59.6%)
Total: Under 44.5
NY Giants at Detroit
Prediction: Lions 25-21 (60.2%)
Pick: Giants +4 (51.1%)
Total: Over 45.5
Neural Networks
Here is a quote I liked from the blog post “Statistics: Losing Ground to CS, Losing Image Among Students”:
(Recently I was pleased to discover–“learn,” if you must–that the famous book by Hastie, Tibshirani and Friedman complains about what they call “hype” over neural networks; sadly, theirs is a rare voice on this matter.)
Cheers.
Genomic Privacy #jsm2014
This is the first of the blog posts in my “backblog” pertaining to #jsm2014.
My dissertation work was in statistical disclosure control and the post-doc work was in genetics. Almost immediately after starting my post-doc, I realized that privacy issues and genetic data seem to go hand in hand. (I recently submitted and R21 to the NIH about using synthetic data to protect privacy in genome-wide association data. It was not funded. I will re-submit.) Anyway, the point is when I saw this session entitled “Genomic Privacy: Risk and Protection Methods”, I absolutely had to go to it. The talks were all fantastic.
The first speaker was Guy Rothblum from Microsoft who was presenting work with Cynthia Dwork on an extension to differential privacy called concentrated differential privacy (CDP).
Here is what they say about CDP in their abstract:
We introduce Concentrated Differential Privacy (CDP), a relaxation of Differential Privacy geared towards improved accuracy and utility.
Like Differential Privacy, Concentrated Differential Privacy is robust to linkage attacks and to adaptive adversarial composition. In fact, it composes as well as Differential Privacy, while permitting (significantly) smaller noise in many setting.
This seems like a good step forward for differential privacy and attempts to address some of the very real issues with the method.
Guy’s introductory slides were a fantastic explanation of the problem at hand and he made a lot of really interesting points. One of the reports he mentioned in his intro slides was the big data review by the federal government. I haven’t finished reading all of this yet, but what I have read is really interesting. Check out the report here: Big Data Review (This report deserves its own blog post.)
Here is a list of some other points that I write down as quickly as I could during his talk:
- Anonymization (removing identifying attributes) does not seem robust in the context of big data. Defeated by the presence of big data. (Netflix Prize as an example of failed anonymization.)
- Concerns 1.) linkage attacks via partial information (ALL adversaries have partial information about us)
- Concern 2.) Composition: Each query is private in isolation, but not necessarily in multiple analyses
- Concern 3.) “Just release statistics”: Attacks include differencing, big band attacks. (e.g. Query 1: How many sickle cell individuals in DB Query 2: How many sickle cell individuals not names Guy Rothblum. Difference of these queries yields the status of Guy Rohtblum.)
- Intuition of differential privacy: “Bad things can happen happen but not because you participate in the data base.”
- Advantages of differential privacy: 1.) quanitifiable 2.) handles linkage attacks/auxiliary data
- Concentrated differential privacy improves the noise addition by an order of magnitude. Better accuracy, mildly relaxed privacy.
- “A social choice must be made” This is a great point. Once we can quantify privacy, which we don’t all agree on yet, we need to have a discussion about how much privacy we want.
The second speaker was Bradley Malin from Vanderbilt who spoke about “Anonymization of Phenotypic data to support genotype association studies”
Some bullet points from his talk:
- Two quotes I took away from his first few slides were “Hurdle not Fort Knox” and “Possible doesn’t imply probable”. In terms of the first quote my boss always describes this in terms of breaking into a house: Just because it’s illegal to break into someone else’s home, doesn’t mean we don’t lock our doors. But you don’t necessarily need bars on the windows either. We can’t simply rely on the law to deter adversarial data users, but we also don’t need to go over board.
- “Often we use very strong adversary models. But almost perfect results can be achieved…. in the real world. We must be ‘reasonable and practical‘” I am totally guilty of this. A few of my articles on the topic make very strong assumptions about what the adversary knows (often “worst case scenario). My more recent papers are relaxing these worst case scenario assumptions.
- Examples of things that could potentially identify an individual: demographics,diagnosis codes, laboratory, DNA, location visits, movie reviews.
- Malin introduces a procedure named UGACLIP for anonymization in a GWAS setting.
- Malin talked about how stakeholders should some how participate in the decision as to how much privacy they think they should have, but many people have no idea how to interpret the numbers. (i.e. Average person on the street doesn’t have any idea how secure 5-anonymization is)
- When sharing data with NIH the generally accepted value of k (as in k-anonymzation) is 5.
Speaker 3 was Fei Yu of Carnegie Mellon who presented joint work with Stephen Fienberg. Fei spoke about “Scalable Privacy preserivng data sharing methodologies for GWAS.” (Full article on arXiv).
- One of the big privacy concerns of GWAS is that even aggregate statistics from a GWAS (MAF, χ² statistics, regression coefficients) do not provide perfect privacy. Homer et al. (2008) showed that an intruder may be able to infer that someone has participated in a GWAS. This caused the NIH to review its data sharing policy of GWAS data.
- Fei presented work on how to share the top M SNPs (in terms of their significance) and achieve ε-Differential Privacy.
Finally, Hae Kyung Im of the University of Chicago spoke about the “On Sharing Quantitative Trait GWAS Results in an Era of Multiple-Omics Data and the Limits of Genomic Privacy“. Since I am now a resident of the midwest, I meant to try to meet Hae Kyung Im, but at the end of the session I bumped into someone else and ended up missing her (this happens all the time at JSM. You try to do one thing, but then you bump into someone you haven’t seen in 5 years.)
- “For full advantage, broad sharing of data of results is needed; Must be careful about privacy.” I totally agree with this. There is so much potential benefit to sharing this type of data that we can’t just lock it up in a database and throw away the key.
- “Summary studies are considered safe, BUT with GWAS studies we may have millions of SNPS”. With big data our previous ideas about what is safe to release need to be re-avaluated. She again cited Homer et al. (2008) noting based on that article: Even if the DNA sample was a mixture of 1000 or more individuals, they were able to determine with high accuracy whether they were in the sample or not. This is “Great for forensics, but has consequences fro GWAS”.
- The question she posed in terms of GWAS is basically “Can we publish regression coefficients?” I was going to try to summarize her results, but I can’t interpret what I wrote about her results, so I’ll just wait until I get her slides. (I tweeted (Twitter is awesome) at her, hopefully she will be kind enough to share them.)
A discussion then ensued, which led Stephen Fienberg to make (roughly) the following statement:
I think that there is a misconception. In very high dimensional problems as in GWAS with the auxiliary data every individual is unique. You must find something to share. We cannot go on saying we can’t share anything. If I were on your IRB there is no issue with the faculty doing this. The only issue is what do they publish on what they have done. And I don’t think there is an IRB that understands how to do that in the entire country.
One final thought: I was struck by how few people were in this session. This will be is already a big deal. Right now with just a single hair, someone can reproduce your face up to a family resemblance. Who knows what we’ll be able to do in 5, 10 or 50 years? Privacy is a big deal.
Cheers.
openWAR in 2014
Exploring Baseball Data with R
Over at Stats In the Wild, my collaborator Greg Matthews has been monitoring the results of openWAR for the current season. How is he doing this?
The first step, of course, is to load the openWAR package.
Getting the 2014 Data
Now we need some data. While data from the previous two years is bundled into the openWAR package, data from the current season is not – so we’ll have to download it. We have made this as painless as possible. All we have to do is tell the getData() function the time interval over which we want to download game data, and it will do the rest. In this case, we want all games from the season opener played by the Dodgers and Diamondbacks in Australia on March 22nd, through today’s games.
Warning: this will take a while to run – possibly an hour or so.
Since we…
View original post 640 more words
2014 NFL Preview
The 2014 NFL season is almost here. That means it’s time for my NFL preview (2013 NFL Season Preview). I’ve updated and tweaked my model this year so it’s hopefully better than last year. Below you can read my predictions for the 2014 season. Or if you just want to see the numbers and skip my terrible comparisons of NFL divisions to different countries economies, jump to the bottom. Cheers.
AFC
East
The AFC east the story of America and the Patriots are the 1%-ers. The rest of the division is the 99%. In that, I’m 99% sure that none of these other teams will make the play-offs this year. Pick: Patriots win the division by 7 games.
North
If the AFC east is all about inequality, the the AFC North is a socialist paradise. Like the Finland. I have 3 teams going 9-7 and all making the play-offs with Cleveland being the exception. (Cleveland is like a poor guy in Finland; You won’t read that sentence anywhere else. I promise you that.) Though if this division ended with four 8-8 teams I wouldn’t be surprised at all. Pick: Baltimore (Or Pittsburgh) (Or Cincinnati) (In Finland, everyone makes the playoffs).
South
Want another economy comparison? Well you’re getting one. The AFC South is like India. The have people who are rich (Houston), but not rich, rich like America (New England), they have a developing middle class that is good not great (Indianapolis and Tennessee) and absolute, abject poverty at the bottom (Jacksonville). Boom. That actually sort of makes sense, right? Pick: Houston. Because we all know they weren’t really 2-14 bad.
West
I’ve got nothing left. Wait. I do. The AFC west is like China. You have the super rich (Denver), a developing middle class (Kansas City and San Diego) and a terrible lower class (Oakland) that is poor, but not as poor as India’s poorest (Jacksonville). Pick: Denver. Because (if I was from ESPN) PEYTON! But really because they are the best TEAM in the AFC West.
NFC
East
Alright. I guess I’m committed to this now. The NFC East is….Denmark. One of the most socialist countries in Europe with one of the lowest poverty rates of all OECD nations according to Wikipedia. Philadelphia finishes 10-6, everyone else finishes 8-8. And I’m sure Dallas will have a chance to win and get in in their last game of the season and Romo will throw an interception to lose for his team because everyone loves that narrative. (I’m not sure how exactly that is like Denmark, but it is.) Pick: Philadelphia. Because someone has to win.
North
Let’s go ahead and say the NFC North is like Brazil. 4 fan bases that couldn’t possibly be sadder when their team let’s them down. Just like Brazil and the World Cup. (This is officially off the rails) Pick: Green Bay. Because they are publicly owned.
South
The NFC South is like…France? Because there are a lot of teams that want to make the play-offs but there just aren’t enough spots (NFC South Teams:French Youth::NFL Play-off spots:French Jobs). And New Orleans. Pick: New Orleans. Because they are the most French.
West
The NFC West is clearly Russia. I can’t do this anymore. Pick: San Francisco. Slightly easier schedule than Seattle.
So who’s gonna make the play-offs?
Pretty boring in the AFC. I’m picking the exact same 6 teams to make the play-offs as I did last year (New England, Denver, Houston, Baltimore, Pittsburgh, Cincinnati). (What’s the definition of insanity again? Doing the same thing….) In fact, the only thing I switched in the AFC was the seeds of the 5 and 6 teams (I swapped Pittsburgh and Cincinnati). In the NFC, I’m obviously picking San Francisco and Seattle to make the play-offs (as is just about everyone else) and one of these teams will be the 1 seed, with the edge to San Francisco who plays the (slightly) easier schedule. Along with those two, I’m taking New Orleans, Green Bay, and Philadelphia to win their divisions. And finally, as the 6 seed in the NFC I’m taking….(drumroll please)….Detroit! I have them squeaking in at 9-7. Finally, I’m taking San Francisco over New Orleans in the NFC title game and New England over Denver in the AFC. This leads to a Patriots vs 49ers Super Bowl with the Patriots prevailing 24-23. (I know. Boring right? Same pick as last year.)
Ranks
Retro is based only on games played in 2013 and heavily weighs strength of schedule. New Orleans is ranked number 1 based on this as they finished 11-5 and played 10 teams who finished 8-8 or better.
Prosp is based on 4 years of data weighted fro recency. It’s based on expected points with New England ranked number 1 in this measure.
Both Retro and Prosp are framed in terms of the probability of defeating and average team.
predMargin column is the average margin of victory if each team played every other team at home and away. Denver is ranked number 1 in this measure due to their ability and willingness to score a ton of points.
|
Playoff Probabilities
|
Projected Records
Team – (Median wins) expected wins
AFC East
New England – (13-3) 13.044
Miami – (6-10) 6.344
Buffalo – (6-10) 5.905
NY Jets (5-11) 5.329
AFC North
Baltimore (9-7) 9.192
Pittsburgh – (9-7) 9.129
Cincinnati – (9-7) 9.041
Cleveland – (5-11) 5.328
AFC South
Houston (11-5) 10.679
Indianapolis – (7-9) 7.114
Tennessee (7-9) 6.623
Jacksonville (2-14) 2.234
AFC West
Denver – (13-3) 12.636
San Diego – (8-8) 8.334
Kansas City – (7-9) 7.369
Oakland – (4-12) 4.213
NFC East
Philadelphia (10-6) 9.579
Dallas (8-8) 8.256
NY Giants (8-8) 7.801
Washington (8-8) 7.75
NFC North
Green Bay (11-5) 10.659
Detroit (9-7) 9.095
Chicago (9-7) 8.505
Minnesota (5-11) 5.352
NFC South
New Orleans (11-5) 10.990
Carolina (9-7) 8.909
Atlanta (8-8) 8.227
Tampa Bay (5-11) 5.227
NFC West
San Francisco (12-4) 11.594
Seattle (11-5) 11.449
Arizona (5-11) 5.312
St. Louis (5-11) 4.781
Projected Playoffs
AFC
1. New England
2. Denver
3. Houston
4. Baltimore
5. Pittsburgh
6. Cincinnati
NFC
1. San Francisco
2. New Orleans
3. Green Bay
4. Philadelphia
5. Seattle
6. Detroit
Projected Wild Card Round
AFC
Houston beats Cincinnati 22-20
Baltimore beats Pittsburgh 22-20
NFC
Green Bay beats Detroit 27-23
Seattle beats Philadelphia 23-22
Projected Divisional Round
AFC
New England beats Baltimore 28-22
Denver beats Houston 27-21
NFC
San Francisco beats Seattle 21-19
New Orleans beats Green Bay 27-24
Projected Conference Round
AFC
New England beats Denver 29-27
NFC
San Francisco beats New Orleans 24-22
Super Bowl
New England beats San Francisco 24-23
Season Long Bets
Win Totals
Arizona Under 7.5 +120
Cleveland Under 6.5 +115
Denver Over 11.5 +105
Houston Over 7.5 +125
Indianapolis Under 9.5 +110
Jacksonville Under 4.5 +145
NY Jets Under 7 +105
Kansas City Under 8.5 +145
Miami Under 7.5 -130
New England Over 10.5 -210
St. Louis Under 7.5 +135
Tampa Bay Under 7 +125
Win Division
Philadelphia +130
Green Bay -130
New England -320
Denver -300
Make Playoffs
AFC
Cincinnati +150
Baltimore +220
San Diego +300
Houston +280
NFC
Seattle -300
San Francisco -250
New Orleans -140
Carolina +235
Detroit +270
Miss Playoffs
Indianapolis -140
Crazy Long Shot Super Bowl Match-Up
Carolina vs Houston +70000

