What is the definition of privacy?

The Atlantic has an article posted on its site today entitled “By 2025, the Definition of ‘Privacy’ Will Have Changed“.  This got me thinking again (and I’ve thought about it quite a bit) about just what is a reasonable definition of privacy?  My two major conclusions to that question are that 1) it’s very complicated and 2) it depends.

A simple definition of privacy used in [1] attributed to Professor Weston of Columbia University is

the right “to determine what information about ourselves we will share with others.”

In terms of directly opting in or out of some data collection process, this is a straight forward definition.  However, it’s not always this simple.  For starters, I may want to share my data with organization X but I don’t want to share that same information with organization Y.  So for starters, I would extend this definition to say something like: the right to determine what pieces of information about ourselves we are willing to share and with who.  

Using this definition, someone now would have control over what data they are willing to share and what organizations they are willing to share it with.  Ok easy right?

Well what about this scenario: You are in a class with three students.  You all take a test, and after the test the professor hands back your exam with your grade and tells the class the average test score.  If any two of the three students collude and share their test scores, they can immediately calculate the third students exact score.  I would argue that this is clearly a violation of the third students privacy.  They never willingly shared their exam score with the two other colluding students.  But a piece of information about them was learned.

However, what about situations where the third students score is learned within some range.  Clearly, if you know the students score is between 0 and 100, their privacy is not violated.  But what if you learn that their score was between 80 and 85?  above 50? or less than 70?  Is learning any of these pieces of information a violation of privacy?  I don’t know.  Defining privacy is hard.

The other aspect of defining privacy that I find fascinating is that privacy is not tied directly to a piece of information.  It’s often about HOW that information was obtained.  For instance, if a friend tells me that they have cancer, that is not a violation of privacy.  However, if the hospital without the consent of my friend, discloses that my friend has cancer, a clear violation of privacy has occurred.  EVEN if I already knew that the friend had cancer.  So it’s not the information itself that causes a privacy problem, it is a combination of the data AND the mode of transmission.  (Another random question: Is it a privacy violation for a hospital to confirm that you do NOT have cancer?  I would argue yes.)

My point here is that privacy is a very slippery concept and difficult to pin down an exact definition of.  But studying and learning about privacy is going to be an increasingly valuable topic as data about ourselves is being collected on such a monumental scale that it would have been hard to imagine this even ten years ago.  Try to imagine what this will be like in another 10 year?  or 25 year?  or 50 years?

Finally, I especially like (and am terrified by) this quote from the last paragraph of that article:

We are embarked, irreversibly, I suspect, upon a trajectory toward a world in which those spaces, times, and spheres of activity free from data collection and monitoring will, for all practical purposes, disappear.

Cheers!

Citations:

[1] Fellegi, I.P., 1972. On the question of statistical confidentiality. Journal of
the American Statistical Association 67 (337), 7–18.

NFL Picks – Week 16

Total (weeks 1-15) – SU: 150-73-1 ATS: 111-108-5 O/U: 120-102-2 

Week 1 – SU: 9-7-0 ATS: 8-8-0 O/U: 13-3-0

Week 2 – SU: 10-6-0 ATS: 10-6-0 O/U: 10-6-0

Week 3 – SU: 12-4-0 ATS: 9-6-1  O/U: 8-8-0

Week 4 – SU: 7-6-0 ATS: 5-7-1  O/U: 5-8-0

Week 5 – SU: 14-2-0 ATS: 6-9-0  O/U: 9-6-0

Week 6 – SU: 11-3-1 ATS: 8-7-0  O/U: 6-9-1

Week 7 – SU: 11-4-0 ATS: 7-8-0  O/U: 8-7-0

Week 8 – SU: 11-3-0 ATS: 8-7-0 O/U: 8-7-0

Week 9 – SU: 9-4-0 ATS: 8-5-0 O/U: 4-8-1

Week 10 – SU: 9-4-0 ATS: 4-9-0 O/U: 6-7-0

Week 11 – SU: 9-5-0 ATS: 8-6-0 O/U: 7-7-0

Week 12 – SU: 10-5-0 ATS: 7-8-0 O/U: 8-7-0

Week 13 – SU: 11-5-0 ATS: 8-8-0 O/U: 7-9-0

Week 14 – SU: 7-9-0 ATS: 9-6-1 O/U: 11-5-0

Week 15 – SU: 11-5-0 ATS: 6-8-2 O/U: 10-6-0

Tennessee at Jacksonville

Prediction: Titans 21-20 (50.8%)

Pick: Titans +3.5

Total: Over 40

Philadelphia at Washington

Prediction: Eagles 24-23 (51.0%)

Pick: Washington Football Team +8.5

Total: Under 50.5

San Diego at San Francisco

Prediction: 49ers 23-18 (64.5%)

Pick: 49ers -1.5

Total: Under 42

Cleveland at Carolina

Prediction: Panthers 24-19 (62.7%)

Pick: Panthers -3.5

Total: Over 39.5

Detroit at Chicago

Prediction: Bears 23-22 (52.7%)

Pick: Bears +7

Total: Under 46

Indianapolis at Dallas

Prediction: Cowboys 25-23 (56.1%)

Pick: Colts +3

Total: Under 56

Baltimore at Houston

Prediction: Texans 22-21 (54.1%)

Pick: Texans +6.5

Total: Over 41

Minnesota at Miami

Prediction: Dolphins 23-19 (61.5%)

Pick: Vikings +7 

Total: Under 42.5

Atlanta at New Orleans

Prediction: Saints 30-24 (67.3%)

Pick: Falcons +6.5

Total: Under 56

New England at NY Jets

Prediction: Patriots 25-21 (61.2%)

Pick: Jets +10.5

Total: Under 47

Buffalo at Oakland

Prediction: Bills 21-19 (54.7%)

Pick: Raiders +6

Total: Over 39

Kansas City at Pittsburgh

Prediction: Steelers 23-20 (58.8%)

Pick: Chiefs +3.5

Total: Under 46.5 

NY Giants at St. Louis

Prediction: Rams 22-20 (53.6%)

Pick: Giants +5

Total: Under 43.5

Green Bay at Tampa Bay 

Prediction: Packers 25-21 (61.5%)

Pick: Buccaneers +10.5

Total: Under 48.5

Seattle at Arizona

Prediction: Seahawks 22-18 (59.9%)

Pick: Cardinals +9

Total: Over 36

Denver at Cincinnati

Prediction:  Broncos 25-23 (56.0%)

Pick: Bengals +3.5

Total: Over 47.5

An NHL shootout lasted 20 rounds. How improbable was it

Originally posted on StatsbyLopez:

Tonight’s Panthers/Capitals game lasted an improbable 20 rounds, with the teams recording the following round by round outcomes (X for misses, 0 for goals)

Washington: X X X O X X O X X O O X X X X X O X X X

Florida: X X X O X X O X X O O X X X X X O X X O

Estimating the likelihood of a shootout reaching 20 rounds is actually fairly straightforward, provided that you make some reasonable assumptions.

Assumption 1: The save percentage for each goalie is 67%.

For starters, the overall NHL save rate on shootouts is about 67%. And while I have made the point that not all NHL goalies are created equal in terms of stopping shootout attempts, in this game, both Florida goalie Roberto Luongo (67.3% save rate) and Washington counterpart Braden Holtby (66.0%) are nearly identical to…

View original 485 more words

Links for December 14

statsinthewild:

I enjoyed all of these links. Especially the Christmas themed ones.

Cheers!

Originally posted on God plays dice:

Erica Klarreich at Quanta on large prime gaps.

What makes Paris look like Paris?

Math-inspired Christmas ornaments (sadly, I don’t have a tree).

Nathan Yau at FlowingData has a list of data-ish physical gift things.

From the archives of the MAA, video of David Blackwell on predicting at random (1966).

View original

NFL Picks – Week 15

Total (weeks 1-15) – SU: 150-73-1 ATS: 111-108-5 O/U: 120-102-2 

Week 1 – SU: 9-7-0 ATS: 8-8-0 O/U: 13-3-0

Week 2 – SU: 10-6-0 ATS: 10-6-0 O/U: 10-6-0

Week 3 – SU: 12-4-0 ATS: 9-6-1  O/U: 8-8-0

Week 4 – SU: 7-6-0 ATS: 5-7-1  O/U: 5-8-0

Week 5 – SU: 14-2-0 ATS: 6-9-0  O/U: 9-6-0

Week 6 – SU: 11-3-1 ATS: 8-7-0  O/U: 6-9-1

Week 7 – SU: 11-4-0 ATS: 7-8-0  O/U: 8-7-0

Week 8 – SU: 11-3-0 ATS: 8-7-0 O/U: 8-7-0

Week 9 – SU: 9-4-0 ATS: 8-5-0 O/U: 4-8-1

Week 10 – SU: 9-4-0 ATS: 4-9-0 O/U: 6-7-0

Week 11 – SU: 9-5-0 ATS: 8-6-0 O/U: 7-7-0

Week 12 – SU: 10-5-0 ATS: 7-8-0 O/U: 8-7-0

Week 13 – SU: 11-5-0 ATS: 8-8-0 O/U: 7-9-0

Week 14 – SU: 7-9-0 ATS: 9-6-1 O/U: 11-5-0

Week 15 – SU: 11-5-0 ATS: 6-8-2 O/U: 10-6-0

Arizona at St. Louis

Prediction: Rams 21-20 (53.0%)

Pick: Cardinals +4.5

Total: Over 40

Pittsburgh at Atlanta

Prediction: Falcons 25-24 (53.2%)

Pick: Falcons +2.5

Total: Under 55

Jacksonville at Baltimore

Prediction: Ravens 25-15 (76.3%)

Pick: Jaguars +14

Total: Under 46

Green Bay at Buffalo

Prediction:  Packers 24-22 (55.3%)

Pick: Bills +6

Total: Under 50.5

Tampa Bay at Carolina

Prediction: Panthers 23-18 (65.9%)

Pick: Panthers -3.5

Total: Under 41.5

Cincinnati at Cleveland

Prediction: Bengals 22-21 (50.7%)

Pick: Bengals +1.5

Total: Under 44

Minnesota at Detroit

Prediction: Lions 25-19 (65.7%)

Pick: Vikings +8

Total: Over 43

Houston at Indianapolis

Prediction: Colts 25-22 (57.7%)

Pick: Texans +7 PUSH

Total: Under 49

Oakland at Kansas City

Prediction: Chiefs 24-16 (69.8%)

Pick: Raiders +10.5

Total: Under 42

Miami at New England

Prediction: Patriots 28-21 (68.7%)

Pick: Dolphins +8

Total: Over 48

Washington at NY Giants

Prediction: Giants 24-21 (57.6%)

Pick: Washington Football Team +7

Total: Under 47

Denver at San Diego

Prediction: Broncos 27-24 (58.5%)

Pick: Chargers +4.5

Total: Over 50.5

San Francisco at Seattle

Prediction: Seahawks 21-18 (60.8%)

Pick: 49ers +10 PUSH

Total: Over 38

NY Jets at Tennessee

Prediction: Titans 20-19 (53.3%)

Pick: Titans +2

Total: Under 42

Dallas at Philadelphia

Prediction: Eagles 27-24 (60.0%)

Pick: Cowboys +3.5

Total: Under 56

New Orleans at Chicago

Prediction: Saints 26-25 (52.2%)

Pick: Bears +3

Total: Under 54.5

Voter bias, football polls, and TCU

Originally posted on StatsbyLopez:

One of the topics undersold during the arguments of which four NCAA football teams deserved a spot in the college football playoff was the effect of voter bias on decision making.

Specifically, literature has found NCAA football poll voters to be biased in a few ways.

Bias #1- Associated Press (AP) poll voters are biased towards teams (i) in the voter’s home state, (ii) in the same conference as teams in the voter’s state, (iii) in BCS conferences, and (iv) teams playing in more televised games.

Bias #2- Coaching poll voters are biased in favor of both their recent opponents and their alma-maters.

Bias #3- AP voters are biased in favor of teams which were ranked higher earlier in the season.

It’s obviously too early to tell whether or not these biases will hold with the college football playoff selection committee over the long run. However, it’s particularly curious how the decision-making process…

View original 486 more words

College Football Conference Construction

So the Big XII is pissed.  (And probably rightly so).  As I’ve said before, if the committee was trying to pick the best 4 teams in college football, they failed miserably.  TCU and Baylor, both one loss tams, are both better than Ohio State, imho.  Further, there are several multi-loss teams that are better than Ohio State or Florida State.  I’d suggest Mississippi State (2 losses) AND Ole Miss (3 losses) for starters.  But I’d also include basically any team from the SEC West including Georgia, Auburn, and LSU.  And screw it, I’m going to include Arkansas.  I think Arkansas would beat Florida State or Ohio State.  The SEC is that good.

Since the simpletons on the College Football Playoff committee can apparently only see wins, is it possible to game the system?  For instance, could a conference add or remove teams from their conference to maximize their potential for getting a team into the playoffs?  How would a conference do that?  Let’s do a simulation study.

Motivating question

How could a college football conference construct its conference to maximize their chances of getting at least one of their teams into the playoff?

Simulation Description

Let’s assume a simple model for the college football world.  Let’s assume there are 5 conferences each with 10 teams.  25 of these 50 total teams are “good” and the remaining teams are “bad”.  Each team plays nine games against the other teams in their conference and three “random” non-conference games for a total of 12 games.  When a good team plays a good team or a bad team plays a bad team, each team has a 0.5 probability of winning the game.  When a good team plays a bad team, the good teams probability of a win is expit(1)=.731.  I then simulated a schedule and simulated the season.  I counted the 4 teams with the most wins as the four teams that made the playoff.  The tie-breakers for teams that were tied in wins was drawing lots (i.e. using runif in R).  I then counted how often a team from each of the conferences made it to the playoffs as related to the number of “good” teams in the conference.

Results

Obviously, when all 5 conferences are completely balanced and have 5 good and 5 bad teams, each conference has the same chances to get a team to the playoffs.  So let’s look at some unbalanced situations.  If the good teams are split up so that the conferences have 2, 3, 5, 7, and 8 good teams, respectively, the conference with the 5 good teams is the most likely to get a team into the playoffs getting in about 71.3% of the time. In this setting, with 7 teams, a conference is just slightly less likely to get a team into the playoffs at 69.4% and it drops even more to 63.1% with 8 good teams.  While there are more good teams in the conference to have an opportunity to get into the playoff, these good teams cannibalize each other.

This trend continues throughout all of the simple simulations that I looked at where a conference with about half good and half bad teams was the most likely conference to make the playoffs.  At the extremes where there is one conference with 5 good teams and the other conferences are all or nearly all good or bad teams, the conference with 5 good teams probability is the largest.

 2 3  5  7 8
 57.7%  66.5%  71.3%  69.4%  63.2%
 1  2  6  7  9
 53.2% 63.4%  74.9%  72.6%  62.5%
 0 1 5 9 10
 34.9%  59.6%  80.6%  71.5%  69.9%
 1  3  5  6  10
 51.1%  71.3%  75.6%  74.0%  51.7%

The plot below shows graphically the results of the above table.  On the x-axis are the number of “good” teams in each conference and the y-axis is the probability that a team from that conference gets into the playoffs (i.e. Is top 4 in terms of number of wins.

collegeSim

What does this mean?

From the point of view of a conference commissioner, if your goal is to build a conference with the purpose of putting teams in a position to win a national championship, your best bet is NOT to construct a power house conference.  You want to put together a conference with about half of the teams being elite and the other half of the teams being not so good.  If you construct a conference with all elite teams, the teams cannibalize each other and no team is clearly the best.  No matter how hard the schedule, this committee, I believe, absolutely will not let a 2 loss team into the playoffs even if that two loss team lost to the number 1 and number 2 best teams in the country.  So to all those conference commissioners looking to add a power house program to their conference, maybe they should reconsider and add a middling program to their conference and get their elite teams another win.  Cause after all, that’s basically all this committee cares about.

Future Work

What I’d like to do in the future, is run this simulation on real college football teams this year to try to construct the ultimate conference for getting a team to the college football playoffs.  I’d use some sort of estimated team strength (Maybe Sagarin) to simulate games during a season and then take the top 4 teams in terms of wins.  That could be really interesting.

Cheers.