Category Archives: Uncategorized
Bill Barnwell is up to his usual tricks at Grantland. This time, he’s tired of hearing that Flacco is an elite quarterback and wants a new measure of quarterback value. Flacco gets credit for piling up wins, which Barnwell thinks is unfair:
For whatever good or bad Flacco provides, he has spent his entire career as the starting quarterback of the Baltimore Ravens, who perennially possess one of the league’s best defenses. He also has Ray Rice and a solid running game to go alongside him on offense. It’s safe to say that a win by, say, Cam Newton usually requires more work from the quarterback than one by Flacco.
I agree with this wholeheartedly. In response, Barnwell tries to capture quarterback value by creating an “expected wins” measure based on points allowed by the defense and comparing this to actual wins. He argues that a quarterback with more…
View original post 955 more words
Edit: Please see my later post as well, which corrects an omission here.
Miguel Cabrera has a shot at the Triple Crown this year. No one has done it since Carl Yastrzemski. Is it really possible that he could win the Triple Crown and not win the MVP? Well, yes. Every advanced stats guy out there is trumpeting Mike Trout for MVP, with his “wins above replacement” (WAR) above 10 (next best in the majors is 6.8) and his 13 “total zone total fielding runs above average” (basically, this is the number of runs he has saved with his fielding, compared to an average fielder).
The discussion is eerily similar to the AL Cy Young conversation in 2010. Felix Hernandez won because he led the AL in innings pitched, ERA, and, most importantly, WAR, even though his win-loss record was a mediocre 13-12.
The 2010 Cy Young was a victory…
View original post 1,471 more words
Statistics
To most people, statistics means plugging numbers into an advanced calculator that spits out values, without much thought involved. Those people don’t work with data.
Cheres.
Infiltrated by liberals
My favorite critique of my article from Shark974:
Translation: Deadspin is shit and biased and wants to ban the NFL. Nothing more.
Sorry but you know it’s true. Studies these days are worth the paper they’re printed on, having been infiltrated by liberals (probably a few forests have been sacrificed printing fake global warming studies by liberals).
Also, you go through all that trouble to “prove”, drum roll please, baseball players are “no more likely” to die than FB players.
Hmm, given the notion out there that FB is a deathsport or something, I’d call that alone a big win for the NFL. The way Deadspin chose to spin and headline this article says a lot about their obvious bias. I suspect the author votes for Democrats, as well (liberals are much more likely to be both illogical and extremely biased in conducting studies).
But dont worry, we’ve got liberals on the case, I am sure plenty of fake lies, I mean, scientific studies, are coming soon that prove playing in the NFL rapes your dog and gives you cancer, to be shouted from the rooftops everywhere by the liberal media. Again, look to the global warming precedent.
The more I read this, the more I appreciate what a work of are it really is.
Cheers.
Death Rates: A cautionary note from 1937
The crude death rate, for well known reasons, is not a good measure, because it is quite seriously affected by differences in age composition. Standardized death rates, on the other hand, have the disadvantage that they depend on an arbitrarily selected standard population.
–Dublin, L.I. and Lotka, A.J. “Use of the Life Table in Vital Statistics.” American Journal of Public Health. Vol. 27, May, 1937.
Cheers.
MLB Playoff Probabilities – 9/4/2012
StatsInTheWild MLB rankings as of September 4, 2012 at 12:18pm. SOS=strength of schedule
| Team | Rank | Change | Record | Projected Record | Prob of making playoffs | SOS | Run Diff |
| Texas | 1 | ↑1 | 80-54 | 95-67 | 99.3% | 11 | +121 |
| NYY | 2 | ↓1 | 76-58 | 90-72 | 87.4% | 6 | +86 |
| Tampa Bay | 3 | – | 74-61 | 87-75 | 43.4% | 7 | +78 |
| Oakland | 4 | ↑1 | 76-58 | 89-73 | 69.2% | 8 | +79 |
| Washington | 5 | ↓1 | 82-52 | 97-65 | 99.9% | 25 | +114 |
| LA Angels | 6 | ↑4 | 72-63 | 84-78 | 7.9% | 5 | +50 |
| Detroit | 7 | ↑2 | 72-62 | 86-76 | 53.8% | 13 | +37 |
| Chi WSox | 8 | ↓1 | 73-61 | 87-75 | 66.2% | 14 | +66 |
| Cincinnati | 9 | ↓1 | 82-54 | 95-67 | 99.9% | 30 | +84 |
| Atlanta | 10 | ↓4 | 76-59 | 89-73 | 96.0% | 22 | +82 |
| Baltimore | 11 | ↑1 | 75-59 | 88-74 | 72.8% | 3 | -31 |
| St. Louis | 12 | ↑1 | 70-64 | 85-77 | 44.5% | 29 | +97 |
| SF | 13 | ↑5 | 77-58 | 90-72 | 96.8% | 26 | +43 |
| Seattle | 14 | ↑1 | 66-70 | 77-85 | 0% | 2 | -9 |
| Boston | 15 | ↓4 |
62-74 | 73-89 | 0% | 4 | -9 |
| LA Dodgers | 16 | – | 73-63 | 85-77 | 34.4% | 24 | +28 |
| Toronto | 17 | ↓3 | 60-74 | 72-90 | 0% | 1 | -41 |
| Arizona | 18 | ↓1 | 66-70 | 76-84 | 0.1% | 23 | +23 |
| Pittsburgh | 19 | – | 70-64 | 84-78 | 27.4% | 28 | +9 |
| NY Mets | 20 | ↑1 | 64-71 | 76-86 | 0.1% | 15 | -28 |
| Philadelphia | 21 | ↑1 |
65-70 | 78-84 | 0% | 20 | -20 |
| Milwaukee | 22 | ↑3 | 65-69 | 77-85 | 0.7% | 27 | +19 |
| Kansas City | 23 | ↓3 | 60-74 | 73-89 | 0% | 12 | -56 |
| San Diego | 24 | ↑3 | 62-74 | 73-89 | 0% | 21 | -56 |
| Minnesota | 25 | – | 55-80 | 67-95 | 0% | 10 | -106 |
| Miami | 26 | – | 60-75 | 71-91 | 0% | 16 | -96 |
| Cleveland | 27 | ↓4 | 57-78 | 68-94 | 0% | 9 | -157 |
| Colorado | 28 | ↑1 | 55-78 | 67-95 | 0% | 18 | -95 |
| Chi Cubs | 29 | ↓1 | 51-83 | 63-99 | 0% | 19 | -115 |
| Houston | 30 | – | 42-93 | 52-110 | 0% | 17 | -197 |
If you haven’t yet discovered the competitive machine learning site kaggle.com, please do so now. I’ll wait.
Great – so, you checked it out, fell in love and have made it back. I recently downloaded the data for the getting started competition. It consists of 42000 labelled images (28×28) of hand written digits 0-9. The competition is a straight forward supervised learning problem of OCR (Optical Character Recognition). There are two sample R scripts on the site to get you started. They implement the k-nearest neighbours and Random Forest algorithms.
I wanted to get started by visualizing all of the training data by rendering some sort of an average of each character. Visualizing the data is a great first step to developing a model. Here’s how I did it:
Which gives you:
Notice the wobbly looking ‘1’. You can see that there is some variance in the angle of…
View original post 127 more words
