Color Survey Results


Who in the rainbow can draw the line where the violet tint ends and the orange tint begins? Distinctly we see the difference of the colors, but where exactly does the one first blendingly enter into the other? So with sanity and insanity.
—Herman Melville, Billy Budd

Orange, red? I don’t know what to believe anymore!
—Anonymous, Color Survey

—Anonymous, Color Survey

Thank you so much for all the help on the color survey.  Over five million colors were named across 222,500 user sessions.  If you never got around to taking it, it’s too late to contribute any data, but if you want you can see how it worked and take it for fun here.

First, a few basic discoveries:

  • If you ask people to name colors long enough, they go totally…

View original post 1,291 more words

Cinderella Plots 2017 #ncaa #marchmadness

I originally made this plot for the 2012 Stat Geek Idol contest run by Team Rankings, and they are still cool.  The original article describing them can be found here. Below I’ve updated the plots for games through March 19th.  (For games that haven’t been played yet this year, I’ve assumed the higher seed wins for the sake of the plot.)






$10.8 Billion. But none for the players. #ncaa

In 2010 CBS signed a deal with the NCAA for the rights to broadcast the NCAA tournament for 14 seasons from 2011-2024.  According to the NCAA, the deal is worth 10.8 BILLION dollars.

Let’s do some fun quick math, shall we:

There are 347 NCAA division 1 NCAA basketball teams and they each offer 13 scholarships.  That’s a total of 4,511 spots.  If you paid every single one of these players $50,000 per year, that would cost about 225 million dollars ($225,550,000).  Let’s assume salaries go up 3 percent a year for inflations, which would mean that in 2024 you are paying players around $73K per year.  The grand total over 14 years for this plan would cost about 3.8 BILLION dollars ($3,853,820,415).  That’s a ton of money.  BUT it’s dwarfed by how much money this deal is worth.  If you took this money out of this deal to pay the players, you’d still have almost $7 BILLION left over  ($6,946,179,585).  And keep in mind, this is just for the tournament.  That’s 67 games over three weeks in march.  There is more money than this for regular season TV contracts not to mention the money schools make for selling tickets and merchandise.  Merchandise, some of which, has players’ names on it!!!! The school can make money off of a player’s name, but the player can’t! (Well the player can, he just then can’t play in the NCAA).  Think about that.  Screw you NCAA.  Screw you.

The NCAA is a cartel that makes billions of dollars off of the back of labor that has absolutely no representation.  I don’t know why as a society we put up with this.  In literally every other area of life, if you are good enough at something to get paid for it, someone can pay you for it.  The exception is sports.  Why do we value amateurism in sports?  I have no idea.  But I’ll continue to call out the hypocrisy of the NCAA until they pay the players who actually make this stuff worth watching.  Coaches get to negotiate a salary and can come and go as they please.  Hell, some coaches get huge contract buy out to NOT COACH.  But players have basically no rights.  So until the NCAA changes it’s ways, I hope they get sued from every direction until they have no choice but to pay the players.




Not our year #kaggle

After 48 games we are in 97th place in this year’s March Machine Learning Madness.  I’m going to guess that we finish in 45th.  Screen Shot 2017-03-19 at 10.52.55 PM


Sunday’s NCAA game predictions with probabilities #kaggle

In 82nd after 32 games.  In 101st after 40 games.  Not a great day for us.  Need something big tomorrow to get within striking distance.  I’m guessing we will finish in the top 50, but it looks like our lucky run of three top 4 finishes in a row is pretty much over.

Screen Shot 2017-03-18 at 10.57.11 PM

Here are our predicted probabilities for Sunday’s games (with ridiculous precision):

  • Louisville over Michigan: 0.7051029
  • Kansas over Michigan St: 0.8346071
  • UCLA over CIncinnati: 0.6074323
  • Oregon over Rhode Island: 0.7488454
  • Duke over South Carolina: 0.7873011
  • Kentucky over Wichita State: 0.6272257
  • North Carolina over Arkansas: 0.8434368
  • Baylor over USC: 0.8055379


Today’s NCAA game predictions with probabilities

@statsbylopez and I finished the first day of March Machine Learning Madness in 140th.  After 32 games we are currently in 82nd.

Screen Shot 2017-03-18 at 10.26.32 AM.png

Here are our predicted probabilities for today’s games (with ridiculous precision):

  • Villanova over Wisconsin: 0.7490162
  • Florida over Virginia: 0.5047416
  • Gonzaga over Northwestern: 0.896185
  • West Virginia over Notre Dame: 0.6114829
  • Florida State over Xavier:  0.7793296
  • Arizona over St. Mary’s (CA): 0.5626207
  • Butler over Middle Tennessee State: 0.7420982
  • Purdue over Iowa State: 0.5234515


My most likely bracket #NCAA Tournament

Here is my bracket based entirely on my rankings using betting market data:


Nothing too interesting here.  I’ve got all of the higher seeds winning all the games except for two first round upsets.  I like Oklahoma State over Michigan and Rhode Island over Creighton.

Screen Shot 2017-03-15 at 10.09.55 PMSouth

I’ve got three first round upsets in this bracket, the 9, 10 and 11 seeds.  That’s Seton Hall over Arkansas in one of the worst games of the first round, Kansas State over Cincinnati (if Wake Forest had beaten Kansas State I would have picked Wake Forest here; I really don’t like Cincinnati), and Wichita State over Dayton.  Wichita State being a 10 seed is the most head scratching seeding of the tournament in my mind.  They probably won’t be favored over Kentucky in the second round, but I’d guess that Kentucky won’t be favored by much more than 3 or 4 (Contrast that with North Carolina who would be favored over Seton Hall by about 12 or 13 points).

I’ve also got UCLA getting to the Elite 8 in this region over Kentucky, with North Carolina coming out of this bracket.

Screen Shot 2017-03-15 at 10.09.47 PMWest

This is the most boring region in my bracket.  Straight chalk the whole way with only Florida State with a small upset over Arizona.  The Northwestern-Vanderbilt game features teams that have a combined 26 losses on the season with FIFTEEN of those belonging to Vanderbilt.

Screen Shot 2017-03-15 at 10.09.41 PMEast

Another pretty boring region.  I’ve got no upsets in the first round here and the only small upset I have is Virginia over Florida to make it to the Sweet Sixteen.  I’ve got Villanova coming out of this region triumphing over Duke.

Screen Shot 2017-03-15 at 10.09.22 PM

Final Four

I have Villanova over Gonzaga by 1 point to make it back to the finals and I have North Carolina, also getting back to the finals, over Kansas by 3.5 points.

Screen Shot 2017-03-15 at 10.10.07 PM

National Champion

I think North Carolina gets back to the championship game, but this year they pull it off.

Screen Shot 2017-03-15 at 10.10.13 PM

I’ll be posting some more brackets later tonight if I have time.




Here is what I think the NCAA Tournament field should have looked like

If I chose the NCAA Tournament, here is what the field would look like:

(*automatic qualifier)

Actual Tournament Qualifiers

1 seeds: North Carolina, Villanova*, Gonzaga*, Duke*

2 seeds:Kansas, UCLA, Kentucky*, Louisville

3 seeds: Virginia, West Virginia, Oregon, Florida St.  

4 seeds: Baylor, Wisconsin, Arizona*, Purdue

5 seeds: Florida, Wichita St.*, Notre Dame, Butler

6 seeds: Miami (FL), St. Mary’s (CA), Oklahoma St., Dayton

7 seeds: Michigan*, Iowa St., VCU, Rhode Island*

8 seeds: Clemson, Kansas St., SMU*, South Carolina 

9 seeds: Wake Forest, Texas Tech, Creighton, Syracuse

10 seeds: Cincinnati, Utah, Marquette, New Mexico St.*

11 seeds: Maryland, Minnesota, USC,California/Seton Hall 

12 seeds:UNC Wilmington*,TCU/Michigan St, MTSU*, Nevada*

13 seeds: Bucknell*, Princeton*, Vermont*, ETSU*

14 seeds: Winthrop*, UC Davis*, Iona*, Kent St.*

15 seeds: N Kentucky*, Troy*, NC Central*, North Dakota*

16 seeds: Texas Southern*, Jacksonville St.*, South Dakota St.*/Florida Gulf Coast*, Mount St. Mary’s*/New Orleans* 



NCAA Tournament – Some Thoughts

Well the brackets are out and I just finished filling my first one out.  Tomorrow and Tuesday I’ll work on my models for March Machine Learning Mania 2017.

Here are some of my initial thoughts:

  • I think the committee did a really good job picking 1 seeds this year.  I would have made Duke a 1 seed and Kansas a 2, but I’m fine either way.
  • I’m taking Oklahoma St over Michigan in the first round.  This is classic “not so great team goes on a great run and returns to reality in the NCAA tournament”.
  • My other first round upset picks are:
    • URI over Creighton
    • Seton Hall over Arkansas
    • Wake Forest or Kansas State over Cincinnati
    • Wichita State over Dayton (poor Dayton)
  • Speaking of Wichita State…….a 10 seed?  How in the what?  I have them as a 5 seed and Ken Pomeroy has them as a TWO! A 10 seed is an absolute joke.
  • Clemson got screwed.  But they won football so screw them.
  • Congrats to Northwestern for finally making it to the NCAA tournament.  They get to play FIFTEEN loss Vanderbilt, which they can very easily win.  Then they will get absolutely roasted by Gonzaga.
  • The four 9 seeds all make me want to throw up: Michigan State, Seton Hall, Virginia Tech, and Vanderbilt.  All of those teams belong in the NIT.
  • The East region is stacked.

More to come tomorrow.


All win probability models are wrong — Some are useful

How does Matt Ryan sleep at night?


As in the moments following the 2016 US election, win probabilities took center stage in public discourse after New England’s comeback victory in the Super Bowl over Atlanta.

Unfortunately, not everyone was enamored.

While it’s tempting to deride conclusions like Pete’s, it’s also too easy of a way out. And, to be honest,  I share a small piece of his frustration, because there’s a lingering secret behind win probability models:

Essentially, they’re all wrong.

But win probabilities models can still be useful.

To examine more deeply, I’ll compare 6 independently created win probability models using projections from Super Bowl 51. Lessons learned can help us better understand how these models operate.  Additionally, I’ll provide one example of how to check a win probability model’s accuracy, and share some guidelines for how we should better disseminate win probability information.

So, what…

View original post 1,992 more words