Author Archives: statsinthewild
Update: Someone on Twitter suggested that these advancement grids should be weighted by how likely the scenario is. Here is what those looked like before the last games of Groups G and H.
Each team in the World Cup is through 2 games and every team has one game left in the group stage. Below you will find graphics for each teams advancement scenarios based on the two remaining games in their group.
- Green indicates that a team will win their group.
- Yellow means they finish second, and red means they are eliminated.
- Light green means they are tied for first after points and goal differential and the winner is determined by further tie breakers.
- Orange indicates a tie for second after points and goal differential and the team that moves on is determined by further tie breakers.
- Gray indicates a three way tie after points and goal differential and further tie breakers are applied to decide who moves on.
This is a pretty boring group.
- Russia wins the group with a win or tie.
- Uruguay wins the group with a win.
- Egypt is out.
- Saudi Arabia is out.
Group B is much more interesting that A.
- Morocco has been eliminated.
- Iran advances with a win OR a tie and Morocco wins by 2 or more over Spain. If Morocco wins by 1 over Spain and Iran wins, it goes to goals for as a tie breaker. Iran can actually still win the group with a win and a Morocco win or tie.
- Portugal advances with a win or tie. They can also advance with a loss and a Morocco win. As long as Morocco beats Spain by more than Portugal loses to Iran.
- Spain advances in basically all scenarios EXCEPT a loss and an Iran tie OR a loss and a Morocco win by more than Iran beats Portugal.
- France is through. They can win the group with a win or a tie over Denmark.
- Denmark advances unless they lose and Australia wins.
- Australia gets in with a win and a loss by Denmark. If Australia wins by 1 and France wins by 1, Denmark and Australia tie for second and the team with more goals scored would advance. If it’s still tied the rest of the tie breakers would be applied.
- Peru is elimnated.
- Croatia is through. The win the group unless they lose and Nigeria wins AND Nigeria can make up a goal differential of 5.
- Nigeria advances with any win and can advance with a tie as long as Iceland doesn’t win by 3 or more.
- Iceland can advance with a win and an Argentina win or tie. But they still need to make up a goal differential of 2.
- Argentina advance with a win and a Croatia win or tie. They can also advance with a win and an Iceland win, but they would need to make up the 1 goal differential with Iceland.
- Brazil advances with a win or tie. They can still advance with a loss as long as Costa Rica wins and Brazil maintains its 1 goal differential advantage over Switzerland.
- Switzerland is through with a win or tie. They also advance with any Brazil win. They can also advance with a loss as long as Serbia beats Brazil and they can make up the 1 goal differential behind Brazil.
- Serbia advances with a win. Or they can advance with a tie and a Costa Rica win.
- Costa Rica is eliminated.
Group F is nuts.
- Mexico is 2-0-0 and hasn’t clinched yet. They win the group with a win or a tie and there are even some scenarios where they win the group with a one goal loss and a South Korean tie or victory.
- Germany, who was almost eliminated by Sweden, advances with a win* or a tie and a Mexico win. Germany will tie for second if they tie and Mexico ties or if they lose by 1 and Mexico wins.
- Sweden advance with a win and a German loss. Or a tie and German loss. Or a win and a German tie. In fact they most likely advance with a win, with a few exceptions*.
- South Korea, who is currently sitting on two losses and still somehow advance. They simply need to beat Germany by 2 and have Mexico beat Sweden. Simple……
*If Sweden wins by 1 and Germany wins by 1, Sweden, Germany, and Mexico would be in a three way tie for first with 6 points each and the same goal differential. That means some team with 6 point would not advance. Heartbreaking. 6 points is a lot.
- England is through and wins the group with a win over England.
- Belgium is through and wins the group with a win over Belgium.
- Tunisia is eliminated.
- Panama is eliminated.
If Belgium and England tie, all the tiebreakers are tied down to fair play points and the winner of the group will be chosen based on who has fewer yellow cards.
- Japan is through with a win or tie. They can also advance with a loss and a Senegal win.
- Senegal is through with a win or tie. They can also advance with a loss and a Columbia win but that would come down to goal differential to break their tie with Japan.
- Columbia advances with any win or with a tie and a Poland win.
- Poland is eliminated.
So I was watching game 1 of the NBA finals, and I got to thinking about how some of these players have been around for a long time in the NBA (Lebron was drafted in 2003!!!) So I to basketball reference and looked back at some old drafts. Then I got the idea to scrape drafts and current rosters to see what each current NBA team looks like in terms of the years and positions in a draft. This led to me staying up way past my bed time screwing around with rvest and plotly.
What I started with was a scatter plot of draft position on the x-axis versus year drafted on the y-axis with each point having the color of their team. That’s easy enough to do in ggplot. But what I really wanted to do was make it interactive so that when you clicked on a point, all the other points for the team will also highlight. Now on Thursday night/very early Friday morning, I had no idea how to do this. And it drove me F-ing crazy until like 2 or 3 in the morning. You think I’m kidding, but look at my fitbit sleep Friday night. That has nothing to do with a kid; I just couldn’t figure out how to do this.
So Friday I wake up at like 6 barely functional, get baby out the door, go to my only meeting of the day, and then I happened to be having lunch with Carson Sievert who was in Chicago for an R conference. So I mention this problem to him after we finished eating tacos, and he casually pulls out his laptop and shows me:
That’s it. That’s all you need to do to make that work. The full plot code is then:
draft2 %>% SharedData$new(~Team) %>% plot_ly(x = ~Rk,y = ~Year, color = ~Team, text = ~paste(Player,Team), colors = pal)
That’s it. Check out the plot it makes here!
Next what I wanted to do was add some convex hulls around the points. Apparently this is super easy to do too using geom_polygon and the chulls function. Check out the convex hulls plot here. At first it looks like a mess, but double click on the legend on the right to choose what to add to the plot. For instance, below is a screen show of the convex hulls of the final 4 teams in the NBA playoffs. What’s so notable about these teams is that Golden State, Cleveland and Houston have very similar shapes indicating that their teams are made up of some very high draft picks from several years ago, but notably no high draft picks from very recently. But look at Boston. Totally different shape mostly in the upper left corner (indicating high draft picks in very recent years). Go play around with that plot. It’s really interesting.
What I think I’d like to do next is to see how these convex hulls change over time for an individual team. Or if someone has some free time you can take my code and modify it to do that. My full github code for scraping the data and making the plots here.
April 3, 2018 – Yankees vs Rays – Didi Gregorius: 6.58 RAA.bat, 4-4, BB, 3 R, 8 RBI, Double, 2 HR
- Bases Empty – Double
- Runners on 1st and 3rd – Homerun
- Runner on 1st – Walk
- Runners on 1st and 3rd – Homerun
- Bases Loaded – Single
April 13, 2018 – Angels vs Royals – Abraham Almonte: -3.09 RAA.bat, 0-5, K, 2 GIDP
- Bases Empty – Strikeout
- Runners on 1st and 2nd – Grounded into Double Play
- Runner on 3rd – Groundout
- Bases loaded – Groundout
- Runner on 1st – Grounded into Double Play
April 9, 2018 – Diamondbacks at Giants – Zack Godley: 4.32 RAA.pitch, 7 IP, 4 H, 9 K, 0 ER, 23 Batters Faced
- Lineout – K – K
- Single – K – GIDP
- K – K – K
- Groundout – K – K
- Single – Forceout – Single – Pop Out – Groundout
- Groundout – Groundout – K
- Single – Forceout – GIDP
April 7, 2018 – Marlins at Phillies – Dillon Peters: -7.56 RAA.pitch, 2.2 IP, 9 H, 9 ER, 3 BB, 3 K, 2 HR, 19 Batters Faced
- Walk – Single- Single – Walk – K – HR – Flyout – K
- Single – GIDP – Groundout
- K – Single – Single – Walk – HR – Single – Pop Out – Single
April 15, 2018 – Rockies vs Nationals – Michael Taylor: 1.59 RAA.br
- Walk – Advances to 2B on a sac bunt – Advanced to third on walk – Scores on passed ball
- Double to LF – Steals 3rd – Scores on passed ball
May 6, 2018 – Rockies vs Mets – David Dahl: -1.49 RAA.br
- Single – Steals 2B (Arenado then walks, Dahl gets no credit for the steal! This needs to be fixed!)
- Double – Thrown out trying to advance to third on a ground out to the shortstop.
One of the nice things about openWAR is that you can compute it over any time period and you can look at its individual components. Here I’ve looked at Mike Trout’s batting component in openWAR (raa.bat) over the course of the 2018 season. His best game performance so far in terms of hitting was on 4/8/2018 when he amasses 1.95 raa.bat by going 2-3 with a HR, 2RBI, a walk and a strikeout. His worst game so far was worth 1.62 raa.bat on 3/29/2018 where Trout went 0-6 with a strikeout. As you can see, Trout started relative slowly over the first week of the season but since April 8, 18 of his last 23 games have been positive raa.bat. If he keeps up that kind of production, this kid might have a future in the major leagues…..
JSM Data Art show history
In the summer of 2016, JSM was held in Chicago. I live in Chicago had the idea to try to have a data art show somewhere in Chicago to coincide with JSM. So I tweeted out the idea asking if anyone knew a venue that would be appropriate for hosting this. Well, through the power of twitter, the good people at the ASA suggested I have the data art show at JSM. So we sent out a call for art and ended up with a nice little data art show featuring Alisa Singer, Craig Miller, Elizabeth Pirraglia, Gregory J. Matthews, Marcus Volz , and Jillian Pelto.
In 2017, we did the show again, this time in Baltimore featuring work by Lucy D’Agostino McGowan and Maëlle Salmon, Gregory J. Matthews, and Elizabeth Pirraglia. It was a little bit smaller in terms of participation, but I blame myself mostly. I had a baby at the end of 2016 so I spent a lot less time publicizing the 2017 show, and we had far fewer applicants.
So for the 2018 show in Vancouver, I want to get the word out that there will be another show and to encourage all of you to apply. (Yes you!) If you want to apply, you have until May 15 to submit your work for consideration to email@example.com. Full details of how and where to submit your work can be found here. And if you don’t want to apply yourself, please send this to someone who you think might be interested in submitting work.
Also, a few more favors to ask of you
- I will be unable to attend JSM this year for the first time in TEN years because I am having baby number 2 in July. So I’m looking for someone who will be attending the event who can act as a sort of coordinator for the event. This is minimal effort and basically requires you to check that it gets set-up. I’d also like you to take some pictures of the event and send them to me.
- Would anyone be willing to set my work up at JSM if I ship in to the convention center? And then ship it back to me? I will, of course, cover all the costs of shipping.
- If anyone reading this knows someone in Vancouver who is connected to the local art world there, I would appreciate them forwarding this to them.
Maybe I should move to netlify too?
To facilitate an easier sharing of code and figures, I’ve started a RMarkdown blog, which you will find at http://statsbylopez.netlify.com/. All new blog posts will be shared at this new site.
I’m going to keep the WordPress site active for the time being, so past articles aren’t going anywhere. In the meantime, thanks for four years of reading and fun! Hopefully the next site will be a success.
Round of 64
I think generally the committee did a pretty good job this year, at least in terms of first round games. The only lower seeded teams that I have favored in the first round are Florida St. over Missouri and Butler over Arkansas (though I have Houston as only a tiny favorite over San Diego State). As far as most likely possible upsets? Here are the double digit seeds I think are most likely to win in round 1 (in order of likelihood) :
(11) Loyola-Chicago over (6) Miami (That’s not what my model says, but I’m contractually obligated to say this)
(11) San Diego State over (6) Houston
(10) Texas over (7) Nevada
(10) Providence over (7) Texas A&M
(12) Davidson over (5) Kentucky
(12) New Mexico St over (5) Clemson
(11) St. Bonaventure over (6) Florida
(12) Murray State over (5) West Virginia
(12) South Dakota State over (5) Ohio State
Then if you want to get crazy and go for some big time first round upsets I would pick these (in order of likelihood):
(14) Montana over (3) Michigan
(13) Marshall over (4) Wichita State
(13) Charleston over (4) Auburn
(14) Wright State over (3) Tennessee
(14) S.F. Austin over (3) Texas Tech
(15) Georgia State over (2) Cincinnati
Round of 32
Nothing really interesting here. I have all the 1-4 seeds favored to make this round with the exception of Wichita State, which I have as an underdog to West Virginia.
Looking to pick an upset? Most likely 5 seed or higher to make the Sweet Sixteen:
(5) West Virginia
(8) Seton Hall
(11) San Diego State
(7) Texas A&M
Want to get real crazy with it?
(12) New Mexico State
(11) St. Bonaventure
(12) Murray State
Here is where things start to get a bit interesting. I have Villanova, Purdue, Kansas, Duke/Michigan St*, North Carolina, Cincinnati, Virginia, and Gonzaga.
I think the two most potentially interesting games in this round are Gonzaga vs Xavier and Duke vs Michigan St. I think Xavier is way overrated and Gonzaga is underrated so I think it will be interesting to see if Xavier lives up to its one seed here. The other game, Duke vs Michigan St, I think would be a good Final Four matchup. I’m taking whoever wins this game to go all the way to the finals. I just have no idea who is going to win this game, so I’m not picking the winner of that game, but I am advancing them in the bracket as a /. It’s my blog and I can do what I want.
Looking to pick a double digit seed to the Elite 8? How about these teams:
(11) San Diego State
(12) New Mexico State
(11) St. Bonaventure
(15) Georgia State
Alright. I’ve got Virginia over Cincinnati. Villanova over Purdue. Duke/Michigan State over Kansas. And North Carolina over Gonzaga. That’s 3 ACC teams. Ugh. And Duke. The most Ugh.
I think Butler and Texas A&M as Final Four teams are interesting picks as well as Seton Hall and Miami.
I’m taking Virginia over North Carolina and Duke/Michigan State over Villanova.
I’m taking Virginia over Duke.
Want some cray picks to win the championship?
(5) West Virginia
(5) Ohio State
(9) Florida State
(9) Seton Hall
If you need me Friday morning, I’ll be crying in a corner next to the remains of my bracket.
Oh. And for god’s sake NCAA, pay the players!
Of the four number 1 seeds, Virginia, Villanova, Kansas, and Xavier, Xavier is far and away the weakest number 1 seed in this tournament (I have them ranked 15th overall).
Estimated chances of making the Sweet Sixteen
- Villanova(1) – 81.14%
- Virginia(1) – 77.89%
- Purdue(2) – 76.24%
- Kansas(1) – 74.40%
- Duke(2) – 74.03%
- Michigan St(3) – 72.59%
- North Carolina(2) – 72.02%
- Cincinnati(2) – 66.60%
- Tennessee(3) – 59.16%
- Auburn(4) – 59.08%
- Gonzaga(4) – 58.10%
- Texas Tech(3) – 56.45%
- Xavier(1) – 56.06%
- Michigan(3) – 53.65%
- West Virginia(5) – 52.09%
- Arizona(4) – 49.81%
- Wichita St(4) – 43.58%
- Ohio St(5) – 39.19%
- Florida(6) – 38.18%
- Clemson(5) – 3.158%
- Kentucky(5) – 30.56%
- Miami FL(6) – 29.12%
- Florida St(9) – 29.12%
- Houston(6) – 28.67%
- Texas A&M(7) – 22.14%
- Oklahoma(10) – 19.01%
- TCU(6) – 17.83%
- Texas(10) – 16.31%
- Nevada(7) – 15.39%
- Butler(10) – 15.32%
- Missouri(8) – 14.65%
- Seton Hall(8) – 14.26%
- San Diego St(11) – 13.49%
- Creighton(8) – 12.19%
- Virginia Tech (8)- 12.05%
- Davidson(12) – 11.82%
- NC State(9) – 10.83%
- Loyola Chicago(11) – 9.83%
- Kansas St(9) – 9.77%
- Arizona St/Syracuse(11) – 8.38%
- Arkansas(7) – 8.32%
- Buffalo(13) – 7.81%
- Alabama(9) – 6.78%
- Rhode Island(7) – 6.59%
- New Mexico St(12) – 6.42%
- Providence(10) – 5.44%
- St. Bonaventure/UCLA(11) – 4.30%
- Montana(14) – 4.19%
- Murray St(12) – 3.64%
- Charleston(13) – 2.92%
- Wright St(14) – 1.89%
- Georgia St(15) – 1.70%
- S Dakota St(12) – 1.55%
- Bucknell(14) – 1.20%
- Greensboro(13) – 1.16%
- SF Austin(14) – 1.07%
>0% and <1%: Marshall(13), Penn(16), Lipscomb(15), Iona(15), Texas Southern(16), UMBC(16), CS Fullerton(15), Radford(16), LIU Brooklyn(16), NC Central(16).
Estimated chances of making the Final Four
- Virginia – 41.49%
- VIllanova – 32.07%
- Purdue – 30.79%
- North Carolina – 30.33%
- Michigan St – 29.11%
- Duke – 27.19%
- Cincinnati – 22.78%
- Kansas – 21.91%
- Gonzaga – 21.27%
- West Virginia – 11.88%
- Xavier – 11.67%
- Tennessee – 10.91%
- Michigan – 10.79%
- Auburn – 10.38%
- Ohio St – 10.24%
- Texas Tech – 9.08%
- Arizona – 8.51%
- Wichita St – 7.31%
- Florida St – 4.71%
- Texas A&M – 4.68%
- Florida – 3.94%
- Houston 3.54%
- Miami FL – 3.43%
- Kentucky 3.42%
- Clemson – 2.99%
- TCU – 2.70%
- Oklahoma – 2.66%
- Creighton – 2.37%
- Texas – 2.27%
- Butler – 2.19%
- Nevada – 1.89%
- Kansas St – 1.64%
- Missouri – 1.51%
- Virginia Tech – 1.24%
>0% and <1%: Seton Hall, Arkansas, NC State, Arizona St, San Diego St, Loyola Chicago, Davidson, Alabama, Providence, Rhode Island, Buffalo, New Mexico St, Murray St, St Bonaventure, Montana, Georgia St, Charleston, Bucknell, Wright St, South Dakota St, Greensboro, SF Austin
Estimated chances of winning NCAA tournament
- Virginia – 13.85%
- Villanova – 12.50%
- Purdue – 11.94%
- Michigan St – 10.23%
- Duke – 9.03%
- North Carolina – 7.15%
- Kansas – 5.50%
- Cincinnati – 5.30%
- Gonzaga – 4.49%
- West Virginia – 3.33%
- Texas Tech – 1.84%
- Auburn – 1.84%
- Xavier – 5.13%
- Wichita St – 1.56%
- Tennessee – 1.38%
- Ohio St – 1.34%
- Michigan – 1.25%
- Arizona – 1.06%
- Florida St – 0.63%
- Florida – 0.60%
- Texas A&M – 0.47%
- TCU – 0.34%
- Oklahoma – 0.28%
- Houston – 0.26%
- Clemson – 0.25%
- Kentucky – 0.24%
- Creighton – 0.24%
- Butler – 0.24%
- Miami FL – 0.23%
- Missouri – 0.14%
- Virginia Tech – 0.13%
- Texas – 0.13%
- Kansas St – 0.13%
- Arizona St – 0.08%
- Seton Hall – 0.07%
- San Diego St – 0.06%
- Nevada – 0.06%
- NC State – 0.03%
- Arkansas – 0.03%
- Providence – 0.02%
- Alabama – 0.02%
- Rhode Island – 0.01%
- Loyola Chicago – 0.01%
- Davidson – 0.01%
Highest Seed Remaining in Conference Tournament
1 seeds: Virginia, Xavier, Michigan St, Villanova
2 seeds: Duke, Kansas, Purdue, Tennessee
3 seeds: Cincinnati, Wichita St, Auburn, North Carolina
4 seeds: Michigan, Kentucky, Ohio St, Clemson
5 seeds: Nevada, Arkansas, Miami (FL), Virginia Tech
6 seeds: Texas A&M, Gonzaga, West Virginia, Houston
7 seeds: Texas Tech, Missouri, TCU, New Mexico St
8 seeds: Kansas St, Arizona, Creighton, Mississippi St
9 seeds: Florida, Nebraska, Butler, Florida St
10 seeds: Louisville, NC St, Baylor, MTSU
11 seeds: St. Bonaventure, Seton Hall, Oklahoma/Oklahoma St, Texas/Marquette
12 seeds: Rhode Island, SF Austin, Loyola-Chicago, Murray St
13 seeds: San Diego St, Montana, Buffalo, Charleston
14 seeds: Bucknell, Penn, Marshall, Wright St
15 seeds: CS Fullerton, Georgia St, Texas Southern, Iona,
16 seeds: NC Central, Lipscomb, UMBC/UNCG , Radford/LIU Brooklyn
First Four out: Alabama, USC, Providence, St. John’s
Next Four out: Notre Dame, ULL, Syracuse, LSU
Every couple of years in February I get around to writing about Super Bowl squares. It’s been a few years, so I decided to update the post. So here is the updated 2 dimensional histogram of how often certain numbers occur in Super Bowl squares. Nothing new here. You want to get some combination of 7-0 or 0-7 followed by 7-7, 7-4, and 4-7. 3-0, 4-0, and 0-0 are also good. Try not to get 2-2. (Though it does happen).
Next, rather than looking at 7-0 and 0-7 as different, I let those count as the same outcome giving the following 2 dimensional histogram. Basically the same amount of information — You want 7-0 and you don’t want 2-2.
Next, what I was wondering about was how this changed over time. Here is a plot of each end digit for all games played by season. The most notable part of this graph is that 0 dropped very rapidly from 1920-1960 stemming from far fewer games ending with one team getting shut out. You can also see some other smaller trends over this time period such as 1, 7, 4, and 8 increasing with 6 and 3 decreasing. But this plot is kind of a mess and there are way too many lines on it. Let’s use facet_wrap().
Ahh! Much easier to trends in numbers over time. Let’s go through these number by number. 0 dropped rapidly from 1920 – 1960 then increased slightly until about 1980 when it began another small decline. 1 increased quickly and has basically been flat since the 1970s. 2 has been flat forever. 3 has a small decline through 1940, but has been slowly increasing ever since. 4 looks like it peaked in 1950 and has been slowly dropping since then. 5 is basically 2 — flat. 6 follows roughly the same pattern as 3 — a small decrease until 1950 and then slowly increasing. 7 peaked in 1940 and has been slowly decreasing since then. 8 peaked in 1950 came back down and has been flat since 1970. Finally, 9 has been basically flat.
Lastly, another way to look at this is with a heat map over time. The plot below shows the relative frequency of last digits over time with dark red indicating large numbers and dark blue indicating low numbers.
All the code for generating these plots can be found here.