Regression on Angular Data

So, I’ve been working on missing data in statistical shape analysis for the last several years and one question that has popped up out of this is how would we deal with missing data for angular data (this question is related to how to deal with missing data in shapes, i swear).  So, as part of this I’ve been reading about regression models for angular data and angular data in general.

Here are some links to papers on the topic:

I graduated with a Ph.D. in statistics ten years ago, and I start studying statistics almost 20 years ago, and I’ve never ever thought about angular data analysis until now.  I’m just constantly amazing by how little I know.  There is just too much stuff.




I saw on Flowing Data this Snowflake Generator.  It’s way cooler than my snowflake generator.



Euro 2020: Some thoughts.

Here is a plot I made showing which seeds advanced to the round of eight (green), eliminated in round of 16 (blue), or eliminated in the group stage (black).

Some thoughts:

  • The only group that didn’t advance anyone to the final 8 was the “group of death”, group F.  France, Germany, and Portugal all lost their games in the Round of 16.  That’s the last two World Cup Champions and the last Euro Champion all eliminated.
  • Of the 8 teams to advance, three of them finished third in their group.
  • One 3 of the 6 group winners advanced to the Quarterfinals (England, Italy, Belgium).
  • Groups A, B, and D each had two teams advance.
  • Belgium and Italy who play each other in the Quarterfinals combined for 6 wins, 0 losses, 0 ties, and 18 points in group play.  The other 6 teams earned 26 points TOTAL from 7 wins, 7 losses, and 4 ties.
  • Two teams in the final 8 had two losses in the group stage (Denmark and Ukraine).
  • There were 7 teams that had 0 losses in the group stage.  Only 3 advanced to the Quarterfinals (Spain, Belgium, Italy).  And all three of these teams are on the same side of the bracket.
  • If you are wondering how the matchups in the knockout stage were decided, check out this wikipedia page. Cheers.   And go Denmark, Switzerland, and Czech Republic!


Computers generating puns!

Click to access 1910.10950.pdf

Euro 2020: Just some random thoughts.

Below are Luke Benz’s Euro Cup 2021 predictions from June 10.  You should all be following Luke.

Some thoughts on June 28:

  • The Czech Republic only had a 15% chance of reaching the quarterfinals!  And the Netherlands had the second highest probability of making the quarterfinals.
  • Belgium will play Italy in the quarterfinals.  That could have been a final.  Meanwhile, as Luke pointed out to me, either Denmark or Czech Republic will end up in the semi-finals.
  • England will not win this tournament.
  • Totally unrelated to this, it’s always weird when I remember that GREECE won Euro 2004.  GREECE!

A Bayesian marked spatial point processes model for basketball shot chart

This paper from the December 2020 issue of JQAS is wonderful: A Bayesian marked spatial point processes model for basketball shot chart.

Simply put, the build a model looking at where players are taking shots and then given a location, how often are they making shots from those locations.

I’m particularly interested in this point from the paper:

The preferred models for all four players, which are intensity independent model for Curry and intensity dependent model for other three players, can reduce the MSE by 2.7, 1.3, 2.0, and 7.0%.

I think the correct way to interpret this is that three of the players analyzed have different chances of making a shot based on where they shot is taken.  But for Curry, the probability he makes a shot is INDEPENDENT of where he is taking a shot.  Basically he’s just good everywhere.  (If this is NOT the correct interpretation, let me know!)

I’d love to see the analysis expanded to all players in the league and see who else would end up with an intensity independent model.




Data art!

Tapestry for reflective data visualization


I made a useful shiny app (and you can too!)

I made a shiny app for organizing openmics in Chicago.  (Yes, I’ve started doing stand up.  No, I’m not good……yet.)

The code for making this app can be found on my github.

And the wonderful Shiny cheatsheet can be found here.

The real key for me to get this to work was the addition of the global.R file.  I didn’t realize you could add this along with the ui.R and server.R files.  I HIGHLY recommend the global.R file in your shiny apps.  I’m going to use this as an example of a shiny app in my Data Science 101 course that I am developing and will teach for the first time Spring 2022.







Aging Curves in Baseball

So I’m working on an aging curve in baseball research project with two of my students here at Loyola.  While there has been quite a bit of work done on aging curves in many sports, including baseball, our question that we are interested in is this: What would the aging curve look like if players played every season from the age of 22 through 40.  Because what we observe are the players who “survive”.  What WOULD have happened if a player who was forced out of the league at the age of 30 had played until they were 40?  We view this as a missing data problem and are currently using multiple imputation with a hierarchical structure to impute missing seasons and then estimating the age curves based on the imputed data.  I’d like to do the aging curve estimation using functional data analysis, but……we’ll see.

Anyway, I’ve started doing some lit review for this and I figured I’d post some of the interesting articles that I’ve found related to the topic:

Albert (1992) looks at estimating models for home run rates and as part of this Albert incorporates an aging curve into his model.  A quadratic form is assumed for aging curve.

Berry et. al (1999) incorporates an aging curve into their analysis, but instead of a quadratic form they use a nonparametric model.  They looked at hockey, golf, and baseball.  (Albert (1999) in a comment argues against the aging model presented in Berry et. al. (1999).

Fair (2008) looks at aging curves in baseball and follows from previous work that looked at aging curves in running, swimming, and chess. (Fair (2007)).

Wakim and Jin (2014) take a function data analysis approach to the problem and look at MLB and NBA.  This is probably the most sophisticated statistical analysis that I have seen so far in regards to aging curves.

Dendir (2016) in the Journal of Sports Analytics looks at when soccer players peak and, based on their analysis, found that players in top leagues peak somewhere between 25 and 27.

Vaci et. al (2019) looked at aging curve in NBA players.

This is clearly not an exhaustive list of paper related to aging curve in sports, but it’s some of the interesting papers that I’ve come across so far.



Google Interview Riddle