At the beginning of my post-doc, my adviser told me that she wanted me to write and R package. I told her I didn’t know how to do that, and she told me to figure it out. It’s really not that difficult and incredible rewarding. Even if you never put your package on CRAN, it’s a worthwhile exercise to see how R packages are made. And who knows, maybe you’ll be the next Hadley Wickham………
I have long wanted to write R packages. But for some reason, I thought that I wasn’t capable of such a feat. “R package authors are super stars, I’m just me,” I would think in a mix of despair and admiration, marveling at some new and useful R package, and wishing I could create such functional beauty.
But now the day I have long dreamed of has arrived! I have authored an R package, and it is perhaps the most satisfying feeling I have ever had using R. Trust that I have had many R-related feelings, so I do not make this statement lightly. On top of now holding myself in higher regard, I am also wondering what the heck took me so long?
View original post 190 more words
I recently wrote an article about Todd Frazier’s stolen bases for BP Southside, the Baseball Prospectus White Sox site, and in doing so did a decent amount of digging into the different advanced measures of base-stealing productivity—something that would take into account all the necessary components and spit out a measure of runs saved or lost. I got frustrated by a few things, and so decided to type this up, as I think it encapsulates a lot of issues in public sports analysis. All of this is written from a baseball perspective, but it applies at least as much to hockey and probably even more to basketball.
Before I start I want to say that the various sports stats sites strike me in many ways as emblematic of the promise of the internet. Vast amounts of cross-indexed information that can be used with minimal technical abilities, and synthesizing months and years of work done often…
View original post 1,492 more words
In 8 knockout games, three times a team scored first and lost (England, Switzerland, Ireland).
In a recent speech by cryogenically-frozen-fart come to life Donald Trump following the largest mass shooting in American history at Pulse Nightclub in Orlando he said: “I refuse to allow America to become a place where gay people, Christian people, and Jewish people, are the targets of persecution and intimidation by Radical Islamic preachers of hate and violence.”
No one in America should be the targets of persecution and intimidation by any group of people including, as Trump notes, gays, Christians, and Jews. However, Donald Trump, an ambitious corn dog that escaped from the concession stand at a rural Alabama fairground, stole an unattended wig, hopped a freight train to Atlantic City and never looked back, seems to have no problem doing the persecution and intimidation when it comes to Mexicans and Muslims. I’m sure that pointing out the massive, gaping hypocrisy of this particular Trump statement could not possibly convince this man who I’m sure has never once considered even the possibility that anything he has ever said has been factually incorrect, inconsistent, or hypocritical. I want to live in a country where ALL Americans are free from being targeted from persecution and intimidation, not just groups that are politically convenient to protect.
This got me thinking about how much misguided persecution Muslims face in America, and I went looking for statistics on hate crimes in the U.S. The data that I found comes from the FBI and is from 2014. Below is a graphic that I made where the width of each box is the number of hate crime incidents and the height is the number of victims for a particular group.
By far the largest number of hate crimes fall into the category of being motivated by race. There were 2,568 out of 5,462 total single-bias incidents and included 3,227 victims. The largest share of these were crimes committed against blacks (1,621) followed by whites (593), Asian (140), and Native American (130). Crimes based on religious bias and sexual orientation bias both had nearly the same number of incidents in 2014 at 1,017 and 1,014, respectively. The largest share of hate crimes based on religion were against Jewish people (609) and Muslims (154). This surprised me a little bit as my guess was that there would be more hate crimes in the U.S. against Muslims than any other religion as illogical reactions to events such as 9/11 and other terrorist acts perpetrated by jihadi extremists. In terms of hate crimes committed with a bias toward sexual orientation almost 60% (599 out of 1,017) are committed against gay males.
Well, I think I’ve depressed you enough for now.
Apparently 2009 me liked outliers.
So, I was at Barnes and Noble today with a few hours to kill. I sat down and I started reading Malcolm Gladwell’s new book, Outliers: The Story of Success. The further I read, the more it became clear to me that he wasn’t really talking about outliers. Also, since I have my qualifying exam on January 19th, it can’t hurt to do a review on detecting outliers. (I started this post before my qualifier which has since come and gone. I passed by the way.) I go on to do a review of what outliers are and then conlcude by explaining how Gladwell’s book isn’t really about outliers. If the middle part bores you just skip to the conclusion. (Note: I love Gladwell’s work. I have read all of his book and all of his New Yorker articles.)
Let’s try to answer this question: What is an…
View original post 914 more words
After their first 30 games, the Chicago Cubs were 24-6 with a winning percentage 0f 0.800. How have they fared since? Well, the Cubs are now through 51 games and are at 36-15 with a winning percentage of 0.706. They’ve cooled off a bit since their blazing start (which isn’t surprising), but they still hold a 7.5 game lead in their division and are the only team in baseball with a winning percentage over 0.700.
After 30 games, I predicted that would win 103.41 games with a 95% prediction interval of (84.58, 122.24) and a 50% prediction interval of (96.93, 109.89). After 51 games, my updated prediction is that they will win 102.90 games with a 95% prediction interval of (87.23, 118.58) and a 50% prediction interval of (97.51, 108.29).
Below you can see a plot of where the Cubs stand in relation to some of the all time winningest regular season teams. Even with their “cooling off” the 2016 Cubs are still, at this point, ahead of the 1906 Cubs who went 116-36-3.
In this clip with Gwen Ifill, Obama claimed that the federal government has fewer employees now than under Reagan. That seems hard to believe given the narrative that Reagan was all about small government and Obama is all about big government and ruining the economy (Thanks Obama). Turns out it’s actually true though. You can get the raw data here and see for yourself.
In terms of total number of employees, this topped out in 1968 at 6.639 million largely due to the increase in the size of the military during the Vietnam War. If we only consider from 1977 on (Carter through Obama), the largest number of employees in the federal government was 5.301 million in 1987. The next 3 years in the Carter to Obama window with the most federal employees were 1989, 1988, and 1985.
If we want to ignore the size of the military and look only at the size of the executive branch, the five years with the largest federal government employment since 1962 are 1990, 1989, 1988, 1991, and 1969.
If we look at the size of the legislative and judicial branch, which makes up a very small percentage of the total number of employees, these numbers were highest in 2009, 2002, 1993, and 1992 all with about 66,000 employees.