Predator X in the Wild
I was watching the show “Predator X” on the History Channel tonight. Apparently, they discovered this fossil of an enormous aquatic predator. It’s pretty awesome.

Here is a description of the bite force of this predator: (from here)
“At St Augustine Alligator Farm and Zoological Park in Florida , Dr. Hurum assisted evolutionary biologist Dr. Greg Erickson from Florida State University in calculating the bite force of this colossal creature. The jaws held in place a set of trihedral teeth, each measuring 12 inches, which clamped down on prey with an estimated 33,000lbs of bite force. The calculation is one of the largest bite forces ever calculated for any creature. Predator X would have had a bite a bite force was more than ten times the bite force of any animal alive today and four times the bite force of a T- Rex.”
(Here is a link of average bite forces for humans and a few selected animals.)
At this point you might be saying, “That IS awesome. But what does it have to do with this blog?” An astute observation. Well…
They estimated that the bite force of the predator was 33,000 pounds. The way they estimated this was by taking measurements of the bit force of different sized crocodiles (or alligators, I can never tell the difference.) Then they plotted the data in a scatter plot, weight of crocodile versus bite force. There was a clear postive relationship between bite force and size of the animal. Then they fit a simple regression line through the data and extrapolated how much bite force a 50 foot long animal that weighed an estimated 45 tons could pack in its bite force. That’s how I believe they came up with there estimated bite force. I’ll give them the benefit of the doubt and assume they did more than that to come up with the estimate but they didn’t want to show the details on the History channel. (If you have details on how they estimated the bite force, please send them my way.)
Let’s assume that all they did was extrapolate this simple regression line. What would be the problem with that? The problem is that they are extrapolating the linear trend outside of their domain. There is no guarantee that the bite force trend remains linear as the weight approaches the estimated 45 tons. They collected their data on animals with weight of crocodiles which can be up to 1.5 tons. It seems naive to assume that the linear trend will continue as you increase the weight of an animal.
Here is a simple example of why extrapolating outside of your domain is a bad idea.
Say you collected data of children’s heights and weights and you fit a regression line through the data. You’ll surely observe a postive relationship between age and height. As children get older, their heights generally increase. This increase can be approximated by a roughly linear trend say between the ages of 10-18. Also, say that we find that children grow on average of 1 inch per year between 10-18. If I were then to predict the height of a person by extrapolating out this trend I would assume that a 48 year old would be, on average, 30 inches taller than an 18 year old. Clearly, this is not true.
So just because a trend is linear over a certain domain does not mean that that linear trend continues outside of the tested domain.
Cheers.
NCAA Sweet 16 picks in the wild.
After re-evaluating, here are my picks after the first two round of the NCAA tournament:
Elite 8: Uconn, Pitt, Memphis, Villanova, Louisville, Oklahoma, Michigan State, UNC
My final four is the same:Memphis, Louisville, Pitt, and UNC
Finals: Memphis versus Pitt
Winner: Memphis
Good picks for the sweet 16 games: (Winner are in Bold, losers are in italics.)
Villanova +120 (I have them as a favorite in this game).
Michigan State -1.5 at -110
Gonzaga +350 (They’ll probably lose, but this price is fantastic.)
Pitt -320 (Added 3/26/2009 11:28 am)
Results: 3-1. A bet of “100” on each of these games yields a profit of 142.16 for a 35.54% return.
For the tournament I am 8-5 with a 19.82% return per bet.
Check out my results from the first round.
NIT predictions:
San Diego State beats St. Mary’s Tonight.
Notre Dame Beats Kentucky.
Then, San Diego State beats Baylor to go to the Finals, and Penn State beats Notre Dame for their spot in the finals.
And I’m switching to Penn State as my pick for the NIT champion.
Good luck.
And.
Cheers.
Choosing your bracket in the wild.
Check out page 35 of the old issue of Chance magazine which has an article talking about the best way to pick NCAA basketball teams to win your office pool. Of course this probably would have been more of a help if I had posted it five days ago, but it’s still interesting. And you can use the advice next year.
Cheer
Gambling on Sports in the Wild
Of course, we’re not gambling with American dollars. That would be illegal, so we use “standard betting units” here. (1 “standard betting unit” is approximately 1 American dollar. But it is definitely NOT an American dollar.)
From my previous post before the first round started. Winners are in bold, losers are in italics:
Good first round bets:
Washington -220
FSU -145
Utah -110
Illinois -200
Arizona St -200
Michigan +190 (This price is fantastic)
Texas A and M +115
Oklahoma St +115 (Seriously? They’re an underdog here?)
Ohio St -160
If you bet 100 “standard betting units” per game, you would have went 5-4 and be up 115.45 “standard betting units” at this point. That is a 12.8% return per bet. Not bad.
Cheers.
ENAR in review in the wild
So I’m back from ENAR and back from spring break. I’ve ben greeted back to grad school by a midtern on Friday night from 6-8pm. What a fun time for a midterm!
Let me first start by saying that San Antonio is awesome. The river walk is great. I at at two great Mexican restaurants for lunch two days in a row and they were both incredible. And I drank Shiner Bock the whole time, which I highly recommend.
Anyway, on Sunday night I presented a poster at ENAR (they served Shiner Bock during the poster presentations) about a paper that we wrote (and recently published) about synthetic data with binary variables. I met some very interesting people who stopped by my poster. One guy who stopped by informed me that he coined the term predictive mean matching (which i referenced on my poster). So, I asked him who he was, and he told me he was Rod Little (He’s kind of a big deal). He wrote the book on multiple imuptation: Little, R.J.A. & Rubin, D.B. (1987). Statistical Analysis with Missing Data. New York:
John Wiley. So that was kind of neat. (I just visited Rod Little’s website and apparently this is the “most useful of all links”. (Here is another interesting article called “Calibrated Bayes: A Bayes/Frequentist Roadmap”.)
The next day I spoke with some people from SAS and STATA, as well as, some recruiters from Smith-Hanley (who I got my first job through) and Cambridge Group. The SAS people told me about a product call JMP, which I was very impressed by it. The STATA people told me that I could buy a student STATA license for like $55 dollars and then use it commerically after I graduate. (As opposed to several thousand dollars for a SAS license that only lasts a year.) And I could use it for as long as I wanted to. The only thing I would have to pay fo rwould be upgrades. So STATA has that going for them. I am definately gonig to try it out.
Cheers.
March Madness in the Wild
I was sitting in my office last semester and this really tall guy walked past my door. I kind of did a double take he was so tall. A few seconds later he came back and knocked on my door. He asked if my office mate was there so he could hand in some homework. I told him that he wasn’t around, but I could take the homework and give it to him when I saw him. He thanked me and left. It was Hasheem Thabeet. So in honor of Mr. Thabeet handing in his STAT 1100 homework to me, I have started a Yahoo! tourney pick’em group.
Join the Stats in the Wild Yahoo! Tournament Challenge. Join group #166142. Good luck.
Every year in March I start to care about college basketball. I even build some half-assed model’s to do my bracket. I finished in the 99th percentile of all yahoo brackets last year by managing to pick the champion, the two finalists, and all four final four teams. (My goal for a long time was to pick all four of the final four. I’ve been stuck on three out of four since high school.) Anyway, in order to make my predictions, I build several models and then I combine the results.
Here’s some thoughts on the tournament based on those models:
Most overrated teams in the tournament:
1. LSU: The SEC is a great football conference and a mediocre basketball conference, (at least this year) and LSU is leading the charge of mediocre teams in this conference. LSU only wins their first round game because they play Butler. Speaking of Butler…..
2. Butler: They lost to Wisconsin Green Bay, Wisconsin Milwaukee and Loyola Chicago. Then they lost to Cleveland State in their tournament. Although in their defense they did beat IU-South Bend 87-33. They played two top 25 teams all year and they went 1-1. Ohio state (who is no longer in the top 25) and Xavier (another over rated team). Speaking of Xavier…..
3. Xavier: Look. I’m a Umass fan, (from the days of Roe, Camby, Dana Dingle, Donta Bright, Derek Kellogg) but the A-10 stinks. Xavier deserves to be in the tourney, but as 4 seed? I doubt it. This is the team the closed it’s regular season 4-4 with losses to Duquesne, Dayton, Charlotte, and Richmond, then lost to Temple in their conference tournament. I’m not exactly inspired by the musketeers. Florida State beats them in the second round by 10.
4. Dayton: An at large bid? So Dayton is better than St. Mary’s, Penn State, Florida, Auburn, Creighton, and Miami? That’s what the selection committee is saying.
5. Boston College: USC should dispose of them in the first round. USC by 12.
Most Underrated teams in the tournament
West Virginia: I’ve got West Virginia as a Sweet sixteen team. They are going to dismantle Dayton in the first round, then beat Kansas by 2. Then they have a good chance against Michigan State for a shot at the Elite 8.
Wisconsin: I think they got screwed with a 12 seed, and then by drawing a very good FSU 5 seed. If they can get past this game, they should cruise into the sweet sixteen past Xavier.
First round upsets: USC over BC.
Most likely 13 seed to win first round: Cleveland State
Most likely 12 seed to win first round: Arizona
Most likely 11 seed to win first round: Utah State
Most likely 10 seed to win first round: USC
Most likely 9 seed to win first round: Butler (Only because LSU stinks too.)
Biggest 3 seed blowout: Missouri
4 seed: Washington
5 seed: Illinois
6 seed: West Virginia
7 seed: Cal
8 seed: Ohio State
First number 1 seed gone: UConn
Good first round bets:
Washington -220
FSU -145
Utah -110
Illinois -200
Arizona St -200
Michigan +190 (This price is fantastic)
Texas A and M +115
Oklahoma St +115 (Seriously? They’re an inderdog here?)
Ohio St -160
And finally, the offical final 4 picks of Stats in the Wild (Drumroll Please):
UNC, Pitt, Memphis, and Lousiville with UNC beating Memphis in the finals 76-75.
Also, I am picking San Diego State to win the NIT with a 72-67 win over Miami in the finals.
And for my boys in the NIT…..what? Thy’re not even in the NIT. Bring back Calipari!
Go.
Go U.
Go U – Mass.
Go Umass!
Cheers.
ENAR in the wild
I’m sitting at Bradley International Airport waiting for a flight to Raleigh/Durham. Then I’ll hop on a connector to Memphis, followed by another connector to San Antonio, Texas. What’s in San Antonio, you might ask? The Alamo? Tim Duncan? Well yes, but why would I post that on a stats blog?
The reason I am headed to the Lone Star state is for the Eastern North American Region (ENAR) (pronounced EE-NAR) of the International Biometric Society (IBS). (IBS is an unfortunate acronym in itself, but I should note that the Western North American Region (WNAR) is pronounced WEE-NAR. That is unfortunate. I am also extremely childish.) I’ll be there from tonight until Monday afternoon, so I can attend a few talks in the morning before I leave.
On SUnday night from 8-11pu, I’m giving a poster presentation about synthetic data related to the paper I just published. It’s like a student poster bonanza. And they serve alcohol. So it should be interesting.
I’d love to post the abstracts to some of the talks I am interesting in attending, but Bradley’s internet connection is not letting me view the abstract page.
If Raleigh/Durham or Memphis has a better connection, I’ll post them when I get there.
Cheers.
NCAA basketball in the wild
It’s that time of year again where I care about college basketball for three weeks and then forget it even exists.
I have collected data on all of the games played so far (through last nights 6OT thriller) and I use some standard modeling techniques to come up with a ranking. Last year I accurately predicted all four of the final 4 (but so did everyone in America), both teams in the finals, and the champion. I also went 29 of 32 in the first round and my bracket finished in the top 1% of all brackets on Yahoo. (Of course all that means is that I will have a dismal year this year.)
Right now (2:17 pm EDT Friday 3/13/2009) here are my thoughts on the NCAA tournament (These will change as more games come in):
Go ahead and compare my predictions with ESPN’s, CBS’s, and Sports Illustrated’s picks (Of course some of those predicted brackets are old).
Try to bear in mind that I am not trying to predict what the committee will do, I am trying to predict what the committee should do. I don’t really think the selection committee will seed West Virginia as a four. And I have a hunch that Penn State will actually get in, but I’ll definitely pick them to lose first round.
The 4 number 1 seed should be: North Carolina, Pittsburgh, Memphis, and UConn
With the 2 seeds going to Duke, Louisville, Michigan St., and Wake Forest
Three seeds: Washington, Gonzaga, Oklahoma, Syracuse
Four Seeds: Villanova, Missouri, UCLA, West Virginia
Last four teams in: Maryland, Auburn, Boston College, Tennessee
First four teams out: Florida, Virgina Tech, New Mexico, Providence
Next four teams out:USC, Northwestern, Penn St, Kentucky
Conference Breakdown:
ACC – 7
Big East – 7
Big Ten – 7
Big 12 – 6
Pac 10 – 5
SEC – 3
Mountain West – 3
Horizon League – 2
Over-rated teams that will get in: LSU and Butler
Under-rated teams that will get in: West Virginia, Illinois, Utah, BYU and Texas
Once the dust settles, I’ll re-evaluate and post my picks on Tuesday or Wednesday of next week.
Cheers.
Probability in the Wild
This isn’t really “statistics” or “in the wild” since it’s a problem from an advanced probability class I am taking, but it’s a neat problem so you’ll just have to live with.
Here is the game:
I get a die and the opponent gets a die. (These dice are not necessarily numbered 1,2,3,4,5,6. The sides could be numbered 1,1,1,3,3,3, for example, but the dice has six sides.) We both roll. If you’re number is higher than my number, you win a dollar. If my number is higher than your number, I win a dollar.
Can you design three six sided dice in such a way, that if the opponent chooses his die first, I can always choose a die from that remaining two that makes my expected winnings positive?
STOP HERE IF YOU WANT TO TRY THIS YOURSELF!

You probably didn’t even try. So here’s an answer.
Here are three possible dice:
A: 0 0 6 6 6 6
B: 4 4 4 4 4 4
C: 2 2 2 2 8 8
Dice A will beat Dice B with probability 2/3.
Dice B will beat Dice C with probabilty 2/3.
Dice C will beat Dice A with probabilty 5/9.
(Pr(C>A)=P(C>A|A=0)*P(A=0)+P(C>A|A=6)*P(A=6)=1*1/3+1/3*2/3=5/9)
So if I let my opponent choose first, I can always make a choice that will make my expectation positive.
The moral:
Make these dice, head to a bar, and fleece some drunks ( Just don’t tell them I sent you). I guess that’s the “in the wild” part.
Cheers.
Shark, the Global Financial Crisis, and Correlation (in the wild)
I was reading the fantastic blog The Bernoulli Trial today, (the Stock Market Seismometer is very interesting.) and I stumbled onto one of his links which brought me to The Mr Science Show. All of the posts are very interesting, but one post seemed particularly relevant to this page.
One of the topics I have talked about in the past is correlations between two variables when the correlation is just a coincidence (like me being in grad school and the Red Sox winning the World Series). Here is a post on Mr. Science Show which relates the global financial crisis to the number of fatal shark attacks in the world. In fact, this correlation has won the prestigious Mr. Science Show Correlation of the Week. Congratulations.
Cheers.