Monday, June 17, 2013

Massachusetts Senate Race: Markey(D) vs. Gomez(R)
17 June Update


New polls have been released since last Monday regarding the special Senate election to fill John Kerry's open seat in Massachusetts. On June 9, two polls were released giving Democrat Ed Markey a 7 point lead over Republican Gabriel Gomez. Two more recent polls show double digit leads, one by the Republican polling company Harper (49%-37% on 6/11) and the most recent by Boston Globe (54%-43% on 6/14).

We are eight days away from the polls closing and nearly no movement in Confidence of Victory numbers, now sitting at 97.5% for Markey. A week out it's hard to move the needle, especially given the increased popularity of voting by mail. Polling companies would like it if news organizations would add the phrase "if the election were held when the poll was taken" to any report on a poll, but with absentee ballots the election is effectively taking place now. I expect more polls this week and at least one more report on the race.



Monday, June 10, 2013

Massachusetts Senate Race:
Markey(D) vs. Gomez(R)
10 June Update


We are now about two weeks out from the special election to fill John Kerry's Senate seat in Massachusetts. Polling data had been scarce, but it's starting to pick up. Now that Suffolk has published their data from a poll that ended yesterday, we have five polls from 2 June to 9 June. Here is the list from newest to oldest. (The next newest poll not included is from 15 May and was conducted by Public Policy Polling [PPP], who also completed a poll on June 4. This poll would be excluded from the list either for being too old or for being done by a company already represented.)


9 June: Suffolk 48%-41% Markey n=500
Confidence of Victory 95.2%
5 June: McLaughlin[R] 45%-44% Markey n=400
Confidence of Victory 58.4%
4 June: YouGov 51%-40% Markey n=500
Confidence of Victory 99.5%
4 June: PPP[D] 47%-39% Markey n=560
Confidence of Victory 98.0%
2 June: New England College 52%-40% Markey n=500
Confidence of Victory 100.0%

The [R] and [D] behind McLaughlin and PPP respectively indicate that they are partisan companies and I include that out of a sense of fairness. I do not skew the data, I let all the pollsters have their say and take the median, which this week happens to be PPP at 98% CoV. McLaughlin is currently the lone outlier saying the race is close and even that poll does not give Gomez the lead.

Again, I consider these statements as snapshots of the situation instead of forecasts, but in general I would say Markey looks to be in the lead comfortably and Gomez has some small momentum which has to improve markedly for him to have a chance.

More updates as more polls come in.

Thursday, June 6, 2013

2013 NBA Playoffs predictions:
Predictions for the finals

Currently on the ESPN website, there are 18 predictions for the outcome of the NBA Finals series between the Miami Heat and the San Antonio Spurs. Earlier in the week, there were two other predictions posted that have since been bumped. Here are all the predictions as well as the records of the experts so far in the first fourteen series match-ups.

Full time
Haberstroh: (14 predictions, 74.3%) Heat in 7
Elhassan: (14 predictions, 72.9%) Spurs in 6
Wallace: (14 predictions, 72.1%) Heat in 7
Pelton: (14 predictions, 70.7%) Heat in 7
Adande: (14 predictions, 70.0%) Heat in 7
Arnovitz: (14 predictions, 70.0%) Heat in 7
Windhorst: (14 predictions, 67.1%) Heat in 7
Guttierrez: (14 predictions, 66.4%) Heat in 6
Abbott: (14 predictions, 65.0%) Heat in 7
Stein: (14 predictions, 62.1%) Spurs in 6

The strong consensus of the people who haven't missed a chance to predict for the entire playoff series is the Heat will win Game 7 and become the champions. Looking at the percentages, I am not filled with confidence they actually know what will happen.


Almost full time
Legler: (12 predictions, 88.3%) Spurs in 6
Ford: (13 predictions, 84.6%) Heat in 7
Thorpe: (11 predictions, 76.4%) Heat in 6
Barry: (13 predictions, 64.6%) Heat in 7
Palmer: (12 predictions, 64.6%) Heat in 7

I put Legler's and Ford's numbers in bold because they are head and shoulders the best predictors so far. If I was grading on straight percentages they would be looking at B+ and B, and grading in comparison to the rest of the class they deserve As.

And they are pointing in the opposite directions.

Prediction is hard, especially about the future.


Part timers
McMenamin: (1 prediction, 70%) Heat in 6
Doolittle: (5 predictions, 68.0%) Heat in 6
Shelburne: (3 predictions, 66.7%) Spurs in 6
Broussard: (8 predictions, 62.5%) Heat in 6
Torres: (2 predictions, 0.0%) Spurs in 6

I put the rest of the predictors in out of completeness sake. This group does not inspire much confidence.

Notice that no one predicts a blowout. A six or seven game series is what to expect between evenly matched teams. A four game sweep is complete domination and five games series is pretty lopsided as well. It should be noted that if the series goes to Game 7, it will be played in Miami.

I am not an expert on basketball, but looking at the patterns, my feeling is that the experts have done a pretty good job of understanding the Eastern Conference and a relatively poor job of understanding the Western Conference. Not one expert thought the Warriors would beat the Nuggets and then underestimated how well they would do against the Spurs. They thought the Clippers were better than the Grizzlies. Surprised the Spurs did not dominate the Warriors easily and underestimating the Grizzlies, they then thought the Spurs-Grizzlies matchup would be a tough struggle. The Spurs won in four.

Personally as a fan, I don't like the Miami Heat. They are the team that looks better on paper, talent in their prime vs. talent that is getting old. It might be that Heat coach Erik Spolestra is at the beginning of a brilliant career, but I'm not convinced he's better than Greg Popovich right now.

One advantage the Spurs have is the leisurely pace of the Finals, with games on Tuesday, Thursday and Sundays. The extra day of rest can help the older players.

Not just to be different, I'm going to predict the Spurs win the series on their home court in Game 5, four games to one. I make this prediction looking at the 2-3-2 schedule and the extra day of rest between games 1 and 2 and games 4 and 5, which should help the older players recover.  I fully admit this prediction is nearly equal parts math and wishful thinking, but given the unimpressive record of conventional wisdom in the playoffs so far, I'm happy to go out on a limb with this one.

Monday, June 3, 2013

2013 NBA Playoffs predictions: Grade after the Heat-Pacers series

All the basketball experts from ESPN who have an opinion  predicted the Heat would win the conference final against the Indiana Pacers, so all of them get some improvement in their overall records. Here are the percentages for the people who have put forward an opinion on all fourteen contests so far.

Abbott 65.0%
Adande 70.0%
Arnovitz 70.0%
Elhassan 72.9%
Gutierrez 66.4%
Haberstroh 74.7%
Pelton 70.7%
Stein 62.1%
Wallace 72.1%
Windhorst 67.1%

If this were a math class I was teaching, I would not be happy. The median score is 70%, which is to say that nearly half the class is failing. The last question - Spurs or Heat - should be the toughest question of all.

To be fair, if we look at people who have had predictions in more than ten of fourteen series, we have some better students.

Barry (13 of 14 series) 64.6%
Ford (13 of 14 series) 84.6%
Legler (12 of 14 series) 88.3%
Palmer (12 of 14 series) 67.5%
Thorpe (11 of 14 series) 76.4%

I don't publish this stuff to mock the participants. When there is as much randomness as we have in sporting events, prediction gets much harder than it is in American elections that drag on for months and months.

I will return to this topic at the end of the final series, which should be over in a week or two at the most.

Sunday, June 2, 2013

Evaluating the Moneyball draft eleven years later:
Part 3: the first round draftees of the Oakland A's and their competitors


A substantial portion of the book Moneyball concerns itself with the 2002 baseball draft. Due to trades and compensation for free agents taken from their roster, the Oakland Athletics had seven of the first forty draft picks in this draft and figuring out who to pick was of great important to general manager Billy Beane. Both in the book and the movie, it is noted that Beane was drafted straight out of high school and did not live up to his potential. The A's that year did not draft a single high school player, preferring to get players from the college ranks.

It makes for a more dramatic story line to say that Billy Beane was doing all he could to not draft another Billy Beane, but there is another explanation that should be considered. As is noted in both the book and the movie, the A's were cheap. (And they still are, for that matter.) If you were trying to cut corners in a scouting organization, you could decide to only focus on college players or only focus on high school players. Because there are less colleges, the thriftiest decision would be to just scout the colleges.

The A's seven choices were four position players and three pitchers.

Position players: The A's first pick in the draft was the 16th pick overall and has to be seen as a great success, Nick Swisher. Their next two picks, John McCurdy (26th overall) and Jeremy Brown (35th overall), did not suceed, McCurdy never making it to the big leagues and Brown getting six total bases in 10 official at bats and one walk. Their last pick in the first round was Mark Teahen (39th overall), who is a solid major leaguer but not a big star.

It's hard to compare the A's picks of position players over this stretch because from the 16th to 39th pick, only the A's were looking at college position players. It's not fair to look at the high school players they could have picked up, since we can assume they didn't scout them at all. In hindsight, the best college prospects that they didn't take in these early round picks include the major league successes from the second round, Joey Votto (44th overall), Christopher Snyder (68th overall) and Curtis Granderson (80th overall). As evidence of what a crap shoot the draft can be, the greatest college success in the late rounds is Howie Kendrick, the 274th overall pick with a respectable 1490 total bases in his career so far.

Pitchers: The A's got one major league pitcher in their three first round picks, Joe Blanton, the 24th overall pick. Their other two picks did not make the majors, Ben Fritz (30th) and Steve Obenchain (37th). They could have done much worse, as the Chicago Cubs demonstrated. Like the A's, the Cubs were going only for college players and they had a total of four first round draft picks. The went entirely for pitchers and non of them made the major leagues.

If the A's decided to eschew high school drafts due to Billy Beane's worries about getting another kid like himself, the numbers don't back this up. High draft picks are hit or miss regardless if they come for high school or college ball. As it happens, the very first pick in the 2002 was a pitcher out of college, Bryan Bullington, whose overall major league record was 1-9 with and ERA of 5.62. On the opposite side, the entire first round produced some great players out of high school, including pitchers Matt Cain, Cole Hamels and Zack Greinke. For position players, the high school draftees from the first round include Prince Fielder, Jeff Francoeur, B.J. Upton and James Loney.

To conclude, the A's draft that year wasn't stellar and completely ignoring high school players is not a good strategy overall, but it might be the best strategy if you are trying to cut costs in the scouting department.

Saturday, June 1, 2013

Evaluating the Moneyball draft eleven years later:
Part 2: Value of the major leaguers, college vs. high school draftees


If you read the book or saw the movie Moneyball, a strong impression you could come away with is that Billy Beane had helped make the Oakland A's a better team than their payroll predicted they should be by using modern approaches for scouting talent. The book pays a lot of attention to the 2002 baseball draft, a draft where Beane chose only college players and left out the high school stars. The emotional reason behind this decision is that Beane himself had been drafted out of high school and did not turn out to be productive at the major league level.

Obviously, a sample size of n = 1 is not a basis for science. Let's look at a statistic from the high school and college players among the first 50 drafted position players and the first 50 drafted pitched.

I admit that a single statistic measures only a single dimension and baseball is not a one dimensional game. For position players, I'm using total bases, which means adding up the hits and walks and stolen bases, giving one extra base for each double, two extra bases for each triple, three extra for a home run and subtracting the number of times the player was caught stealing.

For the pitchers, I use innings pitched as the measure of their value over their career. It's not perfect, as it tends to favor starting pitchers over relievers, but in general it does show over a career how much value a pitcher had for his club.

Total base numbers for the players from the 2002 draft that made the majors
High school draftees 
===============
2902, 2114, 2111, 2062, 1629, 1243, 907, 527, 476, 179, 98, 54, 19
Average: 1101.6
Standard deviation: 846.5
n = 13

College draftees
===========
2670, 2649, 1928, 1292, 1270, 1100, 829, 541, 424, 124, 6, 3, 2
Average: 987.5
Standard deviation: 951.0
n = 13

For these numbers, the high schoolers are better prospects on average than the college players, due in large part to the most productive hitter on the list, Prince Fielder and his 2,902 career total bases so far. The top two college recruits were Nick Swisher and Curtis Granderson, who are also both still active.

The reason I included the standard deviation and the sample size is to find out if the difference we see is statistically significant and the answer is no. The simplest formula for statistical significance is the z-score method, and the big standard deviations are in the denominator of the formula, which overwhelm the numerator by a lot.

Innings pitched numbers for the players from the 2002 draft that made the majors
High school draftees 
===============
1537, 1492, 1377, 1163, 1022, 473, 450, 233,180, 164, 120, 15, 10
Average = 633.5
Standard deviation = 529.3
n = 13

College draftees
===========
1438, 1202, 1179, 1161, 1141, 495, 166, 157, 114, 82, 68, 20
Average = 602
Standard deviation = 491.2
n = 12

Once again, the big standard deviations mean the difference isn't significant, but the high school draftees did slightly outshine the college.

Generally, Beane's decision to ignore high school talent appears to be counter-productive. On average, they are no more likely to wash out than college players and the most talented can be in the major leagues at a younger age with a chance to have a longer career if they can stay healthy and productive.

One of the reasons the book focused on the draft was that the Athletics had negotiated a large number of first round picks in 2002, the highest being the 16th overall and the lowest being the 39th. Tomorrow, we will look at those picks and how the A's did against their competitors.

Thursday, May 30, 2013

Evaluating the Moneyball draft eleven years later.
Part 1: Making the majors or not


Michael Lewis' book Moneyball concentrated on the Oakland A's in 2002, more on the front office than on the players on the field. The idea was that general manager Billy Beane and his employees were looking at new and better ways to develop talent and use it, giving them a chance to be competitive with teams whose payrolls were several times higher than what the frugal A's ownership was willing to spend.

Beane and his team used sabremetrics, a word coined from the acronym SABR, the Society of American Baseball Research. The idea was that he would be able to draft players under-appreciated by other clubs and build a nucleus of young talent, though the best would be lost to free agency within a few years.

Beane's method really wasn't that scientific. What Billy Beane wanted to avoid was drafting another Billy Beane. He was much sought after out of high school. He debated whether he would go to college or go straight into baseball out of high school. He was drafted out of high school and he made the major leagues eventually, but he wasn't the All-Star the scouts had hoped he would be.

Beane the general manager drafted no high school players in 2002, worried there was a high probability he might find too many kids like himself who crumbled under the pressure of professional baseball.

Using the 2002 draft as our data set, let's ask the question: Is there a significant difference between the success rates of high school and college draftees?

The null hypothesis: There is no significant difference.
Data set #1: The first 50 position players drafted
Data set #2: The first 50 pitchers drafted
How we split the sets: A player was either drafted from high school or college and the player either made the major league roster or did not.
We will perform a chi-square test to see if the differences we see are significant.

Problem with this test: We are lumping together some players with very good careers so far with some guys who just barely had a cup of coffee in The Show. That problem will be addressed in the test used tomorrow.

Position players
============
High school draft: 13 made the majors, 13 did not
College draft: 13 made the majors, 11 did not

Test statistic: chi square = 0.087, well below even the 90% confidence threshold of 2.706


Pitchers
======
High school draft: 13 made the majors, 10 did not
College draft: 12 made the majors, 15 did not

Test statistic: chi square = 0.725, well below even the 90% confidence threshold of 2.706

These numbers just count whether players will make it to the majors or not, and as we can see, out of the first hundred or so players chosen, about half will see major league experience and high school draftees are not significantly different from college draftees. Another question is how good are those major leaguers when we compare the high schoolers to the collegiates?  Tomorrow, we will use a different statistical test on only one stat per player, not a completely fair test, but it does give an approximate idea of the players' worth to their squads.