Tuesday, June 25, 2013

Massachusetts Senate Race: Markey(D) vs. Gomez(R)
Final election day update

The special election to fill John Kerry's Senate seat in Massachusetts is today and Democrat Ed Markey has lead in the polls from day one. There have been a couple polls that showed a close race, but even in those Markey has been ahead, once by as little as a single percentage point. The four polls in the final week have had him ahead by 3%, 7% and twice at 10%. The Confidence of Victory of the median poll has never slipped under 95% for Markey.

There is always a first time, but in over 100 races where I have used the Confidence of Victory method, the favorite wins an overwhelming amount of the time and a favorite at over 90% is yet to lose. Voter turnout is always important and the method does not predict the margin of victory, just the victor.  I'll be back tomorrow to give the results.

Friday, June 21, 2013

2013 NBA Playoff predictions:
The final tallies

The Heat in seven was a popular prediction with the experts on ESPN, so we have a lot of predictors whose record improved with the last series.

Experts with 15 predictions
Arnovitz: 77.1%
Haberstroh: 76.0%
Wallace: 74.0%
Pelton: 72.7%
Adande: 72.0%
Windhorst: 69.3%
Gutierrez: 68.0%
Elhassan: 68.0%
Abbott: 67.3%
Stein: 58.0%

If I'm grading this strictly, that's five grades of C, four grades of D and an F.

I am not trying to insult these people, though I would certainly drop Stein from the list of experts next year if I was the boss. My point is that there are a lot of random factors in games and prediction is tough.

Experts with less than 15 but more than 10 predictions
Ford: 14 predictions, 85.7%
Legler: 13 predictions, 81.5%
Thorpe: 12 predictions, 77.5%
Palmer: 13 predictions, 70.0%
Barry: 14 predictions, 67.1%

Without question, the part-timers did better than the full-timers, 2 Bs,2 Cs and 1 D. While I would drop Stein, I would want the predictions of Ford and Legler in every series next time playoffs roll around.

Monday, June 17, 2013

Massachusetts Senate Race: Markey(D) vs. Gomez(R)
17 June Update

New polls have been released since last Monday regarding the special Senate election to fill John Kerry's open seat in Massachusetts. On June 9, two polls were released giving Democrat Ed Markey a 7 point lead over Republican Gabriel Gomez. Two more recent polls show double digit leads, one by the Republican polling company Harper (49%-37% on 6/11) and the most recent by Boston Globe (54%-43% on 6/14).

We are eight days away from the polls closing and nearly no movement in Confidence of Victory numbers, now sitting at 97.5% for Markey. A week out it's hard to move the needle, especially given the increased popularity of voting by mail. Polling companies would like it if news organizations would add the phrase "if the election were held when the poll was taken" to any report on a poll, but with absentee ballots the election is effectively taking place now. I expect more polls this week and at least one more report on the race.

Monday, June 10, 2013

Massachusetts Senate Race:
Markey(D) vs. Gomez(R)
10 June Update

We are now about two weeks out from the special election to fill John Kerry's Senate seat in Massachusetts. Polling data had been scarce, but it's starting to pick up. Now that Suffolk has published their data from a poll that ended yesterday, we have five polls from 2 June to 9 June. Here is the list from newest to oldest. (The next newest poll not included is from 15 May and was conducted by Public Policy Polling [PPP], who also completed a poll on June 4. This poll would be excluded from the list either for being too old or for being done by a company already represented.)

9 June: Suffolk 48%-41% Markey n=500
Confidence of Victory 95.2%
5 June: McLaughlin[R] 45%-44% Markey n=400
Confidence of Victory 58.4%
4 June: YouGov 51%-40% Markey n=500
Confidence of Victory 99.5%
4 June: PPP[D] 47%-39% Markey n=560
Confidence of Victory 98.0%
2 June: New England College 52%-40% Markey n=500
Confidence of Victory 100.0%

The [R] and [D] behind McLaughlin and PPP respectively indicate that they are partisan companies and I include that out of a sense of fairness. I do not skew the data, I let all the pollsters have their say and take the median, which this week happens to be PPP at 98% CoV. McLaughlin is currently the lone outlier saying the race is close and even that poll does not give Gomez the lead.

Again, I consider these statements as snapshots of the situation instead of forecasts, but in general I would say Markey looks to be in the lead comfortably and Gomez has some small momentum which has to improve markedly for him to have a chance.

More updates as more polls come in.

Thursday, June 6, 2013

2013 NBA Playoffs predictions:
Predictions for the finals

Currently on the ESPN website, there are 18 predictions for the outcome of the NBA Finals series between the Miami Heat and the San Antonio Spurs. Earlier in the week, there were two other predictions posted that have since been bumped. Here are all the predictions as well as the records of the experts so far in the first fourteen series match-ups.

Full time
Haberstroh: (14 predictions, 74.3%) Heat in 7
Elhassan: (14 predictions, 72.9%) Spurs in 6
Wallace: (14 predictions, 72.1%) Heat in 7
Pelton: (14 predictions, 70.7%) Heat in 7
Adande: (14 predictions, 70.0%) Heat in 7
Arnovitz: (14 predictions, 70.0%) Heat in 7
Windhorst: (14 predictions, 67.1%) Heat in 7
Guttierrez: (14 predictions, 66.4%) Heat in 6
Abbott: (14 predictions, 65.0%) Heat in 7
Stein: (14 predictions, 62.1%) Spurs in 6

The strong consensus of the people who haven't missed a chance to predict for the entire playoff series is the Heat will win Game 7 and become the champions. Looking at the percentages, I am not filled with confidence they actually know what will happen.

Almost full time
Legler: (12 predictions, 88.3%) Spurs in 6
Ford: (13 predictions, 84.6%) Heat in 7
Thorpe: (11 predictions, 76.4%) Heat in 6
Barry: (13 predictions, 64.6%) Heat in 7
Palmer: (12 predictions, 64.6%) Heat in 7

I put Legler's and Ford's numbers in bold because they are head and shoulders the best predictors so far. If I was grading on straight percentages they would be looking at B+ and B, and grading in comparison to the rest of the class they deserve As.

And they are pointing in the opposite directions.

Prediction is hard, especially about the future.

Part timers
McMenamin: (1 prediction, 70%) Heat in 6
Doolittle: (5 predictions, 68.0%) Heat in 6
Shelburne: (3 predictions, 66.7%) Spurs in 6
Broussard: (8 predictions, 62.5%) Heat in 6
Torres: (2 predictions, 0.0%) Spurs in 6

I put the rest of the predictors in out of completeness sake. This group does not inspire much confidence.

Notice that no one predicts a blowout. A six or seven game series is what to expect between evenly matched teams. A four game sweep is complete domination and five games series is pretty lopsided as well. It should be noted that if the series goes to Game 7, it will be played in Miami.

I am not an expert on basketball, but looking at the patterns, my feeling is that the experts have done a pretty good job of understanding the Eastern Conference and a relatively poor job of understanding the Western Conference. Not one expert thought the Warriors would beat the Nuggets and then underestimated how well they would do against the Spurs. They thought the Clippers were better than the Grizzlies. Surprised the Spurs did not dominate the Warriors easily and underestimating the Grizzlies, they then thought the Spurs-Grizzlies matchup would be a tough struggle. The Spurs won in four.

Personally as a fan, I don't like the Miami Heat. They are the team that looks better on paper, talent in their prime vs. talent that is getting old. It might be that Heat coach Erik Spolestra is at the beginning of a brilliant career, but I'm not convinced he's better than Greg Popovich right now.

One advantage the Spurs have is the leisurely pace of the Finals, with games on Tuesday, Thursday and Sundays. The extra day of rest can help the older players.

Not just to be different, I'm going to predict the Spurs win the series on their home court in Game 5, four games to one. I make this prediction looking at the 2-3-2 schedule and the extra day of rest between games 1 and 2 and games 4 and 5, which should help the older players recover.  I fully admit this prediction is nearly equal parts math and wishful thinking, but given the unimpressive record of conventional wisdom in the playoffs so far, I'm happy to go out on a limb with this one.

Monday, June 3, 2013

2013 NBA Playoffs predictions: Grade after the Heat-Pacers series

All the basketball experts from ESPN who have an opinion  predicted the Heat would win the conference final against the Indiana Pacers, so all of them get some improvement in their overall records. Here are the percentages for the people who have put forward an opinion on all fourteen contests so far.

Abbott 65.0%
Adande 70.0%
Arnovitz 70.0%
Elhassan 72.9%
Gutierrez 66.4%
Haberstroh 74.7%
Pelton 70.7%
Stein 62.1%
Wallace 72.1%
Windhorst 67.1%

If this were a math class I was teaching, I would not be happy. The median score is 70%, which is to say that nearly half the class is failing. The last question - Spurs or Heat - should be the toughest question of all.

To be fair, if we look at people who have had predictions in more than ten of fourteen series, we have some better students.

Barry (13 of 14 series) 64.6%
Ford (13 of 14 series) 84.6%
Legler (12 of 14 series) 88.3%
Palmer (12 of 14 series) 67.5%
Thorpe (11 of 14 series) 76.4%

I don't publish this stuff to mock the participants. When there is as much randomness as we have in sporting events, prediction gets much harder than it is in American elections that drag on for months and months.

I will return to this topic at the end of the final series, which should be over in a week or two at the most.

Sunday, June 2, 2013

Evaluating the Moneyball draft eleven years later:
Part 3: the first round draftees of the Oakland A's and their competitors

A substantial portion of the book Moneyball concerns itself with the 2002 baseball draft. Due to trades and compensation for free agents taken from their roster, the Oakland Athletics had seven of the first forty draft picks in this draft and figuring out who to pick was of great important to general manager Billy Beane. Both in the book and the movie, it is noted that Beane was drafted straight out of high school and did not live up to his potential. The A's that year did not draft a single high school player, preferring to get players from the college ranks.

It makes for a more dramatic story line to say that Billy Beane was doing all he could to not draft another Billy Beane, but there is another explanation that should be considered. As is noted in both the book and the movie, the A's were cheap. (And they still are, for that matter.) If you were trying to cut corners in a scouting organization, you could decide to only focus on college players or only focus on high school players. Because there are less colleges, the thriftiest decision would be to just scout the colleges.

The A's seven choices were four position players and three pitchers.

Position players: The A's first pick in the draft was the 16th pick overall and has to be seen as a great success, Nick Swisher. Their next two picks, John McCurdy (26th overall) and Jeremy Brown (35th overall), did not suceed, McCurdy never making it to the big leagues and Brown getting six total bases in 10 official at bats and one walk. Their last pick in the first round was Mark Teahen (39th overall), who is a solid major leaguer but not a big star.

It's hard to compare the A's picks of position players over this stretch because from the 16th to 39th pick, only the A's were looking at college position players. It's not fair to look at the high school players they could have picked up, since we can assume they didn't scout them at all. In hindsight, the best college prospects that they didn't take in these early round picks include the major league successes from the second round, Joey Votto (44th overall), Christopher Snyder (68th overall) and Curtis Granderson (80th overall). As evidence of what a crap shoot the draft can be, the greatest college success in the late rounds is Howie Kendrick, the 274th overall pick with a respectable 1490 total bases in his career so far.

Pitchers: The A's got one major league pitcher in their three first round picks, Joe Blanton, the 24th overall pick. Their other two picks did not make the majors, Ben Fritz (30th) and Steve Obenchain (37th). They could have done much worse, as the Chicago Cubs demonstrated. Like the A's, the Cubs were going only for college players and they had a total of four first round draft picks. The went entirely for pitchers and non of them made the major leagues.

If the A's decided to eschew high school drafts due to Billy Beane's worries about getting another kid like himself, the numbers don't back this up. High draft picks are hit or miss regardless if they come for high school or college ball. As it happens, the very first pick in the 2002 was a pitcher out of college, Bryan Bullington, whose overall major league record was 1-9 with and ERA of 5.62. On the opposite side, the entire first round produced some great players out of high school, including pitchers Matt Cain, Cole Hamels and Zack Greinke. For position players, the high school draftees from the first round include Prince Fielder, Jeff Francoeur, B.J. Upton and James Loney.

To conclude, the A's draft that year wasn't stellar and completely ignoring high school players is not a good strategy overall, but it might be the best strategy if you are trying to cut costs in the scouting department.

Saturday, June 1, 2013

Evaluating the Moneyball draft eleven years later:
Part 2: Value of the major leaguers, college vs. high school draftees

If you read the book or saw the movie Moneyball, a strong impression you could come away with is that Billy Beane had helped make the Oakland A's a better team than their payroll predicted they should be by using modern approaches for scouting talent. The book pays a lot of attention to the 2002 baseball draft, a draft where Beane chose only college players and left out the high school stars. The emotional reason behind this decision is that Beane himself had been drafted out of high school and did not turn out to be productive at the major league level.

Obviously, a sample size of n = 1 is not a basis for science. Let's look at a statistic from the high school and college players among the first 50 drafted position players and the first 50 drafted pitched.

I admit that a single statistic measures only a single dimension and baseball is not a one dimensional game. For position players, I'm using total bases, which means adding up the hits and walks and stolen bases, giving one extra base for each double, two extra bases for each triple, three extra for a home run and subtracting the number of times the player was caught stealing.

For the pitchers, I use innings pitched as the measure of their value over their career. It's not perfect, as it tends to favor starting pitchers over relievers, but in general it does show over a career how much value a pitcher had for his club.

Total base numbers for the players from the 2002 draft that made the majors
High school draftees 
2902, 2114, 2111, 2062, 1629, 1243, 907, 527, 476, 179, 98, 54, 19
Average: 1101.6
Standard deviation: 846.5
n = 13

College draftees
2670, 2649, 1928, 1292, 1270, 1100, 829, 541, 424, 124, 6, 3, 2
Average: 987.5
Standard deviation: 951.0
n = 13

For these numbers, the high schoolers are better prospects on average than the college players, due in large part to the most productive hitter on the list, Prince Fielder and his 2,902 career total bases so far. The top two college recruits were Nick Swisher and Curtis Granderson, who are also both still active.

The reason I included the standard deviation and the sample size is to find out if the difference we see is statistically significant and the answer is no. The simplest formula for statistical significance is the z-score method, and the big standard deviations are in the denominator of the formula, which overwhelm the numerator by a lot.

Innings pitched numbers for the players from the 2002 draft that made the majors
High school draftees 
1537, 1492, 1377, 1163, 1022, 473, 450, 233,180, 164, 120, 15, 10
Average = 633.5
Standard deviation = 529.3
n = 13

College draftees
1438, 1202, 1179, 1161, 1141, 495, 166, 157, 114, 82, 68, 20
Average = 602
Standard deviation = 491.2
n = 12

Once again, the big standard deviations mean the difference isn't significant, but the high school draftees did slightly outshine the college.

Generally, Beane's decision to ignore high school talent appears to be counter-productive. On average, they are no more likely to wash out than college players and the most talented can be in the major leagues at a younger age with a chance to have a longer career if they can stay healthy and productive.

One of the reasons the book focused on the draft was that the Athletics had negotiated a large number of first round picks in 2002, the highest being the 16th overall and the lowest being the 39th. Tomorrow, we will look at those picks and how the A's did against their competitors.