Saturday, December 7, 2013

World Cup prediction for the quarterfinals

The World Cup will be played in Brazil in 2014, if all the stadia can be built in time. (With Brazil as host in 2014, Russia in 2018 and Qatar in 2022, we could easily be looking at three disasters in a row, with 2022 almost certain to go down as the worst World Cup of all time.) The draw was yesterday, and as always, fans from some countries feel like they got hosed.

The toughest groups are for the most part considered to be Group B, Group D and Group G. When there are three teams that look good enough to be in the round of 16, the popular nickname is a Group of Death. If Group B, the three teams are Spain, Netherlands and Chile. In Group D, it's Uruguay, England and Italy. In Group G, Germany looks a cinch to qualify, but Portugal and the United States will fight for the second spot. (The U.S. also has a history of tough games against Ghana. My friend Art, a fan of football and the show Farscape, calls the U.S. draw We're So Screwed, Parts 1, 2 and 3, in honor of three episodes from the last season of the Australian sci-fi series.)

What I would like to point out is the weakness of Groups C and H. The rules stipulate four of these teams will be in the round of 16, but my prediction is that at most one will make it to the quarterfinals, and that team will be Colombia. Group C's winner and runner-up will face the runner-up and winner from group D respectively, and Groups G and H have a similar reciprocal schedule. Because the tournament is in South America, I think Colombia might have a chance should they win Group C against whoever comes in second in Group D, but I predict here that no other team out of the other seven, Greece, Ivory Coast, Japan, Belgium, Algeria, Russia or South Korea, will win a game after the Group rounds are over.

I'll be back in early July to state if I got this right or not.

Monday, November 11, 2013

Virginia governor's race post-mortem:
Grades the five pollsters from the final week

All the polls in the Virginia governor's race from mid-July to November that Terry McAuliffe had a lead over Ken Cuccinelli, and in the final week the median lead was 7%, which using my system which factors in the size of the sample as well made McAuliffe a 99.4% favorite to win. He won, but by a much smaller margin, leading many people to ask why the polls were so wrong.

The best explanation is a well-known tendency for third party candidates to do better in polls than they do in the actual election. The Libertarian candidate Robert Sarvis was getting good numbers for a third party, the last five polls giving him 13%, 12%, 10%, 8% and 4%. The median was 10% and the average 9.4%. His actual numbers were 6.6% and a lot of his alleged support appears to have favored the Republican Cuccinelli when the curtains on the booths were drawn.

Many websites give the margin of the lead instead of the "margin of error" these days, and that makes some sense, since the "margin of error" is the 95% confidence interval for the actual result, which really should be measured after the undecided and no preference numbers are removed from the tally. If all we care about is the final lead, the Emerson College Polling Society bathed themselves in glory by predicting a 2% lead for McAuliffe, very close to the actual 2.5% final margin when everyone else overstated the lead.

But looking at the 95% confidence intervals after undecideds are removed actually makes Emerson College look like the worst polling company of the final five.  Here's why.

Emerson College
Raw numbers: sample of 874, McAuliffe 42%, Cuccinelli 40%, Sarvis 13%
Removing undecided: sample of 830, McAuliffe 44.2%, Cuccinelli 42.1%, Sarvis 13.7%
Total missed percentage from reality: 14.25% (5th place out of 5)
95% confidence interval (aka margin of error) for each candidate: 
McAuliffe 47.59% to 40.83% (missed 47.97%)
Cuccinelli 45.46% to 38.75% (barely missed 45.47%)
Sarvis 16.02% to 11.35% (massively missed 6.56%)

Emerson missed all three true vote totals, only barely in the case of the front runners, but by a wide margin when looking at Sarvis.

Raw numbers: sample of 870, McAuliffe 50%, Cuccinelli 43%, Sarvis 4%
Removing undecided: sample of 844, McAuliffe 51.5%, Cuccinelli 44.3%, Sarvis 4.1%
Total missed percentage from reality: 7.15% (second best)
95% confidence interval (aka margin of error) for each candidate: 
McAuliffe 54.92% to 48.17% (missed 47.97%)
Cuccinelli 47.67% to 40.98% (captured 45.47%)
Sarvis 5.47% to 2.78% (missed 6.56%)

PPP was the only company to underestimate Sarvis, so they were very close to the real numbers for Cuccinelli and overestimated McAuliffe. One company did better at the distance from all three candidates.
Raw numbers: sample of 1002, McAuliffe 43%, Cuccinelli 36%, Sarvis 12%
Removing undecided: sample of 912, McAuliffe 47.3%, Cuccinelli 39.6%, Sarvis 13.2%
Total missed percentage from reality: 13.25% (4th place out of 5)
95% confidence interval (aka margin of error) for each candidate: 
McAuliffe 50.59% to 44.01% (captured 47.97%)
Cuccinelli 42.73% to 37.76% (missed 45.47%)
Sarvis 15.38% to 10.99% (massively missed 6.56%)

Like Emerson, the big overestimate of Sarvis skews these numbers badly, but they did capture McAuliffe's numbers.

Newport College
Raw numbers: sample of 1028, McAuliffe 45%, Cuccinelli 38%, Sarvis 10%
Removing undecided: sample of 965, McAuliffe 48.4%, Cuccinelli 40.9%, Sarvis 10.8%
Total missed percentage from reality: 9.22% (3rd best)

95% confidence interval (aka margin of error) for each candidate: 
McAuliffe 51.54% to 45.23% (captured 47.97%)
Cuccinelli 43.96% to 37.76% (missed 45.47%)
Sarvis 12.71% to 8.80% (missed 6.56%)

Like several other polls Newport captured the McAuliffe number but underestimated Cuccinelli and overestimated Sarvis.
Raw numbers: sample of 1606, McAuliffe 46%, Cuccinelli 40%, Sarvis 8%
Removing undecided: sample of 1510, McAuliffe 48.9%, Cuccinelli 42.6%, Sarvis 8.5%
Total missed percentage from reality: 5.83% (best of 5)
95% confidence interval (aka margin of error) for each candidate: 
McAuliffe 51.46% to 46.41% (captured 47.97%)
Cuccinelli 45.05% to 40.06% (missed 45.47%)
Sarvis 9.92% to 7.10% (missed 6.56%)

Coming off their great call of the New York City Democratic comptroller primary, Quinnipiac is again the best polling company of those that polled in the last week, though it must be said they are the best of a bad bunch. Their downfall in the 95% confidence intervals was the size of their sample. Big samples mean smaller margins of error, but not necessarily more captures of the real numbers.

I don't change my model very often, and even though the candidate who my system favored actually won, I was a little embarrassed by assigning such a huge Confidence of Victory number (99.4%) to a race that turned out to be so close. The next time there is a third party candidate polling with significant support but no real chance to win, I'm going to figure out a fair way to re-assign the numbers, with the assumption that a Libertarian will lose support that will go Republican and a Green will lose support that will go Democratic.

Wednesday, November 6, 2013

Virginia Governor's race 2013:

Last night, the votes were tallied in the Virginia governor's race. The final result was

McAuliffe (Democratic) 48.0%
Cuccinelli (Republican) 45.5%
Sarvis (Libertarian) 6.6%

If you add these numbers together, you get 100.1%, which both means rounding error and effectively there were no votes for other candidates.

My system gave McAuliffe a 99.4% Confidence of Victory(CoV). That usually points to an easy win, but this was not easy, contested well into the night. What went wrong?

Well, my system does not promise to give the margin of victory, just who will win, so this one goes into the books as a win. The most recent polls that gave us the 99.4% was from Christopher Newport University, which had these numbers for the three candidates.

McAuliffe (Democratic) 45%
Cuccinelli (Republican) 38%
Sarvis (Libertarian) 10%

Notice that these numbers add up to 93%, so scaling up to make the three add up to 100.1% and keep their relative sizes, we get

McAuliffe (Democratic) 48.4%
Cuccinelli (Republican) 40.9%
Sarvis (Libertarian) 10.8%

The prediction for McAuliffe is very close, but Cuccinelli is quite low and Sarvis is very high. This is where almost all the error lies in the median poll. There was an outlier poll in the lastweek by Emerson College that had the lead at only 2% (very close to the truth), but their numbers for the three candidates are actually not very close.

McAuliffe (Democratic) 42%
Cuccinelli (Republican) 40%
Sarvis (Libertarian) 13%

This adds up to 95%, so scaling things up we get

McAuliffe (Democratic) 44.2%
Cuccinelli (Republican) 42.1%
Sarvis (Libertarian) 13.7%

This one isn't close to any of the true totals, only to the true margin of victory.

Quite simply, all the polls had the same problem. People lied about voting for the Libertarian.

As Republicans stake out a weird space (we want a government small enough to fit into your bedrooms and glove compartments) some people on the right have taken to calling themselves "Libertarians".

My problem with this is simple enough. I was registered Libertarian in 1976.

And then I met some of them.

I heard so many arguments that one nightmarish evening. Then, they were considered fringe and unacceptable but now, the same ideas are repeated across the Internet and in some circles considered mainstream. Here are a few.

a) Social Security can't survive.

b) The Post Office is an intolerable obstruction to liberty.

c) Government should have nothing to do with education.

d) Taxation is theft.

e) You should be able to smoke pot.

Well, I agree with one of those positions.

I want to keep my system as simple and clean as possible, but I have find a way to factor in what a significant but hopeless third party candidate really means. A Libertarian vote does not have an equal chance to go Democratic or Republican or stay home if the option to vote Libertarian is removed.

Since all my system promises is to call the winner sometime during the last week, last night counts as a win. As the old baseball saying goes, every hit looks like a line drive in the box score. But the next time a Libertarian appears be drawing a large number for a third party, my system has to come up with a method for adjusting the Confidence of Victory.

It is said you only learn from your mistakes. I got the winner this time, but it was still a mistake. We will see how much I learn from it.

Monday, November 4, 2013

VA Governor's race:
Election Eve update

Several elections are being held tomorrow. The polls say there is no suspense whatsoever in the governor's race in New Jersey or the mayor's race in New York, where repsectively Chris Christie and Bill De Blasio are expected to win by double digits. In Virginia, the governor's race between Terry McAuliffe and Ken Cuccinelli looks closer, but if my system hold true, there is extremely little chance of an upset. (I have notbeen checking the data for the down ticket races. Daily Kos says only the Attorney General race looks close.)

For a time in early September, Cuccinelli appeared to be gaining on McAuliffe, but that trend turned around and Cuccinelli's team has been starved for good news since. The last three updates on October 20, October 27 and November 3 have all put the Confidence of Victory number for McAuliffe at a formidable 99.4%. (In a strange coincidence, these three numbers came from three separate polls from three different companies, the well-known Quinnipiac and Rasmussen and the lesser known Newport College Polling Group.

My system does not predict the margin of victory, but except for one outlier predicting a close 2% margin, the rest of the polls in the past week make it look like the margin will be 6% to 7%.

I will report back on Wednesday about the results.

Wednesday, October 30, 2013

VA Governor's race:
Last October update

There are three notable elections held next Tuesday. Barring absolute craziness, Bill DeBlasio will be mayor of New York City and Chris Christie will still be governor of New Jersey. That's a win for a liberal Democrat and a moderate Republican, at least by 2013 standards. The other election thatis closeris for governor of Virginia, pitting an establishment Democrat Terry McAuliffe against a Tea Party conservative Ken Cuccinelli. This race is closer, but still not really that close.
Even Republican pollsters who say it's close still have McAuliffe in the lead. There were no identified partisan pollsters among the five polling companies that put out data in the past few days. Rasmussen was reliably skewed to the right in 2012, but founder Scott Rasmussen has resigned and their polls in 2013 are closer to the median now.

Here are the five companies, the date the poll closed, the lead they have for McAuliffe and the confidence of victory number, rounded to the nearest tenth of a percent.

Washington Post 10/27 12 point lead 100.0% CoV
Roanoke 10/27 15 point lead 100.0% CoV
Hampton 10/27 6 point lead 97.3% CoV
Rasmussen 10/28 7 point lead 99.4% CoV
Quinnipiac 10/28 4 point lead 93.1% CoV

In this set of data, Rasmussen is the median, which is the number I go with. 99.4% sounds like a lock, and given that I have only collected data on less than 200 races, I don't have an instance of some candidate losing when the polls were so much in their favor. My system does not give more weigh to one company and less to another, but it should be noted that Quinnipiac alone picked a winner in the New York City comptroller race, so they are on a roll, at least in a small way. If the race is as close as they say, I will at least have to consider making a special case of mentioning their position every time they take a final poll, but that can wait until Wednesday morning to be decided.

If there are more polls, I'll make another update before Tuesday's election.

Monday, October 14, 2013

Final report: New Jersey special election for Senate

The special election to fill the seat of the late Frank Lautenberg will be held this Wednesday, October 16. Why it isn't being held on the same date as the governor's race this November or even being held on a Tuesday as most American election are, only Chris Christie knows.

In any case, I have said almost nothing about this race because no poll has shown it being very close. Cory Booker has a commanding lead over Steve Lonegan in every poll so far. The only dispute is how big the lead is.

Oct 13: Rutgers: sample size 513 Booker ahead 58%-36%(22 points)
Oct 12: Monmouth: sample size 1393 Booker ahead 52%-42% (10 points)
Oct 8: Richard Stockton College: sample size 729 Booker ahead 50%-39%(11 points)
Oct 7: Quinnipiac: sample size 899 Booker ahead 53%-41% (12 points)
Oct 5: Farliegh Dickinson: sample size 702 Booker ahead 45%-29% (16 points)

The story for polling nerds here is the huge discrepancy between the last two polls. My Confidence of Victory system does not try to pick the margin of victory, just the victor. Because the width of the margin is so high in all cases, the polls all agree Lonegan has next to no chance. Here are the odds using Confidence of Victory, stating them as x to 1 favorite, rounded to three significant digits.

Oct 13: Rutgers: Booker is a 16,000,000 to 1 favorite
Oct 12: Monmouth: Booker is a 18,500 to 1 favorite
Oct 8: Richard Stockton College: Booker is a 1,320 to 1 favorite
Oct 7: Quinnipiac: Booker is a 10,900 to 1 favorite
Oct 5: Farliegh Dickinson: Booker is a 4,170,000 to 1 favorite

Can I explain these differences? This may seem strange, but my explanation is these differences really don't matter. Once you get past being more than a 1,000 to 1 favorite, it's over. I have not yet had 1,000 samples, but so far the biggest favorite to lose since 2008 was about 11 to 5, or 2.2 to 1. This was Spitzer vs. Springer in New York City comptroller, a race only one Quinnipiac called for Springer.

There were some very big upsets in the Republican presidential primaries after Herman Cain and Newt Gingrich both failed to be the Not Romney and Rick Santorum went from also-ran to contender, but two person races are pretty easy to call, especially when the numbers are this big.

I'm sure there will be bigger upsets in two candidate races eventually and there may be some I didn't log because it was in a race I skipped over, like the House races or state ballot propositions. But unless voter turnout is crazy low and skewed as hell, Cory Booker will be a Senator very soon.

Wednesday, October 9, 2013

VA governor's race:
First October update

There are races in New Jersey for senator and governor this year, but I haven't reported on them because they haven't looked interesting. Democrat Cory Booker will likely become a senator with a double digit win and Republican Chris Christie appears to be cruising to re-election by a wide margin as well.

The governor's race in Virginia is a little closer, but according to my Confidence of Victory system, only a little. Here's the latest data.

Since late August, polling companies have been taking a major interest in the race between Democrat Terry McAuliffe and Republican Ken Cucinnelli. Every week or so, two or more polls are released, so the data I present here are weekly medians of the Confidence of Victory numbers.

Advantage McAuliffe, if you haven't been paying attention.

Most news outlets that show polling data consider the percentage lead the most important information, but my system is based on the percentage lead of voters who have a preference. This I turn into a Confidence of Victory (CoV) number using very basic statistical methods. In mid August, McAuliffe's CoV numbers were over 98%, but they fell in mid-September to about 85%. Since then, they have climbed back to 96.8%. I haven't collected enough data to say that there really is much difference when someone gets a CoV over 90%. No one with that strong a lead at the end lost in the data I gather since 2004.

Notice the prepositional phrase "at the end". We aren't at the end yet, because the election is a month away. But as it stands currently, Cuccinelli needs something like the miracle Bill De Blasio got when Anthony Weiner went from front-runner to pariah/dick joke. Unless there are pictures of Terry McAuliffe's junk floating undetected around the Internet, Cuccinelli faces very long odds indeed.

More updates when there is more data.