Math Year 2013: October 2013

Wednesday, October 30, 2013

VA Governor's race:
Last October update

There are three notable elections held next Tuesday. Barring absolute craziness, Bill DeBlasio will be mayor of New York City and Chris Christie will still be governor of New Jersey. That's a win for a liberal Democrat and a moderate Republican, at least by 2013 standards. The other election thatis closeris for governor of Virginia, pitting an establishment Democrat Terry McAuliffe against a Tea Party conservative Ken Cuccinelli. This race is closer, but still not really that close.

Even Republican pollsters who say it's close still have McAuliffe in the lead. There were no identified partisan pollsters among the five polling companies that put out data in the past few days. Rasmussen was reliably skewed to the right in 2012, but founder Scott Rasmussen has resigned and their polls in 2013 are closer to the median now.

Here are the five companies, the date the poll closed, the lead they have for McAuliffe and the confidence of victory number, rounded to the nearest tenth of a percent.

Washington Post 10/27 12 point lead 100.0% CoV
Roanoke 10/27 15 point lead 100.0% CoV
Hampton 10/27 6 point lead 97.3% CoV
Rasmussen 10/28 7 point lead 99.4% CoV
Quinnipiac 10/28 4 point lead 93.1% CoV

In this set of data, Rasmussen is the median, which is the number I go with. 99.4% sounds like a lock, and given that I have only collected data on less than 200 races, I don't have an instance of some candidate losing when the polls were so much in their favor. My system does not give more weigh to one company and less to another, but it should be noted that Quinnipiac alone picked a winner in the New York City comptroller race, so they are on a roll, at least in a small way. If the race is as close as they say, I will at least have to consider making a special case of mentioning their position every time they take a final poll, but that can wait until Wednesday morning to be decided.

If there are more polls, I'll make another update before Tuesday's election.

Monday, October 14, 2013

Final report: New Jersey special election for Senate

The special election to fill the seat of the late Frank Lautenberg will be held this Wednesday, October 16. Why it isn't being held on the same date as the governor's race this November or even being held on a Tuesday as most American election are, only Chris Christie knows.

In any case, I have said almost nothing about this race because no poll has shown it being very close. Cory Booker has a commanding lead over Steve Lonegan in every poll so far. The only dispute is how big the lead is.

Oct 13: Rutgers: sample size 513 Booker ahead 58%-36%(22 points)
Oct 12: Monmouth: sample size 1393 Booker ahead 52%-42% (10 points)
Oct 8: Richard Stockton College: sample size 729 Booker ahead 50%-39%(11 points)
Oct 7: Quinnipiac: sample size 899 Booker ahead 53%-41% (12 points)
Oct 5: Farliegh Dickinson: sample size 702 Booker ahead 45%-29% (16 points)

The story for polling nerds here is the huge discrepancy between the last two polls. My Confidence of Victory system does not try to pick the margin of victory, just the victor. Because the width of the margin is so high in all cases, the polls all agree Lonegan has next to no chance. Here are the odds using Confidence of Victory, stating them as x to 1 favorite, rounded to three significant digits.

Oct 13: Rutgers: Booker is a 16,000,000 to 1 favorite
Oct 12: Monmouth: Booker is a 18,500 to 1 favorite
Oct 8: Richard Stockton College: Booker is a 1,320 to 1 favorite
Oct 7: Quinnipiac: Booker is a 10,900 to 1 favorite
Oct 5: Farliegh Dickinson: Booker is a 4,170,000 to 1 favorite

Can I explain these differences? This may seem strange, but my explanation is these differences really don't matter. Once you get past being more than a 1,000 to 1 favorite, it's over. I have not yet had 1,000 samples, but so far the biggest favorite to lose since 2008 was about 11 to 5, or 2.2 to 1. This was Spitzer vs. Springer in New York City comptroller, a race only one Quinnipiac called for Springer.

There were some very big upsets in the Republican presidential primaries after Herman Cain and Newt Gingrich both failed to be the Not Romney and Rick Santorum went from also-ran to contender, but two person races are pretty easy to call, especially when the numbers are this big.

I'm sure there will be bigger upsets in two candidate races eventually and there may be some I didn't log because it was in a race I skipped over, like the House races or state ballot propositions. But unless voter turnout is crazy low and skewed as hell, Cory Booker will be a Senator very soon.

Wednesday, October 9, 2013

VA governor's race:
First October update

There are races in New Jersey for senator and governor this year, but I haven't reported on them because they haven't looked interesting. Democrat Cory Booker will likely become a senator with a double digit win and Republican Chris Christie appears to be cruising to re-election by a wide margin as well.

The governor's race in Virginia is a little closer, but according to my Confidence of Victory system, only a little. Here's the latest data.

Since late August, polling companies have been taking a major interest in the race between Democrat Terry McAuliffe and Republican Ken Cucinnelli. Every week or so, two or more polls are released, so the data I present here are weekly medians of the Confidence of Victory numbers.

Advantage McAuliffe, if you haven't been paying attention.

Most news outlets that show polling data consider the percentage lead the most important information, but my system is based on the percentage lead of voters who have a preference. This I turn into a Confidence of Victory (CoV) number using very basic statistical methods. In mid August, McAuliffe's CoV numbers were over 98%, but they fell in mid-September to about 85%. Since then, they have climbed back to 96.8%. I haven't collected enough data to say that there really is much difference when someone gets a CoV over 90%. No one with that strong a lead at the end lost in the data I gather since 2004.

Notice the prepositional phrase "at the end". We aren't at the end yet, because the election is a month away. But as it stands currently, Cuccinelli needs something like the miracle Bill De Blasio got when Anthony Weiner went from front-runner to pariah/dick joke. Unless there are pictures of Terry McAuliffe's junk floating undetected around the Internet, Cuccinelli faces very long odds indeed.

More updates when there is more data.

Friday, October 4, 2013

The Correlation Coefficient R and the reduction of range.

Those who followed my series on climate change might remember that I would take some area I could define as a rectangle in longitude and latitude and track the average temperature by year, comparing equivalent seasons. (To be precise, if a longitude/latitude area include either the North or South Pole, it would be more like a slice of pie than a rectangle.) I then split the time span from 1955 to 2010 into four eras based on the El Niño/La Niña cycles, these in particular each starting and ending in a strong La Niña year.

This particular region and season, Siberia in Spring, has a clearly increasing trend of the median temperature, the dotted red line moving up in four separate steps. The lowest temperature registered only moves upward twice and the maximum average temperature takes a step down in the era of 1999-2010, as the highest temperature was registered back in the 1990s.

Another method would be to add a line of regression, also known as a trendline or predictor line or the line of least squares. Excel has an option which gives the equation of the line and the variance R². R is called the correlation coefficient and it varies between -1 and 1. R² must be between 0 and 1 and is sometimes thought of as a proportion. In this case, the .3655 would be the proportion we would assign to the general increase we see in the temperatures, while the fluctuations are about .6345 of the influence.

This doesn't sound very convincing, but if R² = .3655, R in this case would be +.6046, a value that by nearly every standard of correlation is considered high, though it doesn't meet the Rule of Thumb criteria for very high, which would be over .8.

This is one of the many reasons I don't love using the predictor line and the correlation coefficient. The statements of confidence seem arbitrary - I know of three different systems and they disagree radically on whether an R score is strong or not - but also there is a way to cherry pick data in both directions, either to show more correlation or less.

Generally though not always, taking a subset of a sample by restricting the range will result in R and R² being reduced. For example, if we look at the first Consistent Oceanic Niña Interval from 1955 to 1975, we see somewhat less overall increase (here we check the number multiplying x, which went from .0417 to .0346) and a drastic drop in R² from .3655 to .05097. Here I can say without fear of contradiction that the correlation is not impressive.

In our second interval, R² is stronger at .19237, but still well below the larger set's value of .3655. It could be considered moderately strong by some measures, but notice that here the trend shows the region cooling. (It really is coincidence that the first year of this era shows a large jump in temperature over the previous.)

Here again we see a downward sloping trendline and an extremely weak R² value of .01592, which is to say nearly no correlation.

Yet again, a small downward slope and a low R² value.

In my view, the problem is cherry picking in both directions. People who wish to downplay or deny warming temperatures can take smaller samples, but when they do, the R² value will often give little confidence in the trend they try to show. On the other hand, people wanting to show strong evidence of warming have a natural advantage of generally higher R² scores in larger data sets. To be fair, in this particular set it is impossible to create a subset longer than thirty years that doesn't show a warming trend, though it can be minimized and so can the correlation coefficient.

If anyone is coming to the blog for the first time, you should know that I am not a denier of the general warming trend in temperatures around the globe in my lifetime, which started in the Strong La Niña year of 1955. What I hope for is a discussion where both sides can agree on terms and methods and avoid cherry picking at all costs. I realize this hope may very well be in vain, and yet I hold on to it.