Saturday, December 7, 2013

World Cup prediction for the quarterfinals

The World Cup will be played in Brazil in 2014, if all the stadia can be built in time. (With Brazil as host in 2014, Russia in 2018 and Qatar in 2022, we could easily be looking at three disasters in a row, with 2022 almost certain to go down as the worst World Cup of all time.) The draw was yesterday, and as always, fans from some countries feel like they got hosed.

The toughest groups are for the most part considered to be Group B, Group D and Group G. When there are three teams that look good enough to be in the round of 16, the popular nickname is a Group of Death. If Group B, the three teams are Spain, Netherlands and Chile. In Group D, it's Uruguay, England and Italy. In Group G, Germany looks a cinch to qualify, but Portugal and the United States will fight for the second spot. (The U.S. also has a history of tough games against Ghana. My friend Art, a fan of football and the show Farscape, calls the U.S. draw We're So Screwed, Parts 1, 2 and 3, in honor of three episodes from the last season of the Australian sci-fi series.)

What I would like to point out is the weakness of Groups C and H. The rules stipulate four of these teams will be in the round of 16, but my prediction is that at most one will make it to the quarterfinals, and that team will be Colombia. Group C's winner and runner-up will face the runner-up and winner from group D respectively, and Groups G and H have a similar reciprocal schedule. Because the tournament is in South America, I think Colombia might have a chance should they win Group C against whoever comes in second in Group D, but I predict here that no other team out of the other seven, Greece, Ivory Coast, Japan, Belgium, Algeria, Russia or South Korea, will win a game after the Group rounds are over.

I'll be back in early July to state if I got this right or not.

Monday, November 11, 2013

Virginia governor's race post-mortem:
Grades the five pollsters from the final week

All the polls in the Virginia governor's race from mid-July to November that Terry McAuliffe had a lead over Ken Cuccinelli, and in the final week the median lead was 7%, which using my system which factors in the size of the sample as well made McAuliffe a 99.4% favorite to win. He won, but by a much smaller margin, leading many people to ask why the polls were so wrong.

The best explanation is a well-known tendency for third party candidates to do better in polls than they do in the actual election. The Libertarian candidate Robert Sarvis was getting good numbers for a third party, the last five polls giving him 13%, 12%, 10%, 8% and 4%. The median was 10% and the average 9.4%. His actual numbers were 6.6% and a lot of his alleged support appears to have favored the Republican Cuccinelli when the curtains on the booths were drawn.

Many websites give the margin of the lead instead of the "margin of error" these days, and that makes some sense, since the "margin of error" is the 95% confidence interval for the actual result, which really should be measured after the undecided and no preference numbers are removed from the tally. If all we care about is the final lead, the Emerson College Polling Society bathed themselves in glory by predicting a 2% lead for McAuliffe, very close to the actual 2.5% final margin when everyone else overstated the lead.

But looking at the 95% confidence intervals after undecideds are removed actually makes Emerson College look like the worst polling company of the final five.  Here's why.

Emerson College
Raw numbers: sample of 874, McAuliffe 42%, Cuccinelli 40%, Sarvis 13%
Removing undecided: sample of 830, McAuliffe 44.2%, Cuccinelli 42.1%, Sarvis 13.7%
Total missed percentage from reality: 14.25% (5th place out of 5)
95% confidence interval (aka margin of error) for each candidate: 
McAuliffe 47.59% to 40.83% (missed 47.97%)
Cuccinelli 45.46% to 38.75% (barely missed 45.47%)
Sarvis 16.02% to 11.35% (massively missed 6.56%)

Emerson missed all three true vote totals, only barely in the case of the front runners, but by a wide margin when looking at Sarvis.

Raw numbers: sample of 870, McAuliffe 50%, Cuccinelli 43%, Sarvis 4%
Removing undecided: sample of 844, McAuliffe 51.5%, Cuccinelli 44.3%, Sarvis 4.1%
Total missed percentage from reality: 7.15% (second best)
95% confidence interval (aka margin of error) for each candidate: 
McAuliffe 54.92% to 48.17% (missed 47.97%)
Cuccinelli 47.67% to 40.98% (captured 45.47%)
Sarvis 5.47% to 2.78% (missed 6.56%)

PPP was the only company to underestimate Sarvis, so they were very close to the real numbers for Cuccinelli and overestimated McAuliffe. One company did better at the distance from all three candidates.
Raw numbers: sample of 1002, McAuliffe 43%, Cuccinelli 36%, Sarvis 12%
Removing undecided: sample of 912, McAuliffe 47.3%, Cuccinelli 39.6%, Sarvis 13.2%
Total missed percentage from reality: 13.25% (4th place out of 5)
95% confidence interval (aka margin of error) for each candidate: 
McAuliffe 50.59% to 44.01% (captured 47.97%)
Cuccinelli 42.73% to 37.76% (missed 45.47%)
Sarvis 15.38% to 10.99% (massively missed 6.56%)

Like Emerson, the big overestimate of Sarvis skews these numbers badly, but they did capture McAuliffe's numbers.

Newport College
Raw numbers: sample of 1028, McAuliffe 45%, Cuccinelli 38%, Sarvis 10%
Removing undecided: sample of 965, McAuliffe 48.4%, Cuccinelli 40.9%, Sarvis 10.8%
Total missed percentage from reality: 9.22% (3rd best)

95% confidence interval (aka margin of error) for each candidate: 
McAuliffe 51.54% to 45.23% (captured 47.97%)
Cuccinelli 43.96% to 37.76% (missed 45.47%)
Sarvis 12.71% to 8.80% (missed 6.56%)

Like several other polls Newport captured the McAuliffe number but underestimated Cuccinelli and overestimated Sarvis.
Raw numbers: sample of 1606, McAuliffe 46%, Cuccinelli 40%, Sarvis 8%
Removing undecided: sample of 1510, McAuliffe 48.9%, Cuccinelli 42.6%, Sarvis 8.5%
Total missed percentage from reality: 5.83% (best of 5)
95% confidence interval (aka margin of error) for each candidate: 
McAuliffe 51.46% to 46.41% (captured 47.97%)
Cuccinelli 45.05% to 40.06% (missed 45.47%)
Sarvis 9.92% to 7.10% (missed 6.56%)

Coming off their great call of the New York City Democratic comptroller primary, Quinnipiac is again the best polling company of those that polled in the last week, though it must be said they are the best of a bad bunch. Their downfall in the 95% confidence intervals was the size of their sample. Big samples mean smaller margins of error, but not necessarily more captures of the real numbers.

I don't change my model very often, and even though the candidate who my system favored actually won, I was a little embarrassed by assigning such a huge Confidence of Victory number (99.4%) to a race that turned out to be so close. The next time there is a third party candidate polling with significant support but no real chance to win, I'm going to figure out a fair way to re-assign the numbers, with the assumption that a Libertarian will lose support that will go Republican and a Green will lose support that will go Democratic.

Wednesday, November 6, 2013

Virginia Governor's race 2013:

Last night, the votes were tallied in the Virginia governor's race. The final result was

McAuliffe (Democratic) 48.0%
Cuccinelli (Republican) 45.5%
Sarvis (Libertarian) 6.6%

If you add these numbers together, you get 100.1%, which both means rounding error and effectively there were no votes for other candidates.

My system gave McAuliffe a 99.4% Confidence of Victory(CoV). That usually points to an easy win, but this was not easy, contested well into the night. What went wrong?

Well, my system does not promise to give the margin of victory, just who will win, so this one goes into the books as a win. The most recent polls that gave us the 99.4% was from Christopher Newport University, which had these numbers for the three candidates.

McAuliffe (Democratic) 45%
Cuccinelli (Republican) 38%
Sarvis (Libertarian) 10%

Notice that these numbers add up to 93%, so scaling up to make the three add up to 100.1% and keep their relative sizes, we get

McAuliffe (Democratic) 48.4%
Cuccinelli (Republican) 40.9%
Sarvis (Libertarian) 10.8%

The prediction for McAuliffe is very close, but Cuccinelli is quite low and Sarvis is very high. This is where almost all the error lies in the median poll. There was an outlier poll in the lastweek by Emerson College that had the lead at only 2% (very close to the truth), but their numbers for the three candidates are actually not very close.

McAuliffe (Democratic) 42%
Cuccinelli (Republican) 40%
Sarvis (Libertarian) 13%

This adds up to 95%, so scaling things up we get

McAuliffe (Democratic) 44.2%
Cuccinelli (Republican) 42.1%
Sarvis (Libertarian) 13.7%

This one isn't close to any of the true totals, only to the true margin of victory.

Quite simply, all the polls had the same problem. People lied about voting for the Libertarian.

As Republicans stake out a weird space (we want a government small enough to fit into your bedrooms and glove compartments) some people on the right have taken to calling themselves "Libertarians".

My problem with this is simple enough. I was registered Libertarian in 1976.

And then I met some of them.

I heard so many arguments that one nightmarish evening. Then, they were considered fringe and unacceptable but now, the same ideas are repeated across the Internet and in some circles considered mainstream. Here are a few.

a) Social Security can't survive.

b) The Post Office is an intolerable obstruction to liberty.

c) Government should have nothing to do with education.

d) Taxation is theft.

e) You should be able to smoke pot.

Well, I agree with one of those positions.

I want to keep my system as simple and clean as possible, but I have find a way to factor in what a significant but hopeless third party candidate really means. A Libertarian vote does not have an equal chance to go Democratic or Republican or stay home if the option to vote Libertarian is removed.

Since all my system promises is to call the winner sometime during the last week, last night counts as a win. As the old baseball saying goes, every hit looks like a line drive in the box score. But the next time a Libertarian appears be drawing a large number for a third party, my system has to come up with a method for adjusting the Confidence of Victory.

It is said you only learn from your mistakes. I got the winner this time, but it was still a mistake. We will see how much I learn from it.

Monday, November 4, 2013

VA Governor's race:
Election Eve update

Several elections are being held tomorrow. The polls say there is no suspense whatsoever in the governor's race in New Jersey or the mayor's race in New York, where repsectively Chris Christie and Bill De Blasio are expected to win by double digits. In Virginia, the governor's race between Terry McAuliffe and Ken Cuccinelli looks closer, but if my system hold true, there is extremely little chance of an upset. (I have notbeen checking the data for the down ticket races. Daily Kos says only the Attorney General race looks close.)

For a time in early September, Cuccinelli appeared to be gaining on McAuliffe, but that trend turned around and Cuccinelli's team has been starved for good news since. The last three updates on October 20, October 27 and November 3 have all put the Confidence of Victory number for McAuliffe at a formidable 99.4%. (In a strange coincidence, these three numbers came from three separate polls from three different companies, the well-known Quinnipiac and Rasmussen and the lesser known Newport College Polling Group.

My system does not predict the margin of victory, but except for one outlier predicting a close 2% margin, the rest of the polls in the past week make it look like the margin will be 6% to 7%.

I will report back on Wednesday about the results.

Wednesday, October 30, 2013

VA Governor's race:
Last October update

There are three notable elections held next Tuesday. Barring absolute craziness, Bill DeBlasio will be mayor of New York City and Chris Christie will still be governor of New Jersey. That's a win for a liberal Democrat and a moderate Republican, at least by 2013 standards. The other election thatis closeris for governor of Virginia, pitting an establishment Democrat Terry McAuliffe against a Tea Party conservative Ken Cuccinelli. This race is closer, but still not really that close.
Even Republican pollsters who say it's close still have McAuliffe in the lead. There were no identified partisan pollsters among the five polling companies that put out data in the past few days. Rasmussen was reliably skewed to the right in 2012, but founder Scott Rasmussen has resigned and their polls in 2013 are closer to the median now.

Here are the five companies, the date the poll closed, the lead they have for McAuliffe and the confidence of victory number, rounded to the nearest tenth of a percent.

Washington Post 10/27 12 point lead 100.0% CoV
Roanoke 10/27 15 point lead 100.0% CoV
Hampton 10/27 6 point lead 97.3% CoV
Rasmussen 10/28 7 point lead 99.4% CoV
Quinnipiac 10/28 4 point lead 93.1% CoV

In this set of data, Rasmussen is the median, which is the number I go with. 99.4% sounds like a lock, and given that I have only collected data on less than 200 races, I don't have an instance of some candidate losing when the polls were so much in their favor. My system does not give more weigh to one company and less to another, but it should be noted that Quinnipiac alone picked a winner in the New York City comptroller race, so they are on a roll, at least in a small way. If the race is as close as they say, I will at least have to consider making a special case of mentioning their position every time they take a final poll, but that can wait until Wednesday morning to be decided.

If there are more polls, I'll make another update before Tuesday's election.

Monday, October 14, 2013

Final report: New Jersey special election for Senate

The special election to fill the seat of the late Frank Lautenberg will be held this Wednesday, October 16. Why it isn't being held on the same date as the governor's race this November or even being held on a Tuesday as most American election are, only Chris Christie knows.

In any case, I have said almost nothing about this race because no poll has shown it being very close. Cory Booker has a commanding lead over Steve Lonegan in every poll so far. The only dispute is how big the lead is.

Oct 13: Rutgers: sample size 513 Booker ahead 58%-36%(22 points)
Oct 12: Monmouth: sample size 1393 Booker ahead 52%-42% (10 points)
Oct 8: Richard Stockton College: sample size 729 Booker ahead 50%-39%(11 points)
Oct 7: Quinnipiac: sample size 899 Booker ahead 53%-41% (12 points)
Oct 5: Farliegh Dickinson: sample size 702 Booker ahead 45%-29% (16 points)

The story for polling nerds here is the huge discrepancy between the last two polls. My Confidence of Victory system does not try to pick the margin of victory, just the victor. Because the width of the margin is so high in all cases, the polls all agree Lonegan has next to no chance. Here are the odds using Confidence of Victory, stating them as x to 1 favorite, rounded to three significant digits.

Oct 13: Rutgers: Booker is a 16,000,000 to 1 favorite
Oct 12: Monmouth: Booker is a 18,500 to 1 favorite
Oct 8: Richard Stockton College: Booker is a 1,320 to 1 favorite
Oct 7: Quinnipiac: Booker is a 10,900 to 1 favorite
Oct 5: Farliegh Dickinson: Booker is a 4,170,000 to 1 favorite

Can I explain these differences? This may seem strange, but my explanation is these differences really don't matter. Once you get past being more than a 1,000 to 1 favorite, it's over. I have not yet had 1,000 samples, but so far the biggest favorite to lose since 2008 was about 11 to 5, or 2.2 to 1. This was Spitzer vs. Springer in New York City comptroller, a race only one Quinnipiac called for Springer.

There were some very big upsets in the Republican presidential primaries after Herman Cain and Newt Gingrich both failed to be the Not Romney and Rick Santorum went from also-ran to contender, but two person races are pretty easy to call, especially when the numbers are this big.

I'm sure there will be bigger upsets in two candidate races eventually and there may be some I didn't log because it was in a race I skipped over, like the House races or state ballot propositions. But unless voter turnout is crazy low and skewed as hell, Cory Booker will be a Senator very soon.

Wednesday, October 9, 2013

VA governor's race:
First October update

There are races in New Jersey for senator and governor this year, but I haven't reported on them because they haven't looked interesting. Democrat Cory Booker will likely become a senator with a double digit win and Republican Chris Christie appears to be cruising to re-election by a wide margin as well.

The governor's race in Virginia is a little closer, but according to my Confidence of Victory system, only a little. Here's the latest data.

Since late August, polling companies have been taking a major interest in the race between Democrat Terry McAuliffe and Republican Ken Cucinnelli. Every week or so, two or more polls are released, so the data I present here are weekly medians of the Confidence of Victory numbers.

Advantage McAuliffe, if you haven't been paying attention.

Most news outlets that show polling data consider the percentage lead the most important information, but my system is based on the percentage lead of voters who have a preference. This I turn into a Confidence of Victory (CoV) number using very basic statistical methods. In mid August, McAuliffe's CoV numbers were over 98%, but they fell in mid-September to about 85%. Since then, they have climbed back to 96.8%. I haven't collected enough data to say that there really is much difference when someone gets a CoV over 90%. No one with that strong a lead at the end lost in the data I gather since 2004.

Notice the prepositional phrase "at the end". We aren't at the end yet, because the election is a month away. But as it stands currently, Cuccinelli needs something like the miracle Bill De Blasio got when Anthony Weiner went from front-runner to pariah/dick joke. Unless there are pictures of Terry McAuliffe's junk floating undetected around the Internet, Cuccinelli faces very long odds indeed.

More updates when there is more data.

Friday, October 4, 2013

The Correlation Coefficient R and the reduction of range.

 Those who followed my series on climate change might remember that I would take some area I could define as a rectangle in longitude and latitude and track the average temperature by year, comparing equivalent seasons. (To be precise, if a longitude/latitude area include either the North or South Pole, it would be more like a slice of pie than a rectangle.) I then split the time span from 1955 to 2010 into four eras based on the El Niño/La Niña cycles, these in particular each starting and ending in a strong La Niña year.

This particular region and season, Siberia in Spring, has a clearly increasing trend of the median temperature, the dotted red line moving up in four separate steps. The lowest temperature registered only moves upward twice and the maximum average temperature takes a step down in the era of 1999-2010, as the highest temperature was registered back in the 1990s.

Another method would be to add a line of regression, also known as a trendline or predictor line or the line of least squares. Excel has an option which gives the equation of the line and the variance R². R is called the correlation coefficient and it varies between -1 and 1. R² must be between 0 and 1 and is sometimes thought of as a proportion. In this case, the .3655 would be the proportion we would assign to the general increase we see in the temperatures, while the fluctuations are about .6345 of the influence.

This doesn't sound very convincing, but if R² = .3655, R in this case would be +.6046, a value that by nearly every standard of correlation is considered high, though it doesn't meet the Rule of Thumb criteria for very high, which would be over .8.

This is one of the many reasons I don't love using the predictor line and the correlation coefficient. The statements of confidence seem arbitrary - I know of three different systems and they disagree radically on whether an R score is strong or not - but also there is a way to cherry pick data in both directions, either to show more correlation or less.

Generally though not always, taking a subset of a sample by restricting the range will result in R and R² being reduced. For example, if we look at the first Consistent Oceanic Niña Interval from 1955 to 1975, we see somewhat less overall increase (here we check the number multiplying x, which went from .0417 to .0346) and a drastic drop in R² from .3655 to .05097. Here I can say without fear of contradiction that the correlation is not impressive.

In our second interval, R² is stronger at .19237, but still well below the larger set's value of .3655. It could be considered moderately strong by some measures, but notice that here the trend shows the region cooling. (It really is coincidence that the first year of this era shows a large jump in temperature over the previous.)

Here again we see a downward sloping trendline and an extremely weak R² value of .01592, which is to say nearly no correlation.

Yet again, a small downward slope and a low R² value.

In my view, the problem is cherry picking in both directions. People who wish to downplay or deny warming temperatures can take smaller samples, but when they do, the R² value will often give little confidence in the trend they try to show. On the other hand, people wanting to show strong evidence of warming have a natural advantage of generally higher R² scores in larger data sets. To be fair, in this particular set it is impossible to create a subset longer than thirty years that doesn't show a warming trend, though it can be minimized and so can the correlation coefficient.

If anyone is coming to the blog for the first time, you should know that I am not a denier of the general warming trend in temperatures around the globe in my lifetime, which started in the Strong La Niña year of 1955. What I hope for is a discussion where both sides can agree on terms and methods and avoid cherry picking at all costs. I realize this hope may very well be in vain, and yet I hold on to it.

Monday, September 23, 2013

VA governor's race:
Late September update

In Virginia, a state Barack Obama won by 4% in 2012 and 6% in 2008, the statewide races feature Tea Party favorites running on the Republican ticket. In the governor's race, both the Democrat Terry McAuliffe and the Republican Ken Cuccinelli have large negative ratings, but the basic math of the electorate says one thing clearly.

The Republicans are the minority party in the land. They've lost two presidential elections in a row, they don't hold the Senate and got less votes across the country in last congressional election, holding onto the majority in large part due to gerrymandering.

For all that, the needle is moving slightly for Cuccinelli if we give all polls the same weighting. Here are the Confidence of Victory numbers for recent polls, given by company name and date of poll. All give McAuliffe the advantage.

Washington Post 9/22 89.1%
Marist 9/19 90.3%
Harper[R] 9/16 94.2%
Roanoke 9/15 63.5%
Quinnipiac 9/15 84.9%

The good news for Cuccinelli: McAuliffe isn't cruising above 95% anymore and Quinnipiac, still crowing over calling the NYC comptroller race all by themselves with their new likely voter model, says the race is getting close, but still isn't really close.

The bad news for Cuccinelli: You see those two big dives downward in the graph, the one in May and the second in July?  The first was a Washington Post poll and the second was a Roanoke poll. The only polls that have had Cuccinelli ahead now have him behind.

The other bad news for Cuccinelli: My system may say 85% Confidence of Victory, but the track record so far says almost no underdog wins unless the Confidence of Victory splits about 60%-40%. The biggest upset against my system since 2008 was Stringer beating Spitzer for NYC Comptroller when Spitzer had 68.9% Confidence of Victory, the election when only Quinnipiac got the numbers right. (They have crowed loudly about that, but they had De Blasio cruising over 40% of decided voters and he only just beat that margin.)

The election is still about seven weeks off, which means my system is not built to make a prediction, but the small gains Cuccinelli has made are not what he needs. Romney was in a similar situation and saw some small gains after Obama's poor showing in the first primary, but those faded quickly enough. Cuccinelli needs something bigger than we've seen so far to close the gap, and appealing to the Republican "base", such as it is, is a step in the wrong direction right now.

More updates when I get more data.

Saturday, September 14, 2013

Virginia Gubernatorial Race:
McAuliffe (D) vs. Cuccinelli (R)

There was a recent article in The Washington Post by Ben Shapiro discussing the Virginia gubernatorial race. Several pundits were asked their opinion of the race. They quote one poll - Quinnipiac from last month showing a six point McAuliffe lead - and several pundits saying it leans towards McAuliffe or, in the words of often quoted Charlie Cook, it's a 50-50 race.

Not one poll aggregator is asked how it's going.

I could say that pundits are a lower life form. I could call them some scatological name or some rude word that describes a male or female sex organ.

But seriously, if you didn't get the memo from 2012, there is no lower life form than pundit. Pond scum should write strongly worded letters to the editor whenever they are compared to pundits.

Here's what the polls have said with remarkable consistency since early May. McAuliffe has a very big lead.  A Washington Post poll said Cucinelli had a lead on May 2, but since a week after that on May 9, only a mid-July Roanoke poll gave him the lead. Within days of that report, PPP and Quinnipiac both gave McAuliffe a substantial lead.

For those who have prejudices against some polls as being "liberal leaning", consider Rasmussen polls, who erred substantially towards Romney in the 2012 election cycle. In early June, they read a 3% lead for McAuliffe. In early September, they said it was a 7% lead for the Democrat. In my Confidence of Victory system, that 7% lead means McAuliffe's is about 140 to 1 favorite to win if the election were held when the poll was taken.

And now for my personal provisos. Nate Silver thinks he can give a percentage for what that number means today given the election happens November 5.

I make no such claims and I never will.

Here's the situation. Right now, it looks like a gimme for McAuliffe, but there are seven and a half weeks to go. What no one can predict, neither the hard working aggregators or the lazy arrogant overpaid and over-quoted pundits, is what will move the needle in that period. As of right now, what Cuccinelli is an outright stunner, something along the lines of Anthony Weiner's penis pics or Mitt Romney's comments about the lazy scum-sucking 47% of the population.

Heck, maybe he can even get McAuliffe to admit he doesn't like the Soggy Bottom Boys. But anything less than that and he has a concession speech to write, the sooner the better.

More reports when there is more information.

Wednesday, September 11, 2013

Results from the Democratic mayoral race in New York City.

Almost all the ballots have been counted in the New York City municipal elections held yesterday. My last post here was on Sunday, but there was a late poll from Public Policy Polling that changed my numbers. After working on how I was going to consider the "median result" from a set of four polls in a three person race where there was a 40% threshold, my last prediction on Monday evening was posted to Twitter.

Final call on NYC Dem Mayor's race, different from  
55% De Blasio 1st ballot 
30% De Blasio/Thompson 
15% De Blasio/Quinn

Professor Wang's last prediction was a 90% probability of De Blasio on the first ballot.

The current vote count has Bill De Blasio at 40.3% and his closest rival Bill Thompson at 26.2%. Because his margin over the 40% threshold is so slim, we will have to wait for absentee ballots and an automatic recount for a result this close. The reported estimate is next Monday for a conclusive result.

Here on the blog, I made no mention of the comptroller's race, but I did mention it on Twitter in a two part tweet.

NYC mayoral primary today. and I agree that De Blasio first ballot win is most likely as is Spitzer for Comptroller. [1/2]

If Spitzer loses the Comptroller race, full credit should go , the only organization to favor his rival Stringer. [2/2]

There were several polls of this race as you can see at this link to the Real Clear Politics polling page. There was nothing that made this look like a close race until late August, with Eliot Spitzer's hopes for a political comeback looking very strong while Manhattan borough president Scott Stringer seemed unable to make any headway. But then Quinnipiac had a poll that said the race was tied, then another Quinnipiac poll had a small of 2% for Stringer on September 1, then a commanding 7% lead a week later.

The actual result was Stringer by 4%, 52% to 48%.

The reason my system - and Professor Wang's - favored Spitzer was that no other pollster agreed with Quinnipiac, not even once. After the first Quinnipiac poll that showed a close race, Siena, Marist and PPP all released polls, all of them favoring Spitzer.

So my system failed to predict this race. Often, when I get a race wrong, I try to find ways to adjust things to improve, but if I was given a similar situation tomorrow, I wouldn't change a thing. Here are my reasons.

1. I never weigh in more than a single poll from any polling company and always the most recent.

2. Because Quinnipiac only gets one result counted, they had one poll out of three in the final mix, along with PPP and Siena.

3. In a three poll situation, I take the median result, not the average of the three. Quinnipiac was the outlier, but outliers aren't often the closest to correct. Obviously, it will happen sometimes, but it is rare enough that I do not want to second guess myself in a situation like this. Polls miss the mark, sometimes by making bad assumptions, sometimes just by the fact that randomness is involved. For example, of the three polls Quinnipiac had the largest sample size, but that does not always (or even often) mean the most accurate. Looking at the mayoral polling and factoring out the undecided, Quinnipiac had De Blasio at 46% of the people who had a preference, while PPP had him at 42% and Siena had him at 39%. In this case, it's the low outlier that was closest.

So this means at least one more report on the mayor's race. It is widely agreed that the Republican primary winner Joe Llota will be very hard pressed to win in the general election. De Blasio got 260,000 votes in the Democratic primary, Llota finished first with over 50% in the Republican primary with about 30,000, which would have given him sixth place in the Democratic race.

And while I did not use their poll, I want to congratulate Quinnipiac for being alone with the correct result in the comptroller's race. It is a field filled with randomness, but they had Stringer with a chance or the lead three separate times in barely two weeks when no one else did. That's a record they can point to with pride.

Sunday, September 8, 2013

New York City Mayoral Race: Democratic Primary
The Weekend before the election.

On Tuesday, New York City voters will go to the polls to decide the Democratic and Republican candidates to be the next mayor. The conventional wisdom is the Democratic nominee holds a huge advantage over whichever candidate the Republicans put forward, so the Democratic primary is being given the lion's share of interest.
Since mid-August, just after Anthony Weiner's habit of sending portraits of his penis to strangers became common knowledge, the front runner has been Bill De Blasio. It is widely agreed that the ads featuring his 15 year old son Dante, he of the remarkable adolescent baritone and even cooler Afro, have made quite the impact.

This Sunday, a new poll from Marist has been  published which agrees with the general trend if not the exact numbers.

Candidate recent % (previous %)

Marist - 9/6 recent, 8/14 previous
De Blasio 36% (24%)
Thompson 20% (16%)
Quinn 20% (24%)
Other 16% (21%)
None of the Above 8% (15%)

Quinnipiac - 9/1 recent, 8/12 previous
De Blasio 43% (30%)
Thompson 20% (22%)
Quinn 18% (24%)
Other 12% (17%)
None of the Above 7% (7%)

Siena - 8/28 recent, 8/7 previous
De Blasio 32% (14%)
Thompson 18% (16%)
Quinn 17% (25%)
Other 16% (19%)
None of the Above 17% (26%)

The agreement is across the board. De Blasio had a great September, Quinn took a beating and Thompson improved enough to probably be the favorite for second place if that matters. De Blasio is seen as the progressive candidate and the candidate for the boroughs other than Manhattan. Even the fact that De Blasio is a Red Sox fan may not be enough to stop him.

Of the three polls, Quinnipiac has been the one that has found the most support for De Blasio and Siena has lagged, but that may be due to Siena being first to polls in each case. That leaves Marist, last to poll and the median poll for De Blasio support both mid-August and early September.

If we accept Marist as the most reliable because of being the median, there are three outcomes that look possible on Tuesday.

De Blasio gets more than 40% on the first ballot: 34%
A De Blasio-Thompson run-off: 36%
A De Blasio-Quinn run-off: 30%

If I were putting a wager on this three way outcome, I'd put my buck on De Blasio-Thompson in a run-off, though any of these three results can hardly be called an upset. The only thing that would make me doubt my sanity or the validity of my methods is De Blasio finishing third. There hasn't been a poll in more than three weeks that shows that result to be even close to plausible.

I'll be back on Tuesday evening to report the actual vote.

Wednesday, September 4, 2013

Rounding - standard method and statistician's method a.k.a Gaussian rounding or banker's rounding.

I do not like to tell tales out of school. I consider the relationship of teacher and student to be one of confidentiality, most especially on the teachers' part. But I will say this.

The students I teach have a heck of a time with rounding.

I've taught some of the remedial classes at community college, like arithmetic and pre-algebra, and the students in these classes often do not get the idea of rounding, either rounding to a nearest decimal - like tenths or hundredths or thousandths - or rounding to the nearest thousand or million or rounding to a certain number of significant digits. In later classes like statistics or even calculus or linear algebra, I see there are more than a few students who haven't grasped round up and rounding down, usually always rounding down, also known as truncating.

For me, this is a major pedagogical hurdle. When I teach, I try to put myself in the mindset of when I didn't know how to do a thing and remember the things that helped me learn it. I'm not going to say I learned rounding in mere seconds, but it feels like I did. I'm sure I stumbled with it back some time in primary school, but after a few mistakes the mechanical rule fell into place and made sense.

Look at the digit that is going to vanish, the one to the right of the last place you are rounding to. If it is a 5, 6, 7, 8 or 9, add one to the digit you are rounding to. If it is a 0, 1, 2, 3 or 4, the last digit you are going to use remains the same. The first method is called rounding up and the second version is called rounding down.

Example: 4/7 = 0.571428571428..., a decimal place then the six digit pattern 571428 repeating forever.

Round 4/7 to the nearest tenth.
The digit in the tenths place is the .5, so the answer is going to be either .5 or .6; because the next digit (in some texts called the decision digit) is a 7, we add 1 to 5 and the answer is .6

Round 4/7 to the nearest hundredth.
If we truncate to the hundredths place, we get .57, so the answer is going to be either .57 or .58; because the next digit is a 1, we leave .57 as it is.

Round 4/7 to the nearest thousandth.
If we truncate to the thousandths place, we get .571, so the answer is going to be either .571 or .572; because the next digit is a 4, we leave .571 as it is.

Okay, I expect that this is not news to many of my readers, though it may have been a while since you thought about it.

I am chagrined that I did not know about other methods given my advanced years, but a statistics text I am using for the first time has a math skills pre-test and uses a slightly different method, known by several names. I first heard of it as "banker's rounding", though doing more research, I understand that "statistician's method" or "Gaussian rounding" are more common and likely more accurate.

Let's say for arguments's sake we are rounding off to the nearest dollar, getting rid of the pesky pennies. The method you likely learned in school, which here is called "Traditional", will simply look at the tenths position.

If we have between $2.00 and $2.49, this will "round down" to $2.00 exactly.

If instead the total is between $2.50 and $2.99, we "round up" to $3.00.

The new method agrees with the old method almost exactly, with the only contentious case being the half dollar. Technically, $2.51 should go to $3.00, because that's the closest value. (It's 49 cents away from $3 and 51 cents away from $2.) Likewise $2.49 should round down to $2.00, since that is the closest value. but $2.50 presents a philosophical dilemma, since it is exactly 50 cents away from $3.00 and 50 cents away from $2.00.

This new method says to round a number of the form x.5 to the nearest even number. That means half the time we round x.5 up and half the time we round down. The row in yellow shows the only disagreement between 1 and 3. 1.5 rounds up traditionally to 2, and in the new method rounds to the nearest even number, which is still 2. But 2.5 rounds traditionally to 3, and in the new method rounds to 2.

Why bother? Think of what we are changing in the sum of rounded numbers, assuming that all numbers are equally likely to show up. Let's say we had the list of numbers as follows:


These 201 entries add up to 402, which means the average is 402/201 = 2.

If we round them using the standard method, we will get

50 x 1 = 50
100 x 2 = 200
51 x 3= 153

This adds up to 403, and 403/201 = 2.004975124..., which is to say that adding, then rounding will not give the same answer as rounding, then adding.

If we round this set using statisticians rounding, this is what happens.

50 x 1 = 50
101 x 2 = 202
50 x 3= 150

This adds up to 402, and rounds to 2 exactly.

As a teacher who knows my students already have a difficult time with rounding, this presents a problem. I do not want to "dumb down" the curriculum, but I also don't want to add in extra problems when I don't have to. Searching Wikipedia for this method, I see that this is the standard for IEEE 754 use with floating point operations.

Some but certainly not all of my students may see this in their careers. I would hope that people who go into programming would have a better grasp of math, but having worked for nearly two decades in the field, I know that is not always the case. More than once, I came onto a project at a large computer company that involved something like higher math and I was the only programmer on a large team who knew the right method. Sometimes it was something slightly esoteric, like group theory and the symmetries of the square. Another time, I was the only person who really understood how sine and cosine worked.

One of my favorite expressions I learned from my father is "You learn something new every day, if you aren't careful." Well, I wasn't careful and I learned something new yesterday. Now I have to decide how it should apply to the classes I teach.

It would be so much easier if I did what I was told and didn't give a rat's ass, but as I am now two score and seventeen years old, I get the feeling the "I don't give a rat's ass" option is not open to me.

Tuesday, September 3, 2013

The New York City Mayor's race:
One week from election day

There were a flurry of polls in the New York mayoral race in mid August and then nothing until last week. Now, we have four new polls, but two of them are from the same company, Quinnipiac, so I only count three, using the most recent Quinnipiac, one from amNew York-News 12 and a third from Siena, the polling company hired by The New York Times.

Both Quinnipiac and Siena polled mid-month and late month, so I will give their numbers in the form

Candidate recent% (previous %)

Quinnipiac - 9/1 recent, 8/12 previous
De Blasio 43% (30%)
Thompson 20% (22%)
Quinn 18% (24%)
Other 12% (17%)
None of the Above 7% (7%)

Siena - 8/28 recent, 8/7 previous
De Blasio 32% (14%)
Thompson 18% (16%)
Quinn 17% (25%)
Other 16% (19%)
None of the Above 17% (26%)

Both of these polls agree that August was a very good month for De Blasio and a bad month for almost everyone else. We also have a poll from amNew York-News12, who only polled once in August.

amNew York-News12 - 8/27 recent, no previous
De Blasio 29%
Thompson 24%
Quinn 17%
Other 17%
None of the Above 13%

The "median" poll can be a little difficult to judge in a multi-candidate contest. If we consider the race De Blasio vs. Thompson, the Siena poll is the median with a 14% lead.  In this case, De Blasio has about a 27% chance to cross the 40% threshold and win outright, and the Siena poll says the race for second is relatively close, but still gives Thompson a 65% Confidence of Victory over Quinn.

Here are the Confidence of Victory numbers from all three polls.

De Blasio outright win: 99.9%
De Blasio-Thompson run-off: 0.08% 
De Blasio-Quinn run-off: 0.02%

De Blasio outright win: 27.2%
De Blasio-Thompson run-off: 47.3% 
De Blasio-Quinn run-off: 25.5%

amNew York-News 12
De Blasio outright win: 0.1%
De Blasio-Thompson run-off: 99.9% 
De Blasio-Quinn run-off: 0%

There should be at least one more poll from Marist by the end of the week, but unless Siena is right, Quinn is pretty much out of the race.


Sunday, August 25, 2013

Some thoughts about election prediction.

It's been nearly two weeks since there has been any new data on the New York City mayoral race. The last few polls agree the three at the front of the pack are Christine Quinn, Bill De Blasio and Bill Thompson. Unless one of them gets over 40% of the vote, and the recent polls make that look very unlikely, the top two will go into a runoff election, and the odds make it look like an uphill climb for Thompson.

The New York Times endorsed Quinn. Her bio is certainly compelling from a progressive standpoint. She would be the first female mayor of America's largest city and she would be the first openly gay mayor. Her record on the issues is not as compelling from a progressive standpoint. She is seen by many as a person who would continue the Bloomberg status quo. The Nation, a truly progressive periodical unlike the Times, has endorsed De Blasio.

Nate Silver has published numbers saying that history favors the long term front runner of the year, which would be Quinn in this case. She had competition from Anthony Weiner after he announced and before his spectacular crash, but for months before that she lead in every poll.

This is one of the many places in predictions where I part company with Silver. I assume that every election stands on its own and historical data should not outweigh current polling.

It's impossible to predict the big events and impossible to gauge the weight they will have. It looked like based on name recognition Anthony Weiner was a serious candidate, but the Carlos Danger situation cut his support in half, his position plummeting from first to a distant fourth. A judge deciding "stop and frisk" was unconstitutional gave De Blasio a big boost, but it's hard to say if it's a surge or a bounce until we get more data. The Times endorsement should be much more important that The Nation's, but are either of them really events that make a difference in the polls?

I have no idea.

When I say that the last polls look like an overwhelming likelihood of a Quinn/De Blasio runoff, I am completely willing to say something else if the next polls change. It's very hard to say what events will change the polling numbers, known in journalistic circles as "moving the needle".

In 2012, Romney's 47% comments moved the needle against him, but he was already in a horrible position when he said that. Obama's awful first debate definitely moved the needle in Romney's favor, but by the time of the Biden/Ryan and the second Obama/Romney debate, the movement towards Romney had stopped and the tide reversed. He never was favored to win.

Conservatives hoped Benghazi would move the needle. It didn't.

Silver went to Princeton and I went to Cal State Hayward. Silver was hired by the Times and I haven't been able to get anyone to pay me for my work. In my favor, I did better than he did in 2008 and 2012. (He missed Indiana in the electoral vote in 2008 and two Senate races in 2012 that I got right. One of them he missed by giving weight to historical data, something I would never do.)

Here's the big difference between us as far as I'm concerned. My education is in pure mathematics and his is in applied.

Election prediction is applied math. I deeply distrust it and I wait until I think the data really means something. I have arbitrarily chosen about a week before the polls close. By then, I figure enough mail-in ballots have been cast that the most important proviso of any opinion poll is actually true, that "the election is being held when the poll was taken".

Is seven days the right number? Maybe ten would be better, maybe four. If my predictions become less reliable in the future, changing this time period is one of my first places to look to improve my system.

This brings me to something else Silver has done that I disagree with vehemently. He is predicting the make-up of the Senate in 2014 and giving a margin of error. These numbers are being used as scare tactics by the Democrats begging for money.

As far as I'm concerned, these numbers are worse than nonsense. He does actual work, so I am not happy comparing him to lazy, ignorant anti-intellectuals like Dick Morris, Karl Rove, Jim Cramer or George F. Will, but predictions about a multi-part election more than fifteen months away from polls closing are mere balloon juice. Even if someone "crunches numbers", given how many unknowable events there will be in the next year and a quarter means his work has no more meaning than doing exact bio-rhythms or a complete astrological chart.

I do not know if Nate Silver will ever read what I write. If he does, I want to say to him, older nerd to younger nerd, stop doing the stuff that is most likely to hurt your reputation. A fifteen month prediction about elections is no better than your work in fantasy football or Oscar prediction, which you have been gracious enough to admit has been less than optimal.

Wait until just before the election. Then the numbers of people doing actual work, work like yours and mine and Sam Wang's, have meaning. Right now, your work in 2013 is like Gene Simmons of KISS going out on a limb in 2011 predicting President Rick Perry and giving us as the reason to believe him, the quote "I'm never wrong."

He was wrong. Even with your margin of error, saying there is 95% confidence in your numbers is plain silly.

Stop being silly. You are much better than that.

Here endeth the lesson.

Wednesday, August 21, 2013

The Witch of (Maria) Agnesi

 Consider a circle that fits perfectly between two parallel lines. One of the easiest pairs of parallel lines to picture is a top and a bottom. In this picture, the point O is the point tangent to the bottom and the point M is the point tangent to the top.

Take any point on the circle and call it A. The line drawn from O to A is called a secant and it will cross the line at the top at a point we call N. We can create a right triangle by making a line parallel to both top and bottom. The point P is the corner of the right triangle that has the 90° angle. For every new A, there is a new N and a new P. We are interested in the curve created by all the points P.
This may seem like a very strange and roundabout way to make a curve in the plane. You wouldn't be wrong to think so. In the early days of the xy-plane, mathematicians worked on these kinds of very strange curves. The first work on this was done by Pierre De Fermat, an amateur mathematician working with Rene Decartes at the beginning of the discovery of the Cartesian plane, named for Descartes ("of the cards" in French.) The formula for the curve is given by the formula in x, y and a, where a is the radius of the circle.

The simplest version of the equation is when the radius is ½, so the diameter is 1.
The curves are sometimes called bell-shaped curves, but this is not the formula for the most famous of the bell-shaped curves, the most studied version of that being called the normal curve. This curve is called the Witch of Maria Agnesi.

The name comes about by a mistranslation of Italian to English, a mistranslation that might have been intentional but has most certainly lasted until today.

Fermat's first work on the curve is done is 1630, but in 1703, the Italian Guido Grandi also studies the shape and names it in Latin, the versoria, which is also the name for one of the ropes that holds a sail in place, usually called a sheet in English.

Move forward again to 1748 and a very odd occurrence at the time, an important mathematical paper written by a woman, the Italian mathematician Maria Agnesi. She writes her paper in Italian and calls the curve "versiera", the Italian translation for Grandi's Latin word "versoria". Here comes the confusion, possibly done on purpose.

The Italian for adversary is "aversiera" is sometimes shortened to "versiera". The adversary of God is the Devil, but because this is the feminine form, it can be translated to "witch". The translation by Cambridge professor John Colson, a contemporary of Maria Agnesi, turns the name for a rope on a boat to witch, a play on words that likely was done at the expense of the very rare female mathematician.

So here is to Maria Agnesi, not an originator of an idea but, like so many of us in mathematics, a worker in the field, doing her best to preserve knowledge for future generations. Her name lasts to this day in a somewhat mocking form, but at least we remember her, one of the few women in all of Europe to publish a mathematical work that has lasted from an era when women were not allowed to get a degree from any university on the continent.

Saturday, August 17, 2013

One more rule about Pythagorean triples

Yet again, let's consider Pythagorean triples, sets of three positive integers a, b and c such that a² + b² = c².  Here are several examples where all three numbers are less than 100.

3² + 4² = 5²
12² + 5² = 13²
15² + 8² = 17²
7² + 24² = 25²
9² + 40² = 41²
20² + 21² = 29²
48² + 55² = 73²

We know that we can generate Pythagorean triples using any two positive integers j and k with the formulas, where j > k with

a = j² - k²    b = 2jk and   c = j² + k²

This means there must be infinitely many different Pythagorean triples. If you aren't sure, consider k=1 and we get

a = j² - 1   b = 2j and  c = j² + 1

For any j > 1, we will get a unique Pythagorean triple.

Here's a pattern I noticed. Every triple on our list above has one number that is divisible by 5. Let us start with a speculation.

Speculation: Every Pythagorean triple has at least one number that is divisible by 5.

We could create a triple where all three numbers are divisible by 5 just by taking any Pythagorean triple and multiplying all the numbers by 5.

Example: Because  3² + 4² = 5², 15² + 20² = 25², The original numbers are 9 +16 = 25, and if we multiply all the numbers by 5, all the squares are multiplied by 25 and we get 225 + 400 = 625.

That explains why the modifier "at least" is used in the speculation.

Let's find facts that will help us prove the speculation.

Fact: The last digit of a square is determined by the last digit of the original number.

Think about multiplying two numbers that have two digits together, like 72 × 72. 


The ones place is just 2 × 2 and the 70 isn't involved. This would be true no matter how many digits in a number.

Fact: The last digit of a perfect square can only be 0, 1, 4, 9 or 6.

We can prove this by looking at the list of perfect squares of the numbers 0 through 9 (0, 1, 4, 16, 25, 36, 49, 64, 81) and using the first fact from above that the last digit of a number determines the last digit of the square.

Fact: The remainder of a perfect square when divided by 5 can only be 0, 1 or 4.

Because 10 = 2 × 5, the remainder upon division by 5 will look a lot like the last digit, except 6 becomes 1, and 9 becomes 4.

Fact: If you want to make a sum of two numbers from the set {0, 1, 4} to add up to 0, 1 or 4, there are only a finite number of patterns and not all of them work.

0 + 0 = 0 works
0 + 1 = 1 works
0 + 4 = 4 works
1 + 0 = 1 works
1 + 1 = 2 doesn't work
1 + 4 = 5 doesn't work, except when we divide 5 by 5 we get remainder 0, so it does work for us.
4 + 0 = 4 works
4 + 1 = 5  doesn't work, except when we divide 5 by 5 we get remainder 0, so it does work for us.
4 + 4 = 8 doesn't

Fact: Every pattern that works has a 0 or 5 in it.

Fact: A remainder of 0 when dividing by five means the number is divisible by 5.

Fact: Because 5 is prime, getting a perfect square that is divisible by 5 means the original number is divisible by 5.

This means our speculation about the Pythagorean triples is in fact a pattern that is always true. We could call it a "theorem", but I'm going to save that word for

Fact: Every Pythagorean triple has at least one number that is divisible by 5.

Here are two other facts that can be proved using similar methods, which the reader can try to prove if interested.

Fact: Every Pythagorean triple has at least one number that is divisible by 3.

Fact: Every Pythagorean triple has at least one number that is divisible by 2 and one number  divisible by 4.

We don't have any other primes for which this is true. That is proved simply by the triple 3, 4, 5. All the primes greater than 5 (7, 11, 13, 19, ...) do not divide any of these numbers evenly, so we have a single counter-example that works for an infinite number of cases, a situation a lazy mathematician is always happy to find.

Friday, August 16, 2013

Another New York City mayoral poll.

A new poll for the New York City races was released Thursday night, this one from Marist, yet another small East Coast college that does polling.(The earlier polls were from Qunnipiac and Siena, also small East Coast colleges.)

The latest poll, completed on August 14 sampling 679 Democrats, puts City Council Speaker Christine Quinn and public advocate Bill DiBlasio in a tie at 24% and former comptroller William Thompson in third with 16%.

The rules of the race are that if any single candidate polls over 40%, he or she is declared the winner and goes onto the general election. If not, the top two vote getters are in a run-off.

If the election were held when the poll was taken, one candidate getting over the 40% mark is highly unlikely, so it's really a race for the top two spots. A week ago, that looked like Quinn and Thompson, but the two polls this week say Quinn and DiBlasio. The event that is considered the turning point is the decision by a judge that "stop and frisk" policies are unconstitutional and DiBlasio's long held objection to the policy in New York City is his key advantage.

Marist also polled the Republican race, where Joe Lhota has a large lead but not polling over 40% yet. He could win without need for a run-off depending on how the undecided vote breaks, but he is not given a very good chance against whoever survives the Democratic race.

There will certainly be more polls between now and the election and I will give updates when available.

Thursday, August 15, 2013

Another rule about Pythagorean triples.

Last January, I wrote several posts about Pythagorean Triples, a set of three whole numbers a, b and c that follows the rule of the Pythagorean Theorem a² + b² = c². Here are some examples.

3² +4² = 5²
8² + 15² = 17²
24² + 7² = 25²

If we think about these numbers as sides of a triangle, the area will be ½ab. In the cases from above, the areas as 6, 60 and 84, respectively.

There is a way to generate all such whole number triples, using two positive integers j and k, with j>k.

a = j² - k²
b = 2jk
c = j² + k²

Our first rule is that we should stipulate that j does not equal k, because in the that case a = 0. Re-writing a and b in the area formula into multiples of j and k, we get this.

Area = ½ab = ½(j² - k²)2jk = (j² - k²)jk

Since j and k are whole numbers, the area must be a whole number. But more than that the number must be divisible by 6. The proof is split into two parts, first that the area is divisible by 2 and secondly that the area is divisible by 3.

Proof of the area is divisible by 2. If j or k is even, then (j² - k²)jk is even, since if you have a product of whole numbers and one is even, the product is even. The only other option is that both j and k is odd, and if that is the case then j² - k² = odd - odd = even.

Proof of the area is divisible by 3. If j or k is a multiple of 3, then (j² - k²)jk is a multiple of 3. The only other option is that neither j and k is a mulitple of 3. For any number n that isn't a multiple of 3,  n² will be of the form 3p + 1, which is to say one more than some multiple of 3. If that is the case then j² - k² = (3p + 1) - (3q + 1) = 3(p- q), which is a multiple of 3 as well.

Wednesday, August 14, 2013

More election news:
Results from New Jersey, new polls for New York City mayor

The results from the primary for the New Jersey special election came in and there were no surprises. Cory Booker won the Democratic nomination handily with 59% while his closest rival had 20%. Republican Steve Lonegan beat his single opponent Alieta Eck 79% to 21%.

The Booker vs. Lonegan election will be held on Wednesday, Oct. 16. Two months is a long time in politics, but currently Booker is leading comfortably in all the polls that have asked about this particular match-up.

The New York City mayoral race is much more up in the air. The Democratic primary polls taken this month by Siena and Quinnipiac don't agree on much except the fall of Anthony Weiner from the top tier of candidates.

The rules of the primary state that if a single candidate gets more than 40% of the vote, he or she is declared the winner. Otherwise the top two candidates are in a run-off. Here are the results for the top three candidates in the most recent polls.

Siena (8/2 to 8/7)
Quinn 25%
DiBlasio 14%
Thompson 16%

Quinnipiac (8/7 to 8/12)
DiBlasio 30%
Quinn 24%
Thompson 22%

My Confidence of Victory method does have ways of turning each of these sets of numbers into probabilities of victory for all three of these candidates, and as you might expect the Siena numbers from earlier in the month would make Quinn the prohibitive favorite while the Quinnipiac number would hand the big favorite role to DiBlasio.

Quinn has been the favorite for several months, never polling less than second. Similar to the 2012 Republican presidential primary, there is a remarkable amount of what I call "churn" in the numbers. If I stretched the analogy a little farther (perhaps too far, I can't be sure), this would put Christine Quinn, the openly gay speaker of the New York City Council in the role of Romney, while Weiner and DiBlasio, the  New York City public advocate and considered the most liberal candidate, in the roles of the many Not Romneys that the Republican field had to offer.

The last five elections for mayor have been won by a Republican (Giuliani) and a former Republican turned Independent (Bloomberg), but the current assumption is that the star power is gone from the Republican ranks currently and the Democratic primary winner will move to Gracie Mansion.

I've read analyses by Harry Enten and Nate Silver, both of whom are very keen on historical track records. Maybe it's because I'm older than either of them or I've just seen so many predictions done on flimsy data turn out so badly, but I'm not convinced the historical record is the right thing to look at. An issue like "stop and frisk" has a strong chance to be a game changer, and that would work to DiBlasio's advantage.

Enten believes it's all about ethnic politics, which gives Thompson,who is black, an edge over DiBlasio. Silver thinks Quinn's position as front runner for most of the year gives her the inside track. I claim no expertise about New York politics, but I believe more strongly in event driven elections being the correct modern model and that historical data is often so much balloon juice.

I'll post again when there is new data.

I will keep track of these races through the primary on September 10, the likely run-off on October 1 and the general election on November 5.

Wednesday, August 7, 2013

The not much election news to report.

My Internet crashed on the day of the reporting of the Massachusetts special Senate election. My prediction said Ed Markey looked like a prohibitive favorite and he won. The median poll said he was leading by seven to ten points, he won by ten.

The other two elections in the news are the governor's race in Virginia and the special Senate election in New Jersey. There hasn't been any news since the middle of July in Virginia, where McAuliffe is still leading in the median polls.

There is news from New Jersey, but it is of zero interest to those who want an exciting race. Newark mayor Cory Booker has a huge lead in the Democratic primary and former Bogota mayor Steve Lonegan looks to be a shoo-in in the Republican primary. In the general election, Booker has a massive lead as well, though there is a long time between now and the October 16 special election, strangely scheduled for a Wednesday.

Thursday, August 1, 2013

inifinite sums of power series

Yesterday, it was shown that 1/2 + 1/4 + 1/8 + ... = 1. Today we are going to look at the power series for any number x between -1 and 1 (not including 0) for x^0 + x^1 + x^2 + ...

We are going to use a trick called telescoping.

(1 - x)(1 + x + x²) can be split the positive terms and the negative terms.

Positive: 1 + x + x²
Negative: - x - x² - x³

The two of the positive and negative terms cancel out and we are left with just  1 - x³.

If we have a number x between -1 and 1, as the powers increase towards infinity the number gets closer to zero. Dividing both sides by (1 - x) tells us the sum will be equal to 1/(1 - x).

For example, if x = ½, the sum 1 + 1/2 + 1/4 + 1/8 + ... = 1/(1 - ½) = 1/½ = 2.

if x = -½, the sum 1 - ½ + 1/4 - 1/8 + ... = 1/(1 - (-½)) = 1/(3/2) = 2/3.

There are many other methods used in infinite sums, but this is one of the most basic.

Wednesday, July 31, 2013

An short introduction to infinite sums.

On first blush, it might seem that if you add an infinite number of positive numbers together, the sum must be unbounded. But if the numbers are getting small enough fast enough, the sum of an infinite series can be both finite and exactly defined.

Probably the simplest infinite sum is the powers of ½. Let's say that half of a room is painted in the first hour, and half of what remains is painted in the second hour, so now 3/4 of the room is painted. For reasons unknown, you decide to let this increasingly lazy person continue their plan of painting less and less each hour, one eighth in the third hour, one sixteenth in the fourth hour, on and on infinitely. I say infinitely because mathematically the room is never finished. (In reality, we will get to a sliver so tiny that it must be painted completely, if only because a molecule of paint once split isn't a molecule of paint anymore.)

In math, this is well understood now to be an infinite sum that adds up to 1. You might fairly say "we never get there", which is true. This was the point of one of the famous paradoxes of the ancient Greek mathematician and philosopher Zeno. But in modern math we have the concept of a limit as n approaches infinity. We can agree that the sum never gets to be more than 1, and the idea of a limit is that you get to choose how close you want the sum to be to 1 and I have to give a number of steps which will guarantee a sum that is closer than your given definition of "close enough". For example, if we want the sum to be within one millionth of 1, which is to say larger than .999999 but still less than 1, I can guarantee that after twenty steps, the sum will be that large, because ½ raised to the twentieth power is slightly less than one millionth.

Tomorrow, one of the tricks used for the general power series of positive numbers less than 1.

Monday, July 29, 2013

The top of the bell shaped curve

This is the bell shaped curve from x ranging from -3 to 3. Note that this is NOT the normal curve, the famous workhorse of statistics. In math, "normal" usually means there is something about object whose measure is 1. This curve has a highest point at (0,1) but the normal curve has an area between the curve and the x-axis of 1. (In math, we would talk about this as the value of the definite integral from negative infinity to infinity.)

While this is not the normal curve, one calculus related attribute it shares with the normal curve is that the points of inflection are at x = -1 and x = 1.

Here is how the curve compares to the top of the unit circle from -1 to 1 and the parabola y = 1 - x² over the same range.

The curve that is very close to the same shape when the bottom points (-1, 0) and (1, 0) and top point (0, 1) are lined up the cosine function. The bell shaped curve is shown in blue dots and the cosine function in purple dashes. The most they disagree by is about .00647, a little more than six parts in one thousand. Still, they do disagree and are not the same function as far as mathematicians are concerned.

Sunday, July 28, 2013

The catenary and the parabola.

If the two ends of a chain are held fast at the same height, what is the shape of the hanging section? This question has been around for a long time. Galileo speculated it was a parabola, but some experimentation showed him that wasn't accurate.

The name of the shape is the catenary. The simplest equation for the shape is the hyperbolic cosine function, y = ½(e^x + e^-x). Unlike sine and cosine that oscillate us and down, hyperbolic cosine has just the one lowest point, like a parabola, and hyperbolic sine looks something like the cubic. The thing that makes them sort of like the trig functions is that each is the derivative and anti-derivative of the other. (I write "sort of" because the derivative of sine is cosine, but the derivative of cosine is negative sine.)

As pretty as the tangent line representations can be, looking at the actual functions is clearer in this case.

Red curve is the catenary from x = -3 to 3.

The blue shape is y = x² + 1.

The 1 added so they coincide more closely. A catenary grows slightly more slowly than the parabola at the vertex, but it is an exponential function, so when it starts to grow faster, it will overwhelm a quadratic function very quickly.

This drawing is the catenary in red and the parabola in blue yet again, but this time from -5 to 5. The shape of the hanging chain depends on the ratio of the length of the chain to the distance apart of the points from which the shape hangs. To get the exact comparison of the catenary and parabola with those values given is a harder thing to compute and describe, as it involves the arc length formula for the parabola and the catenary, which turn into messy integrals.

Tomorrow we will look at the top of the bell shaped curve before the second derivative goes to zero and compare it to the top of a circle.

Saturday, July 27, 2013

Tangents of the lower half of a circle

The bottom half of a circle has tangents that don't look anything like the tangents of a parabola or the trough of a sine wave. Most notably, the last two tangent lines, the one on the left and the one on the right, are both vertical.

Both the parabola and the circle have tangents that are neatly spaced out, but the parabola seen here only has slopes that range from -2 to 2, while the circle has slopes from negative infinity to infinity.

Tomorrow, a somewhat lesser known curve sometimes mistaken for a parabola, the catenary.

Friday, July 26, 2013

tangents of a sine wave trough

These are the tangent lines defined by the trough of a sine wave, with x between -pi and 0. Over this particular stretch, the curve might be hard to distinguish from the parabola from x varying from -1 to 1, but a close look at the tangent patterns should show how the two curves are different. Notice where the first downward sloping red tangent line meets the last upward sloping purple line.

Here in the sine wave curve, the first few tangents on the left are almost on top of each other, as are the last few tangents on the right.

Here are the parabola tangents again. The slopes of these tangents are increasing steadily, while the sine curve tangents max out at a slope of 1 and begin to get smaller as we leave the trough.

Tomorrow, another famous "round valley" curve, the bottom half of a circle.