Why Nate Silver’s Forecasts Are Bad

cherrypicking.jpg

Much has been made of Nate Silver’s forecasts regarding the 2016 Democratic presidential primary contest.  If you are a Sanders supporter you are probably sick and tired of people citing articles like this one, where Silver says that the chances of Bernie winning are very slim.  This article by Paul Krugman, titled ‘Feel the Math’, has been turned into a hash tag.

Annoying Hillary Supporter

It seems like the more Sanders wins, the louder these calls for him to pull out get.  I love Krugman more than most liberals, because I studied his work extensively while getting my economics degree, but at the same time, it seems like Hillary supporters don’t understand math.  They do understand math, but they do math badly in order to support their candidate.

Hillary’s only platform seems to be that Bernie is not a viable candidate.  Her supporters can’t debate Sanders supporters on issues, so they try this magic numbers gambit to make us lose heart so we give up hope and vote for Hillary.  The more viable Bernie becomes the louder they have to shout that he can’t win, which is what motivates them to cynically stick their fingers in their ears and shout “lalala” while doing math badly.  They don’t see or don’t care that Bernie is winning by large margins in recent weeks, and has raised more money than Hillary since January.

That’s why I’m explaining the math here, rather than brushing Hillary supporters aside like the 9/11 truthers they behave like.  There are much better ways to support a candidate than to do math badly.  I hope they find one soon, because we can’t win the general election if we can’t unify as a party behind Bernie Sanders.

Lets begin with some facts.

Fact 1:  Nate Silver is correct in over 90% of his predictions

Sorry, fellow Bernie supporters, but it’s true.  If you look purely at Nate Silver’s predictions, he predicts the winner of any given primary contest accurately 90.48% of the time, on average.  When he predicts what percentage of the vote will go for Hillary Clinton in any given state contest, she only underperforms his estimates by 0.87 percentage points.  Bernie Sanders overperforms his forecasts by an average of 2.63 percentage points.  Sure, there are contests where Silver is off by as much as 11 and 12 percent, like in Michigan, but overall, the mean of these errors is less than 10%.  Those numbers are pretty good.

Fact 2:  Nate Silver maintains his winning record by cherry picking the contests he makes predictions for.

There are 57 total Democratic presidential primary contests in 2016, including territories like Guam and Democrats Abroad contests.  Nate Silver has made 26 predictions about these state contests, total, so far.  5 of those predictions are about future contests, and 21 of them are about past contests.

When he predicts who will win a contest, Silver has so far predicted Clinton wins with 100% accuracy.  He has only predicted Sanders wins with 50% accuracy.  His cherry picking turns this record into a success rate of over 90%.

Pie Chart-o-rama

I put together a series of pie charts showing the proportion of contests Silver has predictions for, because we all need a little more pie in our diets.

Total Contests Forecasted.jpgPast Contests Forecasted.jpgFuture Contests Forecasted.jpg

 

I know what you Hillary supporters are saying… “Nate Silver’s argument against Sanders is that he can’t win enough delegates in the future, so the number of contests doesn’t matter, it’s the delegates that matter!”  Well, I made a special pie chart just for that argument.  It shows that Nate Silver only forecasts about 60% of the remaining delegates.

Future Delegates Forecasted.jpg

What does this mean?  Lets get into the math…

Counting Delegates

There are 2073 delegates left to win.  Sanders needs 1403, and Clinton needs 1140 of them to win the nomination outright, without an open convention.  To put it another way, to win, Sanders needs to get 67.68% of the remaining delegates, and Clinton needs to get 54.99% of the remaining delegates.  Silver has made predictions for 60% of the vote, which is barely enough for Hillary to win the party nomination.

This means that Silver has forecasted that Hillary can win, but he has not forecasted whether or not Sanders can win, because he’s almost 8 percentage points shy of predicting the delegates Sanders needs to win.

Reality vs Nate Silver’s Democratic Primary

Silver doesn’t see the same election happening that you and I see, because he does not track very many contests where Sanders wins.  The chart below shows the real election so far.  It looks pretty tight to me, with Sanders and Clinton trading leads all the time.

Full Results.jpg

The next chart shows only the contests that Silver watched.

Silvers Results.jpg

See the difference?  Silver is only seeing Bernie beating Hillary a handful of times, because he’s not watching when Bernie wins an election.  With Silver’s data set I might be inclined to think that Sanders doesn’t have a chance to win the nomination, too, but I have access to better information than Silver has, because I watch the news.

Cherry Picking vs Lack of Information

On his web site, Silver says that he does not predict contests that he lacks polling data for.  Is that fair?  I don’t think so.

Nate Silver is a professional forecaster, working for the New York Times, one of the biggest, most well funded journalism organizations on the planet.  He should be able to either pick up a phone and find out how the election is going in Guam or at least be able to make a qualitative prediction, without any polls.  Qualitative predictions are difficult to make well, but they easy to make.  We make them all the time.  Nobody makes a spreadsheet when we decide whether or not to carry an umbrella to work in the morning, we just look outside and make a judgement call about it.

Here’s an example:  I’m not a professional forecaster, and I haven’t seen any polls for the Puerto Rico primary.  I know that Puerto Rico pays no federal taxes, but they do receive federal benefits.  That means that they love the kinds of social programs that Sanders advocates for, so Sanders should win big in Puerto Rico.  I don’t know by how much, and you can say my prediction lacks rigor, but I think most people can agree that it’s not a completely irrational prediction.

If we assume that good polls truly do not exist for the contests that Nate Silver ignores, then why can’t he do good qualitative analysis?  Is he deliberately ignoring Sanders wins in order to campaign for Hillary, or is he really hamstrung by his training as a quantitative analyst and the limitations of modern opinion polling processes?  I don’t know, but I’d like to give him the benefit of the doubt and say he’s trapped by training and circumstances.

That sympathy doesn’t mean that I trust Silver’s forecasts.  It just means that I understand why he’s having problems.  As a professional forecaster, he needs to address his qualitative blind spot before I trust his forecasts.

Clinton vs Sanders vs The Polls

Some people think that Silver is deliberately skewing his results to favor Clinton.  For example, Counter Punch seem to really have it in for Silver.  They say he skews data to favor Clinton by 12.8%.  I think that looks like a little spin on their part.

If we look only at the opinion polls that Silver follows, we find that both candidates overperform the opinion poll projections.  As we move closer to a contest, they overperform by less and less, but they still overperform.

The graph below shows that Clinton overperforms Silver’s opinion polls by between 2.27 percentage points and 3.79 percentage points.  Sanders overperforms Silver’s opinion polls by between 5.08 percentage points and 19.22 percentage points, depending on how far ahead of election day the opinion poll was released.Overperformance.jpg

 

The data points I used for the horizontal axis correspond with the number of days before an election that the opinion poll was released.  I picked these numbers of days before the election for technical reasons, which I explain in my third note on methodology below.

For now, what you have to know about the chart above is that the orange line above zero on the far left represents the number of percentage points Sanders overperformed opinion polls released on election day, and the blue line represents the number of percentage points Clinton overperformed an opinion poll that was released on election day.  On the far right, the orange line above 72 shows how much Sanders overperformed an opinion poll released 72 days before an election, and the blue line shows how much Clinton overperformed an opinion poll released 72 days before election day.

Polls that are released 72 days before an election are wildly inaccurate.  Right now, the California opinion poll released on 3/27/2016 shows Clinton with 47.7% support and Sanders with 39.8% support.  That poll was released 72 days before the California primary election.  If this poll has the same predictive power of previous opinion polls that were released 72 days before an election, then Sanders will win California by 8.05 percentage points.  I personally think that Sanders will win by more than that, but if we just go by the polls that Silver follows, and expect them to perform in the future as well as they have performed in the past, then Clinton will not win California.

In Conclusion…

Sanders isn’t dropping out.  Nobody is abandoning him.  He has been consistently winning elections for the past month.  He is behind in delegates, but if he continues to win with even a fraction of the 42.27 percentage point margins he’s won with since mid-March, on average, then Clinton can not win the nomination.

That said, the election is close.  Perhaps Sanders didn’t turn a corner.  Maybe Clinton will bounce back like she has this entire election cycle, with the candidates trading leads back and forth all the time.  In that case, this is still anybody’s contest.  If you look at the chart above, the one with the real election results (not Silver’s cherry picked results), you will see how close this election is.

This is not a blowout.  We have no clear winner right now.  Only 54.87% of the delegates have been pledged for either candidate, so Clinton supporters should worry less about the numbers and focus on what is important in a presidential election:  Issues.

If somebody likes Clinton’s stances on the issues then that’s ok, but deflecting into the realm of sacred geometry and magic numbers, based on flawed analysis from a forecaster with a weak record… that just makes them look stupid.  Don’t leverage our broken educational system to electioneer with bad math.  Mathematics is beautiful when done correctly.  Please, do a little research, discover what your candidate thinks about the issues and let the numbers fall where they may.  I have faith in Hillary supporters.  I know they aren’t as dumb as you look when they repeat Silver’s projections without thinking.  Just try to consider each candidate’s platform and support the one you agree with.

debate_the_issues.png

Notes on Methodology

Note 1:  I used this data from the New York Times for my information on delegates and outcomes in this election so far.  I used Nate Silver’s forecasts from here to do my analysis.

Note 2:  In this analysis I will not discuss unpledged delegates (AKA ‘super delegates’), because they simply don’t matter.  Unpledged delegates have not even voted in a democratic presidential primary since 1984, and they can change their mind any time they want.  If it comes down to an open convention, they will go with the strongest candidate.  In that scenario, if Sanders wins the majority of pledged delegates, the unpledged delegates will flock to him.

Note 3:  I charted the election day performance of each candidate with respect to polls taken x number of days before the election, so the bottom axis represents days before the election.  The zero on the far left represents polls released on election day.

I began doing this analysis on April 3, 2016.  From April 3, the Wisconsin primary is 2 days away, Wyoming is 6 days away, New York is 16 days away, Connecticut, Delaware, Maryland, Pennsylvania and Rhode Island is 23 days away, Indiana is 30 days away, West Virginia is 37 days away, Kentucky and Oregon are 44 days away, Puerto Rico is 63 days away, California, New Jersey, New Mexico and South Dakota are 65 days away, and Washington DC is 72 days away.

In other words, I charted how accurate the polls have been before previous contests in order to try to predict how accurate they might be in future contests.  These numbers are the average of how much each candidate overperforms, so while Sanders might have overperformed polls that were 44 days old on election day by over 40 percentage points in Utah, he also underperformed on polls of the same age in Massachusetts by 0.197 percentage points.  I added up all the overperformance and underperformance numbers and took the mean.

I did this in order to do a regression, which I will probably blog about later, if I have interesting conclusions.

Note 4:  Take my spreadsheets!  Download them here, and do whatever you want with them.  I consider them public domain.  They aren’t pretty, but it’s a fun data set to play with.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s