Log in

No account? Create an account
Previous Entry Share Next Entry
Lies, Damn Lies and Statistics
South Park Blue Suit
For those of you who remember back to your intro statistics class, out of curiosity at the General election, I did a quick linear regression in Excel trying to predict for a state, what % would vote for McCain based on just 2 inputs: Per Capita Income, and % Population that is Black.

I did it for the reason that McCain supporters have a stereotype among my "liberal elite" peers of being poor white trash.

Well... the regression was significant, and those 2 variables explained about 43% of the differences in what % polled for McCain from state to state, with the linear model predicting:

% Vote for McCain = 44.9% - 0.92% x Per capita income in thousands above the national mean - 0.11% x % of population that is black.

Still, the income variable was very significant p<0.0001, while the % black variable wasn't significant, so I tossed it out and re-ran with just income.

% Vote for McCain = 44.2% - 1.05% x Per capita income in thousands above the national mean

For example, the model would predict for a state with an average income that is $5K/year above the national average, that McCain would get = 44.2% - 5.25 = 39% of the vote. The model would predict for a state with a household income that is $10K below the national average, that McCain would get 44.2% - 10.5% = 54.7% of the vote.

Read into it what you will, but the poor = McCain supporter stereotype isn't exactly getting contradicted by the numbers.

For you stats geeks, yes, I'd admit that a logit model would be better, along with jacknife estimates for each state (building the model excluding state X, and predicting for state X, then comparing to see how the regression does to see if the state stands out), but explaining odds ratios from a logit model is a bitch, and I didn't have anything other than excel at my fingertips.

For all of you outlier junkies, the interesting states where the model fits the worst (misses the prediction by +/- 10%) are either in New England where the model overpredicts McCain support (Main, Vermonth, Rhode Island), in the cowboy country mid-west where the model underpredicts McCain support (Nebraska, Oklahoma, Utah, Wyoming), and in candidate home states (Hawaii & Alaska):

Home State Misses:
Alaska (underpredicts McCain support by 12% vs. polls - likely due to Palin)
Hawaii (overpredicts McCain support by 14% vs. polls - due to Obama growing up there)

New England Misses:
Maine (overpredicts McCain support by 11% vs. polls)
Rhode Island (overpredicts McCain support by 12.5% vs. polls)
Vermont (overpredicts McCain support by 12% vs. polls)

Midwest Misses:
Nebraska (underpredicts McCain support by 11.5% vs. polls)
Utah (underpredicts McCain support by 12% vs. polls)
Wyoming (underpredicts McCain support by 18% vs. polls)
Oklahoma (underpredicts McCain support by 13% vs. polls)

  • 1
What do the electoral votes add up to?

  • 1