Election analysis

November, 2022
 

Summary
Outcome
Project steps
Project files

Summary

In election campaigns, voters essentially come in two varieties: “persuadables” and “committeds.” In general election campaigns, the vast majority of voters—the percentage depends largely on the type of election—are “committeds.” Campaign organizations often understand “committeds” as voters who will always vote for our candidate or never vote for our candidate. (This is based almost exclusively on which party the voter prefers.)

One might think “persuadables”—often called “independents” and “swing voters” (“swing” because they swing from one party to another between elections)—would be a larger percentage, based on the perception, perhaps, that voters don’t make up their minds till they’ve considered the candidates carefully. The reality is, in a general election, that most voters have pretty much ruled out one of the two candidates early on—they’d have a lot of trouble voting for the Republican or, conversely, the Democrat.

Simply speaking, campaign organizations look to persuade the “persuadables” and identify and motivate the “committeds” to come out and vote.

In this project, a class assignment from November 2022, I evaluated Presidential election data from the MIT Election Data Science Lab. I imagined myself a data analyst for some political group interested in identifying persuadable voters and persuading them to vote for a particular Presidential candidate in, say, the 2024 election. My task was to identify US counties where the group might have an impact.

Excel
Power Query
Tableau

Outcome

In the course of my analysis, I decided to focus on a select group of states from the “Rust Belt.” From that subset of some 3,000 US counties, cities, and parishes, I identified slightly over three dozen which showed the highest degree of “swing” over the course of the Presidential elections between 2000 and 2020.

From here, my hypothetical political group would look at these counties and assess which ones they were best suited to influence. Those decisions, most likely, would be based on things such as shared demographics, or member ties to the counties, or similar.

Project steps
  1. Mine the data.

    I found county election results from the MIT Election Data Science lab for the Presidential elections between 2000 and 2020. At the same site, I found data for the 50 states and the District of Columbia for the Presidential elections between 1976 and 2020.

    I also was curious about turnout in Presidential elections and ultimately found data in Vital Statistics on American Politics.
     
  2. Wrangle the data

    I imported the CSV files into Excel using Power Query in order to give me maximum control over data types and the import process.

    In reviewing the data, I discovered that the 2000 election results from Florida did not match the final outcome, making it appear that Al Gore had won the election. Fortunately, Wikipedia had the correct results.

    There were some inconsistencies with county identifications that needed to be fixed; a handful of county identities were inconsistent across election years.

    Finally, I “denormalized” the data so that it would be easier to analyze in Tableau. I also added some calculations (such as percent of votes, identifying the winner by county, by state and overall).
     
  3. Analyze the data

    The finished results from Step 2 were exported as CSV files for use in Tableau. I find that Tableau’s performance is better with CSV files than with Excel workbooks.

    I created a series of visualizations of Presidential elections from 1976 and 2020, as well as a single chart showing Presidential election voter turnout since 1792.
Project files