Chapter 7 State Voter Files

7.1 Motivation

Who one votes for is private, but when someone votes is a matter of public record. A state’s Secretary of State (SOS) maintains a public record of who voted. The accompanying data may vary state by state. For example, the state of New Hampshire does not record the age of the person voting. Companies such as Catalist have built their business around aggregating, cleaning, and enhancing public voter files. However, these databases can be quite expensive and are not updated at the same pace as the state voter file.

7.2 Overview

In this section we will demonstrate how to programatically download and clean the state voterfile. From there, we will identify potential voters and canvassing targets. Once those targets have been identified, they can be uploaded as a list into VAN based on their SOS ID which should be present depending on your provider.

The OH SOS website provide links to the voter file which are updated frequently. These files were last updated on July 19th, 2019 as of today (July 21st, 2019). We will combine the webscraping tools (rvest) that we previously learned and combine them with dplyr and data.table—a package that works exceptionally well with large data, see the wiki for a primer.

7.3 Exercise

As always, let’s set up our workspace.

library(rvest)
library(data.table)
library(dplyr)

The first step is to use rvest to identify the URLs that we can use to download the voter-file. To do this, use the inspector tool in your web browser and identify the HTML elements of interest. In this use case, we will specify the highlight-row class and the a (anchor) tag. The a tag is used to create hyperlinks within a document. The href attribute is used to specify the hyperlink. We can extract the href attribute using rvest::html_attr().

7.3.1 Identifying download links

session <- html_session("https://www6.sos.state.oh.us/ords/f?p=VOTERFTP:STWD:::#stwdVtrFiles")

html_nodes(session, ".highlight-row a") %>% 
  html_attr("href")

## [1] "f?p=VOTERFTP:DOWNLOAD::FILE:NO:2:P2_PRODUCT_NUMBER:363"
## [2] "f?p=VOTERFTP:DOWNLOAD::FILE:NO:2:P2_PRODUCT_NUMBER:364"
## [3] "f?p=VOTERFTP:DOWNLOAD::FILE:NO:2:P2_PRODUCT_NUMBER:365"
## [4] "f?p=VOTERFTP:DOWNLOAD::FILE:NO:2:P2_PRODUCT_NUMBER:366"

Notice that this does not actually provide the full URL that is needed, but rather the query parameters. We can iteratively append this to the base url—in this case https://www6.sos.state.oh.us/ords/—using purrr::map().

file_urls <- html_nodes(session, ".highlight-row a") %>% 
  html_attr("href") %>% 
  purrr::map_chr(~paste0("https://www6.sos.state.oh.us/ords/", .))

file_urls

## [1] "https://www6.sos.state.oh.us/ords/f?p=VOTERFTP:DOWNLOAD::FILE:NO:2:P2_PRODUCT_NUMBER:363"
## [2] "https://www6.sos.state.oh.us/ords/f?p=VOTERFTP:DOWNLOAD::FILE:NO:2:P2_PRODUCT_NUMBER:364"
## [3] "https://www6.sos.state.oh.us/ords/f?p=VOTERFTP:DOWNLOAD::FILE:NO:2:P2_PRODUCT_NUMBER:365"
## [4] "https://www6.sos.state.oh.us/ords/f?p=VOTERFTP:DOWNLOAD::FILE:NO:2:P2_PRODUCT_NUMBER:366"

With that, we now have the file paths for the four statewide voter file data sets. For the sake of computation and memory, we will only work with the first URL.

7.3.2 Download the voter file

Now that we have the links to the voter file, we can use R to download them using download.file(). The first argument is the link to the file, the second is the destination of that file. Here we are saving the first file into the data folder. You can also do this iteratively using purrr::map().

# download the file as a .txt.gz (that's the file format if you download it from the web)
download.file(file_urls[1], destfile = "data/SWVF_1_22 (Adams-Erie).txt.gz")

Note that the file is in .gz compressed format. We can uncompress this using R.utils::gunzip() which will return the compressed file.

R.utils::gunzip("data/SWVF_1_22 (Adams-Erie).txt.gz")

data.table reads text files using fread(). Since this file is very large, we will take only the first 10,000 observations. Do this by setting the nrows argument to 10000. Since data.table is an extension upon the data.frame, we can combine both data.table aand dplyr functions.

7.3.3 Voter file exploratoration

swvf <- fread("data/SWVF_1_22 (Adams-Erie).txt", nrows = 10000)

Do your due dilligence and look at all 106 columns. This is a great time to embrace dirty data. Are the data tidy?

glimpse(swvf)

## Observations: 10,000
## Variables: 106
## $ SOS_VOTERID                   <chr> "OH0016238254", "OH0019414074", "O…
## $ COUNTY_NUMBER                 <int> 6, 2, 6, 9, 18, 13, 18, 18, 13, 4,…
## $ COUNTY_ID                     <int> 21511, 1010005, 40055, 482703, 204…
## $ LAST_NAME                     <chr> "KUETHER", "GEMLICK", "KITCHEN", "…
## $ FIRST_NAME                    <chr> "BARBARA", "JODI", "LESLIE", "AMAN…
## $ MIDDLE_NAME                   <chr> "A", "LYN", "L", "LEIGH", "J", "L"…
## $ SUFFIX                        <chr> "", "", "", "", "", "", "", "", ""…
## $ DATE_OF_BIRTH                 <chr> "1969-11-15", "1972-08-19", "1969-…
## $ REGISTRATION_DATE             <chr> "1998-02-23", "2007-01-29", "2008-…
## $ VOTER_STATUS                  <chr> "ACTIVE", "ACTIVE", "ACTIVE", "ACT…
## $ PARTY_AFFILIATION             <chr> "", "", "", "", "", "", "", "D", "…
## $ RESIDENTIAL_ADDRESS1          <chr> "725 OAKWOOD DR", "3464 WOODHAVEN …
## $ RESIDENTIAL_SECONDARY_ADDR    <chr> "", "", "", "", "", "", "", "", ""…
## $ RESIDENTIAL_CITY              <chr> "MINSTER", "LIMA", "ST MARYS", "MO…
## $ RESIDENTIAL_STATE             <chr> "OH", "OH", "OH", "OH", "OH", "OH"…
## $ RESIDENTIAL_ZIP               <int> 45865, 45806, 45885, 45050, 44105,…
## $ RESIDENTIAL_ZIP_PLUS4         <int> NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ RESIDENTIAL_COUNTRY           <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ RESIDENTIAL_POSTALCODE        <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ MAILING_ADDRESS1              <chr> "", "", "", "", "", "", "", "", ""…
## $ MAILING_SECONDARY_ADDRESS     <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ MAILING_CITY                  <chr> "", "", "", "", "", "", "", "", ""…
## $ MAILING_STATE                 <chr> "", "", "", "", "", "", "", "", ""…
## $ MAILING_ZIP                   <int> NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ MAILING_ZIP_PLUS4             <int> NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ MAILING_COUNTRY               <chr> "", "", "", "", "", "", "", "", ""…
## $ MAILING_POSTAL_CODE           <chr> "", "", "", "", "", "", "", "", ""…
## $ CAREER_CENTER                 <chr> "", "APOLLO CAREER CENTER", "", ""…
## $ CITY                          <chr> "", "", "ST. MARYS CITY", "MONROE …
## $ CITY_SCHOOL_DISTRICT          <chr> "", "", "ST MARYS CITY SD", "", "C…
## $ COUNTY_COURT_DISTRICT         <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ CONGRESSIONAL_DISTRICT        <int> 4, 4, 4, 8, 11, 2, 11, 16, 2, 14, …
## $ COURT_OF_APPEALS              <int> 3, 3, 3, 12, 8, 12, 8, 8, 12, 11, …
## $ EDU_SERVICE_CENTER_DISTRICT   <chr> "AUGLAIZE COUNTY ESC", "ALLEN COUN…
## $ EXEMPTED_VILL_SCHOOL_DISTRICT <chr> "", "", "", "", "", "", "", "", ""…
## $ LIBRARY                       <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ LOCAL_SCHOOL_DISTRICT         <chr> "MINSTER LOCAL SD (AUGLAIZE)", "SH…
## $ MUNICIPAL_COURT_DISTRICT      <chr> "", "LIMA", "", "", "CLEVELAND", "…
## $ PRECINCT_NAME                 <chr> "PRECINCT MINSTER N", "SHAWNEE H",…
## $ PRECINCT_CODE                 <chr> "06AAZ", "02AFU", "06AAE", "09-P-A…
## $ STATE_BOARD_OF_EDUCATION      <int> 1, 1, 1, 3, 11, 10, 11, 5, 10, 7, …
## $ STATE_REPRESENTATIVE_DISTRICT <int> 84, 4, 82, 53, 9, 65, 12, 16, 65, …
## $ STATE_SENATE_DISTRICT         <int> 12, 12, 1, 4, 21, 14, 25, 24, 14, …
## $ TOWNSHIP                      <chr> "JACKSON TOWNSHIP", "Township Shaw…
## $ VILLAGE                       <chr> "MINSTER VILLAGE", "", "", "", "",…
## $ WARD                          <chr> "", "", "ST. MARYS WARD 3", "", "C…
## $ `PRIMARY-03/07/2000`          <chr> "", "", "", "", "", "", "", "", ""…
## $ `GENERAL-11/07/2000`          <chr> "", "", "", "", "", "", "X", "X", …
## $ `SPECIAL-05/08/2001`          <chr> "", "", "", "", "", "", "", "", ""…
## $ `GENERAL-11/06/2001`          <chr> "", "", "", "", "", "", "X", "", "…
## $ `PRIMARY-05/07/2002`          <chr> "", "", "", "", "", "", "", "", ""…
## $ `GENERAL-11/05/2002`          <chr> "", "", "", "", "", "", "X", "", "…
## $ `SPECIAL-05/06/2003`          <chr> "", "", "", "", "", "", "", "", ""…
## $ `GENERAL-11/04/2003`          <chr> "", "", "", "", "", "", "", "", ""…
## $ `PRIMARY-03/02/2004`          <chr> "", "", "", "", "", "", "D", "", "…
## $ `GENERAL-11/02/2004`          <chr> "", "X", "", "", "", "", "X", "", …
## $ `SPECIAL-02/08/2005`          <chr> "", "", "", "", "", "", "", "", "X…
## $ `PRIMARY-05/03/2005`          <chr> "", "", "", "", "", "", "", "", ""…
## $ `PRIMARY-09/13/2005`          <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ `GENERAL-11/08/2005`          <chr> "", "", "", "", "", "", "X", "", "…
## $ `SPECIAL-02/07/2006`          <chr> "", "", "", "", "", "", "", "", ""…
## $ `PRIMARY-05/02/2006`          <chr> "", "", "", "", "", "", "D", "", "…
## $ `GENERAL-11/07/2006`          <chr> "X", "", "", "", "", "", "X", "X",…
## $ `PRIMARY-05/08/2007`          <chr> "", "", "", "", "", "", "", "", ""…
## $ `PRIMARY-09/11/2007`          <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ `GENERAL-11/06/2007`          <chr> "", "", "", "", "", "", "", "", ""…
## $ `PRIMARY-11/06/2007`          <chr> "", "", "", "", "", "", "", "", ""…
## $ `GENERAL-12/11/2007`          <chr> "", "", "", "", "", "", "", "", ""…
## $ `PRIMARY-03/04/2008`          <chr> "", "", "", "", "", "", "D", "D", …
## $ `PRIMARY-10/14/2008`          <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ `GENERAL-11/04/2008`          <chr> "X", "", "", "X", "", "X", "X", "X…
## $ `GENERAL-11/18/2008`          <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ `PRIMARY-05/05/2009`          <chr> "", "", "", "", "", "", "", "", ""…
## $ `PRIMARY-09/08/2009`          <chr> "", "", "", "", "", "", "", "", ""…
## $ `PRIMARY-09/15/2009`          <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ `PRIMARY-09/29/2009`          <chr> "", "", "", "", "", "", "", "", ""…
## $ `GENERAL-11/03/2009`          <chr> "", "", "", "X", "", "", "X", "", …
## $ `PRIMARY-05/04/2010`          <chr> "", "", "", "", "", "", "", "", ""…
## $ `PRIMARY-07/13/2010`          <chr> "", "", "", "", "", "", "", "", ""…
## $ `PRIMARY-09/07/2010`          <chr> "", "", "", "", "", "", "", "", ""…
## $ `GENERAL-11/02/2010`          <chr> "", "", "", "X", "", "X", "X", "X"…
## $ `PRIMARY-05/03/2011`          <chr> "X", "", "", "", "", "", "", "", "…
## $ `PRIMARY-09/13/2011`          <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ `GENERAL-11/08/2011`          <chr> "", "", "", "X", "", "X", "X", "",…
## $ `PRIMARY-03/06/2012`          <chr> "", "", "", "", "", "", "D", "", "…
## $ `GENERAL-11/06/2012`          <chr> "X", "", "X", "X", "", "X", "X", "…
## $ `PRIMARY-05/07/2013`          <chr> "", "", "", "", "", "X", "", "", "…
## $ `PRIMARY-09/10/2013`          <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ `PRIMARY-10/01/2013`          <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ `GENERAL-11/05/2013`          <chr> "", "", "", "X", "", "", "X", "", …
## $ `PRIMARY-05/06/2014`          <chr> "", "", "", "", "", "", "D", "", "…
## $ `GENERAL-11/04/2014`          <chr> "", "", "", "X", "", "X", "X", "X"…
## $ `PRIMARY-05/05/2015`          <chr> "", "", "", "", "", "", "", "", ""…
## $ `PRIMARY-09/15/2015`          <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ `GENERAL-11/03/2015`          <chr> "", "", "", "X", "", "", "", "", "…
## $ `PRIMARY-03/15/2016`          <chr> "", "", "", "", "", "", "", "D", "…
## $ `GENERAL-06/07/2016`          <chr> "", "", "", "", "", "", "", "", ""…
## $ `PRIMARY-09/13/2016`          <chr> "", "", "", "", "", "", "", "", ""…
## $ `GENERAL-11/08/2016`          <chr> "", "X", "", "X", "", "X", "X", "X…
## $ `PRIMARY-05/02/2017`          <chr> "", "", "", "", "", "", "", "", ""…
## $ `PRIMARY-09/12/2017`          <chr> "", "", "", "", "", "", "X", "X", …
## $ `GENERAL-11/07/2017`          <chr> "", "", "", "", "", "", "", "X", "…
## $ `PRIMARY-05/08/2018`          <chr> "", "", "", "", "", "", "", "", ""…
## $ `GENERAL-08/07/2018`          <chr> "", "", "", "", "", "", "", "", ""…
## $ `GENERAL-11/06/2018`          <chr> "", "", "X", "X", "", "", "X", "X"…
## $ `PRIMARY-05/07/2019`          <chr> "", "", "", "", "", "", "", "", ""…

Nope! They are not tidy. Currently each row represents one voter and the events are on the columns following the format ELECTION TYPE-MM/DD/YYYY. The date of the election should be its own column as should the type (i.e. general, primary, or special). These data need to be converted to a longer format where each row is a unique combination of one voter and election. This means that if an individual voted in five elections, there would be five observations.

There are two main steps that need to be taken to tidy up these data. Collect the column headers and their respective values into two new columns: election, and party. Since the column headers contain two variables of interest, namely the election type and election date, the column will be split into two.

Usually we would use gather() or pivot_longer() from tidyr. However, these data are quite large and we need to write preformative code. data.table uses data.table::melt() to preform the same operation but is much faster. In this example we specify the id columns. These are the columns that will not be gathered into a new column. We also specify the name of the variable that will be created from the column headers with the variable.name argument. Next, we specfiy the name of the column that will contain the values.

sw_gathered <- swvf %>% 
  melt(id = 1:46, variable.name = "election", value.name = "party") %>% 
  janitor::clean_names()

Preview the results of melt(). The example below uses dplyr::distinct() to identify unique value pairs.

# Now we can view the results. The election column can be easily split into two different columns. The election type (i.e. primary or general) and the date of the election. Next, the result column looks like there is a lot of room for cleaning. There are "" where NA should be. 
distinct(sw_gathered, election, party)

##                election party
##   1: PRIMARY-03/07/2000      
##   2: PRIMARY-03/07/2000     D
##   3: PRIMARY-03/07/2000     X
##   4: PRIMARY-03/07/2000     R
##   5: PRIMARY-03/07/2000     L
##  ---                         
## 170: GENERAL-11/06/2018     X
## 171: PRIMARY-05/07/2019      
## 172: PRIMARY-05/07/2019     X
## 173: PRIMARY-05/07/2019     R
## 174: PRIMARY-05/07/2019     D

Notice that there are multiple values for party. What are these?

# what are the unique values?
distinct(sw_gathered, party)

##    party
## 1:      
## 2:     D
## 3:     X
## 4:     R
## 5:     L
## 6:  <NA>
## 7:     C
## 8:     G

7.3.4 Tidying

The SOS website has a downloadable data dictionary. The Voter File Layout says:

The data dictionary Variable filed name with election type and date of each election. Value for this field indicates how the voter voted in that election.

Abbr.	Party Name
C	Constitution Party
D	Democrat Party
E	Reform Party
G	Green Party
L	Libertarian Party
N	Natural Law Party
R	Republican Party
S	Socialist Party
X	Voted without declaring party affiliation
Blank	Indicates that there is no voting record for this voter for this election

We can use this to clean up the party field using case_when(). To create two separate columns for the election type and date, we can split on the first -. tidyr::separate() will split the column into two or more columns based on the sep argument. Once the election_date column has been created, we will need to parse it accordingly using lubridate. Since the date column is formatted as MM/DD/YYYY we can use lubridate::mdy() to parse it to class Date.

sw_clean <- sw_gathered %>% 
  tidyr::separate(election, into = c("election_type", "election_date"), sep = "-") %>% 
  mutate(election_date = lubridate::mdy(election_date),
         party = case_when(
           party == "C" ~ "Constitution",
           party == "D" ~ "Democrat",
           party == "E" ~ "Reform",
           party == "G" ~ "Green",
           party == "L" ~ "Libertarian",
           party == "N" ~ "Natural Law",
           party == "R" ~ "Republican",
           party == "S" ~ "Socialist",
           party == "X" ~ "Independent"
         ))

head(sw_clean)

##    sos_voterid county_number county_id last_name first_name middle_name
## 1 OH0016238254             6     21511   KUETHER    BARBARA           A
## 2 OH0019414074             2   1010005   GEMLICK       JODI         LYN
## 3 OH0019419095             6     40055   KITCHEN     LESLIE           L
## 4 OH0019489283             9    482703     GRACE     AMANDA       LEIGH
## 5 OH0015384921            18   2044314    CARNER    TIFFANY           J
## 6 OH0020115764            13   6100757 VAN SCYOC      SUSAN           L
##   suffix date_of_birth registration_date voter_status party_affiliation
## 1           1969-11-15        1998-02-23       ACTIVE                  
## 2           1972-08-19        2007-01-29       ACTIVE                  
## 3           1969-12-26        2008-01-09       ACTIVE                  
## 4           1974-11-09        2008-02-01       ACTIVE                  
## 5           1971-08-28        2016-05-25       ACTIVE                  
## 6           1973-08-01        2008-09-18       ACTIVE                  
##   residential_address1 residential_secondary_addr residential_city
## 1       725 OAKWOOD DR                                     MINSTER
## 2    3464 WOODHAVEN LN                                        LIMA
## 3      122 CONCORD AVE                                    ST MARYS
## 4 932 SLEEPY HOLLOW DR                                      MONROE
## 5    13902 BENWOOD AVE                                   CLEVELAND
## 6    1090 S MUSCOVY DR                                    LOVELAND
##   residential_state residential_zip residential_zip_plus4
## 1                OH           45865                    NA
## 2                OH           45806                    NA
## 3                OH           45885                    NA
## 4                OH           45050                    NA
## 5                OH           44105                    NA
## 6                OH           45140                    NA
##   residential_country residential_postalcode mailing_address1
## 1                  NA                     NA                 
## 2                  NA                     NA                 
## 3                  NA                     NA                 
## 4                  NA                     NA                 
## 5                  NA                     NA                 
## 6                  NA                     NA                 
##   mailing_secondary_address mailing_city mailing_state mailing_zip
## 1                        NA                                     NA
## 2                        NA                                     NA
## 3                        NA                                     NA
## 4                        NA                                     NA
## 5                        NA                                     NA
## 6                        NA                                     NA
##   mailing_zip_plus4 mailing_country mailing_postal_code
## 1                NA                                    
## 2                NA                                    
## 3                NA                                    
## 4                NA                                    
## 5                NA                                    
## 6                NA                                    
##                career_center           city        city_school_district
## 1                                                                      
## 2       APOLLO CAREER CENTER                                           
## 3                            ST. MARYS CITY            ST MARYS CITY SD
## 4                               MONROE CITY                            
## 5                                           CLEVELAND MUNICIPAL CITY SD
## 6 GREAT OAKS CAREER CAMPUSES                                           
##   county_court_district congressional_district court_of_appeals
## 1                    NA                      4                3
## 2                    NA                      4                3
## 3                    NA                      4                3
## 4                    NA                      8               12
## 5                    NA                     11                8
## 6                    NA                      2               12
##   edu_service_center_district exempted_vill_school_district library
## 1         AUGLAIZE COUNTY ESC                                    NA
## 2            ALLEN COUNTY ESC                                    NA
## 3                                                                NA
## 4           BUTLER COUNTY ESC                                    NA
## 5                                                                NA
## 6                                                                NA
##         local_school_district municipal_court_district
## 1 MINSTER LOCAL SD (AUGLAIZE)                         
## 2    SHAWNEE LOCAL SD (ALLEN)                     LIMA
## 3                                                     
## 4    MONROE LOCAL SD (BUTLER)                         
## 5                                            CLEVELAND
## 6                                                     
##           precinct_name precinct_code state_board_of_education
## 1    PRECINCT MINSTER N         06AAZ                        1
## 2             SHAWNEE H         02AFU                        1
## 3 PRECINCT ST. MARYS 3A         06AAE                        1
## 4              MONROE 2      09-P-AKN                        3
## 5        CLEVELAND-02-Q      18-P-ALJ                       11
## 6      MIAMI TOWNSHIP X      13-P-ACY                       10
##   state_representative_district state_senate_district          township
## 1                            84                    12  JACKSON TOWNSHIP
## 2                             4                    12  Township Shawnee
## 3                            82                     1 ST MARYS TOWNSHIP
## 4                            53                     4    LEMON TOWNSHIP
## 5                             9                    21                  
## 6                            65                    14         MIAMI TWP
##           village             ward election_type election_date party
## 1 MINSTER VILLAGE                        PRIMARY    2000-03-07  <NA>
## 2                                        PRIMARY    2000-03-07  <NA>
## 3                 ST. MARYS WARD 3       PRIMARY    2000-03-07  <NA>
## 4                                        PRIMARY    2000-03-07  <NA>
## 5                 CLEVELAND WARD 2       PRIMARY    2000-03-07  <NA>
## 6                                        PRIMARY    2000-03-07  <NA>

7.3.5 Creating targets

One thing that you will need to do is identify your potential voter base. These voters are sometimes referred to as your “targets” or your voter “universe”. Each voter is a unique culmination of experience, opinions, and biases. As such, not every person will be willing to vote for your candidate or maybe to even vote at all. Some people are habitual voters who vote in every election and always along party lines. Others may only sometimes vote in a general election. And others might not vote consistently along party lines.

Due to the never ending complexities that are peoples’ preferences and persuasions it is important to break the pool of potential voters down into smaller groups which I will refer to as tiers. We will break our voters into three distinct tiers. These categorizations are rather crude. Work with your state leadership team to determine the best way to segment your voter base.

Tier 1. Base voters - registered Democrats who have voted Democrat in a primary Tier 2. Motivation - are not registered with a party but have voted Democrat in a primary Tier 3. Persuasion - folks that have voted Democrat in a priimary but aren’t registered as such

To begin this process, we want to identify everyone who has voted Democrat in a primary. We will filter sw_clean to these criteria. Then we will count the number of times each voter has voted for a Democrat. This will then be joined back to the original data frame so we can identify those people who have voted Democrat even if it conflicts with their party registration.

# One thing that you will need to do is to identify your target base voters. These will most likely be those who have voted Democrat in previous primaries. Then we will compare those people with their registered party. 
primary_dems <- sw_clean %>% 
  filter(election_type == "PRIMARY",
         party == "Democrat") %>% 
  count(sos_voterid, party)

head(primary_dems)

## # A tibble: 6 x 3
##   sos_voterid  party        n
##   <chr>        <chr>    <int>
## 1 OH0010002426 Democrat     1
## 2 OH0010004408 Democrat     1
## 3 OH0010012148 Democrat     1
## 4 OH0010012441 Democrat     3
## 5 OH0010013144 Democrat     1
## 6 OH0010015404 Democrat     1

Now that we have a table of everyone who has voted for a Democrat in a primary, we need to join this back onto the original voter file. The motivation for this is that by joining back onto the orginal table we can then use some of the other information that it provides such as party affiliation and birth year, among others.

# join these back to the original voter file to get additional information 
potential_targets <- inner_join(primary_dems, swvf, 
                                by = c("sos_voterid" = "SOS_VOTERID")) %>% 
  janitor::clean_names() %>% 
  select(sos_voterid, party_affiliation, precinct_name,
         date_of_birth, registration_date, party, n) %>% 
  mutate(age = as.integer((Sys.Date() - lubridate::ymd(date_of_birth)) / 365))

select(potential_targets, sos_voterid, precinct_name, party_affiliation, age, n) %>% 
  arrange(-n) %>% 
  head()

## # A tibble: 6 x 5
##   sos_voterid  precinct_name          party_affiliation   age     n
##   <chr>        <chr>                  <chr>             <int> <int>
## 1 OH0010322183 PRECINCT ASHTABULA 2-A D                    68    14
## 2 OH0015532776 ATHENS 3-4             D                    64    14
## 3 OH0010329359 PRECINCT ASHTABULA 5-A D                    57    13
## 4 OH0014653051 EAST CLEVELAND-03-C    D                    79    13
## 5 OH0014799002 PARMA-04-A             D                    60    13
## 6 OH0014805561 PARMA-05-A             D                    66    13

Let’s create some cross tables to identify how many times people of each party voted for a Demorat in a primary.

# generate some cross-tabs. Everyone loves cross tabs.
count(potential_targets, party_affiliation, party) %>% 
  mutate(prop = round(n / sum(n), 2))

## # A tibble: 4 x 4
##   party_affiliation party        n  prop
##   <chr>             <chr>    <int> <dbl>
## 1 ""                Democrat  1093  0.3 
## 2 D                 Democrat  1828  0.51
## 3 G                 Democrat     4  0   
## 4 R                 Democrat   663  0.18

What is most telling from this table is that 30% of individuals who voted for a Democrat in the primary have no party affiliation. Now that we have a data frame with voter IDs and their registered party. We can begin segmenting this group based on the afore mentioned definitions. This will be done with a basic case_when() statement.

# Tier 1 base dems will be those that are registered dem and have voted dem
# we  will tier our targets

# Tier 2 will be our motivation group. 
# Those that aren't registered with a party, but have voted Dem


# Tier 3 will be our persuasion group. 
# These are the folks that have voted Dem but aren't registered as such. 

# export thiws and upload to VAN or other management software. 
targets <- potential_targets %>% 
  mutate(tier = case_when(
    party_affiliation == "D" ~ 1,
    party_affiliation == "" ~ 2,
    !party_affiliation %in% c("", "D") ~ 3
  ))

Now select voter ID and tier, write as a csv, and bulk upload into VAN!

## # A tibble: 3 x 5
##    tier med_age n_primaries tier_size     p
##   <dbl>   <dbl>       <int>     <int> <dbl>
## 1     1      59        8071      1828 0.700
## 2     2      56        2026      1093 0.176
## 3     3      62        1436       667 0.125