This page summarizes what I know about voter data and systems for accessing it and was current as of April, 2008. Things may be different now.
In general, the data are compiled by individual County Registrars
of Voters, who then provide their data to the California Secretary of
State. I believe that the State imposes some requirements for the
minimum amount of data that must be reported, and I know that
individual County Registrars can, and sometimes do, maintain more than
the minimum required data.
This page has three major sections:
Aggregate
Data and Election Results
Samples of Raw Data
Working with the
Detailed Data
Consolidated information about voter turnout percentages and election results are available online from the California Secretary of State and the San Diego County Registrar of Voters. There are no doubt other sources, but this is from the horse's mouth, so to speak.
The California Secretary of State website is at http://www.sos.ca.gov/elections/elections.htm
At the moment, that page looks like this:
The pages you'll want to check out are the Voter Registration Statistics, under the Voter Registration heading, and the Election Results, under Candidates and Elections. Those will give you information about the total number of registered voters by party, and election results and turnout statistics.
The San Diego Registrar of Voters home page is here: http://www.sdvote.com
As of 4/6/2008, that page looks like this:
You'll want to check out the link to Past Elections, for detailed reports of turnout and results of elections, and the link to Reports on line for general turnout and registration reports.
Most of us will never see what the data look like "under the hood" of a voter lookup system, but I think it's helpful to know what the underlying data look like.
My entry in the San Diego County Registrar of Voters database looks like this:
status | Abbr | affidavit | last_voted | name_prefix | name_last | name_first | name_middle | name_suffix | house_number | house_fraction | pre_dir | street | type | post_dir | building_number | apartment_number | city | state | zip | precinct | portion | consolidation | alpha_split | party | reg_date | image_id | phone_1 | phone_2 | military | gender | PAV | source | birth_place | birth_date | care_of | mail_street | mail_city | mail_state | mail_zip | mail_country | ltd | language | drivers_license | reg_date_original | perm_category | confidential | IDRequired | Citizen | UnderAge | precinct_name | hDist | sDist | 01 09/25/2007 special election 72 | 02 06/05/2007 city of vista special 71 | 03 03/06/2007 special election 70 | 04 11/07/2006 gubernatorial general 68 | 05 06/06/2006 gubernatorial primary 67 | 06 04/11/2006 special primary - 50th cong. district 66 | 07 01/10/2006 city of san diego special run-off 65 | 08 11/08/2005 special statewide 64 | 09 07/27/2005 sheriff reserve payroll 53 | 10 07/26/2005 city of san diego - spec muni election 63 | 11 06/07/2005 city of oceanside - spec muni election 61 | 12 05/03/2005 ramona mwd special election 62 | 13 03/08/2005 rainbow mwd #4 recall election 60 | 14 02/15/2005 city of santee special municipal election 59 | 15 01/04/2005 city of san diego special run-off election 58 | 16 11/16/2004 special municipal election 57 | 17 11/02/2004 presidential general 56 | 18 03/02/2004 presidential primary 51 | 19 10/07/2003 statewide special 50 | 20 11/05/2002 gubernatorial general 47 | |
A | V-C29 | BE083007 | POWELL | STEVEN | 9999 | ANYSTREET | ST | 2 | SAN DIEGO | CA | 92116 | 255550 | 0 | DEM | 1/3/2006 | 1002571 | 8585512021 | N | M | Y | CA | 1/1/1901 | 9999 ANYSTREET 2 | SAN DIEGO | CA | 92116 | 7/27/2006 | 5/13/2001 | PERM | N | HILLCREST | A | A(DEM) | A | A(DEM) | A(DEM) | V(DEM) | V | V |
If you scroll to the right, you'll see there are voter history data for 20 past elections. You can also see that there's a code in there saying "DEM," so you can see that in THIS database, you would be able to identify swing voters by finding voters who have voted on different parties' ballots in the past. The statewide data, described in more detail below, do NOT seem to contain this information.
You'll also notice a field for email address. Mine happens to be blank, but I can tell you that about 15% of the entries DO have an email address. Not a large percentage, and many of them are no doubt wrong, but it does suggest a cheap way to contact a lot of people. I'm not sure about the legality of using this field -- I know that it's legal to use the phone numbers, so it seems that using the email address would also be legal. You'll notice that the statewide data, again, do not contain this field.
I heard from someone who works with these data a lot that the registrar doesn't often bother to update phone numbers, and my data proves this out -- the phone number is a work phone from about 10 years ago. The rest of the data are current and correct.
My entry in the California statewide voter database as of about December 2007 looks like this:
Locality Code | Registrant ID | Last Name | First Name | Middle Name | Name Suffix | Addr Num | Addr Num Suff | St Dir Prefix | Street Name | Street Type | St Dir Suffix | Unit Type | Unit Number | City | State | Zip | Telephone (Area Code) | Telephone Exchange | Telephone Number | Mailing Address1 | Mailing Address2 | Mailing City | Mailing State | Mailing Zip | Language | Date of Birth | Gender | Party | Status | Status Reason | Registration Date | Precinct | Precinct Part | Registration Method Code | Assistance Flag | Place of Birth | Section Township Range Direction | Previous Registration ID | Previous County Code | Previous Last Name | Previous First Name | Previous Middle name | Previous Name Suffix | Previous Residence Street Number | Previous Residence Street Number Suffix | Previous Residence Street Name | Previous Residence Street Direction Prefix | Previous Residence Street Direction Suffix | Previous Residence Street Type | Previous Residence Unit Type | Previous Residence Unit Number | Previous Residence City | Previous Residence State | Previous Residence Zip | Name Prefix | Previous Name Prefix | Col058 | Elec1 | Elec2 | Elec3 | Elec4 | Elec5 | Elec6 | Elec7 | Elec8 | Non Standard Address |
37 | 1002571 | POWELL | STEVEN | 999 | ANYSTREET | ST | 2 | SAN DIEGO | CA | 92116 | 858 | 551 | 2021 | 01/01/1901 | M | DEM | A | 01/03/2006 | 258000 | 0 | M | CA | GG6 | GP6 | SS5 | PG4 | PP4 | SS3 | GG2 | PG0 |
The statewide database contains less detail than the San Diego County database -- note that there is no email field, and no data for local elections. If you scroll to the far right of the table above, you'll see the codes for my voting history in the last eight elections under the columns headed Elec1, Elec2, ... Elec8. The Decoder for the Election data is below.
Election Code | Election Description |
PP | Presidential Primary |
PG | Presidential General |
GP | Gubernatorial Primary |
GG | Gubernatorial General |
SS | Special Statewide |
CP | Congressional District Special Primary |
CG | Congressional District Special General |
SP | Other Legislative District Special Primary |
SG | Other Legislative District Special General |
It looks like there's no way to tell which party's ballot I pulled for primaries. Since there are entries in all 8 columns, you can tell that I voted in 8 elections, but I've looked at other people's records and they put the codes for past elections in DIFFERENT columns! For example, in the Elec08 column for me, it says PG0 (for Presidential General in 2000), but for someone else it might say GP2 (for Gubernatorial Primary 2002)! This is a crazy way to store data but that's how they do it. From looking at other records, it appears that what they store is the data for up to the last 8 times that the person voted, starting in the first column, Elec1, for the last election in which they voted, and moving across to Elec2, Elec3, etc. for the next most recent times. A sample is below:
Voter | Elec1 | Elec2 | Elec3 | Elec4 | Elec5 | Elec6 | Elec7 | Elec8 |
My Voter History | GG6 | GP6 | SS5 | PG4 | PP4 | SS3 | GG2 | PG0 |
Someone Else's History | SS3 | GP2 | PG0 | PP0 | GG98 | GP98 | PG96 |
For
most of us, we'll want to buy or otherwise get access to a
system
that's already set up to allow us to query the data and produce reports
by precinct or whatever our interest is. The reason for this, in a
nutshell, is the size of the database. Since I do happen to know
something about trying to work with the raw data, I've include some
details about that at the end of this article. For now, let's just talk
about using an existing system.
Commercial products exist that give access to voter data, allowing you to create precinct walk lists or phone lists, search for voters matching certain criteria, etc. From what I understand at the moment, the national Democratic Party has chosen a system called VAN -- Voter Activation Network, website at http://www.voteractivationnetwork.com/. Apparently, the party has negotiated for discounted rates for state party organizations to get access to the system. I don't have details on how this works.
It seems that the California system is accessed from California VoterConnect, website at http://www.cavoterconnect.com/. This is a bit mysterious to me, since on the home page of this site it sounds like a completely independent project, but the link to sign on to the system says "Sign into the VAN," so it must be related. I've talked to embers of local democratic clubs and some of them do have access to some system or other, so I imagine it's this one. From what I gather, it's supposed to be made available at least to candidates endorsed by the party.
As mentioned above, the statewide data are assembled from reports from the individual counties, which means that the state can never have more information than the county, and the county MAY have more information than the state. In the case of San Diego, the county data are indeed more comprehensive than what's in the state database. As mentioned above, the San Diego County data contain an email address field, whereas the State database doesn't store that at this time. However, the cost for the data is quite different: When I last checked in about December 2007, you could get the database for the whole state for $30.00, whereas data for all of San Diego County cost over $400.00. Furthermore, the County data may be slightly more up-to-date, since they have to put it together first before the state can incorporate it into the statewide database. However, I have it on good authority that it only takes a week or two for the state to incorporate new data from a county, so this shouldn't be a big problem.
If cost is an issue, keep in mind that buying data from the County gets you -- you guessed it -- County data. If what you're really interested in is a Congressional District, you're going to have to contact the registrar for every county your district touches. Maps of Congressional Districts are at http://www.calvoter.org/voter/maps/index.html (Be sure to check out the link to Archived 1991 district maps at the top of that page to see how the districts were gerrymandered for the latest boundaries. Apparently this was a joint effort by both Democrats and Republicans to maintain their seats in Congress.)
OK, so you can purchase the raw data from either the Secretary of State or from the San Diego County Registrar of voters. If you go this route, you're going to have a challenge due to the number of voter records (unless of course you only order some smaller subset of the data, which you can do). San Diego County currently has about 1.3 million voters, and the California state database is over 15 million. Either of these is far too big for an Excel 2003 spreadsheet, which will only accept 65,536 rows, and even Excel 2007, which can only handle 1,048,576 rows. Microsoft Access, however, should be able to handle the whole 1.3 million records from the County database. As for the statewide database, Access 2003 has a file size limit of 2Gb. There MAY be some way to get most or all of the 15 million state records into it, since the compressed text file is only about 750Mb. However, the uncompressed text file is nearly 3 Gb, and I can tell you that trying to import it directly into Access failed. When you go to buy the data from the state, they do warn you that you're not going to be able to do anything with it unless you have a real database program and some skill. Being a geek, I have tried to work with it, with some success. Here's what I know:
Skip this paragraph if you're not a computer geek because I'm not going to try to make this entirely understandable. The California statewide database is supplied as a tab delimited text file. I tried importing it using the import wizard in SQL Server 2005, and had nothing but problems. I WAS, however, able to easily import it using the wizard in SQL Server 2000. I think I used an evaluation version of the full product, not that other stripped down version that's free to use but has limits on database size and number of concurrent users, so I'm not sure if that one will work. THEN, I copied the table from the SQL Server 2000 database into a SQL Server 2005 database. Once it was imported, being a novice user, I had major performance problems until I learned to and built some indices. Once I'd built some indices, it was no problem to use ODBC from Access to query the data. (Before building indexes, Access could FIND the table but any attempt to read it or query it would, after a couple minutes, produce nothing but an error window.) I haven't yet tried accessing the data in that original SQL Server 2000 database so I can't say with authority whether building indexes in SQL Server 2000 would also make the data accessible via ODBC from Access, but I'd guess it would. A final note: I have it on good authority that the RegistrantID field in the California database is not, on its own, a key field, since individual county registrars assign it. I was, however, successful in creating a composite key using Locality Code and Registrant ID.