Frequently asked questions
Questions and answers about this project and the underlying the CAL-ACCESS database.
Why is all this necessary?
California’s jumbled, dirty and difficult campaign-finance database, known as CAL-ACCESS, sprawls across 80 database tables and weighs in at more than 650 megabytes.
A significant effort is required to understand its esoteric structure and prepare the records for meaningful analysis.
That barrier blocks anyone seeking to interpret the information. The challenge of untangling the database requires weeks of study and significant guesswork, discouraging most analysts from daring—and raising the risk that those who do will make a critical error.
That is the problem we aim to solve by leading an open-source effort to perfect the numerous transformations, filters and computer operations necessary to refine the raw data into an easy-to-use product.
What is the California Civic Data Coalition?
The California Civic Data Coalition is an open-source team of journalists and computer programmers from news organizations across America.
The coalition was formed in 2014 by Ben Welsh and Agustin Armendariz to lead the development of open-source software that makes California’s public data easier to access and analyze. The effort has drawn hundreds of contributions from developers and journalists at dozens of competing news outlets.
Learn more on our about page.
How far back does the CAL-ACCESS database go?
According to the California Secretary of State, electronic disclosure documents started being filed in CAL-ACCESS on Jan. 1, 2000. Historical analysis of the database should start from that date, the documentation says.
Do you offer all tables in the CAL-ACCESS database?
No. We’ve compared the list of tables in the daily database dump that powers this project to what’s described in the official CAL-ACCESS documentation. There are as many 73 tables excluded.
Some of these missing tables have names or descriptions suggesting they could contain sensitive information, such as user credentials and bank account numbers. It’s understandable that these tables would not be released.
However, many of these tables contain information that we believe should be publicly available.
For instance, there is a series of tables that describe elections, races and candidates that are not included in the daily exports, even though the list of candidates for the current election is published on the CAL-ACCESS website.
When we reached out to the California Secretary of State, asking that they include any elections-related tables in the daily dumps, we were told that “[g]iven our current resource constraints, staff cannot modify the database export to include that other data.”
Here’s a sampling of missing tables we think should be made public:
Name | Description |
---|---|
CASH_RECON_REPORT_WRK | Table description contains this mysterious comment: "J M needs to describe this table. Cox - 4/28/2000" |
CODE_LIST | This table contains a list of CAL file codes. Examples include entity codes, office codes and expense codes |
CORRESPONDENCE | Table description contains this mysterious comment: "J M needs to describe this table. Cox - 4/28/2000" |
DISCLOSURE_PROCEDURES | Table description contains this mysterious comment: "J M needs to describe this table." |
ELECTION_CANDIDATES | This table indicates if a candidate for a given race is an incumbent. |
ELECTION_LINKS | No description |
ELECTION_RACES | No description |
ELECTION_TYPES | This table links election types and their descriptions. |
ELECTIONS | No description |
ERRORS_AND_OMISSIONS | This table contains results of audit checks and the validation process. |
FEDERAL_FORMS | Table used to log reciept of federal filings. |
FEES | Fees, descriptions and their value |
FILER_CORRESPONDENCE_BUILD2 | Table description contains this mysterious comment: "J M needs to describe this table." |
FILER_ELECTIONS | Table description contains this mysterious comment: "J M needs to describe this table. He indicates it is for future use." |
FILER_NOTICE_GENERATION_DEF | "J M needs to describe this table. He indicates it is for future use." |
FILER_OBLIGATIONS | Table description contains this mysterious comment: "J M needs to describe this table. He indicates it is for future use." |
FILER_TYPES_TO_FORMS | Table description contains this mysterious comment: "J M needs to describe this table. It is in his list of tables designed for future releases." |
FILING_ERROR_TYPES | This lookup table provides a cross reference for errors and their and messages. |
FILING_ERRORS | This table contains the errors assocated with a given filing and each of it's amendments. |
FILING_ID_TEMP | No description |
FORM_CODES | This lookup table assocates record types to forms. |
FORMS | This table describes the form set. |
LATE_CONT_IND_EXP_REPORT | Table description contains this mysterious comment: "J M needs to describe this table." |
LOCAL_FORMS | This table is used to log reciept of local paper filings. |
PRD_DATA_AUDIT | No description |
PRD_FINE_DETAIL | Detail information on how a fine was calculated. |
PRD_FINES | Fine summary data table. |
PRD_LIMITS | Table description contains this mysterious comment: "J M needs to describe this table." |
PRD_WAIVERS | Table description contains this mysterious comment: "J Mo needs to describe this table." |
TVIEW_CONTRIBUTIONS3 | Campaign Disclosure reporting tables. "Need to get DH's Documentation to describe." |
Do you offer all of the available data?
Almost. The raw data provided by the state contains some errors in how values are escaped, quoted and delimited. The result is that there are a small number of records we cannot yet automatically parse that are lost during our process.
However, according to our tracking, 99.9998% of records in the downloaded source files are offered here for download.
Do you modify the source data?
We make every effort to carefully parse and load the bulk CAL-ACCESS data from the state “as is.” Therefore, any undocumented modification of the data made during this process is considered a bug in our software.
There is one exception: We strip the time from any date field in the raw data. We consider this modification to be of little consequence since, for the most part, there is a time part of 12:00:00 AM for every value.
Based on our inspection of the raw data, very little information is being lost and whatever is lost has questionable utility.