sandeepk97 / Federal-Election-Commission-Database

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DMQL Milestone 2 Report

I have read and understood the course academic integrity policy in the syllabus of the class.

Team members:

  1. Ashwin Ashok (aashok3@buffalo.edu)
  2. Sandeep Kunusoth (skunusot@buffalo.edu)
  3. Shreyas Sujjaluru Lakshminarasimhan (ssujjalu@buffalo.edu)

Problem Statement:

The Federal Election Commission (FEC) is responsible for enforcing the campaign finance laws in the United States. As part of this responsibility, the FEC collects and publishes data on campaign contributions and expenditure. This data is used by journalists, researchers, and the public to monitor the influence of money in politics. The goal of this project is to create a comprehensive database of campaign finance data for the 2022-2023 election cycle. This database should be easily searchable and accessible to the public, journalists, and researchers. This data is crucial for building a platform to display the campaign contributions and expenditures from past years. It will be much simpler to maintain this data in databases, which also makes it possible to obtain findings instantly. Overall, it seems that the data elements in dataset are all related to the complex web of money and politics in the United States. By analyzing data together, it may be possible to gain insights into how money influences elections and which individuals and organizations are most involved in the political process.

Link: Federal Election Commission Database

Why Databases and not EXCEL?

EXCEL:

  • Large data is challenging to preserve due of size. Additionally, Excel has difficulty maintaining the large number of entities needed to establish links between the data.
  • Data relationships are difficult to maintain and are not possible.
  • It will not support the real time data.

DATABASE:

  • When it comes to manipulating and analyzing data, databases are significantly more powerful and adaptable than Excel. Data can be stored in a variety of formats, including text, photos, audio, and video.
  • It is flexible for the larger data.
  • The relations between the tables are easy.
  • The database facilitates real-time applications.
  • Using a database, we may conduct more complicated actions like connecting two tables, doing computations, and bulk updating. They also enable increased security and scalability.

Target User:

Journalists, Researchers, Advocacy Groups, Voters, and General Public comprise the end users. Journalists often rely on campaign finance data to uncover potential conflicts of interest and to report on the influence of money in politics. They may use the database to search for contributions from specific individuals or organizations, or to compare the fundraising and spending patterns of different candidates. Researchers in political science, economics, and other fields may use the database to analyze the impact of money on political campaigns, to identify trends in campaign finance, or to explore the relationship between political donations and policy outcomes. Advocacy groups may use the database to monitor the fundraising and spending activities of candidates or political parties, to track the activities of specific donors or interest groups, or to identify potential avenues for advocacy and engagement. Voters may use the database to research the candidates running in their districts or states, to learn more about the sources of funding for their campaigns, and to assess the potential influence of money on their elected representatives. The database can be accessed by anyone who is interested in learning more about the role of money in politics during the 2022-2023 election cycle in the United States. The data can be used to increase transparency and accountability in the political process and to inform public discourse on the issue.

Administrator: Federal Election commission maintains a public database, called the Electronic Filing System (EFS), which contains financial information provided by political committees and candidates.

Entity Relation Diagram (ERD)

Before we start designing the E/R diagram, we need to understand the tables in the dataset and their relationships. Based on the table names provided, we can identify the following entities:

Entities:

  1. Candidates
  2. Committees
  3. Contributors
  4. Campaigns
  5. Contributions
  6. Expenditures
  7. Transactions

Now, let's consider the relationships between these entities. Here are some possible relationships that we can identify based on the dataset:

Relationships:

Candidates – Committees:- Many to One

CurrentCampaignsForHouseandSenate - Candidates:- One to One

ContributionsByIndividuals - Committees:- Many to Many

IndependentExpenditures - Candidates:- Many to One

Committees – IndependentExpenditures:- One to Many

Committees – OperatingExpenditures:- One to Many

Committees – PAC&PartySummary:- One to One

Committees – AnyTransactionFromOneCommmitteeToAnother:- Many to Many

Based on these relationships, we can draw an E/R diagram that looks something like this:

In this diagram, we have entities represented as rectangles. The relationships between these entities are represented as lines connecting them.

NOTE:

In all the further sections green indicates the addition of new column/table post normalization and red indicates removal of column/table post the normalization.

Database Implementation:

Database Schema, Attributes, Primary key and Foreign Key:

  • candidate**

Description: The candidate contains information about individual candidates running for office. This will include data on the candidate’s name, party affiliation, campaign platform, and other relevant information. The committee relation contains the below attributes.

Attributes and Datatypes:

Column Name Column Type

Can be

Null

Default Description
CAND_ID VARCHAR(9) No ID of the candidate. First character indicates office sought - H=House, S=Senate, P=Presidential. Characters 3 & 4 are the state abbreviation for Congressional candidates. Example value (H0AK00105)
CAND_NAME VARCHAR(200) Yes '' Name of candidate. Example Value (Martha Washington)
CAND_PTY_AFFILIATION VARCHAR(3) Yes '' The political party affiliation reported by the candidate. Dem=Democrat
Rep=Republican
full list of party affiliations Example value (Dem)
CAND_ELECTION_YR Numeric(4) No Year of election. Example value: 2022
CAND_OFFICE_ST VARCHAR(2) Yes ''

Candidate state.

House = state of race
President = US
Senate = state of race

Example value: (VA)

CAND_OFFICE VARCHAR(1) Yes

''

Candidate Office.

H = House
P = President
S = Senate. Example value: (H)

CAND_OFFICE_DISTRICT VARCHAR(2) Yes

''

Candidate district. Congressional district number
Congressional at-large 00
Senate 00
Presidential 00. Example value: 01
CAND_ICI VARCHAR(1) Yes

''

Incumbent challenger status.

C = Challenger
I = Incumbent
O = Open Seat is used to indicate an open seat; Open seats are defined as seats where the incumbent never sought re-election.

Example value: I

CAND_STATUS VARCHAR(1) Yes

''

Candidate status.

C = Statutory candidate
F = Statutory candidate for future election
N = Not yet a statutory candidate
P = Statutory candidate in prior cycle.

Example value: C

CAND_PCC VARCHAR(9) Yes

''

Principal campaign committee. The ID assigned by the Federal Election Commission to the candidate's principal campaign committee for a given election cycle. Example value: C00100005
CAND_ST1 VARCHAR(34) Yes

''

Mailing address – street.

Example value: 1001 George Washington Hwy

CAND_ST2 VARCHAR(34) Yes

''

Mailing address – street2

Example value: Suite 100

CAND_CITY VARCHAR(30) Yes

''

Mailing address – city

Example value: Alexandria

CAND_ST VARCHAR(2) Yes

''

Mailing address – state

Example value: VA

CAND_ZIP VARCHAR(9) Yes

''

Mailing address - ZIP code

Example value: 22201

Primary Key: (CAND_ID) In the case of the candidate table, the CAND_ID column is the most appropriate choice for the primary key because it uniquely identifies each candidate. This is evident from the fact that the CAND_ID column has a unique value for each row in the table. Moreover, it is a unique identifier assigned by the Federal Election Commission (FEC) to each candidate and is used to track the candidate's fundraising and spending activities. Additionally, foreign keys in other tables that reference the candidate table can use the CAND_ID column to establish the relationship between the tables.

Schema:

CREATE TABLE candidate (

CAND_ID VARCHAR(9) NOT NULL PRIMARY KEY,

CAND_NAME VARCHAR(200),

CAND_PTY_AFFILIATION VARCHAR(3),

CAND_ELECTION_YR NUMERIC(4),

CAND_OFFICE_ST VARCHAR(2),

CAND_OFFICE VARCHAR(1),

CAND_OFFICE_DISTRICT VARCHAR(2),

CAND_ICI VARCHAR(1),

CAND_STATUS VARCHAR(1),

CAND_PCC VARCHAR(9),

CAND_ST1 VARCHAR(34),

CAND_ST2 VARCHAR(34),

CAND_CITY VARCHAR(30),

CAND_ST VARCHAR(2),

CAND_ZIP VARCHAR(9)

);

  • committee**

Description: The committee contains information about the various political committees involved in the election process. This includes data on the committee’s name, purpose, funding sources, and other relevant information. The committee relation contains the below attributes.

Attributes and Datatypes:

Column Name Column Type

Can be

Null

Default Description
CMTE_ID VARCHAR(9) No Committee identification. A 9-character alpha-numeric code assigned to a committee by the Federal Election Commission. Committee IDs are unique and an ID for a specific committee always remains the same. Example value: C00100005
CMTE_NM VARCHAR(200) Yes

''

Committee name . Example value: Martha Washington for Congress
TRES_NM VARCHAR(90) Yes

''

Treasurer's name. The officially registered treasurer for the committee. Example value: Alexander Hamilton
CMTE_ST1 VARCHAR(34) Yes '' Mailing address – street.Example value: 1001 George Washington Hwy
CMTE_ST2 VARCHAR(34) Yes

''

Mailing address – street2

Example value: Suite 100

CMTE_CITY VARCHAR(30) Yes

''

Mailing address – city

Example value: Alexandria

CMTE_ST VARCHAR(2) Yes

''

Mailing address – state

Example value: VA

CAND_ZIP VARCHAR(9) Yes

''

Mailing address - ZIP code

Example value: 22201

CMTE_DSGN VARCHAR(1) Yes

''

Committee designation. A = Authorized by a candidate
B = Lobbyist/Registrant PAC
D = Leadership PAC
J = Joint fundraiser
P = Principal campaign committee of a candidate
U = Unauthorized.

Example value: A

CMTE_TP VARCHAR(1) Yes

''

Committee type. List of committee type codes

Example value: H

CMTE_PTY_AFFILIATION VARCHAR(3) Yes

''

Committee party. List of party codes

Example value: NON

CMTE_FILING_FREQ VARCHAR(1) Yes

''

Filing frequency. A = Administratively terminated
D = Debt, M = Monthly filer, Q = Quarterly filer
T = Terminated, W = Waived.

Example value: Q

ORG_TP VARCHAR(1) Yes

''

Interest group category. C = Corporation
L = Labor organization
M = Membership organization
T = Trade association
V = Cooperative
W = Corporation without capital stock.

Example value: C

CONNECTED_ORG_NM VARCHAR(200) Yes

''

Connected organization's name.

Example value: ‘Widgets, Incorporated’

CAND_ID VARCHAR(9) Yes

''

Candidate identification. When a committee has a committee type designation of H, S, or P, the candidate's identification number will be entered in this field.

Example value: H1VA01225

Primary Key: (CMTE_ID) Using CMTE_ID as the primary key ensures that each row in the table is uniquely identifiable by its committee ID. It also allows for efficient searching and retrieval of data related to a specific committee using the primary key index. Furthermore, since CMTE_ID is a unique identifier assigned by the Federal Election Commission (FEC), it is a reliable and stable identifier that is unlikely to change over time, which makes it a good candidate for the primary key.

Foreign Key: (CAND_ID) This relationship exists because a committee can be associated with a candidate. A committee may support a candidate. Therefore, it is appropriate to include a foreign key constraint in the committee table that references the candidate table's primary key. Additionally, this foreign key relationship allows for easy querying of data related to a candidate's committees and contributions to those committees.

Schema:

CREATE TABLE committee (

CMTE_ID VARCHAR(9) NOT NULL PRIMARY KEY,

CMTE_NM VARCHAR(200),

TRES_NM VARCHAR(90),

CMTE_ST1 VARCHAR(34),

CMTE_ST2 VARCHAR(34),

CMTE_CITY VARCHAR(30),

CMTE_ST VARCHAR(2),

CMTE_ZIP VARCHAR(9),

CMTE_DSGN VARCHAR(1),

CMTE_TP VARCHAR(1),

CMTE_PTY_AFFILIATION VARCHAR(3),

CMTE_FILING_FREQ VARCHAR(1),

ORG_TP VARCHAR(1),

CONNECTED_ORG_NM VARCHAR(200),

CAND_ID VARCHAR(9)

);

  • Candidate- committee linkage**

Description: The candidate-committee linkage contains information linking the candidate's information to information about his or her committee**.** The candidate-committee linkage relation contains the bellow attributes.

Attributes and Datatypes:

Column Name Column Type

Can be

Null

Default Description
CAND_ID VARCHAR(9) No Candidate identification. A 9-character alpha-numeric code assigned to a candidate by the Federal Election Commission. The candidate ID for a specific candidate remains the same across election cycles as long as the candidate is running for the same office. Example value: C00100005
CAND_ELECTION_YR Numeric(4) No

Candidate election year. Example value: 2022
FEC_ELECTION_YR Numeric(4) No

FEC election year. Active 2-year period. Example value: 2022
CMTE_ID VARCHAR(9) No Committee identification. A 9-character alpha-numeric code assigned to a committee by the Federal Election Commission. The committee ID for a specific committee always remains the same. Example value: C00100005
CMTE_DSGN VARCHAR(1) Yes

''

Committee designation. A = Authorized by a candidate
B = Lobbyist/Registrant PAC
D = Leadership PAC
J = Joint fundraiser
P = Principal campaign committee of a candidate
U = Unauthorized.

Example value: A

CMTE_TP VARCHAR(1) Yes

''

Committee type. List of committee type codes

Example value: H

LINKAGE_ID Numeric(12) No

Linkage ID. Unique link ID

Example value: 123456789012

Primary Key: (LINKAGE_ID) The candidate_committee table serves as an association table that links the candidate and committee tables, which already have their own primary keys (CAND_ID and CMTE_ID, respectively). Therefore, the primary key for the candidate_committee table should be a unique combination of CAND_ID, CAND_ELECTION_YR, FEC_ELECTION_YR, CMTE_ID. For every unique combination of CAND_ID, CAND_ELECTION_YR, FEC_ELECTION_YR, CMTE_ID there is a unique LINKAGE_ID which should be the primary key. The CAND_ELECTION_YR and FEC_ELECTION_YR columns represent the year of the candidate's election and the year of the corresponding FEC election respectively.

Foreign Key: (CAND_ID, CMTE_ID). The candidate_committee table contains a link between a candidate and a committee, indicating which candidate has received contributions from which committee. Therefore, the foreign key in this table should reference the primary keys of the candidate and committee tables, which are CAND_ID and CMTE_ID, respectively. These foreign keys ensure that the link between a candidate and a committee is valid and references existing records in the candidate and committee tables.

Schema:

CREATE TABLE candidate_committee (

CAND_ID VARCHAR(9),

CAND_ELECTION_YR Numeric(4),

FEC_ELECTION_YR Numeric(4),

CMTE_ID VARCHAR(9),

CMTE_TP VARCHAR(1),

CMTE_DSGN VARCHAR(1),

LINKAGE_ID Numeric(12),

PRIMARY KEY (LINKAGE_ID),

UNIQUE (CAND_ID, CAND_ELECTION_YR, FEC_ELECTION_YR,CMTE_ID),

FOREIGN KEY (CAND_ID) REFERENCES candidate (CAND_ID),

FOREIGN KEY (CMTE_ID) REFERENCES committee (CMTE_ID)

);

  • contributor**

Description: The contributor contains the information of the contributor**.** The contributor relation contains the bellow attributes.

Attributes and Datatypes:

Column Name Column Type

Can be

Null

Default Description
CONTRIBUTOR_ID Varchar(9) No Contributor id
NAME Varchar(200) Yes Contributor/Lender/Trader Name
CITY Varchar(30) Yes City
STATE Varchar(2) Yes State
ZIP_CODE Varchar(9) Yes Zip Code
EMPLOYER VARCHAR(38) Yes Employer
OCCUPATION Varchar(38) Yes Occupation

Primary Key: (CONTRIBUTOR_ID) Using CONTRIBUTOR_ID as the primary key ensures that each row in the table is uniquely identifiable by its contributor ID. It also allows for efficient searching and retrieval of data related to a specific contributor using the primary key index.

Schema:

CREATE TABLE contributor (

CONTRIBUTOR_ID character varying(9),

NAME character varying(200),

CITY character varying(2),

STATE character varying(2),

ZIP_CODE character varying(9),

EMPLOYER character varying(38),

OCCUPATION character varying(38)

)

  • Contributionsbyindividuals

Description: The contributions by individuals relation contains information for contributions given by individuals.

Attributes & Datatyes:

Column Name Column Type

Can be

Null

Default Description
CMTE_ID Varchar(9) No A 9-character alpha-numeric code assigned to a committee by the Federal Election Commission
AMNDT_IND Varchar(1) Yes Indicates if the report being filed is new (N), an amendment (A) to a previous report or a termination (T) report.
RPT_TP Varchar(3) Yes Indicates the type of report filed
TRANSACTION_PGI Varchar(5) Yes This code indicates the election for which the contribution was made. 
IMAGE_NUM Varchar(18) Yes 18-digit Image Number Format
TRANSACTION_TP Varchar(3) Yes Transaction Type
ENTITY_TP Varchar(3) Yes Entity Type
CONTRIBUTOR_ID Varchar(9) Yes Contributor Id
NAME Varchar(200) Yes Contributor/Lender/Trader Name
CITY Varchar(30) Yes City
STATE Varchar(2) Yes State
ZIP_CODE Varchar(9) Yes Zip Code
EMPLOYER VARCHAR(38) Yes Employer
OCCUPATION Varchar(38) Yes Occupation
TRANSACTION_DT Date Yes Transaction. Date
TRANSACTION_AMT Numeric (14,2) Yes Transaction Amount
OTHER_ID Varchar(9) Yes For contributions from candidates or other committees this column will contain that contributor's FEC ID.
TRAN_ID Varchar(32) Yes Transaction ID
FILE_NUM Numeric(22) Yes Unique report id
MEMO_CD Varchar(1) Yes Memo code
MEMO_TEXT Varchar(100) Yes A description of the activity. Memo Text is available on itemized amounts on Schedules A and B.
SUB_ID Numeric(19) No FEC record number

Primary Key: (SUB_ID) In the case of the contributionbyindividuals table, the SUB_ID is the primary key as it uniquely identifies each contribution made by the individuals. Moreover, it is a unique identifier assigned by the Federal Election Commission (FEC) to record the individual’s contributions to committees.

Foreign Key: (CMTE_ID) The committee’s identification number is used to link the contributions made to the committees

Schema:

CREATE TABLE contributionsbyindividuals (

CMTE_ID character varying(9) NOT NULL,

AMNDT_IND character varying(1),

RPT_TP character varying(3),

TRANSACTION_PGI character varying(5),

IMAGE_NUM character varying(18),

TRANSACTION_TP character varying(3),

ENTITY_TP character varying(3),

CONTRIBUTOR_ID character varying(9),

NAME character varying(200),

CITY character varying(2),

STATE character varying(2),

ZIP_CODE character varying(9),

EMPLOYER character varying(38),

OCCUPATION character varying(38),

TRANSACTION_DT date,

TRANSACTION_AMT numeric(14,2),

OTHER_ID character varying(9),

TRAN_ID character varying(32),

FILE_NUM numeric(22,0),

MEMO_CD character varying(1),

MEMO_TEXT character varying(100), ** SUB_ID numeric(19,0) NOT NULL PRIMARY KEY,

FOREIGN KEY (CMTE_ID) REFERENCES Committee (CMTE_ID)

)

  • anytransactionfromcommitteetoanother

Description: This relation contains each contribution or independent expenditure that one committee gives to another during the two-year election cycle, including PACs, Party committees, Candidate committees or other federal committees

Attributes & Datatyes:

Column Name Column Type

Can be

Null

Default Description
CMTE_ID Varchar(9) No A 9-character alpha-numeric code assigned to a committee by the Federal Election Commission
AMNDT_IND Varchar(1) Yes Indicates if the report being filed is new (N), an amendment (A) to a previous report or a termination (T) report.
RPT_TP Varchar(3) Yes Indicates the type of report filed
TRANSACTION_PGI Varchar(5) Yes This code indicates the election for which the contribution was made. 
IMAGE_NUM Varchar(18) Yes 18-digit Image Number Format
TRANSACTION_TP Varchar(3) Yes Transaction Type
ENTITY_TP Varchar(3) Yes Entity Type
CONTRIBUTOR_ID Varchar(3) Yes Contributor Id
NAME Varchar(200) Yes Contributor/Lender/Trader Name
CITY Varchar(30) Yes City
STATE Varchar(2) Yes State
ZIP_CODE Varchar(9) Yes Zip Code
EMPLOYER VARCHAR(38) Yes Employer
OCCUPATION Varchar(38) Yes Occupation
TRANSACTION_DT Date Yes Transaction. Date
TRANSACTION_AMT Numeric (14,2) Yes Transaction Amount
OTHER_ID Varchar(9) Yes For contributions from candidates or other committees this column will contain that contributor's FEC ID.
TRAN_ID Varchar(32) Yes Transaction ID
FILE_NUM Numeric(22) Yes Unique report id
MEMO_CD Varchar(1) Yes Memo code
MEMO_TEXT Varchar(100) Yes A description of the activity. Memo Text is available on itemized amounts on Schedules A and B.
SUB_ID Numeric(19) No FEC record number

Primary Key: : (SUB_ID) In the case of the anytransactionfromonecommitteetoanother table, the SUB_ID is the primary key as it uniquely identifies each contribution made by the individuals. Moreover, it is a unique identifier assigned by the Federal Election Commission (FEC) to record the contributions to committees.

Foreign Key: (CMTE_ID) The committee’s identification number is used to link the contributions made from one committee to another committee

Schema:

CREATE TABLE anytransactionfromonecommitteetoanother (

CMTE_ID character varying(9) NOT NULL,

AMNDT_IND character varying(1),

RPT_TP character varying(3),

TRANSACTION_PGI character varying(5),

IMAGE_NUM character varying(18),

TRANSACTION_TP character varying(3),

ENTITY_TP character varying(3),

CONTRIBUTOR_ID character varying(9),

NAME character varying(200),

CITY character varying(2),

STATE character varying(2),

ZIP_CODE character varying(9),

EMPLOYER character varying(38),

OCCUPATION character varying(38),

TRANSACTION_DT date,

TRANSACTION_AMT numeric(14,2),

OTHER_ID character varying(9),

TRAN_ID character varying(32),

FILE_NUM numeric(22,0),

MEMO_CD character varying(1),

MEMO_TEXT character varying(100), ** SUB_ID numeric(19,0) NOT NULL PRIMARY KEY,

FOREIGN KEY (CMTE_ID) REFERENCES Committee (CMTE_ID)

)

  • currentcampaignforhouseandsenate

Description: The current campaigns for House and Senate relation contains summary financial information for each campaign committee. The relation has one record per House and Senate campaign committee and shows information about the candidate, total receipts, transfers received from authorized committees, total disbursements, transfers given to authorized committees, cash-on-hand totals, loans and debts, and other financial summary information.

Attributes & Datatyes:

Column Name Column Type

Can be

Null

Default Description
CAND_ID Varchar(9) No Chandidate identication
CAND_NAME Varchar(200) Yes Candidate name
CAND_ICI Varchar(1) Yes Incumbent challenger status
PTY_CD Varchar(1) Yes Party code
CAND_PTY_AFFILIATION Varchar(3) Yes Party affiliation
TTL_RECEIPTS Numeric(14,2) Yes Total receipts
TRANS_FROM_AUTH Numeric(14,2) Yes Transfers from authorized committees
TTL_DISB Numeric(14,2) Yes Total disbursements
TRANS_TO_AUTH Numeric(14,2) Yes Transfers to authorized committees
COH_BOP Numeric(14,2) Yes Beginning cash
COH_COP Numeric(14,2) Yes Ending cash
CAND_CONTRIB Numeric(14,2) Yes Contributions from candidate
CAND_LOANS Numeric(14,2) Yes Loans from candidates
OTHER_LOANS Numeric(14,2) Yes Other loans
CAND_LOAN_REPAY Numeric(14,2) Yes Candidate loan repayments
OTHER_LOAN_REPAY Numeric(14,2) Yes Other loan repayments
DEBTS_OWED_BY Numeric(14,2) Yes Debts owed by
TTL_INDIV_CONTRIB Numeric(14,2) Yes Total individuals contributions
CAND_OFFICE_ST Varchar(2) Yes Candidate state
CAND_OFFICE_DISTRICT Varchar(2) Yes Candidate district
SPEC_ELECTION Varchar(1) Yes Special election status
PRIM_ELECTION Varchar(1) Yes Primary election status
RUN_ELECTION Varchar(1) Yes Runoff election status
GEN_ELECTION Varchar(1) Yes General election status
GEN_ELECTION_PRECENT Numeric(7,4) Yes General election percentage
OTHER_POL_CMTE_CONTRIB Numeric(14,2) Yes Contributions from other political committees
POL_PTY_CONTRIB Numeric(14,2) Yes Contributions from party committees
CVG_END_DT Date Yes Coverage and date
INDIV_REFUNDS Numeric(14,2) Yes Refunds to individuals
CMTE_REFUNDS Numeric(14,2) Yes Refunds to committee

Primary Key: (CAND_ID) This table contains one record which contains information about the candidate, total receipts, total disbursements and other details per house and senate campaign committee. So, here candidate id uniquely identifies each record in the relation.

Foreign Key: (CAND_ID) The candidate’s identification number is used to link the current-campaign-for-house-and-senate with the candidate and it’s committee.

Schema:

CREATE TABLE currentcampaignforhouseandsenate (

CAND_ID character varying(9) NOT NULL PRIMARY KEY,

CAND_NAME character varying(200),

CAND_ICI character varying(1),

PTY_CD character varying(1),

CAND_PTY_AFFILIATION character varying(3),

TTL_RECEIPTS numeric(14,2),

TRANS_FROM_AUTH numeric(14,2),

TTL_DISB numeric(14,2),

TRANS_TO_AUTH numeric(14,2),

COH_BOP numeric(14,2),

COH_COP numeric(14,2),

CAND_CONTRIB numeric(14,2),

CAND_LOANS numeric(14,2),

OTHER_LOANS numeric(14,2),

CAND_LOAN_REPAY numeric(14,2),

OTHER_LOAN_REPAY numeric(14,2),

DEBTS_OWED_BY numeric(14,2),

TTL_INDIV_CONTRIB numeric(14,2),

CAND_OFFICE_ST character varying(2),

CAND_OFFICE_DISTRICT character varying(2),

SPEC_ELECTION character varying(1),

PRIM_ELECTION character varying(1),

RUN_ELECTION character varying(1),

GEN_ELECTION character varying(1),

GEN_ELECTION_PRECENT numeric(7,4),

OTHER_POL_CMTE_CONTRIB numeric(14,2),

POL_PTY_CONTRIB numeric(14,2),

CVG_END_DT date,

INDIV_REFUNDS numeric(14,2),

CMTE_REFUNDS numeric(14,2),

FOREIGN KEY (CAND_ID) REFERENCES Candidate (CAND_ID)

)

  • independentexpenditures

Description: This table contains each contribution or independent expenditure made by a:

  • PAC
  • Party committee
  • Candidate committee
  • Other federal committee

Attributes and Datatypes:

Column name Can be Null Data type Default Description
CMTE_ID N VARCHAR2 (9) A 9-character alpha-numeric code assigned to a committee by the Federal Election Commission
AMNDT_IND Y VARCHAR2 (1) Indicates if the report being filed is new (N), an amendment (A) to a previous report or a termination (T) report.
RPT_TP Y VARCHAR2 (3) Indicates the type of report filed. List of report type codes
TRANSACTION_PGI Y VARCHAR2 (5) This code indicates the election for which the contribution was made. EYYYY (election Primary, General, Other plus election year)
IMAGE_NUM Y VARCHAR2 (11)
or VARCHAR2(18)
11-digit image number format
YYOORRRFFFF
YY - scanning year
OO - office (01 - House, 02 - Senate, 03 - FEC Paper, 90-99 - FEC Electronic)
RRR - reel number
FFFF- frame number

18-digit image number normat (June 29, 2015)
YYYYMMDDSSPPPPPPPP
YYYY - scanning year
MM - scanning month
DD - scanning day
SS - source (02 - Senate, 03 - FEC Paper, 90-99 - FEC Electronic)
PPPPPPPP - page (reset to zero every year on January 1)
TRANSACTION_TP Y VARCHAR2 (3) Transaction types 10J, 11J, 13, 15J, 15Z, 16C, 16F, 16G, 16R, 17R, 17Z, 18G, 18J, 18K, 18U, 19J, 20, 20C, 20F, 20G, 20R, 22H, 22Z, 23Y, 24A, 24C, 24E, 24F, 24G, 24H, 24K, 24N, 24P, 24R, 24U, 24Z and 29 are included in the OTH file. Beginning with 2016 transaction types 30F, 30G, 30J, 30K, 31F, 31G, 31J, 31K, 32F, 32G, 32J, 32K, 40, 40Z, 41, 41Z, 42 and 42Z are also included in the OTH file.
For more information about transaction type codes see this list of transaction type codes
ENTITY_TP Y VARCHAR2 (3) ONLY VALID FOR ELECTRONIC FILINGS received after April 2002. CAN = Candidate
CCM = Candidate Committee
COM = Committee
IND = Individual (a person)
ORG = Organization (not a committee and not a person)
PAC = Political Action Committee
PTY = Party Organization
CONTRIBTOR_ID Y VARCHAR2 (9) Contributor Id
NAME Y VARCHAR2 (200) Name
CITY Y VARCHAR2 (30) City
STATE Y VARCHAR2 (2) State as a part of address
ZIP_CODE Y VARCHAR2 (9) ZIP_code as a part of address
EMPLOYER Y VARCHAR2 (38) Employer details
OCCUPATION Y VARCHAR2 (38) Occupation details
TRANSACTION_DT Y DATE Date on which transaction happened
TRANSACTION_AMT Y

NUMERIC(14,2)

Amount of transaction
OTHER_ID Y VARCHAR2 (9) For contributions from individuals this column is null. For contributions from candidates or other committees this column will contain that contributor's FEC ID.
CAND_ID Y VARCHAR2 (9) A 9-character alpha-numeric code assigned to a candidate by the Federal Election Commission. The candidate ID for a specific candidate remains the same across election cycles as long as the candidate is running for the same office.
TRAN_ID Y VARCHAR2 (32) ONLY VALID FOR ELECTRONIC FILINGS. A unique identifier associated with each itemization or transaction appearing in an FEC electronic file. A transaction ID is unique for a specific committee for a specific report. In other words, if committee, C1, files a Q3 New with transaction SA123 and then files 3 amendments to the Q3 transaction SA123 will be identified by transaction ID SA123 in all 4 filings.
FILE_NUM Y

NUMERIC(14,2)

Unique report id
MEMO_CD Y VARCHAR2 (1) 'X' indicates that the amount is NOT to be included in the itemization total.
MEMO_TEXT Y VARCHAR2 (100) A description of the activity. Memo text is available on itemized amounts on Schedules A and B. These transactions are included in the itemization total.
SUB_ID N

NUMERIC(14,2)

Unique row ID

Primary Key: : (SUB_ID) In the case of the independentexpenditures table, the SUB_ID is the primary key as it uniquely identifies each expenditure made by the individuals. Moreover, it is a unique identifier assigned by the Federal Election Commission (FEC) to record these expenditures.

Foreign Key: (CMTE_ID) The committee’s identification number is used to link the other tables.

Schema:

create table independentexpenditures(

CMTE_ID varchar(50),

AMNDT_IND varchar(10),

RPT_TP varchar(255),

TRANSACTION_PGI varchar(255),

TRANSACTION_TP varchar(255),

ENTITY_TP varchar(255),

CONTRIBUTOR_ID varchar(9)

NAME varchar(255),

CITY varchar(255),

STATE varchar(255),

ZIP_CODE varchar(100),

EMPLOYER varchar(100),

OCCUPATION varchar(100),

TRANSACTION_DT varchar(100),

TRANSACTION_AMT NUMERIC(255),

OTHER_ID NUMERIC(10),

TRAN_ID NUMERIC(10),

MEMO_CD varchar(50),

SUB_ID NUMERIC(50)) NOT NULL primary key,

FOREIGN KEY(CMTE_ID) REFERENCES COMMITTEE (CMTE_ID)

);

  • operatingexpenditures

Description:

The file contains information about the committee making the disbursement, the report where the operating expenditure is disclosed, the entity receiving the disbursement, the disbursement’s date, amount, purpose, and additional information about the operating expenditure (if provided).The end-of-line (EOL) marker is line feed '\n' (LF, 0x0A, 10 in decimal).

Attributes and Datatypes:

Column name Can be Null Data type Default Description
CMTE_ID N VARCHAR2 (9) Identification number of committee filing report. A 9-character alpha-numeric code assigned to a committee by the Federal Election Commission
AMNDT_IND Y VARCHAR2 (1) Indicates if the report being filed is new (N), an amendment (A) to a previous report, or a termination (T) report.
RPT_YR Y

NUMERIC(14,2)

RPT_TP Y VARCHAR2 (3) Indicates the type of report filed. List of report type codes
IMAGE_NUM Y VARCHAR2 (11) or VARCHAR2(18) 11-digit Image Number Format
YYOORRRFFFF
YY - scanning year
OO - office (01 - House, 02 - Senate, 03 - FEC Paper, 90-99 - FEC Electronic)
RRR - reel number
FFFF- frame number

18-digit Image Number Format (June 29, 2015)
YYYYMMDDSSPPPPPPPP
YYYY - scanning year
MM - scanning month
DD - scanning day
SS - source (02 - Senate, 03 - FEC Paper, 90-99 - FEC Electronic)
PPPPPPPP - page (reset to zero every year on January 1)
LINE_NUM Y

NUMERIC(14,2)

Indicates FEC form line number
FORM_TP_CD Y VARCHAR2 (8) Indicates FEC Form
SCHED_TP_CD Y VARCHAR2 (8) Schedule B - Itemized disbursements
CONTRIBUTOR_ID Y

VARCHAR2

(9)

Contributor Id
NAME Y VARCHAR2 (200) Name
CITY Y VARCHAR2 (30) City as a part of address
STATE Y VARCHAR2 (2) State as a part of address
ZIP_CODE Y VARCHAR2 (9) Zip code as a part of address
TRANSACTION_DT Y DATE

Date on which transaction was performed.

TRANSACTION_AMT Y

NUMERIC(14,2)

Amount of transaction
TRANSACTION_PGI Y VARCHAR2 (5)
PURPOSE Y VARCHAR2 (100) Purpose of transaction
CATEGORY Y VARCHAR2 (3) 001-012 and 101-107
CATEGORY_DESC Y VARCHAR2 (40) List of Disbursement Category Codes and their meaning
MEMO_CD Y VARCHAR2 (1) 'X' indicates that the amount is NOT to be included in the itemization total.
MEMO_TEXT Y VARCHAR2 (100) A description of the activity. Memo Text is available on itemized amounts on Schedule B. These transactions are included in the itemization total.
ENTITY_TP Y VARCHAR2 (3) ONLY VALID FOR ELECTRONIC FILINGS received after April 2002. CAN = Candidate
CCM = Candidate committee
COM = Committee
IND = Individual (a person)
ORG = Organization (not a committee and not a person)
PAC = Political action committee
PTY = Party organization
SUB_ID N

NUMERIC(14,2)

Unique row ID
FILE_NUM Y

NUMERIC(14,2)

Unique report id
TRAN_ID VARCHAR2 (32) ONLY VALID FOR ELECTRONIC FILINGS. A unique identifier associated with each itemization or transaction appearing in an FEC electronic file. A transaction ID is unique for a specific committee for a specific report. In other words, if committee, C1, files a Q3 New with transaction SA123 and then files 3 amendments to the Q3 transaction SA123 will be identified by transaction ID SA123 in all 4 filings.
BACK_REF_TRAN_ID Y VARCHAR2 (32) ONLY VALID FOR ELECTRONIC FILINGS. Used to associate one transaction with another transaction in the same report (using file number, transaction ID and back reference transaction ID). For example, a credit card payment and the subitemization of specific purchases. The back reference transaction ID of the specific purchases will equal the transaction ID of the payment to the credit card company.

Primary Key: (SUB_ID) In the case of the operating expenditures table, the SUB_ID is the primary key as it uniquely identifies each expenditure made by the committee. Moreover, it is a unique identifier assigned by the Federal Election Commission (FEC) to record the expenditures of the committees.

Foreign Key: (CMTE_ID) The committee’s identification number is used to link the expenditures made from one committee to another committee.

Table Schema:

create table operatingexpenditures(

CMTE_ID varchar(50) ,

AMNDT_IND varchar(10),

RTP_YR NUMERIC(10),

RPT_TP varchar(255),

LINE_NUM NUMERIC(100),

SCHED_TP_CD NUMERIC(10),

CONTRIBUTOR_ID varchar(9),

NAME varchar(255),

CITY varchar(255),

STATE varchar(255),

ZIP_CODE varchar(100),

TRANSACTION_DT varchar(100),

TRANSACTION_AMT varchar(100),

TRANSACTION_PGI varchar(100),

PURPOSE varchar(100),

CATEGORY varchar(100),

CATEGORY_DESC varchar(100),

MEMO_CD varchar(100),

MEMO_TEXT varchar(100),

ENTITY_TP varchar(100),

SUB_ID varchar(100) NOT NULL primary key,

FILE_NUM varchar(100),

TRAN_ID varchar(100),

BACK_REF_TRAN_ID varchar(100),

FOREIGN KEY(CMTE_ID) REFERENCES COMMITTEE (CMTE_ID),

FOREIGN KEY (CONTRIBUTOR_ID) REFERENCES contributor (CONTRIBUTOR_ID)

);

  • pacandpartysummary

Description:

The file has one record per PAC and Party committee and shows information about the committee, total receipts and disbursements, receipts and disbursements broken down by type, contributions to other committees, independent expenditures made by the committee, and other financial summary information. The end-of-line (EOL) marker is line feed '\n' (LF, 0x0A, 10 in decimal).

Attributes and Datatypes:

Column name Null Data type Default Description
CMTE_ID N VARCHAR2 (9) ID of the committee
CMTE_NM Y VARCHAR2 (200) Committee Name
CMTE_TP Y VARCHAR2 (1)
CMTE_DSGN Y VARCHAR2 (1)
CMTE_FILING_FREQ Y VARCHAR2 (1)
TTL_RECEIPTS Y NUMERIC(14,2)
TRANS_FROM_AFF Y

NUMERIC(14,2)

INDV_CONTRIB Y

NUMERIC(14,2)

OTHER_POL_CMTE_CONTRIB Y

NUMERIC(14,2)

CAND_CONTRIB Y

NUMERIC(14,2)

Not applicable
CAND_LOANS Y

NUMERIC(14,2)

Not applicable
TTL_LOANS_RECEIVED Y

NUMERIC(14,2)

TTL_DISB Y

NUMERIC(14,2)

TRANF_TO_AFF Y

NUMERIC(14,2)

INDV_REFUNDS Y

NUMERIC(14,2)

OTHER_POL_CMTE_REFUNDS Y

NUMERIC(14,2)

CAND_LOAN_REPAY Y

NUMERIC(14,2)

Not applicable
LOAN_REPAY Y

NUMERIC(14,2)

COH_BOP Y

NUMERIC(14,2)

COH_COP Y

NUMERIC(14,2)

DEBTS_OWED_BY Y

NUMERIC(14,2)

NONFED_TRANS_RECEIVED Y

NUMERIC(14,2)

CONTRIB_TO_OTHER_CMTE Y

NUMERIC(14,2)

IND_EXP Y

NUMERIC(14,2)

PTY_COORD_EXP Y

NUMERIC(14,2)

NONFED_SHARE_EXP Y

NUMERIC(14,2)

CVG_END_DT Y DATE(MM/DD/YYYY) Through date

Primary Key: (CMTE_ID) In the case of the pacandpartysummary table, the CMTE_ID is the primary key as it uniquely identifies each summary related to a committee. Moreover, it is a unique identifier assigned by the Federal Election Commission (FEC) to record the summary of these transactions.

Table Schema:

create table pacandpartysummary(

CMTE_ID varchar(9) NOT NULL primary key,

CMTE_NM varchar(200),

CMTE_TP varchar(1),

CMTE_DSGN varchar(1),

CMTE_FILING_FREQ varchar(1),

TTL_RECEIPTS NUMERIC(14,2),

TRANS_FROM_AFF NUMERIC(14,2),

INDV_CONTRIB NUMERIC(14,2),

OTHER_POL_CMTE_CONTRIB NUMERIC(14,2),

CAND_CONTRIB NUMERIC(14,2),

CAND_LOANS NUMERIC(14,2),

TTL_LOANS_RECEIVED NUMERIC(14,2),

TTL_DISB NUMERIC(14,2),

TRANF_TO_AFF NUMERIC(14,2),

INDV_REFUNDS NUMERIC(14,2),

OTHER_POL_CMTE_REFUNDS NUMERIC(14,2),

CAND_LOAN_REPAY NUMERIC(14,2),

LOAN_REPAY NUMERIC(14,2),

COH_BOP NUMERIC(14,2),

COH_COP NUMERIC(14,2),

DEBTS_OWED_BY NUMERIC(14,2),

NONFED_TRANS_RECEIVED NUMERIC(14,2),

CONTRIB_TO_OTHER_CMTE NUMERIC(14,2),

IND_EXP NUMERIC(14,2),

PTY_COORD_EXP NUMERIC(14,2),

NONFED_SHARE_EXP NUMERIC(14,2),

CVG_END_DT DATE,

FOREIGN KEY(CMTE_ID) REFERENCES COMMITTEE (CMTE_ID)

);

Note on Foreign keys: When the primary key (that the foreign key refer to) is deleted, the record with that foreign key is deleted (Cascade delete)

Database SQL Statements:

CREATE DATABASE federalelectioncommission;

DROP DATABASE federalelectioncommission;

Loading into Database:

Data for this dataset is present on federal election commission website in form of text files where each line is row separated by comma delimiter. We have written python script to load the data from these text files. Below snippet shows the code which connects to a PostgreSQL database and inserts data from text files into different tables in the database.

Insert statement for candidate:

INSERT INTO candidate (CAND_ID, CAND_NAME, CAND_PTY_AFFILIATION, CAND_ELECTION_YR, CAND_OFFICE_ST, CAND_OFFICE, CAND_OFFICE_DISTRICT, CAND_ICI, CAND_STATUS, CAND_PCC, CAND_ST1, CAND_ST2, CAND_CITY, CAND_ST, CAND_ZIP) VALUES ('C12345678', 'John Smith', 'REP', '2020', 'CA', 'H', '03', 'O', 'C', 'PCC123456', '123 Main Street', '', 'Los Angeles', 'CA', '90001');

Insert statement for committee:

INSERT INTO committee (CMTE_ID, CMTE_NM, TRES_NM, CMTE_ST1, CMTE_CITY, CMTE_ST, CMTE_ZIP, CMTE_DSGN, CMTE_TP, CMTE_PTY_AFFILIATION, CMTE_FILING_FREQ, ORG_TP, CONNECTED_ORG_NM, CAND_ID) VALUES

('C00000001', 'Friends of Jane Doe', 'John Smith', '123 Main St', 'Los Angeles', 'CA', '90001', 'A', 'P', 'DEM', 'Q', 'C', 'ABC Consulting', 'C12345678'),

('C00000002', 'Republicans for John Smith', 'Jane Doe', '456 Lincoln Blvd', 'San Francisco', 'CA', '94102', 'B', 'N', 'REP', 'M', 'P', NULL, 'C98765432'),

('C00000003', 'Independent Committee to Elect Jim Brown', 'Sarah Lee', '555 Independence Ave', 'Washington', 'DC', '20001', 'A', 'O', NULL, 'Q', 'M', 'XYZ Agency', 'C00098765');

Insert statement for candidate-committee:

INSERT INTO candidate_committee (CAND_ID, CAND_ELECTION_YR, FEC_ELECTION_YR, CMTE_ID, CMTE_TP, CMTE_DSGN, LINKAGE_ID) VALUES ('C12345678', 2022, 2022, 'C00000001', 'P', 'D', 34567);

Insert statement for contributor:

INSERT INTO contributor (CONTRIBUTOR_ID, NAME, CITY, STATE, ZIP_CODE, EMPLOYER, OCCUPATION) VALUES ('123456789', 'John Smith', 'NY', 'NY', '10001', 'ABC Inc.', 'Engineer');

Insert statement for contribution-by-individual:

INSERT INTO contributionsbyindividuals (CMTE_ID, AMNDT_IND, RPT_TP, TRANSACTION_PGI, IMAGE_NUM, TRANSACTION_TP, ENTITY_TP, CONTRIBUTOR_ID, TRANSACTION_DT, TRANSACTION_AMT, OTHER_ID, TRAN_ID, FILE_NUM, MEMO_CD, MEMO_TEXT, SUB_ID) VALUES ('C00000001', 'N', 'Q1', 'P', '12345678910111213', '15', 'IND', '123456789', '2022-01-01', '1000.00', NULL, 'ABCDE1234-2022-01-01-1', 1234567890, 'X', 'Donation for campaign expenses', 9876543210987654321);

Insert statement for any-transaction-from-one-committee-to-another: INSERT INTO anytransactionfromonecommitteetoanother (CMTE_ID, AMNDT_IND, RPT_TP, TRANSACTION_PGI,IMAGE_NUM, TRANSACTION_TP, ENTITY_TP, CONTRIBUTOR_ID,TRANSACTION_DT, TRANSACTION_AMT, OTHER_ID, TRAN_ID, FILE_NUM, MEMO_CD, MEMO_TEXT, SUB_ID) VALUES ('C00000001', 'N', 'Q1', 'P', '12345678910111213','15', 'COM', '123456789', '2022-01-01', 1000.00, NULL,'ABCDE1234-2022-01-01-1', 1234567890, 'X', 'Donation for campaign expenses', 9876543210987654321);

Insert statement for current-campaign-for-house-and-senate:

INSERT INTO currentcampaignforhouseandsenate (CAND_ID, PTY_CD, TTL_RECEIPTS, TRANS_FROM_AUTH, TTL_DISB, TRANS_TO_AUTH, COH_BOP, COH_COP, CAND_CONTRIB, CAND_LOANS, OTHER_LOANS, CAND_LOAN_REPAY, OTHER_LOAN_REPAY, DEBTS_OWED_BY, TTL_INDIV_CONTRIB, SPEC_ELECTION, PRIM_ELECTION, RUN_ELECTION, GEN_ELECTION, GEN_ELECTION_PRECENT, OTHER_POL_CMTE_CONTRIB, POL_PTY_CONTRIB, CVG_END_DT, INDIV_REFUNDS, CMTE_REFUNDS) VALUES ('C12345678', 'D', 100000.00, 5000.00, 80000.00, 6000.00, 20000.00, 18000.00, 10000.00, 5000.00, 0.00, 0.00, 0.00, 1000.00, 50000.00, 'N', 'Y', 'N','N', 0.00, 4000.00, 8000.00, '2022-11-03', 0.00, 0.00);

Insert statement for independentexpenditures:

INSERT INTO independentexpenditures (CMTE_ID, AMNDT_IND, RPT_TP, TRANSACTION_PGI, TRANSACTION_TP, ENTITY_TP, CONTRIBUTOR_ID, TRANSACTION_DT, TRANSACTION_AMT, OTHER_ID, TRAN_ID, MEMO_CD, SUB_ID) VALUES ('C00000001', 'N', 'M2', 'P', 'IE', 'ORG', '123456789', '2022-10-15', 5000.00, NULL, 987654321, NULL, 1234567890);

Insert statement for operatingexpenditures:

INSERT INTO operatingexpenditures (CMTE_ID, AMNDT_IND, RTP_YR, RPT_TP, LINE_NUM, SCHED_TP_CD, CONTRIBUTOR_ID, TRANSACTION_DT, TRANSACTION_AMT, TRANSACTION_PGI, PURPOSE, CATEGORY, CATEGORY_DESC,MEMO_CD, MEMO_TEXT, ENTITY_TP, SUB_ID, FILE_NUM, TRAN_ID, BACK_REF_TRAN_ID) VALUES ('C00000001', 'N', 2022, 'M2', 1, 1, '123456789','2022-10-15', 500.00, 'G', 'Test Purpose', 'Test Category', 'Test Category Desc', NULL, NULL, 'ORG', '1234567890', '100', '987654321', NULL);

Insert statement for pacandpartysummary:

INSERT INTO pacandpartysummary (CMTE_ID, TTL_RECEIPTS, TRANS_FROM_AFF, INDV_CONTRIB, OTHER_POL_CMTE_CONTRIB, CAND_CONTRIB, CAND_LOANS, TTL_LOANS_RECEIVED, TTL_DISB, TRANF_TO_AFF, INDV_REFUNDS, OTHER_POL_CMTE_REFUNDS, CAND_LOAN_REPAY, LOAN_REPAY, COH_BOP, COH_COP, DEBTS_OWED_BY, NONFED_TRANS_RECEIVED, CONTRIB_TO_OTHER_CMTE, IND_EXP, PTY_COORD_EXP, NONFED_SHARE_EXP, CVG_END_DT) VALUES ('C12345', 100000, 20000, 50000, 30000, 10000, 2000, 5000, 80000, 3000, 2000, 500, 1000, 200, 300000, 200000, 10000, 4000, 50000, 2000, 1000, 500, '2022-01-01');

Normalization of the tables

Functional Dependencies:

candidates

CAND_ID -> CAND_NAME, CAND_PTY_AFFILIATION, CAND_ELECTION_YR, CAND_OFFICE_ST, CAND_OFFICE, CAND_OFFICE_DISTRICT, CAND_ICI, CAND_STATUS, CAND_PCC, CAND_ST1, CAND_ST2, CAND_CITY, CAND_ST, CAND_ZIP

committees

CTME_ID -> CMTE_NM, TRES_NM, CMTE_ST1, CMTE_ST2, CMTE_CITY, CMTE_ST, CMTE_ZIP, CMTE_DSGN, CMTE_TP, CMTE_PTY_AFFILIATION, CMTE_FILING_FREQ, ORG_TP, CONNECTED_ORG_NM, CAND_ID

candidate_committee

LINKAGE_ID -> CAND_ID, CAND_ELECTION_YR, FEC_ELECTION_YR, CMTE_ID, CMTE_TP, CMTE_DSGN

contributor

CONTRIBUTOR_ID -> NAME, CITY, STATE, ZIP_CODE, EMPLOYER, OCCUPATION

contributionsbyindividuals

SUB_ID -> CMTE_ID, AMNDT_IND, RPT_TP, TRANSACTION_PGI, IMAGE_NUM, TRANSACTION_TP, ENTITY_TP, CONTRIBUTOR_ID, TRANSACTION_DT, TRANSACTION_AMT, OTHER_ID, TRAN_ID, FILE_NUM, MEMO_CD, MEMO_TEXT

CONTRIBUTOR_ID -> NAME, CITY, STATE, ZIP_CODE, EMPLOYER, OCCUPATION

anytransactionfromonecommitteetoanother

SUB_ID -> CMTE_ID, AMNDT_IND, RPT_TP, TRANSACTION_PGI, IMAGE_NUM, TRANSACTION_TP, ENTITY_TP, CONTRIBUTOR_ID, TRANSACTION_DT, TRANSACTION_AMT, OTHER_ID, TRAN_ID, FILE_NUM, MEMO_CD, MEMO_TEXT

CONTRIBUTOR_ID -> NAME, CITY, STATE, ZIP_CODE, EMPLOYER, OCCUPATION

Currentcampaignforhouseandsenate

CAND_ID -> CAND_NAME, CAND_ICI, PTY_CD, CAND_PTY_AFFILIATION, TTL_RECEIPTS, TRANS_FROM_AUTH, TTL_DISB, TRANS_TO_AUTH, COH_BOP, COH_COP, CAND_CONTRIB, CAND_LOANS, OTHER_LOANS, CAND_LOAN_REPAY, OTHER_LOAN_REPAY, DEBTS_OWED_BY, TTL_INDIV_CONTRIB, CAND_OFFICE_ST, CAND_OFFICE_DISTRICT, SPEC_ELECTION, PREM_ELECTION, RUN_ELECTION, GEN_ELECTION, GEN_ELECTION_PERCENT, OTHER_POL_CMTE_CONTRIB, POL_PTY_CONTRIB, CVG_END_DT, INDIV_REFUNDS, CMTE_REFUNDS

Independentexpenditures

SUB_ID -> CMTE_ID, AMNDT_IND, RPT_TP, TRANSACTION_PGI, TRANSACTION_TP, ENTITY_TP, CONTRIBUTOR_ID, TRANSACTION_DT, TRANSACTION_AMT, OTHER_ID, TRAN_ID, MEMO_CD

CONTRIBUTOR_ID -> NAME, CITY, STATE, ZIP_CODE, EMPLOYER, OCCUPATION

Operatingexpenditures

SUB_ID -> CMTE_ID, AMNDT_IND, RTP_YR, RTP_TP, LINE_NUM, SCHED_TP_CD, CONTRIBUTOR_ID, TRANSACTION_DT, TRANSACTION_AMT, TRANSACTION_PGI, PURPOSE, CATEGORY, CATEGORY_DESC, MEMO_CD, MEMO_TEXT, ENTITY_TP, FILE_NUM, TRAN_ID, BACK_REF_TRAN_ID

CONTRIBUTOR_ID -> NAME, CITY, STATE, ZIP_CODE

pacandpartysummary

CMTE_ID -> CMTE_NM, CMTE_TP, CMTE_DESN, CMTE_FILING_FREQ, TTL_RECEIPTS, TRANS_FROM_AFF, INDIV_CONTRIB, OTHER_POL_CMTE_CONTRIB, CAND_CONTRIB, CAND_LOANS, TTL_LOANS_RECEIVED, TTL_DISB, TRANF_TO_AFF, INDV_REFUNDS, OTHER_POL_CMTE_REFUNDS, CAND_LOAN_REPAY, LOAN_REPAY, COH_BOP, COH_COP, DEBTS_OWED_BY, NONFAD_TRANS_RECEIVED, CONTRIB_TO_OTHER_CMTE, IND_EXP, PTY_COORD_EXP, NONFED_SHARE_EXP, CVG_END_DT

The functional dependencies of each of the tables are listed above. We observe the LHS of the functional dependencies is a super key of its relation. All the functional dependencies are also non-trivial dependencies. These conditions imply that the relations are in BCNF. But, found that the relations contain redundancy anomalies. The details of the contributor repeated in all the tables, so we ended up creating a new relation contributors to store these details and add the id the contributor in the respective tables.

Indexing of the tables

Since the dataset was large, while loading the data, we had to do modifications to the data(to fit into our tables) and also since we decomposed the tables, we carried out data file manipulation and then we loaded it into the database. After doing these things and loading the data to the database, while performing the queries, our join operations specifically was taking lot of time for execution, therefore we added indexing for candidate and committee tables since all other tables mostly have cand_id and cmte_id as the foreign key. After incorporating this we could see significant improvement in the performance of the database.

CREATE INDEX idx_candid ON candidate (CAND_ID);

CREATE INDEX idx_comteid ON committee (cmte_id)

Testing the database with SQL queries.

Queries for each inserting, deleting, and updating operation in your dataset and select queries of different types of statements

  1. Candidate

  1. Committe

  1. Candidate_committee

  1. Contributor

  1. Contributionbyindividuals

  1. Anytransactionfromonecommitteetoanother

  1. Currentcampaignforhouseandsenate

  1. Independent Expenditures

  1. Operating Expenditures

  1. Pacpartyandsummary

  1. Joining candidate_committee and committe

  1. Join on candidate_committee and anytransactionsfromonetoanother

  1. Join on contributor and independent expenditures

  1. Join on operating expenditures and contributor

Query execution analysis

Problematic queries were:

1.

SELECT c.CAND_NAME, com.CMTE_NM

FROM candidate c

JOIN candidate_committee cc ON cc.CAND_ID = c.CAND_ID

JOIN committee com ON com.CMTE_ID = cc.CMTE_ID

WHERE cc.CAND_ELECTION_YR = 2020

GROUP BY c.CAND_NAME, com.CMTE_NM

ORDER BY c.CAND_NAME ASC, com.CMTE_NM ASC;

Based on the execution plan, it seems that the most expensive operation in the query is the sequential scan of the "committee" table, which has a cost of 24.29.

To improve the performance of the query, we can consider creating an index on the "CMTE_ID" column of the "committee" table, which is used in the join condition with the "candidate_committee" table. This will allow for faster retrieval of the relevant rows from the "committee" table.

CREATE INDEX committee_cmte_id_idx ON committee (CMTE_ID);

2.

SELECT *

FROM operatingexpenditures

LEFT JOIN contributor

ON operatingexpenditures.CONTRIBUTOR_ID = contributor.CONTRIBUTOR_ID;

![Graphical user interface, text, application, email

Description automatically generated](images/Aspose.Words.2c68961b-8eb5-42d1-9c96-a925b9664449.009.png)

Based on the execution pla, it appears that the problematic query involves a left join between the operatingexpenditures and contributor tables on their respective contributor_id columns. The left join operation can be expensive and can consume a lot of resources, especially if the tables are large.

To optimize this query, consider the following options:

  1. Add an index to the contributor_id column on both tables to speed up the join operation. This can significantly reduce the amount of time required to perform the join by allowing the database to quickly locate matching rows.
  2. Rewrite the query to use a more efficient join algorithm. For example, a nested loop join may be more efficient than a hash join or a merge join for small tables.
  3. Use a subquery or a common table expression (CTE) to filter the rows before joining. This can reduce the size of the result set and improve the efficiency of the join operation.

3.

SELECT *

FROM committee

FULL OUTER JOIN candidate_committee

ON committee.CMTE_ID = candidate_committee.CMTE_ID;

![Graphical user interface, text, application

Description automatically generated](images/Aspose.Words.2c68961b-8eb5-42d1-9c96-a925b9664449.010.png)

Based on the execution plan, it appears that the problematic query involves a full join between the candidate_committee and committee tables on their respective cmte_id columns. The full join operation can be expensive and can consume a lot of resources, especially if the tables are large.

To optimize this query, consider the following options:

  1. Add an index to the cmte_id column on both tables to speed up the join operation. This can significantly reduce the amount of time required to perform the join by allowing the database to quickly locate matching rows.
  2. Rewrite the query to use a more efficient join algorithm. For example, a nested loop join may be more efficient than a hash join or a merge join for small tables.
  3. Use a subquery or a common table expression (CTE) to filter the rows before joining. This can reduce the size of the result set and improve the efficiency of the join operation.

About


Languages

Language:HTML 53.8%Language:Python 45.8%Language:Shell 0.2%Language:CSS 0.1%