littlelittleboom / FOMCTextAnalysis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

FOMC Text Analysis Project

Authors: Jose Luis Montiel Olea, Oliver Giesecke, Anand Chitale

Purpose: This repository contains tools (Python and Matlab scripts), raw data and derived datasets that are primarily related to textual data produced in preparation of FOMC decisions.

Organization:

The repository is organized as follows: All relevant code and data is contained in the src folder

The src folder is broken down into three subfolders:

  1. Collection: Contains tools for scraping, downloading, and extracting raw text from documents on the FOMC website.

  2. Derivation: Contains tools that use the raw data and transform the data into a format that is suitable for the data analysis.

  3. Analysis: Contains tools that perform the data analysis, produces summary statistics and produces tables and figures for the draft.

Each of which contains the following grouped by programming language:

  1. Data: Manually downloaded or modified files which must be read in (this is either raw data or manually produced data)
  2. Scripts: Programs which produce some output
  3. Output: Content generated by scripts

Collection

This folder contains scripts used in order to read and download all documents from the FOMC website.

Key Files:

  1. python/scripts/download_raw_doc_metadata.py:
    Reads through each page of the FOMC website (https://www.federalreserve.gov/monetarypolicy/fomc_historical_year.htm) and extracts meeting dates, available documents and the corresponding links by year. It produces the output file raw_data.csv which constitutes the universe of documents.

  2. python/scripts/extract_derived_data.py:
    Reads in the raw_doc_metadata and produces a derived file derived_data.csv containing the following fields ['year', 'start_date', 'end_date', 'event_type','file_name', 'file_size','file_type', 'link', 'grouping','document_class']. This is a cleaned version of the previous file and adds information of the natural grouping of documents. This file can be considered as the reference file for available documents.

  3. python/scripts/perform_collection.sh:
    This is a shell scripts that executes all python scripts in the correct order. Apart from the two steps described above, it also downloads the documents from the website of the Federal Reserve Board and extracts the raw text from the .pdf, .html or other file types.

Derivation

This folder contains scripts to extract meeting specific information. So far, it focuses on the extraction of policy alternatives from the ''Bluebook'' and the outcomes from the Statements.

Key Files:

  1. src/derivation/python/scripts/obtain_bluebook_alternatives.py:
    Reads in the raw text of the bluebooks from collection/python/output/bluebook_raw_text and uses regular expressions to extract all sentence with the mentioning of a policy alternative. It exports extracted information to derivation/output/bluebook_alt_and_class_output.csv

  2. src/derivation/python/scripts/produce_meeting_derived_file.py
    Reads in previous output and merges infromation with market information and meeting outcomes. Exports merged information to meeting_derived_file.csv

Analysis

These scripts read in information from the derived output in order to produce figures, charts, graphs, and summary statistics.

Key Files:

  1. src/analysis/python/scripts/overleaf_production/produce_overleaf_files.sh:
    This shell script runs every file placed in the overleaf_production folder, exporting produced graphs, charts, and figures overleaf_files output folder.

  2. src/analysis/python/scripts/produce_final_data_file.py:
    This final analysis file reads in all relevant information from data sources and produces a final data file providing monthly values for a list of dummies found in data_dictionary.md

AKJ Replication

Contains original data and code and a simplified replication in Stata.

Key files:

  1. src/AKJ_Replication/original_code_data: This folder contains the original replication files for Angrist, Kuersteiner, Jorda (2017) from the Guido Kuersteiner's website (https://www.econ.umd.edu/facultyprofile/kuersteiner/guido).

  2. src/AKJ_Replication/replication Replication script, do_replication_acc_matlab, in Stata that uses original data as input. Exploration of variations from the original--do_replication_acc_matlab_mod and do_replication_acc_matlab_mod_res.

About

License:MIT License


Languages

Language:HTML 56.3%Language:Jupyter Notebook 24.2%Language:Python 10.2%Language:MATLAB 6.6%Language:Stata 1.5%Language:TeX 1.2%Language:Objective-C 0.0%Language:Shell 0.0%Language:M 0.0%