jackseagrist / Env_Policy_NLP

Data and notebooks used to create dataset for Google AutoML NER model

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Environmental Policy NLP - Named Entity Recognition

This repository contains the data and notebook used to train a Google AutoML NER network. The goal of this project was to train a network that could help with the analysis of policy documentation related to climate change and the environment. Documents were obtained from govinfo on the House Select Committee on the Climate Crisis.

Link to Data

Outline of Repository

  1. 0_Deprecated - Old documents

  2. 1_Raw_Climate_Crisis_Text - Hearing transcript txt files from the House Select Committee on the Climate Crisis

  3. 2_Processed_Data - Final annotated txt files in jsonl format and accompanying csv files used for the dataset creation in Google Cloud Platform

  4. document_data_prep.ipynb - Notebook used to process raw txt files to final jsonl formatted txt files

By: Jack Seagrist

About

Data and notebooks used to create dataset for Google AutoML NER model


Languages

Language:Jupyter Notebook 93.7%Language:Python 6.3%