joshuamil / VoteSmarterNC

Code for CLT project

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NC Legislature Bill Scrapy

Use this Scrapy to obtain bill data from the NC Legislature website.

The scrapy extracts each bill's data into an object. Use scrapy command to out put a JSON list of bill objects.

Technology Stack

  • Python 3
  • PostgreSQL
  • Scrapy: Web Scraping Library
  • psycopg2: Postgres Connector

Prerequisites

  1. Python 3
  2. Environment variables stored for:
  • AWS Access Key (S3_ACCESS_KEY)
  • AWS Secret Key (S3_SECRET_KEY)
  • AWS S3 Bucket (S3_BUCKET_NAME)
  1. psycopg2 intalled
  • pip install psycopg2
  1. tinys3 installed
  • pip install tinys3

Running the Scraper

  1. Requires python3
  2. Install Scrapy, pip for example: pip install scrapy
  3. Navigate into repo
  4. Tell Scrapy to crawl "bills" scrapy crawl bills -o <filename>.<ext> (. is a JSON file that will be created at runtime and will contain the extracted data.)

About

Code for CLT project


Languages

Language:Python 100.0%