mdlincoln / caselaw-to-sqlite

Script to extract data dumps from the Caselaw Access Project into sqlite

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

case.law to sqlite

This script parses multiple data export file(s) from the Caselaw Access Project into a single structured sqlite database for easier access using analysis tools like R or Python.

Requirements

  • jq
  • sqlite3 command line shell

Usage

To run, pass the path of the unzipped export file you want to add to your database:

./extract Illinois-20180829-text/data/data.jsonl.xz

The script will first use jq to write out several CSV files to disk, and then load and index them into an sqlite database named caselaw.sqlite. You may repeat this process with multiple different dumps from the project website, adding further data into caselaw.sqlite.

If you have downloaded multiple states' files into one directory - in this case, to a directory named downloads/, you can use this command to run the script over all dumps.

find downloads -name data.jsonl.xz -exec ./extract.sh {} \+

Note that the CSV files are only intermediaries, and will be overwritten every time you run the script with a new data file.

About

Script to extract data dumps from the Caselaw Access Project into sqlite

License:MIT License


Languages

Language:PLpgSQL 63.0%Language:Shell 37.0%