thoppe / The-Pile-FreeLaw

Download, parse, and filter data from Court Listener, part of the FreeLaw projects. Data-ready for The-Pile.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The-Pile-FreeLaw

Download, parse, and filter data from Court Listener, part of the FreeLaw project. Data-ready for The-Pile.

While Data exists for multiple modalites (eg. dockets, people, ...), we focus on court opinions for language modeling.

Pile-V2 Statistics

✔ Saved to data/FreeLaw_Opinions.jsonl.zst
ℹ Date completed 12/15/2021
ℹ Saved 4,940,710 opinions (1,378,695 added, 38.7% growth)
ℹ Uncompressed filesize 69,470,526,055
ℹ Compressed filesize   20,975,955,178
ℹ sha256sum 8a38c34f181aa121c3a7360ad63e3e8c0b1ea0913de08a4bf1b68b3eabae3e66

Additional columns to find free text were added [html_columbia, html_anon_2020, html_with_citations, xml_harvard] which accounts for the larger increase from V1 to V2.

Pile-V1 Statistics

✔ Saved to data/FreeLaw_Opinions.jsonl.zst
ℹ Saved 3,562,015 opinions
ℹ Uncompressed filesize   56,138,746,490
ℹ Compressed filesize     17,013,175,549
ℹ sha256sum 7d7ba907cf397e8585bb3ef148b3e9678edbf142b2247460f907c16aecbaed2d

About

Download, parse, and filter data from Court Listener, part of the FreeLaw projects. Data-ready for The-Pile.


Languages

Language:Python 100.0%