dpasse / pbp

Named Entity and Relation Extraction models for NFL play-by-play snippets

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

pbp

Named Entity and Relation Extraction models for NFL play-by-play snippets

Process

  1. Scrap Data
  2. Centralize Data
    • combine multiple files into a single one
  3. Build Dataset / Model
    1. Split
      • splits random subset for managable inspection - 1% at random
    2. ITERATE
      1. Annotate Data
        • builds a redacted file for quick visual inspection
      2. Inspect Data
        • if issues, fix and annotate again
        • may require a complete reset of "gold standard" dataset
      3. Save
        • add data to be used in model building - "gold standard"
      4. Build Model

1. Scrap Data

scrap game ids and play-by-play text from ESPN for 2022 NFL regular season.

from the project root

cd tasks\scrap
make scrap-schedules

output files found in "tasks/data/1/"

make scrap-pbp

output files found in "tasks/data/2/"

2. Centralize Data

create a main source file and split into dev / holdout datasets

from the project root

cd tasks\scrap
make centralize-data

output files found in "tasks/data/3/"

3. Build Dataset / Model

from the project root

cd workspace

1. Split Data - small percentage at random

extr-ds --split

output files found in "workspace/2/"

2. Programmatically Label Named Entities (Iterate)

extr-ds --annotate

output files found in "workspace/3/"

3. Programmatically Label Relations (Iterate)

extr-ds --relate

output files found in "workspace/3/"

4. Save data for model building

extr-ds --save

output files found in "tasks/data/4/"

5. Build CRF Model

make crf

About

Named Entity and Relation Extraction models for NFL play-by-play snippets

License:MIT License


Languages

Language:Jupyter Notebook 83.4%Language:Python 16.4%Language:Makefile 0.1%Language:CSS 0.1%