jiajie-chen / squad-match-lstm

Undergrad project to reimplement "Machine Comprehension Using Match-LSTM and Answer Pointer" (https://doi.org/10.48550/arXiv.1608.07905)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Setup

virtualenv venv
pip install --upgrade pip
pip install https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-0.11.0rc2-py2-none-any.whl

Look here for your tensorflow URL.

  • unzip the glove (word embeddings) file in data/ (it's gitignored b/c it's big)

  • generate the .npy file that contains the embedding matrix (this is faster than loading it every time, but the file is too big for github)

mkdir cache
python embedding.py

Every time you develop

source venv/bin/activate

Open questions

How do we deal with the fact that many of the phrases/entities in the documents (e.g. "Denver Broncos") might not have meaningful embeddings?

  • idea: in addition to word embeddings, augment the vector representation of each word in the passage with a sparse vector N-long (where N is the length of the question), where there's a 1 if that word appears at the corresponding index in the question

About

Undergrad project to reimplement "Machine Comprehension Using Match-LSTM and Answer Pointer" (https://doi.org/10.48550/arXiv.1608.07905)


Languages

Language:Python 100.0%