Paraphrase detection
The goal of this project is to detect if two sentences are paraphrases.
Done for CSC 485: Natural Language processing at the University of Victoria, taught by Dr. Alona Fyshe.
Usage
-
Install conda (I use miniconda on Linux)
-
Create the python environment
$ conda env create -f env.yml
- Activate the environment
$ source activate ml
# On Windows: activate ml
- Change variables in
lstm/train.py
You will probably want to change the path variables such as train/test data, and the checkpoint and logging directories.
-
Run
jupyter notebook
to checkout the notebooks,cd lstm && python train.py
to train the RNN. Might wanna change the variables at the top oftrain.py
-
View logs on tensorboard
$ tensorboard --logdir <path to tensorboard logs>
Methodology
The basic approach:
- Create vectors from sentences
- Calculate distance
- Interperate boundary
The logic behind this is that sentences that are similar are going to have vector representations that are similar.