erikreppel / paraphrase-detection

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Paraphrase detection

The goal of this project is to detect if two sentences are paraphrases.

Done for CSC 485: Natural Language processing at the University of Victoria, taught by Dr. Alona Fyshe.

Usage

  1. Install conda (I use miniconda on Linux)

  2. Create the python environment

$ conda env create -f env.yml
  1. Activate the environment
$ source activate ml
# On Windows: activate ml
  1. Change variables in lstm/train.py

You will probably want to change the path variables such as train/test data, and the checkpoint and logging directories.

  1. Run jupyter notebook to checkout the notebooks, cd lstm && python train.py to train the RNN. Might wanna change the variables at the top of train.py

  2. View logs on tensorboard

$ tensorboard --logdir <path to tensorboard logs>

Methodology

The basic approach:

  1. Create vectors from sentences
  2. Calculate distance
  3. Interperate boundary

The logic behind this is that sentences that are similar are going to have vector representations that are similar.

count vectorization

About


Languages

Language:Jupyter Notebook 99.7%Language:Python 0.3%