JIMMY-KSU / hoDMD-experiments

EigenSent: Spectral sentence embeddings using higher-order Dynamic Mode Decomposition

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Information about this repo

This repo contains the code and data to produce the results as reported in the paper titled: EigenSent: Spectral sentence embeddings using higher-order Dynamic Mode Decomposition accepted at ACL 2019 as a long paper: https://www.aclweb.org/anthology/P19-1445. If you use this code in your work or experiments, please cite the paper as:

author = {Kayal, Subhradeep and Tsatsaronis, George},
title = {EigenSent: Spectral sentence embeddings using higher-order Dynamic Mode Decomposition},
booktitle = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL)},
year = {2019}

Guide to run code


Get resources

  • In the main folder do: mkdir resources
  • Download the pretrained word embeddings from Google: GoogleNews-vectors-negative300.bin.gz and put them in the resources folder you've just created
  • Download the pretrained BERT model: https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-24_H-1024_A-16.zip, copy to resources and unzip

Run code

  • Open a CLI and run bert-serving-start -model_dir <path-to>resources/uncased_L-24_H-1024_A-16/ -num_worker=<as many as possible on your system>. Keep this running.
  • Open another CLI and ..
  • Go to sh_scripts and run: sh produce_pickled_resources.sh
  • This should read the datasets and produce lists of word embeddings, one list per sentence in the original dataset, and write it to data/pickled_data/<dataset_name>
  • For more information about the pickle files, read the code in py_files/sent2wv.py
  • Next, do sh runall.sh to produce all the necessary results written to text files in data/results/


EigenSent: Spectral sentence embeddings using higher-order Dynamic Mode Decomposition

License:BSD 2-Clause "Simplified" License


Language:Python 52.0%Language:Shell 48.0%