Code and data for "Textual Similarity on MD&A disclosure"
This code is written in python. To use it you will need:
- Python 3.6
- A recent version of scikit-learn
- A recent version of Numpy
- A recent version of NLTK
- A recent version of gensim
We provide all the similarity scores for the different methods described in the paper along with the data statiscs.
To create the doc2vec model and then use it to find the similar document vectors:
run python3 compute_doc2vec_sim.py
The data file can be found here: train_docs_sec7.txt