joaquimgomez / BachelorsThesis-TextSimilarityMeasures

Code and models used in my Bachelor’s Degree Thesis about large text similarity measures are here. The similarities have been combined with machine learning based embeddings. This repository also contains raw results obtained from tasks/experiments.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Analysis and Comparison of Text Similarity Measures

This is the repository for the Barchelor's Degree Thesis/Project carried out during the 2020/21 Winter Semester by Joaquim Gómez Sanchez.

The thesis/project is available in: UPC's Repository.

Code and Models

This repository contains all the code developed for the thesis, as well as the raw results obtained and the trained and used models. Following, the code not implemented by the author and the pretrained models used are referenced.

Code:

  • Code for training GloVe. Obtained from the official repository, mantained by model's authors.
  • Code for computing Normalized Relative Compression distance. Provided by the thesis' director, who got it from Armando J. Pinho.

Models:

Data

Regarding the data, it has been decided not to publish anything in order to avoid legal problems. The data used for the experiments are listed in the thesis and can be obtained from the UPC's (Universitat Politècnica de Catalunya) repository or from other papers' repositories. The data used for training the models has been completely collected from UPC's repository.

In case you are interested in knowing about the training data, the preprocessed files or the experiments files elaborated, send me an e-mail.

About

Code and models used in my Bachelor’s Degree Thesis about large text similarity measures are here. The similarities have been combined with machine learning based embeddings. This repository also contains raw results obtained from tasks/experiments.

License:MIT License


Languages

Language:Python 59.2%Language:Jupyter Notebook 23.8%Language:Shell 17.1%