wilsonjr / UnsupervisedSentenceEmbeddings

Repository for experiments using Unsupervised Sentence Embeddings

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unsupervised training method for Sentence Embeddings

Requirements

Instalation

python setup.py bdist_wheel
pip install dist/unsupervised_embeddings-0.0.1-py3-none-any.whl

Usage Examples

Unsupervised training with MLM and SimCSE

from unsupervised_embeddings import utils

# these will generate series of models combinations
# for max. of 5 epochs:
# 0 mlm + 5 simcse
# 1 mlm + 4 simcse
# ...
# see v0.0.1 release for these datasets

utils.perform_experiment(
    train_path='datasets/train_nli.csv',
    mlm_dev_path='datasets/dev_nli.csv',
    sim_cse_dev_path='datasets/stsbenchmark.tsv',
    test_path='datasets/stsbenchmark.tsv',
    model_name='distilbert-base-uncased',
    max_epochs=5,
    batch_size=124
)

Reproduce generated experiments

# Pearson correlation
utils.show_test_eval('./personal/UnsupervisedSentenceEmbeddings/experiment_output/experiment_5_statistics.csv')

# Projections of test set with training models
utils.show_projections('./personal/UnsupervisedSentenceEmbeddings/experiment_output/experiment_epochs_5_projections.csv')

Support

Please, if you have any questions feel free to contact me at wilson_jr@outlook.com

About

Repository for experiments using Unsupervised Sentence Embeddings

License:MIT License


Languages

Language:Jupyter Notebook 94.9%Language:Python 5.1%