tollefj / augmented-pair-encoder

A lightweight sentence-pair encoder for similarity tasks. Includes a data augmentation pipeline from sentence-transformers.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Augmented Pair Encoder

A pipeline for training a binary encoder on similarity-based data with weak labeling.

Example

  • With weak labeling:
    • python pipeline.py bert-base-uncased --similarity_model="intfloat/multilingual-e5-base" --learning_rate=2e-5 --epochs=3 --k=2 --verbose
  • Without:
    • python pipeline.py bert-base-uncased --learning_rate=2e-5 --epochs=3 --verbose

About

A lightweight sentence-pair encoder for similarity tasks. Includes a data augmentation pipeline from sentence-transformers.

License:MIT License


Languages

Language:Python 100.0%