tollefj / augmented-pair-encoder

A lightweight sentence-pair encoder for similarity tasks. Includes a data augmentation pipeline from sentence-transformers.

Augmented Pair Encoder

A pipeline for training a binary encoder on similarity-based data with weak labeling.

With weak labeling:
- python pipeline.py bert-base-uncased --similarity_model="intfloat/multilingual-e5-base" --learning_rate=2e-5 --epochs=3 --k=2 --verbose
Without:
- python pipeline.py bert-base-uncased --learning_rate=2e-5 --epochs=3 --verbose

A lightweight sentence-pair encoder for similarity tasks. Includes a data augmentation pipeline from sentence-transformers.

MIT License

Language:Python 100.0%