anvie / indobert-embedding

Text embedding encoder using IndoBERT

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Text embedding encoder using IndoBERT as the model.

Installation

pip install indobert-embedding

Usage

from indo_bert_embedding import get_embedding

embedding = get_embedding("Saya belajar NLP di Neuversity.")

For get text similarity distance:

from indo_bert_embedding import text_similarity

distance = text_similarity("Saya belajar NLP di Neuversity.", "Aku belajar NLP di Universitas Indonesia.")

text_similarity using cosine similarity to calculate distance.

Citation

@inproceedings{koto2020indolem,
  title={IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP},
  author={Fajri Koto and Afshin Rahimi and Jey Han Lau and Timothy Baldwin},
  booktitle={Proceedings of the 28th COLING},
  year={2020}
}

About

Text embedding encoder using IndoBERT

License:MIT License


Languages

Language:Python 100.0%