kernelmachine / sister

SImple SenTence EmbeddeR

Repository from Github https://github.comkernelmachine/sisterRepository from Github https://github.comkernelmachine/sister

sister

SISTER (SImple SenTence EmbeddeR)

Installation

pip install sister

Basic Usage

import sister
sentence_embedding = sister.MeanEmbedding(lang="en")

sentence = "I am a dog."
vector = sentence_embedding(sentence)

Supported languages.

  • English
  • Japanese
  • French

In order to support a new language, please implement Tokenizer (inheriting sister.tokenizers.Tokenizer) and add fastText pre-trained url to word_embedders.get_fasttext() (List of model urls).

Bert models are supported for en, fr, ja (2020-06-29).

Actually Albert for English, CamemBERT for French and BERT for Japanese.
To use BERT, you need to install sister by pip install 'sister[bert]'.

import sister
bert_embedding = sister.BertEmbedding(lang="en")

sentence = "I am a dog."
vector = bert_embedding(sentence)

You can also give multiple sentences to it (more efficient).

import sister
bert_embedding = sister.BertEmbedding(lang="en")

sentences = ["I am a dog.", "I want be a cat."]
vectors = bert_embedding(sentences)

About

SImple SenTence EmbeddeR


Languages

Language:Python 100.0%