sacovo / document-embedding

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Document Embeddings

Based on the results of: https://arxiv.org/abs/2304.14796

Implemented Methods

  • Average Pooling, with adjustable range for sentences used.
  • PERT weighted average pooling

Usage

Wrapper for Sentence-Embedding, which is used to provide embedding functionality

from sentence_transformers import SentenceTransformer

sentence_model = SentenceTransformer("paraphrase-multilingual-MiniLM-L12-v2")

document_model = AverageDocumentEmbedding(sentence_model, language='german')


doc1 = "Arbitrary text"
doc2 = "..."

document_model.encode([doc1, doc2, ...])

About


Languages

Language:Python 100.0%