MIND-Lab / OCTIS

OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

BERT

nassera2014 opened this issue · comments

Thank you for your great work, can I use those models using BERT as a word embedding model?

Thanks :)
The model that supports BERT embeddings is CTM (Contextualized Topic Models). This is a snippet to run it in OCTIS:

from octis.models.CTM import CTM
from octis.dataset.dataset import Dataset

model = CTM(
    num_topics=10, num_epochs=30, inference_type='combined',
    bert_model="bert-base-nli-mean-tokens", bert_path="path\to\store\the\embbedings\")

where the parameter bert_model represents the name of the contextualized model. You can find the other possible supported models here: https://www.sbert.net/docs/pretrained_models.html
You can find the other parameters here: https://github.com/MIND-Lab/OCTIS/blob/master/octis/models/CTM.py

Just a note: this integrated model uses the pre-processed text to generate the document embeddings. If you want to use the unpreprocessed documents, as the original model, then you should refer to the original implementation: https://github.com/MilaNLProc/contextualized-topic-models

Also ETM uses embeddings, but static word embeddings, so it can't be easily adapted to BERT embeddings. See the implementation here: https://github.com/MIND-Lab/OCTIS/blob/master/octis/models/etm.py