BERT
nassera2014 opened this issue · comments
Thank you for your great work, can I use those models using BERT as a word embedding model?
Thanks :)
The model that supports BERT embeddings is CTM (Contextualized Topic Models). This is a snippet to run it in OCTIS:
from octis.models.CTM import CTM
from octis.dataset.dataset import Dataset
model = CTM(
num_topics=10, num_epochs=30, inference_type='combined',
bert_model="bert-base-nli-mean-tokens", bert_path="path\to\store\the\embbedings\")
where the parameter bert_model
represents the name of the contextualized model. You can find the other possible supported models here: https://www.sbert.net/docs/pretrained_models.html
You can find the other parameters here: https://github.com/MIND-Lab/OCTIS/blob/master/octis/models/CTM.py
Just a note: this integrated model uses the pre-processed text to generate the document embeddings. If you want to use the unpreprocessed documents, as the original model, then you should refer to the original implementation: https://github.com/MilaNLProc/contextualized-topic-models
Also ETM uses embeddings, but static word embeddings, so it can't be easily adapted to BERT embeddings. See the implementation here: https://github.com/MIND-Lab/OCTIS/blob/master/octis/models/etm.py