MIND-Lab / OCTIS

OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

load_custom_dataset_from_folder

srashtchi opened this issue · comments

Hi Silvia

I managed to get my code running fine, thanks for your response.

I have another question , I am trying to make the code smoother, right now in order to create a dataset object I have to save my variable to a .tsv file first, and then use the load_custom_dataset_from_folder method to load the data from .tsv into empty dataset object. without this object obviously the get_corpus() method wouldn't do its magic. See the sample code below.

So basically the question is: is there a way to directly pass my variable to a dataset object without saving and loading?

from octis.dataset.dataset import Dataset
f=Path('/myFolderPath/corpus.tsv')
df.to_csv(f, sep="\t", index=False, header=False, columns = ['document'])

dataset = Dataset()
dataset.load_custom_dataset_from_folder('/myFolderPath/')

texts=dataset.get_corpus()

Originally posted by @srashtchi in #68 (comment)

Is there any chance you could respond to this question?

Hello, sorry for the late reply.
If you need the dataset only for the computation of the coherence, then you can directly define the "texts" as a list of lists of strings. I.e.

texts=[['a', 'b', 'c'], ['a', 'd', 'e'], ...]

This will not require to save and load the dataset.
Let me know if this helped :)

Silvia

Thank for the quick reply. I will try this.