kyr0 / lars_datasets

Lars's datasets

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Lars Datasets

This repo contains datasets used for my research.

Topics Dataset 50

This dataset contains 50 synthetically generated topics, in addition to 200 synthetically generated sentences, where sentence is related to one of the 50 topics.

The topics are found in topics_dataset_50/topics_english.csv. The sentences are found in topics_dataset_50/topic_sentences_<language>.csv.

The dataset contains varying difficulty levels, and is used for evaluating topic modeling algorithms.

About

Lars's datasets