cominsys / Sep_TD_Tel01

Topic Detection dataset on Persian Telegram Posts

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sep_TD_Tel01

Topic Detection dataset on Persian Telegram Posts

Introduction

The Sep_TD_Tel01 dataset was compiled by ComInSyS. This database has been compiled due to the low resource of the Persian language and the high popularity of the Telegram social network in Iran. In this work, an official API published by Telegram is used. To respect the principle of privacy, only data related to public channels and public groups have been collected.

This dataset contains 10,209 records of messages sent to public channels and groups in the one month between 1 January 2017 (12 Day 1395) and 31 January 2017 (12 Bahman 1395). This database is divided into sixty 12-hour windows, which include two super-hot topics: ”The death of Ayatollah Hashemi Rafsanjani” and the second topic, ”Plasco building fire” To be able to review and control the performance, nine of these windows have been selected as GT and labeled. These windows are [14, 15, 16, 17, 18, 37, 38, 39, 40].

Cite:

@Misc{Sep-TD-Tel_Mendeley,
	title={Sep\_TD\_Tel01},
	author = {Mehrdad Ranjbar-Khadivi and Mohammad-Reza Feizi-Derakhshi and Aynaz Forouzandeh and Pejman Gholami and Ali-Reza Feizi-Derakhshi and Elnaz Zafarani-Moattar},
	year = {2022},
	doi = {10.17632/372rnwf9pc},
	url = {https://github.com/cominsys/Sep_TD_Tel01}
}
@article{Sep-TD-Tel,
	author = {Pejman Gholami-Dastgerdi and Mohammad-Reza Feizi‐Derakhshi and Aynaz Forouzandeh},
	journal = {Concurrency and Computation: Practice and Experience},
	month = {10},
	number = {27},
	title = {Named entities detection by beam search algorithm},
	volume = {34},
	year = {2022},
	doi = {10.1002/cpe.7325},
	url = {http://dx.doi.org/10.1002/cpe.7325},
}

About

Topic Detection dataset on Persian Telegram Posts

License:GNU General Public License v3.0