In this repository we provide the data set for the corpus on German topic classification and success (GTCS6k). Please see the corpus website for more information about the corpus.
We provide the corpus to the scientific community. However, for legal reasons, we are not allowed to share the entire data of the posts directly. In order to still publish the corpus while respecting the rights of third parties, we instead provide the annotations along with the IDs of the posts and a script that allows interested readers to retrieve the posts of the corpus on their own.
Requirement: You need a Facebook app that has successfully passed the review and holds the Public Page Content Access permission.
Necessary steps for the retrieval of the posts:
- Edit the file
data/config.py
and add your Facebook Access Token - Run
data/retreive_posts.py
to retrieve the posts - Open
data/posts.json
to inspect the posts
The usage of the experiments is described in a spearate README in the experiments folder.
The software in this repository is available under the MIT license (LICENSE).
The corpus itself however is provided under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License. By using the corpus you agree to this license.