dykang/cgraph

time-series explanation causal-explanation natural-language-generation natural-language-processing causality granger-causality

Thsi repository contains dataset used in Detecting and Explaining Causes From Text For a Time Series Event, EMNLP'17. Please contact Dongyeop Kang (dongyeok@cs.cmu.edu) if you have any questions.

How-to-download

./download_extract.sh

This script will automatically download all datasets and extract each zipped file into separate directories.

Dataset

The format is [Date] \t [Count/Probability]:

sentis: sentiment (positive/negatie) time series for each company and politician
topics: topic time series for each company and politician
topics.sentis: sentiment of each topic time series for each company and politician
unigram: uni-gram time series (12,804 words), uni.filtered.events contains temporal dynamics of each word
bigram: bi-gram time series (25,909 words), uni.filtered.events contains temporal dynamics of each word

For better replication, we additionaly share following data:

Stock prices used in the experiment are udpated under ./stock_price.
10K tweet IDs per each day are also shared under ./tweet_ids_10k_per_day.

Reference

If you think this dataset is useful for your research, please consider citing this paper.

@inproceedings{kang2017detecting,
  title={Detecting and Explaining Causes From Text For a Time Series Event},
  author={Kang, Dongyeop and Gangal, Varun and Lu, Ang and Chen, Zheng and Hovy, Eduard},
  booktitle={Conference on Empirical Methods on Natural Language Processing},
  year={2017}
}

License

MIT

About

dataset for Detecting and Explaining Causes From Text For a Time Series Event, EMNLP'17

time-series explanation causal-explanation natural-language-generation natural-language-processing causality granger-causality

Languages

Language:Shell 100.0%