srbnghosh99 / twitter_download

Download scripts for distributing twitter data.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Semeval Twitter data download script

For downloading tweets distributed using IDs to protect privacy. Uses the format of the Semeval Twitter sentiment analysis dataset

Prerequisites:

sixohsix/twitter tqdm/tqdm

easy_install twitter
easy_install tqdm

Usage:

The first time you run this, it should open up a web browser, have you log into twitter, and show a PIN number for you to enter into a prompt generated by the script.

  1. Login to Twitter with your user name in your default browser.
  2. Run the script like this to download your credentials: python download_tweets_api.py --dist=tweeti-a.dist.tsv
  3. Download tweets like so:
python download_tweets_api.py --dist=tweeti-a.dist.tsv --output=downloaded.tsv

-Note that it takes about 18 hours to download the Semeval sentiment analysis training dataset.

Restarting after a partial download:

In case the script hangs in the middle of the download for whatever reason, use the --partial argument to specify the file containing partially downloaded results.
This way you won't have to start from scratch again:

python download_tweets_api.py --dist=tweeti-a.dist.tsv --partial=downloaded.tsv --output=downloaded2.tsv

Task A Mention Test Script

To print out the mentions and annotations from task A you can use the testIndices.py script like so:

python testIndices.py downloaded.tsv

This just prints out the mentions with sentiment annotations for easier inspection.

Notes:

  • You may need to manually change the link that is printed out for authorization to use https:// instead of http://
  • The time on your computer needs to be set accurately. Thanks to Canberk for noting this on the email list.

About

Download scripts for distributing twitter data.

License:MIT License


Languages

Language:Python 100.0%