twitass

Scrapes tweets from the Twitter Advanced Search webpage - bypasses the 10 day limit of the public API

How do I get set up?

Clone the repo

git clone https://github.com/gutfeeling/twitass.git
cd twitass

Install dependencies

First the Python packages

Create a virtualenv

If you want to use python 2
```
virtualenv venv
```
If you want to use python 3
```
virtualenv -p python3 venv
```

Activate the virtualenv

source venv/bin/activate

Install the python modules

If you want to use python 2
```
pip install -r requirements2.txt
```
If you want to use python 3
```
pip install -r requirements3.txt
```

Start scraping

Here is a basic example which searches for the word "python" in the Twitter Advance Search webpage and returns the first 200 tweets.

>>> from scraper import AdvancedSearchScraper
>>> ass = AdvancedSearchScraper("python", 200)
>>> tweets = ass.scrape()    # Returns the first 200 tweets in a list
>>> tweets[0]    # Each list element is a dict containing data of one tweet
{'tweet_timestamp': '1470408709000', 
 'tweet_id': 761575443145162752, 
 'author_href': '/ulysseas', 
 'tweet_permalink': '/ulysseas/status/761575443145162752', 
 'retweets': 0, 
 'author_name': 'Don Sheu 許家豪', 
 'tweet_time': '7:51 AM - 5 Aug 2016', 
 'author_handle': 'ulysseas', 
 'tweet_language': 'en', 
 'favorites': 0, 
 'author_id': 229946505, 
 'tweet_text': "@DaveParkerSEA @DRNilssen in Rio? Hope you connect w/ @ChicagoPython 's @brianray , Brian's my best friend, introduced me to Python community"
 }

About

Scrapes tweets from the Twitter Advanced Search webpage - bypasses the 7 day historical limit of the public API

Languages

Language:Python 100.0%