Twistream helps you automatically collect and store data from Twitter Stream API.
Latest stable release:
pip install twistream
From source:
git clone https://github.com/guillermo-carrasco/twistream.git
cd twistream
pip install .
You need your twitter credentials in order to be able to use Twitter API. For that, create an application here. Once created, save the credentials to configure twistream
You can use the command twistream init
to help you create a correctly formatted configuration file
for your collections.
Once created, you will have a file that will luke like this:
~> cat ~/.twistream/twistream.yml
twitter:
consumer_key: your_consumer_key
consumer_secret: your_consumer_secret
access_token_key: your_access_token_key
access_token_secret: your_access_token_secret
backend: backend_name
backend_params:
username: db_username
password: db_password
Remember that --help
is always an available option
Once created a configuration file, start collecting tweets!
twistream collect --tracks tracks,to,follow config.yaml
Refer to the twitter documentation to know what tracks are, in short:
A comma-separated list of phrases which will be used to determine what Tweets will be delivered on the stream. A phrase may be one or more terms separated by spaces, and a phrase will match if all of the terms in the phrase are present in the Tweet, regardless of order and ignoring case. By this model, you can think of commas as logical ORs, while spaces are equivalent to logical ANDs (e.g. ‘the twitter’ is the AND twitter, and ‘the,twitter’ is the OR twitter).
If what you want is to follow hashtags, don't forget to include the #
character.
From version 0.1.3, twistream supports two backends. A relational database (SQLite) and a no-sql database (MongoDB).
NOTE that the SQLite backend will only save a couple of tweet fields, whilst the MongoDB backend will save the whole blob. It is a trade off between information and storage space.
backend: sqlite
backend_params:
db_path: /path/to/your/db
backend: mongodb
backend_params:
db_string: database_connection_string
(See database connection string documentation)