- python3.6.* or 3.7.*
- PostgreSQL
- Twitter APIs
- Google APIs
git clone https://github.com/re3turn/twicrawler.git
cd twicrawler
pip3 install -r requirements.txt
Execute get_refresh_token.py
after setting environment variables GOOGLE_CLIENT_ID
and GOOGLE_CLIENT_SECRET
.
$ python3 get_refresh_token.py
Please visit this URL to authorize this application: https://accounts.google.com/o/oauth2/auth?response_type=code&.....
Enter the authorization code: {AUTHORIZATION CODE}
refresh_token: {REFRESH TOKEN}
Environment variable | Description | Require |
---|---|---|
TWITTER_USER_IDS | Twitter user ID to crawling.If multiple users are specified, separate them with , |
✓ |
INTERVAL | Crawler interval(minutes). default=5 minutes |
|
TWEET_COUNT | Specifies the number of tweet statuses to retrieve. default=100 |
|
TWEET_PAGES | Specifies the page of results to retrieve. default=5 |
|
DATABASE_URL | Database url. format postgres://<username>:<password>@<hostname>:<port>/<database> |
✓ |
DATABASE_SSLMODE | Database sslmode. default=require |
|
TZ | Time zone | |
TWITTER_CONSUMER_KEY | Twitter consumer API keys | ✓ |
TWITTER_CONSUMER_SECRET | Twitter consumer API secret key | ✓ |
TWITTER_ACCESS_TOKEN | Twitter Access token | ✓ |
TWITTER_ACCESS_TOKEN_SECRET | Twitter Access token secret | ✓ |
GOOGLE_CLIENT_ID | Google API client id | ✓ |
GOOGLE_CLIENT_SECRET | Google API client secret | ✓ |
GOOGLE_REFRESH_TOKEN | Google API refresh token | ✓ |