Twitch Livestream Data Scraper
Overview
This application periodically saves certain livestream data through a Github Action Workflow through the Twitch API. The data is then stored in a Google Cloud SQL instance.
Dependencies
Setup
Get a Twitch API Token
- Register a Twitch developer application: https://dev.twitch.tv/console/apps
- Generate a new client secret.
- Now you have a client_id and a client_secret. Both values must be saved in a Github secret. The GitHub action workflow will use those values for API authentication. The secrets must be named as follows:
- TWITCH_CLIENT_ID
- TWITCH_CLIENT_SECRET
Setup a Google Cloud SQL instance
- Setup a Google Cloud SQL instance.
- Create a new user with password authentication.
- Store the MySQL user credentials in GitHub Secrets as follows:
- MYSQL_USER
- MYSQL_PW
- Setup the Cloud SQL-Proxy and generate a private key. https://cloud.google.com/sql/docs/mysql/connect-external-app#proxy
- Store the value of the private key in the GitHub Secret GCP_SQL_KEY
- Store the value of the MySQL connection name in the GitHub Secret GCP_SQL_INSTANCE
Configure the Scraper
- Define the workflow schedule according to the crontab format. See https://crontab.guru/ for help.
- Define the Twitch user_id to scrape in the env variable TWITCH_USER_ID in the scraper.yml file.
- Define the MySQL Database name where the data is stored in the env variable MYSQL_DB in the scraper.yml file.
- Define if you want to store the data if the livestream is offline. This is defined in the env var STORE_IF_OFFLINE
DB scheme:
TABLES['stream_data'] = (
"CREATE TABLE `stream_data` ("
" `STREAM_LIVE` bool,"
" `USER_ID` VARCHAR(30),"
" `STREAM_ID` VARCHAR(30),"
" `STREAM_TITLE` VARCHAR(100),"
" `GAME_ID` VARCHAR(30),"
" `GAME_NAME` VARCHAR(30),"
" `START_TIMESTAMP` datetime,"
" `CHECK_TIMESTAMP` datetime"
") ENGINE=InnoDB")
Table:
mysql> SELECT * FROM stream_data;
+-------------+-----------+-------------+---------------------------------------------------+---------+----------------+---------------------+---------------------+
| STREAM_LIVE | USER_ID | STREAM_ID | STREAM_TITLE | GAME_ID | GAME_NAME | START_TIMESTAMP | CHECK_TIMESTAMP |
+-------------+-----------+-------------+---------------------------------------------------+---------+----------------+---------------------+---------------------+
| 1 | XXXXXXXXX | XXXXXXXXXXX | XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | 65876 | Cyberpunk 2077 | 2020-12-19 10:03:59 | 2020-12-19 12:25:26 |
| 1 | XXXXXXXXX | XXXXXXXXXXX | XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | 65876 | Cyberpunk 2077 | 2020-12-19 10:03:59 | 2020-12-19 12:31:40 |
| 1 | XXXXXXXXX | XXXXXXXXXXX | XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | 65876 | Cyberpunk 2077 | 2020-12-19 10:03:59 | 2020-12-19 12:32:00 |
| 1 | XXXXXXXXX | XXXXXXXXXXX | XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | 65876 | Cyberpunk 2077 | 2020-12-19 10:03:59 | 2020-12-19 12:32:43 |
| 1 | XXXXXXXXX | XXXXXXXXXXX | XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | 65876 | Cyberpunk 2077 | 2020-12-19 10:03:59 | 2020-12-19 13:04:21 |
+-------------+-----------+-------------+---------------------------------------------------+---------+----------------+---------------------+---------------------+
Database value | Description |
---|---|
STREAM_LIVE | True if the streamer is live |
USER_ID | The streamers user_id |
STREAM_ID | The ID of the current livestream |
STREAM_TITLE | The current stream title |
GAME_ID | The ID of the current game |
GAME_NAME | The name of the current game |
START_TIMESTAMP | The timestamp when the livestream started |
CHECK_TIMESTAMP | The timestamp of the data scraper run |
All timestamps are in UTC time!
Contributing
I welcome direct contributions to the Twitch Livestream Data Scraper code base. Thank you!
License
This is open source software licensed as GNU General Public License v3.0.