minhphan03 / DataCollector

A multi-part data pipeline project to automate importing data and doing predictions

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

InfluxDB-based Data Collector with Airflow

Using InfluxDB, Python Scrapy and Airflow to automate a schedule-based data collector, this project is used as a predecessor and guiding template for the web-based Twitter Bot (still under construction) and StockCollector's other branch (on model branch). The two branches are experimented on different scraping subjects, but will be resolved to make use of the automation of the back-end.

Resources

Special thanks to Mr. Pham Thanh Hai (my internship mentor at TMA Solutions) for assisting me with the work on Airflow and InfluxDB concepts, including the hooks and connectors in the user-defined library ('plugins' folder)

About

A multi-part data pipeline project to automate importing data and doing predictions


Languages

Language:Python 98.8%Language:Dockerfile 1.2%