renewollny / Gans

SQL/Python/AWS-Project creating Data-Pipelines

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Gans

SQL/Python/AWS-Project from DataScience-Bootcamp

Creating Data-Pipelines

Gans is a startup developing an e-scooter-sharing system. It aspires to operate in the most populous cities all around the world. In each city, the company will have hundreds of e-scooters parked in the streets and allow users to rent them by the minute.

Gans has seen that its operational success depends on having its scooters parked where users need them.

The company wants to anticipate as much as possible scooter movements.

The task is to collect data from external sources that can potentially help Gans predict e-scooter movement. Since data is needed every day, in real-time and accessible by everyone in the company, the challenge is going to be to assemble and automate a data pipeline in the cloud.

There are two phases the taks has been devided into:

1.1 Scrape data from web
1.2 Collect data with APIs
1.3 Create database model
1.4 Store data on local MySQL instance

2.1 Set up cloud database
2.2 Move the scripts to AWS-Lambda
2.3 Automate the pipeline

Due to financial restrictions the second phase has been done throughout the project, but is not live anymore. That's why only the AWS Lambda-functions have been stored in the specific folder in the repo.

About

SQL/Python/AWS-Project creating Data-Pipelines

License:GNU General Public License v3.0


Languages

Language:Jupyter Notebook 94.7%Language:Python 5.3%