dataquestio / analytics_pipeline

Code to build a simple analytics data pipeline with Python

Home Page:http://www.dataquest.io/blog/data-pipelines-tutorial/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Analytics Pipeline

This repo contains the code for creating a data pipeline to calculate metrics for a fake webserver:

  • log_generator.py -- generates fake webserver logs.
  • store_logs.py -- parses the logs and stores them in a SQLite database.
  • count_visitors.py -- pulls from the database to count visitors to the site per day.

Installation

To get this repo running:

  • Install Python 3. You can find instructions here.
  • Create a virtual environment.
  • Clone this repo with git clone git@github.com:dataquestio/analytics_pipeline.git
  • Get into the folder with cd analytics_pipeline
  • Install the requirements with pip install -r requirements.txt

Usage

  • Execute the three scripts mentioned above, in order.

You should see output from count_visitors.py.

About

Code to build a simple analytics data pipeline with Python

http://www.dataquest.io/blog/data-pipelines-tutorial/


Languages

Language:Python 100.0%