iamaziz / etl

simple ETL example

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

What

This repo contains script for demonstrating a simple ETL data pipeline. Starting from extracting data from the source, transforming into a desired format, and loading into a SQLite file.

Then, perform simple analysis queries on the stored data. See: analysis notebook.

Note: The used data is about the US population and unemployment rate over the past decade.


Getting started

First, install the dependencies

$ pip install -r requirements.txt

Note: preferable, run under a virtualenv

$ virtualenv .env -p python3
$ source .env/bin/activate
$ pip install -r requirements.txt

Second, run the main pipeline file

$ python pipeline.py
Data Pipeline created
	 extracting data from source ....
	 formatting and transforming data ...
	 loading into database ...

Done. See: result in "db.sqlite"

Requirements

  • pandas
  • xlrd (for reading excel file)
  • Python >= 3.6

Data source and description

POPULATION BY METROPOLITAN AREA AND COUNTY

UNEMPLOYMENT BY COUNTY

About

simple ETL example


Languages

Language:Jupyter Notebook 95.2%Language:Python 4.8%