This project is an analysis of the relationship between the GDP and latitude of countries around the world. It contains 1) a Python script which scrapes data from Wikipedia using LXML, creates and merges data frames using Pandas, runs linear regression using statsmodels, and plots the data using Matplotlib, and 2) a paper describing the process and results. This was done for a class on Computational Economics.
First, install Python (I used 2.7.6), pip, and Latex (if you want to make the paper).
To install dependencies invoke
pip install lxml numpy pandas statsmodels matplotlib patsy
To run the script, invoke
python gdp-lat.py
which will print the output of the regression, and generate figure/scatter.pdf.
To make the paper, go to paper/ and invoke
pdflatex document.tex
bibtex document
pdflatex document.tex
pdflatex document.tex
The web scraping part of the script is likely to break in the future, so I've cached the cleaned data in data/lat_gdp.csv.