Logs Analysis

This is the solution for the Logs Analysis project in Udacity Full Stack Nanodegree course. I had to execute complex queries on a large database (> 1000k rows) to extract intersting stats.

The database in question is a newspaper company database where we have 3 tables; articles, authors and log.

articles - Contains articles posted in the newspaper so far.
authors - Contains list of authors who have published their articles.
log - Stores log of every request sent to the newspaper server.

This project implements a single query solution for each of the question in hand. See logs.py for more details.

How to run

PreRequisites:

Setup Project:

Install Vagrant and VirtualBox
Download or Clone fullstack-nanodegree-vm repository.
Download the data from here.
Unzip this file after downloading it. The file inside is called newsdata.sql.
Copy the newsdata.sql file and content of this current repository, by either downloading or cloning it from Here

Launching the Virtual Machine:

Launch the Vagrant VM inside Vagrant sub-directory in the downloaded fullstack-nanodegree-vm repository using command:

  $ vagrant up

Then Log into this using command:

  $ vagrant ssh

Change directory to /vagrant and look around with ls.

Setting up the database

Load the data in local database using the command:

  psql -d news -f newsdata.sql

Use psql -d news to connect to database.

Output

The program will output a report for the following questions:

Top Articles, the 3 most viewed articles
Top Authors, the most popular authors
Requests with errors, The days with more than 1% of error on page requests.

Example of output

You also find an example of output in output.txt

reck1ess / log-analysis