This is the solution for the Logs Analysis project in Udacity Full Stack Nanodegree course. I had to execute complex queries on a large database (> 1000k rows) to extract intersting stats.
The database in question is a newspaper company database where we have 3 tables; articles
, authors
and log
.
articles
- Contains articles posted in the newspaper so far.authors
- Contains list of authors who have published their articles.log
- Stores log of every request sent to the newspaper server.
This project implements a single query solution for each of the question in hand. See logs.py for more details.
- Install Vagrant and VirtualBox
- Download or Clone fullstack-nanodegree-vm repository.
- Download the data from here.
- Unzip this file after downloading it. The file inside is called newsdata.sql.
- Copy the newsdata.sql file and content of this current repository, by either downloading or cloning it from Here
- Launch the Vagrant VM inside Vagrant sub-directory in the downloaded fullstack-nanodegree-vm repository using command:
$ vagrant up
- Then Log into this using command:
$ vagrant ssh
- Change directory to /vagrant and look around with ls.
- Load the data in local database using the command:
psql -d news -f newsdata.sql
- Use
psql -d news
to connect to database.
The program will output a report for the following questions:
- Top Articles, the 3 most viewed articles
- Top Authors, the most popular authors
- Requests with errors, The days with more than 1% of error on page requests.
You also find an example of output in output.txt