log analysis is part of Udacity's full-stack nanodegree. Its purpose is to generate insights from a database containing users logs, authors and articles data.
The database is a postgresql database containing 3 main tables (log
- articles
- authors
).
The aim of this simple tool is to run certain static sql queries against the database from python code and generate a report in a text file.
Before executing python code, an extra view named log_cleared
have to be created in the database. check below
-
Install vagrant, VM and configuration file
- download and install virtual box version that corresponds to your operating system from here
- download and install vagrant version that corresponds to your operating system from here
- make sure that vagrant is installed by running this command in your terminal
vagrant --version
. N.B. if you're using windows os, will need to install and use git terminal - download VM configuration
- download and unzip this file offered by Udacity
- you'll end with a directory containing the configuration file.
cd
to that directory - start the virtual machine by running
vagrant up
. This will cause Vagrant to download the Linux operating system and install it. - clone the contents of this github repository to the directory containing the configuration file, This is the shared folder with your virtual machine.
git clone https://github.com/i-mw/log-analysis
-
Run vagrant Virtual machine:
cd
to the folder containing vagrant configuration filevagrant ssh
-
From vagrant terminal:
cd /vagrant/log-analysis
to move to the project directory on the shared folderpsql news
=> to enter the database- create a view named
log_cleared
create view log_cleaned as (select split_part(path, '/', 3) slug, ip, method, status, time, id from log
ctrl + d
=> to exit the database-
=> to run the program
python3 analyser.py
- check the generated output at
report.txt
A view named log_cleared
is created before running python code or fetching anything from the database.
log_cleared
view contains the same columns as log
table except that the path
column is split and the part corresponding to an article slug is stored in a new column in the view named slug
To create the view, execute this statement inside psql
create view log_cleaned as (select split_part(path, '/', 3) slug, ip, method, status, time, id from log
The python code of the program is formed of 4 functions/parts:
-
get_queries
function stores and returns the static sql queries in addition to other descriptive info about the query. -
write_text_block
function writes whatever text you feed it intoreport.txt
file. So, it writes the formatted results of the queries toreport.txt
-
connect_to_db
function connects only to the database, no more. -
execute_queries
function sends sql queries to the database to execute it and return the results, then feed them towrite_text_block
to generate insights intoreport.txt