hridayns / Big-Data-Apache-server-logs-analysis-using-Pig-and-Python

Big Data – Apache server logs analysis using Pig and Python

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Project Details

This repository contains code to analyze Apache server logs to find the most visited website using Apache Hadoop's Pig script extended using Python's User-defined Functions (UDF). It was run on an Ubuntu instance deployed on Oracle's VMware with the help of Vagrant.

Contents

  • shareFiles/pig_script .py contains code to compute the page hits and store them.
  • shareFiles/script .py contains the Python UDF to parse the sample Apache logs.
  • shareFiles/sample_log contains the sample logs on which the scripts are run.

About

Big Data – Apache server logs analysis using Pig and Python


Languages

Language:Python 60.2%Language:PigLatin 39.8%