rishabhindoria / Big-Data-Hadoop-Pig-Latin

Apache Pig Latin script to count letters in multiple input text files, using the HortonWorks Hadoop Sandbox or Google Cloud Platform

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Hadoop-Pig

• Objective: To determine which characters occur how many times in a dataset of textfiles (para1.txt to para6.txt) and performing big data analysis.

• Created a script countChar.pig which automatically maps SQL-like user commands to multiple mappers and reducers in the background which can be executed all in parallel to handle big data, thus listing character count for each alphabet in the dataset.

• Created a script popularFlavor.pig which used two text files purchases.txt (which contains all the purchases made by kids over time) and kids.txt (which contains the count of purchases made by each individual kid) to come up with the answer for the most popular flavor amongst the kids (thus analyzing big data)

About

Apache Pig Latin script to count letters in multiple input text files, using the HortonWorks Hadoop Sandbox or Google Cloud Platform


Languages

Language:PigLatin 100.0%