tukichen / Big-Data-Analysis

Big data projects using Hadoop MapReduce

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Big Data Projects

1. N-Gram modeling(Google Auto Completion with MapReduce)

Google Auto Completion with MapReduce and Hadoop

Java, Hadoop, MapReduce; Tools: Hadoop, Docker,MySql,Jquery,PHP,Ajax,Intellj

  • Constructing N-Gram Library based on Data from Wikipedia
  • Implementing Language Model according to Statistics Probability and push into the database
  • Using JQuery,PHP,Ajax to call data from database, achieving autocompletion in real time
  • Displaying Auto completion function of Search Engine on Web side

2. Mock Google Search Engine by implementing PageRank algorithm

Mini Google Search Engine

Java, Javascript, HTML5; Tools: Hadoop, Docker, MapReduce, Intellj, Maven Language: Java

  • Implementing Page Rank algorithm similar to Google Search Engine
  • Grabing data set from Wikipedia
  • Constructing relationship among different websites by adjacency matrix
  • Calculating PageRank of various websites based on the relationship constructed between websites
  • Realizing the rank of pages by using convergency value of PageRank

3. Top K - Hot Topic Analysis

Social media networks, such as Twitter and Facebook, provide exciting opportunities that can open up a new era of social science research.The data extracted from social media has gained a growing interest among many researchers attempting to better understand the nature and power of social media. So In this project we are accessing tweets and tokenizing them to find out top k-words used at any given interval of time(in our case k varies from 1 to 100). We are hoping that our project can be used as a tool to find patterns,news feeds etc.

4. Movie Recommender System

About

Big data projects using Hadoop MapReduce


Languages

Language:Java 100.0%