There are 4 repositories under cloudera-hadoop topic.
ansible playbook to deploy cloudera hadoop components to the cluster
Docker image for Cloudera Hadoop components (CDH5)
A quick and dirty CDH cluster skeleton using Docker for Testing
Otto-von-Guericke Universität Magdeburg - Big Data SoSe 2017
Getting Started with Hadoop and Big Data
Spark Benchmark suite to evaluate cluster configuration and compare the performance with other big data frameworks.
:guardsman: Hadoop/MapReduce Streaming
This repository contains the TF-IDF score calculation for the documents in the Canterbury dataset for a user given search query
How to install Cloudera quickstart
The goal of this programming assignment is to compute the PageRanks of an input set of hyperlinked Wikipedia documents using Hadoop MapReduce. The PageRank score of a web page serves as an indicator of the importance of the page. Many web search engines (e.g., Google) use PageRank scores in some form to rank user-submitted queries. The goals of this assignment are to: 1. Understand the PageRank algorithm and how it works in MapReduce. 2. Implement PageRank and execute it on a large corpus of data. 3. Examine the output from running PageRank on Simple English Wikipedia to measure the relative importance of pages in the corpus. To run your program on the full Simple English Wikipedia archive, you will need to run it on the dsba-hadoop cluster to which you have access.
This project creates a small local Hadoop cluster using Cloudera CDH and CentOS.
Learn How Hive Work in Simple Example
chatbot for hipchat (cloud or onpremise) that enables you to talk to your cloudera manager
This is my final project for Data Engineer Expert course at Naya College.
This project involves analysing the airline datasets to solve the problem statements using HADOOP.
GCP hosted product for over 1 million movie investors on HSX.com, aiding online movie trading and box-office investments by leveraging Big Data technologies like Hive and Hadoop, and Tableau dashboards
This contains how to perform Sentiment Analysis on the tweets from Twitter using Hive.Collect the tweets from Twitter using Flume, As the tweets coming in from twitter are in Json format, we need to load the tweets into Hive using json input format. Use Cloudera Hive json serde for this purpose.
Anticipatory customer order prediction after purchasal of item(s).
Data processing using docker containers, kafka, spark, and hadoop
fundamental-hadoop is basically for introduction about Apache Hadoop and it's ecosystem.
Cloudera commands used for Big Data Analytics
Navigator is a data service that prepares the content for travel agencies, ready for exploration in EWNS (East-West-North-South) direction and hence allows them to render content to the end-user based on their desire to travel.
Running my first pyspark app in CDH5
Keywords network builder based on TF-IDF with the use of Hadoop platform
a Simple Apache Spark Tutorial
a Simple HBase Tutorial
Learn How Hive Work With HBase in Simple Example
a Simple SparkSQL Tutorial
This repository includes two versions of hadoop management tools