There are 0 repository under mrjob topic.
In this project, I used Decision Tree Learning Model as the main algorithm to build the model. Due to the big amount of flight data, we implement the project using MRJob, PySpark and Spark's MLlib then compare the performance and accuracy of those implementations.
Exercises and examples developed for the Hadoop with Python tutorial
Movie rating prediction application
Analysed New York City's Yellow taxi data set with Big Data tools such as Hadoop, HBase, Sqoop, MapReduce and AWS Cloud Infrastructure.
Projeto de processamento distribuído de dados utilizando Python, MRJob e AWS EMR
RECUPERACIÓ DE LA INFORMACIÓ Curs 2023-24 EPSEVG
Project developed to make an sentiment analysis using dictionary implemented with MrJob applying a map-reduce model. It can be executed locally or in HDFS enviroments (such as Hadoop or AWS)
Practice tasks in Python programming language using Hadoop, MRJob, PySpark for Big Data Analytics.
Search engine for movie cast generation.
Samples related to data engineering, e.g. spark, embulk, airflow, etc.
Accurate and high performance C++ interop code generator for C#.
Big Data analysis project using MapReduce in Python to process movie ratings. Includes scripts for aggregating ratings and identifying the most rated movies, demonstrating data analysis on a large scale.
En esta práctica se empaqueta y distribuye una aplicación Python que descarga y analiza tweets en función de puntuaciones de sentimiento. Los resultados del análisis se guardan en una base de datos MongoDB, y la información se muestra en la web.
A Data analysis module using MapReduce. Made for Andromeda by Damascus Labs.
Big Data Management Systems course assignments
Exercises in the Scala programming language with an emphasis on big data programming and applications in Apache Hadoop and Apache Spark.
This is a showcase of the deliverables of a piece of coursework of the course ECS765 Big Data Processing at Queen Mary University of London, in the Fall semester of 2020.
Analyzing Amazon product reviews
Analysis of full set of transactions which have occured on the Ethereum network
Analyzes book review data from Amazon and the Amazon-Vine program utilizing PySpark and Amazon Web Service's Relational Database Service (AWS RDS)
ETH analysis using big data for the QMUL Big Data Processing module. Intended to promote analysis of data retrieved via big data processing
Using MrJob, create a word counter which breaks the paragraph into small incremental tasks which can be aggregated over a larger Hadoop cluster.
Criando seu Ecossistema de Big Data na Nuvem
A performance evaluation of two algorithms for performing matrix multiplication using MapReduce
Simple map-reduce program that returns the number of times a word occurs inside a file