There are 5 repositories under mapreduce-python topic.
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
《大数据挖掘技术》@复旦 课程项目,试图从搜狗实验室用户查询日志数据(2008)中找出搜索记录中有较高支持度关键词的频繁二项集。在实现层面上,我搭建了一个由五台服务器组成的微型 Hadoop 集群,并且用 Python 实现了 Parallel FP-Growth 算法中的三个 MapReduce 过程。
KMeans, Cure and Canpoy algorithms are demonstrated using Pyspark.
Using hadoop to utilize data from an automobile tracking platform that tracks the history of important incidents after the initial sale of a new vehicle.
基于Item-based CF和XGBRegressor完成的用户对商品的推荐系统
Lambda to start EMR and run a map reduce job
A REST-based service that translates the SQL query into MapReduce and Spark jobs. It runs these jobs and provides the JSON object. SQL to MapReduce and Spark translator.
A repository containing the source codes for the assignments done as a part of the Big Data course (UE18CS322) at PES University.
Pulled 10GB ofYelp Business data through the terminal via Kaggle API. The data was then pushed to and AWS S3 Bucket bucket for storage and analyzed on a Elastic MapReduce Cluster on a Jupyter Notebook using PySpark
This repository have codes that extracts meaningful information from News headline data-set.
Market basket analysis of finding frequent itemsets using SON algorithm in Spark
MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real-world tasks are expressible in this model.
⚡️공개용 맵리듀스 플랫폼인 Spark를 사용하여 데이터마이닝을 해보자⚡️
This repository aims to develop a basic search engine utilizing Hadoop's MapReduce framework to index and process extensive text corpora efficiently. The dataset used for this project is a subset of the English Wikipedia dump, totaling 5.2 GB in size. The project focuses on implementing a naive search algorithm to address challenges in information.
Implementation of Hadoop and Spark
Apache Hadoop docker image | Running Python MapReduce
These are the various programs which i used for my hadoop projects.
Implementation of the MapReduce PageRank algorithm using the Spark framework both in Python and in Java (developed for Cloud Computing course)
Programs for MapReduce written in java with least complexity!
Mapreduce Presentation
Desarrollos en Python de patrones MapReduce, que no han sido incluidos en el TFG final.
Multiprocessing can be an effective way to speed up a time-consuming workflow via parallelization. This article illustrates how multiprocessing can be utilized in a concise way when implementing MapReduce-like workflows.
A Hadoop based Map-Reduce based SQL engine
Emulation-based System for Distributed File storage and Parallel Computation
Performing Map reduce to get the page rank on the WDC data.
Distributed Computing using Hadoop, Docker and Python (Map Reduce)
Understand how map reduce works for parsing a text data with parallel processing of sub tasks using multi threading
Big Data analysis project using MapReduce in Python to process movie ratings. Includes scripts for aggregating ratings and identifying the most rated movies, demonstrating data analysis on a large scale.
Average age of male and female died in Titanic using MapReduce programming in Python
BigData Workshop - Python MapReduce for word frequency analysis on varied datasets.
Big data training material