mapreduce-python

There are 5 repositories under mapreduce-python topic.

mahmoudparsian / big-data-mapreduce-course
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
pyspark-algorithms-book mapreduce santa-clara-university pyspark data-algorithms data-transformation data-partition partitioning-algorithms algorithms mapreduce-python mapreduce-algorithm apache-hadoop apache-spark big-data data-analysis data-engineering glossary monoid spark-dataframes spark-rdd
Language:HTML 161
CLDXiang / Mining-Frequent-Pattern-from-Search-History
《大数据挖掘技术》@复旦课程项目，试图从搜狗实验室用户查询日志数据（2008）中找出搜索记录中有较高支持度关键词的频繁二项集。在实现层面上，我搭建了一个由五台服务器组成的微型 Hadoop 集群，并且用 Python 实现了 Parallel FP-Growth 算法中的三个 MapReduce 过程。
fp-growth hadoop mapreduce mapreduce-python
Language:Python 32
NbnbZero / Recommendation-System
基于Item-based CF和XGBRegressor完成的用户对商品的推荐系统
collaborative-filtering mapreduce-python pyspark
Language:Python 5
SinghHarshita / Clustering-Algorithms-Spark
KMeans, Cure and Canpoy algorithms are demonstrated using Pyspark.
spark mapreduce clustering clustering-algorithm kmeans cure canopy apache-spark spark-cluster big-data-analytics big-data mapreduce-python machine-learning
Language:Jupyter Notebook 5
Andy-Pham-72 / hadoop-mini-project
Using hadoop to utilize data from an automobile tracking platform that tracks the history of important incidents after the initial sale of a new vehicle.
hadoop virtualbox python hortonworks-hdp mapreduce-python
Language:Python 4
abhibalani / emr_lambda
Lambda to start EMR and run a map reduce job
aws aws-emr aws-emr-clusters aws-lambda hadoop-mapreduce mapreduce-python
Language:Python 3
krishnadey30 / NewsHeadlines
This repository have codes that extracts meaningful information from News headline data-set.
hadoop hadoop-mapreduce mapreduce-python news-dataset python
Language:Python 3
sreetamparida / Hiraishin
A REST-based service that translates the SQL query into MapReduce and Spark jobs. It runs these jobs and provides the JSON object. SQL to MapReduce and Spark translator.
mapreduce-python mapreduce spark hadoop-mapreduce hadoop-streaming pyspark python3 sql sqltomapreduce sqltospark
Language:Python 3
Talkative-Banana / MapReduce
A MapReduce framework implemented from scratch to perform K-mean clustering
consensus consensus-algorithm distributed-systems mapreduce mapreduce-algorithm mapreduce-python
Language:Python 3
anshsarkar / Big-Data-Assignments-UE18CS322
A repository containing the source codes for the assignments done as a part of the Big Data course (UE18CS322) at PES University.
big-data spark spark-streaming hadoop analysis mapreduce-python
Language:Python 2
BinetaDiop007 / FullStackBigData-with-SPARK
Pulled 10GB ofYelp Business data through the terminal via Kaggle API. The data was then pushed to and AWS S3 Bucket bucket for storage and analyzed on a Elastic MapReduce Cluster on a Jupyter Notebook using PySpark
cluster pyspark spark mapreduce-python jupyter-notebook elasticsearch-cluster aws-s3 bucket
Language:Jupyter Notebook 2
kkoless / MapReduce
Hadoop MapReduce Python
hadoop hadoop-mapreduce mapreduce-python python3
Language:Python 2
nikhitmago / frequent-itemset-association
Market basket analysis of finding frequent itemsets using SON algorithm in Spark
apriori-son data-mining mapreduce-python python spark
Language:Python 2
PrudhviVajja / DistributedMapReduce
MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real-world tasks are expressible in this model.
gcp inverted-index mapreduce-python memcached-server wordcount
Language:Python 2
Samuele95 / mapyreduce
Lightweight and extensible library to execute MapReduce-like jobs in Python
builder-pattern command joshua-bloch map mapreduce mapreduce-algorithm mapreduce-jobs mapreduce-python multiprocess multiprocessing parallel-computing python python3 reduce
Language:Python 2
yoongoing / bigdata_pyspark
⚡️공개용 맵리듀스 플랫폼인 Spark를 사용하여 데이터마이닝을 해보자⚡️
bigdata dataminig pyspark jupyter-notebook spark mapreduce-python mapreduce
Language:Jupyter Notebook 2
aaqib-ahmed-nazir / Naive_Search_Engine
This repository aims to develop a basic search engine utilizing Hadoop's MapReduce framework to index and process extensive text corpora efficiently. The dataset used for this project is a subset of the English Wikipedia dump, totaling 5.2 GB in size. The project focuses on implementing a naive search algorithm to address challenges in information.
apache-hadoop hadoop jupyter-notebook mapreduce python search-algorithm search-engine jupiter-notebook mapreduce-python python3
Language:Jupyter Notebook 1
aghezzafmohamed / MapReduce-with-PySpark
MapReduce with PySpark
mapreduce-python pyspark
Language:Jupyter Notebook 1
ahmadsalimi / dist_mr
A distributed map-reduce implemented by Python 3 and gRPC
distributed-computing grpc mapreduce mapreduce-python
Language:Python 1
anshul1004 / MutualFriends
Implementation of Hadoop and Spark
big-data big-data-analytics hadoop hadoop-cluster hadoop-hdfs hadoop-mapreduce mapreduce mapreduce-java mapreduce-python mutual-friends pyspark pyspark-dataframe-format pyspark-python social-media social-media-analysis social-media-mining spark spark-dataframes spark-sql yelp-dataset
Language:Java 1
arminZolfaghari / docker-hadoop
Apache Hadoop docker image | Running Python MapReduce
docker-hadoop hadoop hadoop-hdfs hadoop-mapreduce mapreduce-python
Language:Shell 1
aryanGupta-09 / Kmeans-using-MapReduce
K-means clustering algorithm using MapReduce.
distributed-systems grpc grpc-python k-means k-means-algorithm k-means-clustering k-means-implementation k-means-implementation-in-python kmeans kmeans-algorithm kmeans-clustering kmeans-clustering-algorithm map-reduce mapreduce mapreduce-algorithm mapreduce-python protobuf-python protobuf3 protocol-buffers remote-communication
Language:Python 1
besunny95 / HADOOP-BIGDATA
These are the various programs which i used for my hadoop projects.
hdfs hive hume mapreduce-java mapreduce-python python spark-sql pig-latin
Language:Jupyter Notebook 1
fbaldi6 / PageRank-Spark
Implementation of the MapReduce PageRank algorithm using the Spark framework both in Python and in Java (developed for Cloud Computing course)
mapreduce-java mapreduce-python spark pagerank
Language:Java 1
hanashah-01 / Docker-Hadoop-With-Python-Mapreduce
Modified from big-data-europe/docker-hadoop
docker hadoop mapreduce-python
Language:Python 1
HarshitDawar55 / MapReduce
Programs for MapReduce written in java with least complexity!
java java-8 mapreduce mapreduce-java mapreduce-python hadoop-mapreduce hadoop
Language:Java 1
MapReduce
Longannn / MapReduce
YouTube data analysis with comparison between big data tools (Apache Hadoop) and conventional python.
big-data hadoop mapreduce-python
Language:Jupyter Notebook 1
MaimoonaKhilji / MapReduce-Presentation
Mapreduce Presentation
big-data mapreduce mapreduce-algorithm mapreduce-python
1
manursanchez / desarrollosMRJob
Desarrollos en Python de patrones MapReduce, que no han sido incluidos en el TFG final.
mapreduce python mapreduce-python mapreduce-designpatterns mapreduce-demo
Language:Jupyter Notebook 1
mixaisealx / DevOps-n-DataOps
Hands-on project demos covering infrastructure automation (Ansible, Docker), big-data processing & streaming (Hive, Spark, Kafka), and network experiments (MitM, TCP-over-UDP).
ansible dataops devops docker docker-compose hive hiveql networking spark spark-sql spark-streaming big-data-processing hbase mapreduce mapreduce-python mitm-attack reliable-udp hadoop-mapreduce tmux kafka-producer
Language:Python 1
python-supply / map-reduce-and-multiprocessing
Multiprocessing can be an effective way to speed up a time-consuming workflow via parallelization. This article illustrates how multiprocessing can be utilized in a concise way when implementing MapReduce-like workflows.
python python-articles python-introduction python-multiprocessing multiprocessing mapreduce-python map-reduce parallel-python
Language:Jupyter Notebook 1
r-i-c-h-a / MapReduce-based-Mini-HIVE
A Hadoop based Map-Reduce based SQL engine
hadoop-mapreduce hdfs hive python mapreduce-python mapper reducer sql mini-sql-engine
Language:Python 1
Raphael-Jin / EDFS
Emulation-based System for Distributed File storage and Parallel Computation
distributed-computing distributed-systems mapreduce-python servrless
Language:Python 1
Roon311 / WDC-PageRank-Hadoop-MapReduce
Performing Map reduce to get the page rank on the WDC data.
hadoop mapreduce-python
Language:Python 1
skotak2 / Pasrsing-Text-with-MapReduce-programming-Paradigm-with-multithreading
Understand how map reduce works for parsing a text data with parallel processing of sub tasks using multi threading
mapreduce-python multithreading big-data textdata
Language:Python 1
TheVinh-Ha-1710 / Big-Data-Pipeline-Design
This project builds a data pipeline implementing the ETL process.
big-data etl-pipeline json mapreduce-python mongodb-database
Language:Python 1

mapreduce-python

mahmoudparsian / big-data-mapreduce-course

CLDXiang / Mining-Frequent-Pattern-from-Search-History

NbnbZero / Recommendation-System

SinghHarshita / Clustering-Algorithms-Spark

Andy-Pham-72 / hadoop-mini-project

abhibalani / emr_lambda

krishnadey30 / NewsHeadlines

sreetamparida / Hiraishin

Talkative-Banana / MapReduce

anshsarkar / Big-Data-Assignments-UE18CS322

BinetaDiop007 / FullStackBigData-with-SPARK

kkoless / MapReduce

nikhitmago / frequent-itemset-association

PrudhviVajja / DistributedMapReduce

Samuele95 / mapyreduce

yoongoing / bigdata_pyspark

aaqib-ahmed-nazir / Naive_Search_Engine

aghezzafmohamed / MapReduce-with-PySpark

ahmadsalimi / dist_mr

anshul1004 / MutualFriends

arminZolfaghari / docker-hadoop

aryanGupta-09 / Kmeans-using-MapReduce

besunny95 / HADOOP-BIGDATA

fbaldi6 / PageRank-Spark

hanashah-01 / Docker-Hadoop-With-Python-Mapreduce

HarshitDawar55 / MapReduce

Longannn / MapReduce

MaimoonaKhilji / MapReduce-Presentation

manursanchez / desarrollosMRJob

mixaisealx / DevOps-n-DataOps

python-supply / map-reduce-and-multiprocessing

r-i-c-h-a / MapReduce-based-Mini-HIVE

Raphael-Jin / EDFS

Roon311 / WDC-PageRank-Hadoop-MapReduce

skotak2 / Pasrsing-Text-with-MapReduce-programming-Paradigm-with-multithreading

TheVinh-Ha-1710 / Big-Data-Pipeline-Design