There are 11 repositories under mapreduce topic.
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Redisson - Redis Java client with features of In-Memory Data Grid. Over 50 Redis based Java objects and services: Set, Multimap, SortedSet, Map, List, Queue, Deque, Semaphore, Lock, AtomicLong, Map Reduce, Publish / Subscribe, Bloom filter, Spring Cache, Tomcat, Scheduler, JCache API, Hibernate, MyBatis, RPC, local cache ...
Enterprise job scheduling middleware with distributed computing ability.
Python clone of Spark, a MapReduce alike framework in Python
MapReduce, Spark, Java, and Scala for Data Algorithms Book
C# and F# language binding and extensions to Apache Spark
distributed_computing include mapreduce kvstore etc.
🐎 A serverless MapReduce framework written for AWS Lambda
An open source framework for building data analytic applications.
A serverless cluster computing system for the Go programming language
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster.
t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark
Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.
Dynamic execution framework for your Redis data
Firestorm is a Remote Shuffle Service, and provides the capability for Apache Spark and Apache Hadoop MapReduce applications to store shuffle data on remote servers
:zap: 6.824: Distributed Systems (Spring 2017). A course which present abstractions and implementation techniques for engineering distributed systems.
An easy-to-use Map Reduce Go parallel-computing framework inspired by 2021 6.824 lab1. It supports multiple workers threads on a single machine and multiple processes on a single machine right now.
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
Example MapReduce jobs in Java, Hive, Pig, and Hadoop Streaming that work on Avro data.
DTail is a distributed DevOps tool for tailing, grepping, catting logs and other text files on many remote machines at once.
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
A in-process MapReduce library to help you optimizing service response time or concurrent task processing.
Big Data for Data Engineers Coursera Specialization from Yandex
A Simple and Efficient Distributed Multidimensional BI Analysis Engine.
Teaching Materials for Distributed Statistical Computing (大数据分布式计算教学材料)
A light-weight distributed stream computing framework for Golang
a repository for my curriculum project