There are 16 repositories under mapreduce topic.
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Redisson - Easy Redis Java client and Real-Time Data Platform. Valkey compatible. Sync/Async/RxJava/Reactive API. Over 50 Redis based Java objects and services: Set, Multimap, SortedSet, Map, List, Queue, Deque, Semaphore, Lock, AtomicLong, Map Reduce, Bloom filter, Spring Cache, Tomcat, Scheduler, JCache API, Hibernate, RPC, local cache ...
大数据入门指南 :star:
:dart: :star2:[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。
MapReduce, Spark, Java, and Scala for Data Algorithms Book
distributed_computing include mapreduce kvstore etc.
t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark
Uniffle is a high performance, general purpose Remote Shuffle Service.
Dynamic execution framework for your Redis data
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster.
Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.
🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉
Firestorm is a Remote Shuffle Service, and provides the capability for Apache Spark and Apache Hadoop MapReduce applications to store shuffle data on remote servers
An easy-to-use Map Reduce Go parallel-computing framework inspired by 2021 6.824 lab1. It supports multiple workers threads on a single machine and multiple processes on a single machine right now.
:zap: 6.824: Distributed Systems (Spring 2017). A course which present abstractions and implementation techniques for engineering distributed systems.
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
A in-process MapReduce library to help you optimizing service response time or concurrent task processing.
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
Hadoop, MapReduce Distributed Crawling of Data Information from All Chinese Universities.
DTail is a distributed DevOps tool for tailing, grepping, catting logs and other text files on many remote machines at once.
Example MapReduce jobs in Java, Hive, Pig, and Hadoop Streaming that work on Avro data.
Tangseng search engine including full text search and vector search base on golang. 基于go语言的搜索引擎,信息检索系统
Teaching Materials for Distributed Statistical Computing (大数据分布式计算教学材料)