There are 16 repositories under mapreduce topic.
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Redisson - Easy Redis Java client and Real-Time Data Platform. Valkey compatible. Sync/Async/RxJava/Reactive API. Over 50 Redis based Java objects and services: Set, Multimap, SortedSet, Map, List, Queue, Deque, Semaphore, Lock, AtomicLong, Map Reduce, Bloom filter, Spring Cache, Tomcat, Scheduler, JCache API, Hibernate, RPC, local cache ...
大数据入门指南 :star:
:dart: :star2:[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。
MapReduce, Spark, Java, and Scala for Data Algorithms Book
distributed_computing include mapreduce kvstore etc.
t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark
Uniffle is a high performance, general purpose Remote Shuffle Service.
Dynamic execution framework for your Redis data
Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.
🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉
:zap: 6.824: Distributed Systems (Spring 2017). A course which present abstractions and implementation techniques for engineering distributed systems.
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
Hadoop, MapReduce Distributed Crawling of Data Information from All Chinese Universities.
Example MapReduce jobs in Java, Hive, Pig, and Hadoop Streaming that work on Avro data.
Tangseng search engine including full text search and vector search base on golang. 基于go语言的搜索引擎,信息检索系统
Teaching Materials for Distributed Statistical Computing (大数据分布式计算教学材料)