There are 350 repositories under big-data topic.
The Patterns of Scalable, Reliable, and Performant Large-Scale Systems
ClickHouse® is a free analytics DBMS for big data
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
大数据入门指南 :star:
PredictionIO, a machine learning server for developers and ML engineers.
A distributed, fast open-source graph database featuring horizontal scalability and high availability
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Seamless multi-master syncing database with an intuitive HTTP/JSON API, designed for reliability
Arkime (formerly Moloch) is an open source, large scale, full packet capturing, indexing, and database system.
Open-source distributed computation and storage platform. Real-time Stream Processing Unconference. Save Your Spot https://hazelcast.com/lp/unconference/
Stream Framework is a Python library, which allows you to build news feed, activity streams and notification systems using Cassandra and/or Redis. The authors of Stream-Framework also provide a cloud service for feed technology:
StarRocks is a next-gen sub-second MPP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics and ad-hoc query.
The open big data serving engine. https://vespa.ai
⚡️A vue component support big amount data list with high render performance and efficient.
Apache Arrow DataFusion SQL Query Engine