There are 288 repositories under big-data topic.
The Patterns of Scalable, Reliable, and Performant Large-Scale Systems
ClickHouse® is a free analytics DBMS for big data
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
PredictionIO, a machine learning server for developers and ML engineers.
大数据入门指南 :star:
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
The Metadata Platform for the Modern Data Stack
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Seamless multi-master syncing database with an intuitive HTTP/JSON API, designed for reliability
Arkime (formerly Moloch) is an open source, large scale, full packet capturing, indexing, and database system.
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs for Scala, Java, Rust, Ruby, and Python.
Apache Doris is an MPP-based interactive SQL data warehousing for reporting and analysis.
Stream Framework is a Python library, which allows you to build news feed, activity streams and notification systems using Cassandra and/or Redis. The authors of Stream-Framework also provide a cloud service for feed technology:
The open big data serving engine. https://vespa.ai
⚡️A vue component support big amount data list with high render performance and efficient.
CrateDB is a distributed SQL database that makes it simple to store and analyze massive amounts of machine data in real-time.
Koalas: pandas API on Apache Spark