spark

There are 409 repositories under spark topic.

apache / spark
Apache Spark - A unified analytics engine for large-scale data processing
big-data java jdbc python r scala spark sql
Language:Scala 38318
donnemartin / data-science-ipython-notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
python machine-learning deep-learning data-science big-data aws tensorflow theano caffe scikit-learn kaggle spark mapreduce hadoop matplotlib pandas numpy scipy keras
Language:Python 26450
getredash / redash
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
redash python visualization analytics bi redshift bigquery athena mysql postgresql dashboard javascript business-intelligence databricks spark spark-sql hacktoberfest
Language:Python 24920
yeasy / docker_practice
Learn and understand Docker&Container technologies, with real DevOps practice!
docker book cloud-computing container kubernetes swarm mesos spark devops linux
Language:Go 24171
DataTalksClub / data-engineering-zoomcamp
Free Data Engineering course!
data-engineering dbt docker kafka prefect spark
Language:Jupyter Notebook 22395
heibaiying / BigData-Notes
大数据入门指南 :star:
hadoop hdfs yarn mapreduce hive spark storm hbase scala kafka zookeeper flume azkaban sqoop phoenix bigdata big-data
Language:Java 15268
GaiZhenbiao / ChuanhuChatGPT
GUI for ChatGPT API and many LLMs. Supports agents, file-based QA, GPT finetuning and query with web search. All with a neat UI.
chatbot chatglm chatgpt-api claude dalle3 ernie gemini gemma inspurai llama midjourney minimax moss ollama qwen spark stablelm
Language:Python 14684
flink-learning
zhisheng17 / flink-learning
flink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例，还有 Flink 落地应用的大型项目案例（PVUV、日志存储、百亿数据实时去重、监控告警）分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》
flink kafka elasticsearch spark redis mysql rocketmq hbase rabbitmq stream-processing streaming clickhouse loki influxdb opentsdb
Language:Java 14247
horovod / horovod
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
tensorflow uber machine-learning machinelearning mpi baidu deep-learning deeplearning keras pytorch mxnet spark ray
Language:Python 13943
aalansehaiyang / technology-talk
【大厂面试专栏】一份Java程序员需要的技术指南，这里有面试题、系统架构、职场锦囊、主流中间件等，让你成为更牛的自己！
java spring springboot dubbo kafka git hbase mycat spark es6
13923
FavioVazquez / ds-cheatsheets
List of Data Science Cheatsheets to rule the world
datascience python r spark programming jupyter cheatsheet
13556
deeplearning4j / deeplearning4j
Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation.
artificial-intelligence clojure deeplearning deeplearning4j dl4j gpu hadoop intellij java linear-algebra matrix-library neural-nets python scala spark
Language:Java 13422
apache / doris
Apache Doris is an easy-to-use, high performance and unified analytics database.
olap database hadoop hive hudi iceberg real-time sql bigquery dbt delta-lake elt etl lakehouse query-engine redshift snowflake spark
Language:Java 11314
wangzhiwubigdata / God-Of-BigData
专注大数据学习面试，大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...
flink spark hadoop hdfs hive hbase kafka zookeeper bigdata flume azkaban
9268
mage-ai / mage-ai
🧙 Build, run, and manage data pipelines for integrating and transforming data.
artificial-intelligence data data-engineering data-integration data-pipelines data-science dbt elt etl machine-learning orchestration pipeline pipelines python reverse-etl spark sql transformation
Language:Python 6980
delta-io / delta
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
spark acid big-data analytics delta-lake
Language:Scala 6874
h2oai / h2o-3
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
h2o machine-learning data-science deep-learning big-data ensemble-learning gbm random-forest naive-bayes pca opensource distributed java python r hadoop spark gpu automl h2o-automl
Language:Jupyter Notebook 6722
Angel-ML / angel
A Flexible and Powerful Parameter Server for large-scale machine learning
machine-learning parameter-server spark scala model high-dimensional online-learning spark-streaming
Language:Java 6706
Alluxio / alluxio
Alluxio, data orchestration for analytics and machine learning in the cloud
alluxio memory-speed hadoop spark presto tensorflow data-analysis data-orchestration virtual-distributed-filesystem
Language:Java 6631
risingwave
risingwavelabs / risingwave
Cloud-native SQL stream processing, analytics, and management. KsqlDB and Apache Flink alternative. 🚀 10x more productive. 🚀 10x more cost-efficient.
database stream-processing cloud-native sql distributed-database rust serverless postgresql real-time postgres postgresql-database flink kafka analytics big-data spark-streaming ksqldb spark materialized-view data-engineering
Language:Rust 6281
apache / zeppelin
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
big-data database flink java javascript nosql scala spark zeppelin
Language:Java 6261
donnemartin / dev-setup
macOS development environment setup: Easy-to-understand instructions with automated setup scripts for developer tools like Vim, Sublime Text, Bash, iTerm, Python data analysis, Spark, Hadoop MapReduce, AWS, Heroku, JavaScript web development, Android development, common data stores, and dev-based OS X defaults.
macos mac vim sublime-text bash iterm2 python spark aws cloud android-development cli git mysql postgresql mongodb redis elasticsearch nodejs linux
Language:Python 6052
tobymao / sqlglot
Python SQL Parser and Transpiler
transpiler sql python parser optimizer bigquery duckdb hive mysql postgres presto snowflake spark sqlite sqlparser trino tsql clickhouse redshift databricks
Language:Python 5436
SynapseML
microsoft / SynapseML
Simple and Distributed Machine Learning
spark pyspark azure scala microsoft ml machine-learning databricks cognitive-services lightgbm http model-deployment deep-learning ai apache-spark data-science synapse big-data onnx opencv
Language:Scala 4966
PipelineAI / pipeline
PipelineAI
machine-learning artificial-intelligence tensorflow kubernetes cassandra spark kafka airflow docker redis neural-network gpu pipelineai tfx keras kubeflow pytorch scikit-learn
Language:Jsonnet 4161
yahoo / TensorFlowOnSpark
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
tensorflow spark yahoo machine-learning cluster featured python scala
Language:Python 3864
Cyb3rWard0g / HELK
The Hunting ELK
hunting elasticsearch kibana logstash hunting-platforms elk elk-stack elastic docker jupyter-notebook threat-hunting spark dockerhub
Language:Jupyter Notebook 3693
spark-nlp
JohnSnowLabs / spark-nlp
State of the Art Natural Language Processing
nlp natural-language-processing spark pyspark named-entity-recognition sentiment-analysis lemmatizer spell-checker entity-extraction part-of-speech-tagger bert transformers albert tensorflow language-detection machine-translation text-classification language-model llm question-answering
Language:Scala 3671
lw-lin / CoolplaySpark
酷玩 Spark: Spark 源代码解析、Spark 类库等
spark spark-streaming structured-streaming sparkcore apache-spark
Language:Scala 3447
RoaringBitmap / RoaringBitmap
A better compressed bitset in Java: used by Apache Spark, Netflix Atlas, Apache Pinot, Tablesaw, and many others
java bitset roaring-bitmaps roaringbitmap druid spark lucene
Language:Java 3377
liyupi / sql-generator
🔨 用 JSON 来生成结构化的 SQL 语句，基于 Vue3 + TypeScript + Vite + Ant Design + MonacoEditor 实现，项目简单（重逻辑轻页面）、适合练手~
bigdata hive javascript json spark sql mysql typescript vite vue ant-design monaco-editor vue3
Language:Vue 3375
databricks / koalas
Koalas: pandas API on Apache Spark
big-data data-science dataframe mlflow pandas pydata spark
Language:Python 3319
apache / linkis
Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications and the underlying data engines.
sql spark hive pyspark livy linkis engine storage resource-manager application-manager context-service scriptis udf hive-table rest-api jobserver thrift-server jdbc presto impala
Language:Java 3227
spark-notebook / spark-notebook
Interactive and Reactive Data Science using Scala and Spark.
apache-spark notebook scala data-science spark reactive
Language:JavaScript 3146
awslabs / deequ
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
dataquality spark unit-testing scala
Language:Scala 3119
WeBankFinTech / DataSphereStudio
DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
workflow governance azkaban davinci linkis spark hive hadoop visualis zeppelin hue supperset tableau dataworks griffin kettle airflow flink dolphinscheduler atlas
Language:Java 2942

spark

apache / spark

donnemartin / data-science-ipython-notebooks

getredash / redash

yeasy / docker_practice

DataTalksClub / data-engineering-zoomcamp

heibaiying / BigData-Notes

GaiZhenbiao / ChuanhuChatGPT

zhisheng17 / flink-learning

horovod / horovod

aalansehaiyang / technology-talk

FavioVazquez / ds-cheatsheets

deeplearning4j / deeplearning4j

apache / doris

wangzhiwubigdata / God-Of-BigData

mage-ai / mage-ai

delta-io / delta

h2oai / h2o-3

Angel-ML / angel

Alluxio / alluxio

risingwavelabs / risingwave

apache / zeppelin

donnemartin / dev-setup

tobymao / sqlglot

microsoft / SynapseML

PipelineAI / pipeline

yahoo / TensorFlowOnSpark

Cyb3rWard0g / HELK

JohnSnowLabs / spark-nlp

lw-lin / CoolplaySpark

RoaringBitmap / RoaringBitmap

liyupi / sql-generator

databricks / koalas

apache / linkis

spark-notebook / spark-notebook

awslabs / deequ

WeBankFinTech / DataSphereStudio

Related Topics