There are 28 repositories under spark-sql topic.
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
电商用户行为分析大数据平台
The Internals of Spark SQL
New Generation Opensource Data Stack Demo
🐍 Quick reference guide to common patterns & functions in PySpark.
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Apache Spark™ and Scala Workshops
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!
MCW Big data analytics and visualization
A prototype project of big data platform, the source codes of the book Big Data Platform Architecture and Prototype
Spark Structured Streaming / Kafka / Cassandra / Elastic
An encrypted data analytics platform
Spark SQL 实现 ItemCF,UserCF,Swing,推荐系统,推荐算法,协同过滤
Apache Spark 3 - Structured Streaming Course Material
电影推荐系统、电影推荐引擎、使用Spark完成的电影推荐引擎
Spark Connector to read and write with Pulsar
Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi
A library for Spark DataFrame using MinIO Select API
Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
Apache Spark Course Material
spark全示例代码(java、scala) Spark most full instance code DEMO (java、scala)
A curated list of Pulsar tools, integrations and resources.
Apache Spark Connect Client for Rust
bring sf to spark in production