There are 8 repositories under sparksql topic.
Geo Spatial Data Analytics on Spark
Scala examples for learning to use Spark
Process Common Crawl data with Python and Spark
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Sparkline BI Accelerator provides fast ad-hoc query capability over Logical Cubes. This has been folded into our SNAP Platform(http://bit.ly/2oBJSpP) an Integrated BI platform on Apache Spark.
Geospatial Raster support for Spark DataFrames
Quill for Scala 3
Spring-Shiro-Spark是Spring-Boot Hibernate Spark Spark-SQL Shiro iView VueJs... ...的集成尝试
PySpark functions and utilities with examples. Assists ETL process of data modeling
type-class based data cleansing library for Apache Spark SQL
A JupyterLab extension providing, SQL formatter, auto-completion, syntax highlighting, Spark SQL and Trino
Bulletproof Apache Spark jobs with fast root cause analysis of failures.
Google Spreadsheets datasource for SparkSQL and DataFrames
已经合入(apache/incubator-kyuubi) ACL Management for Apache Spark SQL with Apache Ranger.
全套大数据基础学习教程,包含最基础的centos、maven。大数据主要包含hdfs、mr、yarn、hbase、kafka、scala、sparkcore、sparkstreaming、sparksql。教程包含所有的源代码演示以及在线文档说明。
This repository contains Spark, MLlib, PySpark and Dataframes projects
PostgreSQL and GreenPlum Data Source for Apache Spark
Cloud-based SQL engine using SPARK where data is accessible as JDBC/ODBC data source via Spark ThriftServer.
Hive-JDBC-Proxy是一个高性能的HiveServer2和Spark ThriftServer的代理服务,具备负载均衡、基于规则转发Hive JDBC Client的请求给到HiveServer2和Spark ThriftServer的能力。
Spark 2.x 案例操作:Scala版本与 Java1.8lambda版代码示例。涵盖Spark核心技术操作SparkCore、SparkSql、SparkStreaming。同时提供了Spark高级性能优化、序列化、广播变量、数据倾斜、算子优化、JVM优化、troubleshooting、数据倾斜解决方案。是多年来根据工作积累整理出来!
A library for querying Druid data sources with Apache Spark
Spark DataFrames for earth observation data
spark streaming从kafka读取消息,offset写入Redis,spark计算单词出现频率,最后写入hive表
A SparkSQL formatter based on https://github.com/zeroturnaround/sql-formatter, with customizations and extra features.
Analyzing the safety (311) dataset published by Azure Open Datasets for Chicago, Boston and New York City using SparkR, SParkSQL, Azure Databricks, visualization using ggplot2 and leaflet. Focus is on descriptive analytics, visualization, clustering, time series forecasting and anomaly detection.