devyn / GeoSpark

A Cluster Computing System for Processing Large-Scale Spatial Data

Home Page:http://datasystemslab.github.io/GeoSpark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GeoSpark Logo

Stable Latest Source code
Maven Central with version prefix filter Sonatype Nexus (Snapshots) Build Status

GeoSpark@Twitter || GeoSpark Discussion Board || Join the chat at https://gitter.im/geospark-datasys/Lobby

GeoSpark is a cluster computing system for processing large-scale spatial data. GeoSpark extends Apache Spark / SparkSQL with a set of out-of-the-box Spatial Resilient Distributed Datasets (SRDDs)/ SpatialSQL that efficiently load, process, and analyze large-scale spatial data across machines.

GeoSpark contains several modules:

Name API Spark compatibility Introduction
Core RDD Spark 2.X/1.X SpatialRDDs and Query Operators.
SQL SQL/DataFrame SparkSQL 2.1+ SQL interfaces for GeoSpark core.
Viz RDD, SQL/DataFrame RDD - Spark 2.X/1.X, SQL - Spark 2.1+ Visualization for Spatial RDD and DataFrame.
Zeppelin Apache Zeppelin Spark 2.1+, Zeppelin 0.8.1+ GeoSpark plugin for Apache Zeppelin

GeoSpark supports several programming languages: Scala, Java, SQL, Python and R.

Please visit GeoSpark website for detailed documentations

News!

Orignial Contributors

  • (Mo)hamed Sarwat (Twitter: @MoSarwat)
  • Jia Yu

Impact

GeoSpark Downloads on Maven Central

GeoSpark ecosystem has around 10K downloads per month.

About

A Cluster Computing System for Processing Large-Scale Spatial Data

http://datasystemslab.github.io/GeoSpark

License:Apache License 2.0


Languages

Language:Java 56.4%Language:Python 21.4%Language:Scala 17.9%Language:Jupyter Notebook 4.0%Language:JavaScript 0.3%Language:Shell 0.0%