abhishek-ch / GeoSpark

A Cluster Computing System for Processing Large-Scale Spatial Data

Home Page:http://datasystemslab.github.io/GeoSpark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GeoSpark Logo

Stable Latest Source code
Maven Central with version prefix filter Sonatype Nexus (Snapshots) Build Status

GeoSpark@Twitter || GeoSpark Discussion Board || Join the chat at https://gitter.im/geospark-datasys/Lobby

GeoSpark is a cluster computing system for processing large-scale spatial data. GeoSpark extends Apache Spark / SparkSQL with a set of out-of-the-box Spatial Resilient Distributed Datasets (SRDDs)/ SpatialSQL that efficiently load, process, and analyze large-scale spatial data across machines.

GeoSpark contains several modules:

Name API Spark compatibility Introduction
Core RDD Spark 2.X/1.X SpatialRDDs and Query Operators.
SQL SQL/DataFrame SparkSQL 2.1+ SQL interfaces for GeoSpark core.
Viz RDD, SQL/DataFrame RDD - Spark 2.X/1.X, SQL - Spark 2.1+ Visualization for Spatial RDD and DataFrame.
Zeppelin Apache Zeppelin Spark 2.1+, Zeppelin 0.8.1+ GeoSpark plugin for Apache Zeppelin

GeoSpark supports several programming languages: Scala, Java, and R.

Please visit GeoSpark website for details and documentations.

News!

Impact

GeoSpark Downloads on Maven Central

GeoSpark ecosystem has around 8K - 10K downloads per month.

Research

GeoSpark development team has published many papers about GeoSpark. Please read Publications.

GeoSpark received an evaluation from PVLDB 2018 paper "How Good Are Modern Spatial Analytics Systems?" Varun Pandey, Andreas Kipf, Thomas Neumann, Alfons Kemper (Technical University of Munich), quoted as follows:

GeoSpark comes close to a complete spatial analytics system. It also exhibits the best performance in most cases.

About

A Cluster Computing System for Processing Large-Scale Spatial Data

http://datasystemslab.github.io/GeoSpark

License:Apache License 2.0


Languages

Language:Java 80.1%Language:Scala 19.4%Language:JavaScript 0.5%