gorpipe / gor-spark

Relational query engine that unites SparkSQL and GORpipe into a single declarative query framework.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

gor-spark

Spark enabled GOR

GOR scalable through the Spark engine (https://spark.apache.org)

Checkout and build SparkGOR

git clone git@github.com:gorpipe/gor-spark.git
cd gor-spark
./gradlew clean installDist

Usage

Now you can use SparkSQL from within GOR

spark/build/install/gor-scripts/bin/gorpipe "select * from genes.gor limit 10"
spark/build/install/gor-scripts/bin/gorpipe "create xxx = select * from <(select * from genes.gor) where Gene_Symbol like 'B%'; gor [xxx] | top 10"

SDK usage

Scala demo: gorspark.scala

spark-shell --packages org.gorpipe:gor-spark:3.10.2 --exclude-packages "org.apache.logging.log4j:log4j-core,org.apache.logging.log4j:log4j-api" -I gorspark.scala

Python demo: gorspark.py

pyspark --packages org.gorpipe:gor-spark:3.10.2 --exclude-packages "org.apache.logging.log4j:log4j-core,org.apache.logging.log4j:log4j-api" -I gorspark.py

About

Relational query engine that unites SparkSQL and GORpipe into a single declarative query framework.

License:GNU Affero General Public License v3.0


Languages

Language:Java 77.8%Language:Scala 16.9%Language:Shell 2.5%Language:Python 1.9%Language:Groovy 0.5%Language:Makefile 0.3%Language:Dockerfile 0.0%