flacomenoide / scio

A Scala API for Google Cloud Dataflow

Home Page:http://spotify.github.io/scio

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Scio

Build Status codecov.io GitHub license Maven Central

Ecclesiastical Latin IPA: /ˈʃi.o/, [ˈʃiː.o], [ˈʃi.i̯o]

Verb: I can, know, understand, have knowledge.

Scio is a Scala API for Google Cloud Dataflow inspired by Spark and Scalding. See the current API documentation for more information.

Features

  • Scala API close to that of Spark and Scalding core APIs
  • Fully managed service*
  • Unified batch and streaming programming model*
  • Integration with Google Cloud products: Cloud Storage, BigQuery, Pub/Sub, Datastore, Bigtable*
  • HDFS source/sink
  • Interactive mode with Scio REPL
  • Type safe BigQuery
  • Integration with Algebird and Breeze
  • Pipeline orchestration with Scala Futures
  • Distributed cache

* provided by Google Cloud Dataflow

Quick Start

The ubiquitous word count example can be run directly with SBT in local mode, using README.md as input.

sbt "project scio-examples" "run-main com.spotify.scio.examples.WordCount --input=README.md --output=wc"
cat wc/part-00000-of-00001.txt

Documentation

Artifacts

Scio includes the following artifacts:

  • scio-core: core library
  • scio-test: test utilities, add to your project as a "test" dependency
  • scio-bigquery: Add-on for BigQuery, included in scio-core but can also be used standalone
  • scio-bigtable: Add-on for Bigtable
  • scio-extra: Extra utilities for working with collections, Breeze, etc.
  • scio-hdfs: Add-on for HDFS

License

Copyright 2016 Spotify AB.

Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0

About

A Scala API for Google Cloud Dataflow

http://spotify.github.io/scio

License:Apache License 2.0


Languages

Language:Scala 66.7%Language:Java 32.4%Language:Python 0.8%Language:Shell 0.1%