GEizaguirre / seercloud

A shuffle manager for serverless data analytics

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Seer

DOI

Seer is serverless data analytics framework with dynamic optimization of data exchange steps. It is built on Lithops, a multi-cloud distributed computing framework, over cloud functions and blob object storage.

Documentation and execution instructions are available at Documentation.

Programatical API

import yaml

from seercloud.scheduler import Job
from seercloud.operation import Scan, Exchange, Sort

job = Job ( num_stages = 2, lithops_config = yaml.load(open("config.yaml", "rb")))
job.add(stage = 0, op = Scan, file ="terasort_1GB.csv", bucket ="seer-data")
job.add( stage = 0, op = Exchange )
job.add( stage = 1, op = Sort, key = "0" )
job.dependency ( parent = 0, child = 1)
job.run()

Acknowledgements

image

This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 825184.

About

A shuffle manager for serverless data analytics

License:Apache License 2.0


Languages

Language:Python 98.3%Language:Cython 1.7%