koven2049 / native_spark

A new arguably faster implementation of Apache Spark from scratch in Rust

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

native_spark

Join the chat at https://gitter.im/fast_spark/community Build Status License

Documentation

A new, arguably faster, implementation of Apache Spark from scratch in Rust. WIP

Framework tested only on Linux, requires nightly Rust. Read how to get started in the documentation.

ToDo

  • Error Handling(Priority)
  • Fault tolerance

RDD

Most of these except file reader and writer are trivial to implement

  • map
  • flat_map
  • filter
  • group_by
  • reduce_by
  • distinct
  • count
  • take_sample
  • union
  • glom
  • cartesian
  • pipe
  • map_partitions
  • for_each
  • collect
  • reduce
  • fold
  • aggregate
  • take
  • first
  • sample
  • zip
  • save_as_text_file (can save only as text file in executors local file system)

Config Files

  • Replace hard coded values

About

A new arguably faster implementation of Apache Spark from scratch in Rust

License:Apache License 2.0


Languages

Language:Rust 98.7%Language:Shell 0.7%Language:Dockerfile 0.6%Language:Cap'n Proto 0.0%