dongfeiwww / dpark

Python clone of Spark, a MapReduce alike framework in Python

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool


travis-ci status

DPark is a Python clone of Spark, MapReduce(R) alike computing framework supporting iterative computation.

Example for word counting (

 import dpark
 file = dpark.textFile("/tmp/words.txt")
 words = file.flatMap(lambda x:x.split()).map(lambda x:(x,1))
 wc = words.reduceByKey(lambda x,y:x+y).collectAsMap()
 print wc

This script can run locally or on a Mesos cluster without any modification, just using different command-line arguments:

$ python
$ python -m process
$ python -m host[:port]

See examples/ for more use cases.

Some more docs (in Chinese):

DPark can run with Mesos 0.9 or higher.

If a $MESOS_MASTER environment variable is set, you can use a shortcut and run DPark with Mesos just by typing

$ python -m mesos

$MESOS_MASTER can be any scheme of Mesos master, such as

$ export MESOS_MASTER=zk://zk1:2181,zk2:2181,zk3:2181/mesos_master

In order to speed up shuffling, you should deploy Nginx at port 5055 for accessing data in DPARK_WORK_DIR (default is /tmp/dpark), such as:

        server {
                listen 5055;
                server_name localhost;
                root /tmp/dpark/;

Mailing list: (


Python clone of Spark, a MapReduce alike framework in Python

License:BSD 3-Clause "New" or "Revised" License