douban / dpark

Python clone of Spark, a MapReduce alike framework in Python

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DPark

pypi status ci status Join the chat at https://gitter.im/douban/dpark

DPark is a Python clone of Spark, MapReduce(R) alike computing framework supporting iterative computation.

Installation

Example

for word counting (wc.py):

This script can run locally or on a Mesos cluster without any modification, just using different command-line arguments:

See examples/ for more use cases.

Configuration ------------

DPark can run with Mesos 0.9 or higher.

If a $MESOS_MASTER environment variable is set, you can use a shortcut and run DPark with Mesos just by typing

$MESOS_MASTER can be any scheme of Mesos master, such as

In order to speed up shuffling, you should deploy Nginx at port 5055 for accessing data in DPARK_WORK_DIR (default is /tmp/dpark), such as:

UI

2 DAGs:

  1. stage graph: stage is a running unit, contain a set of task, each run same ops for a split of rdd.
  2. use api callsite graph

UI when running ~~~~~~~~~~~~~~

Just open the url from log like start listening on Web UI http://server_01:40812 .

UI after running

  1. before run, config LOGHUB & LOGHUB_PATH_FORMAT in dpark.conf, pre-create LOGHUB_DIR.
  2. get log hubdir from log like logging/prof to LOGHUB_DIR/2018/09/27/16/b2e3349b-9858-4153-b491-80699c757485-8754, which in clude mesos framework id.
  3. run dpark_web.py -p 9999 -l LOGHUB_DIR/2018/09/27/16/b2e3349b-9858-4153-b491-80699c757485-8728/, dpark_web.py is in tools/

UI examples for features ~~~~~~~

show sharing shuffle map output

image

combine nodes iff with same lineage, form a logic tree inside stage, then each node contain a PIPELINE of rdds.

image

More docs (in Chinese)

https://dpark.readthedocs.io/zh_CN/latest/

https://github.com/jackfengji/test\_pro/wiki

Mailing list: dpark-users@googlegroups.com (http://groups.google.com/group/dpark-users)

About

Python clone of Spark, a MapReduce alike framework in Python

License:BSD 3-Clause "New" or "Revised" License


Languages

Language:Python 92.7%Language:JavaScript 3.5%Language:C 1.7%Language:HTML 1.3%Language:CSS 0.4%Language:Shell 0.3%Language:Dockerfile 0.2%