denx20 / Hoo-Ray

A Haskell Auto-Parallelizer for Distributed Computing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Hoo-Ray

Made in Haskell License

Meet Hoo-Ray: a distributed execution engine for Haskell, written in Haskell. The naming of Hoo-Ray came from our initial draft that this system would be similar to Ray, but in the end they are nothing alike.

Hoo-Ray uses an abstract syntax tree parser for parsing the data dependencies in the main function of a given Haskell program, and a greedy scheduler for scheduling all computations that have their dependencies met.

This project was done as part of the course CS 512: Distributed Systems at Duke University in Spring 2023.

Installation

  • Install the Haskell toolchain, preferably via GHCup.
  • stack build inside this directory

Executable Modules

dependency-graph

Generates a dependency graph (in text) from input file and outputs to stdout. e.g. stack exec build_graph test/matmul_ss_test.hs

We also have a Python script using to visualize the graph. To run it,

# Follow the instructions for installing graphviz via
# https://pygraphviz.github.io/documentation/stable/install.html
# For example, my command is
brew install graphviz

# Install Python dependencies
pip install -r python-scripts/requirements.txt

Then redirect the output of the build_graph via e.g. stack exec build_graph test/matmul_ss_test.hs > graph.out. Finally, modify the corresponding variables in the top of python-scripts/fetch_graph.py and run python python-scripts/fetch_graph.py to get a png of the currently parsed computation graph.

Output of stack exec dependency-graph Tests/pure.hs > graph.out and python fetch_graph.py as an example:

matmul_test_gen

Generates single-threaded (Tests/matmul_ss_test.hs), multi-threaded (Tests/matmul_ms_test.hs), and queue.hs compatible (Tests/matmul_coarse_test.hs, Tests/matmul_fine_test.hs) test files (when called with the -t flag) with matrix multiplication operations. For a detailed look at the configuration options, run

cabal run matmul_test_gen -- -h

To run the resulting multi-threaded test file, pass in the runtime flags like so:

cabal run matmul_ms_test -- +RTS -N

To time the execution of any program, use the time command (e.g. time cabal run matmul_ms_test -- +RTS -N).

queue

Runs the Hoo-Ray algorithm. More specifically, it generates the dependency graph for input test program in master and dispatches jobs to be remotely executed on slave workers.

The dependency graph is first reversed such that (A depends on B) => (there is an edge from B to A). Then the following two procedures are done concurrently on the master node

  • assignJobs: Finds all the nodes with indegree 0 (computations that have all dependencies met) and assigns them to the slaves. When a job is dispatched, the master puts this node in a visited list such that this node is not sent again.
  • processReply: Listens for response from the worker nodes. When a computation has sent back its result, the indegree of all the nodes that depend on this computation is decremented.

To start workers on the local server, run

# Start a single worker
stack exec queue slave 127.0.0.1 {port}

# Or start a number of them
./start.sh {number_of_workers}

Then, to start the master, run

stack exec queue master 127.0.0.1 8000 {path/to/file/to/parallize.hs}

Benchmarking

The tests are run via one of

# Single thread
time stack exec {matmul_ss_test/mlp_example/transformer_example} 
# Distributed parallel
python benchmark_servers.py {num_workers}
# Shared memory parallel (we only wrote SMP for matmul_test)
time stack exec -- matmul_ms_test +RTS -N

To kill all the workers, rather than having to finding each process via a process manager, you can use the handy script ./kill_all_workers.sh.

Troubleshooting

Testing UDP multicast

The master and worker servers rely on UDP multicast provided by distributed-process-simplelocalnet to communicate. You can verify that UDP multicast is configured properly on your network by

# Terminal window 1
cd udp-test
make
./sender 239.255.255.250 1900

# Terminal window 2
cd udp-test
./listener 239.255.255.250 1900

If UDP multicast is configured properly1, you should see Hello, World! printed on the terminal with listener running.

Footnotes

  1. In the particular case of misconfiguration that happened to me while running a Linux machine on Duke's network, I had to execute sudo vim /etc/resolv.conf and change the nameserver line to nameserver 152.3.72.100. Might not apply to you, but could be of interest to consult.

About

A Haskell Auto-Parallelizer for Distributed Computing

License:MIT License


Languages

Language:Haskell 50.4%Language:Jupyter Notebook 35.0%Language:C 7.9%Language:Python 5.9%Language:Shell 0.7%Language:Makefile 0.1%