Current version is 6.0.2.
Go to the latest release for download links.
To install PBala, simply extract the contents of one of the attached downloads in the release page and copy the executables PBala
and PBala_task
to the directory where you want to work (you can ignore the include
directory).
Also, make sure you have these lines in your .bashrc
(for sh users):
export PVM_ROOT=/usr/lib/pvm3
export PVM_DPATH=$PVM_ROOT/lib/pvmd
or this in your .cshrc
(for csh users):
setenv PVM_ROOT /usr/lib/pvm3
See the examples directory the examples directory.
See the changelog.
This C program uses PVM libraries in order to create a parallellization interface for
- Maple scripts
- C programs
- Python scripts
- Pari/GP scripts (although Pari should be also supported through the Pari C/C++ library or through gp2c by manually editing the output C code and compiling as a C program using the command provided by gp2c. The executable can then be run using the C module of this software.)
- Sage scripts
- Octave scripts
This interface lets the user execute a same script/program over multiple input data in several CPUs located at the antz computing server. It sports memory management so nodes do not run out of RAM due to too many processes being started in the same node. It also reports resource usage data after execution.
- Parallel Virtual Machine (PVM).
- C compiler
- CMake (version 2.6 or higher)
- Clone this git repository:
git clone https://github.com/oscarsaleta/PBala.git
in a directory of your system, - Change dir into PBala:
cd PBala
, - Create a build directory:
mkdir build && cd build
, - Run the CMake command:
cmake ..
, - Compile the code with
make
, the executables PBala and PBala_task will be found in build/src. - Optionally, perform a system wide install:
sudo make install
, which will install PBala and PBala_task into /usr/local/bin and PBala_config.h into /usr/local/include (this requires root permissions).
Run doxigen
and take a look at html/index.html for documentation.
The program admits standard --help
(-?
), --usage
and --version
(-V
) arguments.
Output from ./PBala --help
:
Usage: PBala [OPTION...] programflag programfile datafile nodefile outdir
PBala -- PVM SPMD execution parallellizer.
programflag argument can be: 0 (Maple), 1 (C), 2 (Python), 3 (Pari), 4 (Sage),
or 5 (Octave)
-c, --custom-process=/path/to/exec
Specify a custom path for the executable program
-e, --create-errfiles Create stderr files
-g, --create-memfiles Create memory files
-h, --create-slavefile Create node file
-k, --kill Kill remainig PBala/PVM processes (WARNING: use at
own risk! Use only if something goes wrong during
an execution and PVM stops working and you have no
other important processes running)
-m, --max-mem-size=MAX_MEM Max memory size of a task (KB)
-s, --maple-single-core Force single core Maple
-?, --help Give this help list
--usage Give a short usage message
-V, --version Print program version
Mandatory or optional arguments to long options are also mandatory or optional
for any corresponding short options.
Report bugs to <osr@mat.uab.cat>.
Output from ./PBala --usage
:
Usage: PBala [-ceghs?V] [-m MAX_MEM] [--create-errfiles] [--create-memfiles]
[--create-slavefile] [--max-mem-size=MAX_MEM] [--maple-single-core]
[--help] [--usage] [--version]
programflag programfile datafile nodefile outdir
Mandatory arguments explained:
programflag
:- 0 = Maple
- 1 = C
- 2 = Python
- 3 = Pari
- 4 = Sage
- 5 = Octave
programfile
: path to program filedatafile
: path to data file- Line format is "tasknumber,arg1,arg2,...,argN"
nodefile
: path to PVM node file- Line format is "nodename number_of_processes"
outdir
: path to output directory
Options explained:
-c, --custom-process=/path/to/exec
: Specify a custom path for the Maple/Python/etc executable to use-e, --create-errfiles
: Save stderr output for each execution in a task_stderr.txt file-g, --create-memfiles
: Save memory info for each execution in a task_mem.txt file-h, --create-slavefile
: Save a log of which task is given to which slave in node_info.txt-m, --max-mem-size=MAX_MEM
: max amount of RAM (in KB) that a single execution can require- Remark: the default behaviour is not giving work to a slave unless more than 15% of the max memory is available
-s, --maple-single-core
: Force Maple to use a single core for its executions
- Computer Algebra Systems (CAS): Maple, PARI and Sage:
- For Maple, we define 2 variables:
taskId
andtaskArgs
. taskId is an identifier for the task number that we are sending to the Maple script.taskArgs
are the actual arguments that Maple has to use to do the computations. It is important to use these names because they are passed to Maple this way. - PARI and Sage are executed by creating auxiliary scripts where
taskId
andtaskArgs
are defined, so they could be directly used in the scripts just like in Maple.
- For Maple, we define 2 variables:
- Programming languages: C, Python:
- For C and Python we use the
argv
arrays so make sure the program can read and use those variables (and perform the error checking because this software has no way of knowing if the data file is suitable for your program).
- For C and Python we use the
- Octave:
- We define the
taskArgs
list as a cell array, so you need to use the proper indexing and accessors to use the stored values (see the Examples section for more information).
- We define the
There is no need of starting PVM using a hostfile (it is actually advised not to do so). Instead, execute the program and we will start PVM for you, using the information extracted from the nodefile.
- Maple: Maple uses multithreading to parallelize the executions by default. This is good for performance but bad for resource management, because the PVM task loses control of the processes spawned. Therefore, the output files task_mem*.txt don't show accurate values for resource usage of the program, because we can only track the parent Maple process, which doesn't do any work besides spawning and controlling its child processes.
- Workaround: we execute Maple with the
-t
flag, so in *_err.txt (output error file) we can see thekernelopts
line that reports memory usage and computation time directly from Maple.
- Workaround: we execute Maple with the
- C, Python, Pari and Sage: as long as the program to be executed is sequential, the PVM task will be able to get resource usage from the execution (using C system call
getrusage()
) and print it to the task_mem*.txt file.
If PBala aborts during execution you can check the return value in the command line to know which kind of error happened. You can also check the output file for more information.
Existing codes:
- PBala.c
- 10: error with command line arguments passed to the program
- 11: error when counting number of lines of nodefile
- 12: error when trying to open nodefile
- 13: error when trying to read nodefile
- 14: error when getting the current working directory
- 15: error when asking pvmd for my TID (task identifier)
- 16: error when asking pvmd for the task parent
- 17: error when counting number of lines of datafile
- 18: error when creating output file
- 19: error when spawning pvm tasks
- 20: error when reading first column of datafile (must be task id)
- 21: error when creating out_dir/node_info.txt (maybe out_dir does not exist?)
- 22: invalid task number
- 23: error when writing to a file (PARI/GP or Sage aux script)
- 24: error when sanitizing Maple script for single CPU execution
- 25: error when there is a duplicate PVM host and PBala fails to remove it
- PBala_task.c
- 10: error when forking process
- 11: task killed by system