A polyglot framework for Factorized ML.
This repo contains code artifacts for Towards A Polyglot Framework for Factorized ML. For more information about the Morpheus line of projects, please see the project page here.
The easy way to try it Trinity is to use our Cloudlab image. In there, you'll find a copy of all the source code and the project fully built and ready to use.
To use our Cloudlab image, instantiate this profile and then head straight to the "Running the benchmarks" section.
cd
into /mydata/trinity/graal/
.
You should see the following directory structure:
root@node:~$ cd /mydata/trinity/
root@node:/mydata/trinity$ ls
deps fastr graal graaljs graalpython mx mygraal set_enviroment_vars.sh trinity-benchmarking
root@node:/mydata/trinity$
Prepare the enviroment by performing a source set_enviroment_vars.sh
Next, cd
into /mydata/trinity/graal/trinity/trinity-benchmarking/
.
You should now see the following directory structure:
root@node:/mydata/trinity/graal/trinity/trinity-benchmarking$ ls
fastr_benchmarking_suite graalJS_benchmarking_suite graalpython_benchmarking_suite
These directories contain the benchmarking code for running the FastR
(for R), graalJS
(JS) and graalpython
(Python) experiments in the Trinity paper.
cd
into /mydata/trinity/graal/trinity/trinity-benchmarking/graalJS_benchmarking_suite
To run the JS experiments, run python runEval.py
.
If everything goes well, you should see a trace like the following:
root@node:/mydata/trinity/graal/trinity/trinity-benchmarking/graalJS_benchmarking_suite# python runEval.py
RUNNING: TR= 1 FR= 1
mx --dy /compiler,js/graal-nodejs --cp-sfx ../../mxbuild/dists/jdk1.8/morpheusdsl.jar --jdk jvmci node --polyglot morpheus.js ./benchparams/synthesized_logRegJS.json logisticRegression 5000 results_javascript trinity T 1 1
Downloading ICU4J from ['https://repo1.maven.org/maven2/com/ibm/icu/icu4j/67.1/icu4j-67.1.jar', 'https://search.maven.org/remotecontent?filepath=com/ibm/icu/icu4j/67.1/icu4j-67.1.jar']
13106771 bytes (100%)
Beginning benchmarking loop
Create Context for MorpheusDSL
logisticRegression TR= 1 FR= 1
iteration: 0 / 25 | current timeDiff 113.773967771
iteration: 1 / 25 | current timeDiff 74.433360308
iteration: 2 / 25 | current timeDiff 62.463385001
iteration: 3 / 25 | current timeDiff 52.48378036
iteration: 4 / 25 | current timeDiff 50.722579041
iteration: 5 / 25 | current timeDiff 62.261986307
iteration: 6 / 25 | current timeDiff 59.890999968
This is running Logistic Regression on a synthetic dataset for 25 iteration, first for a Trinity Normalized Matrix, and then for "the materialized approach".
After running the experiments, you can find the duration of each training loop, in order and in seconds, in the results_javascript
directory.
cd
into /mydata/trinity/graal/trinity/trinity-benchmarking/fastr_benchmarking
To run the Python experiments, run python runPyEval.py
If everything goes well, you should see a trace like the following:
root@node:/mydata/trinity/graal/trinity/trinity-benchmarking/fastr_benchmarking_suite# vim runPyAlgorithms.py
root@node:/mydata/trinity/graal/trinity/trinity-benchmarking/fastr_benchmarking_suite# python runPyAlgorithms.py
RUNNING:
mx --dynamicimports /compiler,graalpython,fastr --cp-sfx ../../mxbuild/dists/jdk1.8/morpheusdsl.jar:../../../../fastr/mxbuild/dists/jdk1.8/fastr.jar --J @-Xmx220G --jdk jvmci python --polyglot ../graalpython_benchmarking_suite/benchmarkRunner.py --fpath ./benchparams/synthesized_py.json --task linearRegression --numWarmups 10000 --mode trinity --monolang False --outputDir results_python --TR 10 --FR 1
['linearRegression']
in method for ‘getNumRows’ with signature ‘"MatrixLibAdapter"’: no definition for class “MatrixLibAdapter”
in method for ‘getNumCols’ with signature ‘"MatrixLibAdapter"’: no definition for class “MatrixLibAdapter”
Creating a new generic function for ‘transpose’ in the global environment
in method for ‘^’ with signature ‘"MatrixLibAdapter","ANY"’: no definition for class “MatrixLibAdapter”
in method for ‘^’ with signature ‘"ANY","MatrixLibAdapter"’: no definition for class “MatrixLibAdapter”
in method for ‘%*%’ with signature ‘"MatrixLibAdapter","ANY"’: no definition for class “MatrixLibAdapter”
in method for ‘%*%’ with signature ‘"ANY","MatrixLibAdapter"’: no definition for class “MatrixLibAdapter”
in method for ‘*’ with signature ‘"MatrixLibAdapter","ANY"’: no definition for class “MatrixLibAdapter”
in method for ‘*’ with signature ‘"ANY","MatrixLibAdapter"’: no definition for class “MatrixLibAdapter”
in method for ‘divisionArr’ with signature ‘"MatrixLibAdapter","ANY"’: no definition for class “MatrixLibAdapter”
in method for ‘divisionArr’ with signature ‘"MatrixLibAdapter","MatrixLibAdapter"’: no definition for class “MatrixLibAdapter”
Create Context for MorpheusDSL
in method for ‘getNumRows’ with signature ‘"MatrixLibAdapter"’: no definition for class “MatrixLibAdapter”
in method for ‘getNumCols’ with signature ‘"MatrixLibAdapter"’: no definition for class “MatrixLibAdapter”
in method for ‘^’ with signature ‘"MatrixLibAdapter","ANY"’: no definition for class “MatrixLibAdapter”
in method for ‘^’ with signature ‘"ANY","MatrixLibAdapter"’: no definition for class “MatrixLibAdapter”
in method for ‘%*%’ with signature ‘"MatrixLibAdapter","ANY"’: no definition for class “MatrixLibAdapter”
in method for ‘%*%’ with signature ‘"ANY","MatrixLibAdapter"’: no definition for class “MatrixLibAdapter”
in method for ‘*’ with signature ‘"MatrixLibAdapter","ANY"’: no definition for class “MatrixLibAdapter”
in method for ‘*’ with signature ‘"ANY","MatrixLibAdapter"’: no definition for class “MatrixLibAdapter”
in method for ‘divisionArr’ with signature ‘"MatrixLibAdapter","ANY"’: no definition for class “MatrixLibAdapter”
in method for ‘divisionArr’ with signature ‘"MatrixLibAdapter","MatrixLibAdapter"’: no definition for class “MatrixLibAdapter”
Warning message:
In numRowsK * numColsK : NAs produced by integer overflow
__________
Beginning benchmark loop
iteration: 0 /25 | time total: 73550
iteration: 1 /25 | time total: 57386
iteration: 2 /25 | time total: 53405
iteration: 3 /25 | time total: 52116
iteration: 4 /25 | time total: 53315
You can ignore those warnings, they're the result of loading only a portion of the R codebase.
After running the experiments, you can find the duration of each training loop, in order and in milliseconds, in the ./graalpython_benchmarking/results_python
directory.
cd
into /mydata/trinity/graal/trinity/trinity-benchmarking/fastr_benchmarking
To run the R experiments, run python runREval.py
If everything goes well, you should see a trace like the following:
root@node:/mydata/trinity/graal/trinity/trinity-benchmarking/fastr_benchmarking_suite# python runAlgos.py
mx --dynamicimports fastr,/compiler --cp-sfx ../../mxbuild/dists/jdk1.8/morpheusdsl.jar --J @'-Xmx220G' --jdk jvmci R --polyglot -f benchmarkRunner.r --args -fpath ./benchparams/movie_metadata.json -task linearRegression -outputDir results_R -mode trinity -TR 1 -FR 1
R version 3.6.1 (FastR)
Copyright (c) 2013-19, Oracle and/or its affiliates
Copyright (c) 1995-2018, The R Core Team
Copyright (c) 2018 The R Foundation for Statistical Computing
Copyright (c) 2012-4 Purdue University
Copyright (c) 1997-2002, Makoto Matsumoto and Takuji Nishimura
All rights reserved.
FastR is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
... a large R source-code trace ...
Create Context for MorpheusDSL
[1] "Execution mode: trinity Experiment: linearRegression TR-FR: ( 1 , 1 )"
[1] "Testing performance"
[1] "WARMING: 1/5"
[1] "WARMING: 2/5"
[1] "WARMING: 3/5"
After running the experiments, you can find the duration of each training loop, in order and in milliseconds in the results_R
directory.
If you wish to build Trinity from scratch, you'll need the following dependencies:
- Ubuntu 16.04.1 LTS (GNU/Linux 4.4.0-176-generic x86_64) for the OS. This isn't strictly required, but this is the OS used to develop the paper.
- OpenJDK with JVMCI release version 20.2-b01: openjdk1.8.0_252-jvmci-20.2-b01
- A clone of mx at commit
13a4df76efdd150b8fdb7b492f5e4dadc5c5383f
. - An installation of GNU-R, as it helps ease the dependency burden when building FastR.
- For building FastR: A Fortran compiler and libraries. Typically gfortran 4.8 or later
- For building FastR: The pcre package, version 8.38 or later
- For building FastR: The zlib package, version 1.2.8 or later
- For building FastR: The ed, sed, and make utilities (usually available on modern *nix systems)
- Create a target directory where the source code will live. From now on, we'll refer to it as the reproducer directory.
- Clone our
fork
of GraalVM in the reproducer directory. - Clone this repo inside the
./graal/trinity/
directory. - Clone the mx repo at commit
13a4df76efdd150b8fdb7b492f5e4dadc5c5383f
in the trinity directory. - Clone the fastR directory at commit
369741e3972688cd782a7e57ddb5b23257f07315
in the reproducer directory. - Clone the graalJS directory at commit
4ff10d928938583a9562189058762165643cc2fb
in the reproducer directory. - Clone the graalpython directory at commit
ff3d6a887f54eaff3927049f45963d588ba932b6
in the reproducer directory.
- In the reproducer directory, run
mkdir -p deps/uncompressed/
- Run
cd deps/uncompressed
to get to the directory you just created. There, unpack your OpenJDK installation. You should now have a directory namedopenjdk1.8.0_252-jvmci-20.2-b01
insidedeps/uncompressed/
. - Head back to the reproducer directory.
- Copy the
set_enviroment_vars.sh
from this repo,trinity
, by runningcp set_enviroment_vars.sh .
- Run
source set_enviroment_vars.sh
. This will load the JDK and mx into yourPATH
. It will also set yourJAVA_HOME
environment variable. - Head to the
graal
directory viacd graal
.
- In the
graal
directory, you'll need to build several components usingmx
. So head tograal/compiler
,graal/sdk
,graal/sulong/
,graal/tools
,graal/truffle
andgraal/vm
, and inside each of them runmx build
. - Head back to the reproducer directory. Then
cd
tofastr
and runmx build
in there as well. If this step fails, you might be missing some system dependency.Review these docs for help. - Head back to the reproducer directory. Then
cd
intograaljs
. There, runmx build
insidegraal-nodejs
andgraal-js
.
- Install
numpy
viamx --dy graalpython python -m ginstall install numpy
- To install
math.js
,cd
into/mydata/trinity/graal/trinity/trinity-benchmarking/graalJS_benchmarking_suite
and runnpm install mathjs
- To install
data.table
andMatrix
forR
,cd
into/mydata/trinity/graal/trinity/trinity-benchmarking/fastr_benchmarking_suite
and runmx --dy fastR R
. Then, in the shell, runinstall.fastr.packages(c("Matrix", "data.table"))