Programmability Evaluation Repo Instructions

Download your key file here: <KEYFILE.pem>
Access your virtual machine by running

ssh -i <KEYFILE.pem> ubuntu@<SYSTEM_IP>

Notes:

VMs may take a couple of minutes to start
If you get a warning about permissions, you may have to run chmod 600 <KEYFILE.pem>
If you're using Windows, you may have to convert the key to .ppk

Once you log into your VM, run

sh startEval.sh

to enter a Docker container that contains the system documented below. __If you forget to do this, you will run into lots of

Navigate to /data/test/{task} according to the task you've been assigned.

The /data/test/{task} directory contains a number of files:

├── README.md ──────── documentation
├── docs ───────────── more documentation (figures, etc)
│
│── run.sh ─────────── script showing how code should be run
├── prep.py ────────── code for dataset preprocessing
│                      Note: The data is already preprocessed, so _you do not need to run this script_
│
├── data ───────────── data (already preprocessed)
│
│── main-redacted.py ─ implementation skeleton
│── logs ───────────── verbose logs from a reference implementation to aid testers.
└── validate.py ────── validation script

You should start by reading the README.md and any associated materials.

The testers' task is to write a functional main.py that passes the correctness check in validate.py. main-redacted.py is an implementation skeleton that illustrates how IO should work and specifies all of the necessary parameters. You do not have to use any code from main-redacted.py in your implementation, but you should make sure that results are written in the exact same format and location, or the correctness check in validate.py will fail. main-redacted.py is intended to rough out one way that a correct solution could be implemented. Note that some of the functions referenced in main-redacted.py may not be available in the performer system you're testing.

You do not have to run prep.py -- preprocessed data can be found in the data directory. We've included prep.py to provide insight into the format of the preprocessed data, which may or may not be helpful.

There are verbose logs from the reference implementation contained in the logs directory. These are included as a debugging aid for testers. Some workflows (lgc and ipnsw) run the same function on multiple inputs -- in this case, the logs only contain output for the first input. For neural network workflows, we just show a trace of the first training iteration.

Note that the validate.py scripts require python>=3.6 and may require some common packages (numpy and scipy). This should not be very sensitive to versions, but the versions that the tests were written with are shown in install.sh.

Implement the algorithm! As shown in run.sh, use validate.py to make sure that your results are correct.
Once you have a correct implementation, try to improve the performance of your implementation using the profiling tools provided with your specific system.
Important When you are done:

your must write your results to the location specified in main-redacted.py and they must pass the correctness check in validate.py or your submission will be considered incomplete.
The specific system you're testing has some way to measure/estimate the performance of your code. You must save this output to a file called /data/test/{task}/profiling_results.txt or your submission will be considered incomplete.

josephchenhk / prog-eval

Programmability Evaluation Repo Instructions

About

Languages