Telamon

Telamon is a framework to find good combinations of optimization for computational kernels on GPUs. It currently focuses on dense linear algebra. For more information on how it works internally, we encourage you to read our paper.

Getting Started

To compile Telamon, you need rust 1.27 or higher installed. If you want to generate code for GPU, you will also need a CUDA toolchain installed, with cuda, curand and cupti accessible in the include and library paths. You can also view the documentation on github.

Examples of kernels are located in the kernels/ directory. In particular, kernels/src/linalg.rs contains linear algebra kernels. You can compare the code generated by Telamon to the state of the art implementation on GPUs by running

cargo bench --features=cuda --bench=cuda-search

in the kernel/ directory. To see the progress of the exploration, append RUST_LOG=telamon::explorer=warn to the command.

Writing a Kernel

To write a kernel, you must first define the inputs of the kernel and the context we optimize for. Here, we assume we optimize for a Cuda GPU, but the process is the same for other backends.

use telamon::device::cuda;
use telamon::helper;

let _ = env_logger::init(); // Enable logging
let executor = cuda::Executor::init(); // Setup the interface with the device.
// Build the signature and bind the inputs in the context.
let mut context = cuda::Context::new(&executor);
let array_a;
let signature = {
    let mut sig_builder = helper::SignatureBuilder::new("my_kernel", &mut context);
    // Create a signature with two arguments: a scalar `m` and an array of floats. We
    // give the value we want to optimize for to each argument.
    sig_builder.scalar("n", 1000i32);
    array = builder.array::<f32>("a", 1000); // Creates an array of size 1000.

    sig_builder.get()
};

We can now describe the body of the kernel itself. Here we create a kernel that computes x[i] = 2*i for each i in 0..n For that we use a builder that creates the loops and the instructions for us. The builder keeps the list of open loops and nest new instructions in them.

let mut builder = helper::Builder::new(&signature, context.device());

// Open a loop of size n.
let size = builder.param_size("n");
let dim0 = builder.open_dim(size);
// Compute `x = 2*i` where `i` is the index on loop `dim0`.
let x = builder.mul(&dim0, &2i32);
// Store `x` in `a[i]. For that, we first compute the address of `a[i]` and build a
// that describes the access patern for the performance model.
let (addr, access_pattern) = builder.tensor_access(&"a", a, &ir::Type::I(32), &[&dim0]);
builder.st(&addr, x, access_pattern);

// Close the loop.
builder.close_dim(&dim0);

let search_space = builder.get();

We are now ready to start the search space exploration to find the best candidate.

use telamon::explorer;

let best = explorer::find_best(explorer::config::read(), &context, search_space).unwrap();
context.device().gen_code(&best, &mut std::io::stdout());

License

Telamon is released under the Apache Licence (version 2.0). See LICENSE for more details.

adjivas / telamon

Telamon

Getting Started

Writing a Kernel

License

About

Languages