candle
Candle is a minimalist ML framework for Rust with a focus on performance (including GPU support) and ease of use. Try our online demos: whisper, llama2.
let a = Tensor::randn(0f32, 1., (2, 3), &Device::Cpu)?;
let b = Tensor::randn(0f32, 1., (3, 4), &Device::Cpu)?;
let c = a.matmul(&b)?;
println!("{c}");
Check out our examples
Check out our examples:
- Whisper: speech recognition model.
- Llama and Llama-v2: general LLM.
- Falcon: general LLM.
- Bert: useful for sentence embeddings.
- StarCoder: LLM specialized to code generation.
Run them using the following commands:
cargo run --example whisper --release
cargo run --example llama --release
cargo run --example falcon --release
cargo run --example bert --release
cargo run --example bigcode --release
In order to use CUDA add --features cuda
to the example command line.
There are also some wasm examples for whisper and
llama2.c. You can either build them with
trunk
or try them online:
whisper,
llama2.
For llama2, run the following command to retrieve the weight files and start a test server:
cd candle-wasm-examples/llama2-c
wget https://huggingface.co/spaces/lmz/candle-llama2/resolve/main/model.bin
wget https://huggingface.co/spaces/lmz/candle-llama2/resolve/main/tokenizer.json
trunk serve --release --public-url /candle-llama2/ --port 8081
And then head over to http://localhost:8081/candle-llama2.
Features
- Simple syntax, looks and feels like PyTorch.
- CPU and Cuda backends, m1, f16, bf16.
- Serverless (on CPU), small and fast deployments
- WASM support, run your models in a browser.
- Model training.
- Distributed computing using NCCL.
- Model support out of the box: Llama, Whisper, Falcon, StarCoder...
- Embed user-defined ops/kernels, such as flash-attention v2.
How to use
Cheatsheet:
Using PyTorch | Using Candle | |
---|---|---|
Creation | torch.Tensor([[1, 2], [3, 4]]) |
Tensor::new(&[[1f32, 2.], [3., 4.]], &Device::Cpu)? |
Creation | torch.zeros((2, 2)) |
Tensor::zeros((2, 2), DType::F32, &Device::Cpu)? |
Indexing | tensor[:, :4] |
tensor.i((.., ..4))? |
Operations | tensor.view((2, 2)) |
tensor.reshape((2, 2))? |
Operations | a.matmul(b) |
a.matmul(&b)? |
Arithmetic | a + b |
&a + &b |
Device | tensor.to(device="cuda") |
tensor.to_device(&Device::Cuda(0))? |
Dtype | tensor.to(dtype=torch.float16) |
tensor.to_dtype(&DType::F16)? |
Saving | torch.save({"A": A}, "model.bin") |
candle::safetensors::save(&HashMap::from([("A", A)]), "model.safetensors")? |
Loading | weights = torch.load("model.bin") |
candle::safetensors::load("model.safetensors", &device) |
Structure
- candle-core: Core ops, devices, and
Tensor
struct definition - candle-nn: Tools to build real models
- candle-examples: Examples of using the library in realistic settings
- candle-kernels: CUDA custom kernels
- candle-datasets: Datasets and data loaders.
- candle-transformers: transformers-related utilities.
- candle-flash-attn: Flash attention v2 layer.
FAQ
Why should I use Candle?
Candle's core goal is to make serverless inference possible. Full machine learning frameworks like PyTorch are very large, which makes creating instances on a cluster slow. Candle allows deployment of lightweight binaries.
Secondly, Candle lets you remove Python from production workloads. Python overhead can seriously hurt performance, and the GIL is a notorious source of headaches.
Finally, Rust is cool! A lot of the HF ecosystem already has Rust crates, like safetensors and tokenizers.
Other ML frameworks
-
dfdx is a formidable crate, with shapes being included in types. This prevents a lot of headaches by getting the compiler to complain about shape mismatches right off the bat. However, we found that some features still require nightly, and writing code can be a bit daunting for non rust experts.
We're leveraging and contributing to other core crates for the runtime so hopefully both crates can benefit from each other.
-
burn is a general crate that can leverage multiple backends so you can choose the best engine for your workload.
-
tch-rs Bindings to the torch library in Rust. Extremely versatile, but they bring in the entire torch library into the runtime. The main contributor of
tch-rs
is also involved in the development ofcandle
.
Missing symbols when compiling with the mkl feature.
If you get some missing symbols when compiling binaries/tests using the mkl features, e.g.:
= note: /usr/bin/ld: (....o): in function `blas::sgemm':
.../blas-0.22.0/src/lib.rs:1944: undefined reference to `sgemm_' collect2: error: ld returned 1 exit status
= note: some `extern` functions couldn't be found; some native libraries may need to be installed or have their path specified
= note: use the `-l` flag to specify native libraries to link
= note: use the `cargo:rustc-link-lib` directive to specify the native libraries to link with Cargo (see https://doc.rust-lang.org/cargo/reference/build-scripts.html#cargorustc-link-libkindname)
This is likely due to a missing linker flag that was needed to enable the mkl library. You can try adding the following at the top of your binary:
extern crate intel_mkl_src;
Tracking down errors
You can set RUST_BACKTRACE=1
to be provided with backtraces when a candle
error is generated.