bakjos / arrow-udf

Arrow User-Defined Function Framework.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Arrow User-Defined Functions Framework

Easily create and run user-defined functions (UDF) on Apache Arrow. You can define functions in Rust, Python or JavaScript, run natively or on WebAssembly.

Language Native WebAssembly
Rust arrow-udf arrow-udf-wasm
Python arrow-udf-python N/A
JavaScript arrow-udf-js N/A

Usage

You can integrate this library into your Rust project to quickly define and use custom functions.

Add the following lines to your Cargo.toml:

[dependencies]
arrow-udf = "0.2"

Define your functions with the #[function] macro:

use arrow_udf::function;

#[function("gcd(int, int) -> int", output = "eval_gcd")]
fn gcd(mut a: i32, mut b: i32) -> i32 {
    while b != 0 {
        (a, b) = (b, a % b);
    }
    a
}

The macro will generate a function that takes a RecordBatch as input and returns a RecordBatch as output. The function can be named with the optional output parameter. If not specified, it will be named arbitrarily like gcd_int4_int4_int4_eval.

You can then call the generated function on a RecordBatch:

let input: RecordBatch = ...;
let output: RecordBatch = eval_gcd(&input).unwrap();

If you print the input and output batch, it will be like this:

 input     output
+----+----+-----+
| a  | b  | gcd |
+----+----+-----+
| 15 | 25 | 5   |
|    | 1  |     |
+----+----+-----+

See arrow-udf for more details.

Benchmarks

We have benchmarked the performance of function calls in different environments. You can run the benchmarks with the following command:

cargo bench --bench wasm

Performance comparison of calling gcd on a chunk of 1024 rows:

gcd/native          1.4476 µs   x1
gcd/wasm            16.006 µs   x11
gcd/js              82.103 µs   x57
gcd/python          122.52 µs   x85

About

Arrow User-Defined Function Framework.

License:Apache License 2.0


Languages

Language:Rust 83.4%Language:JavaScript 16.4%Language:TypeScript 0.1%