abs-tudelft / time-to-fly-high

Benchmarking Arrow Flight - A wire-speed protocol for data transfer, querying and microservices

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Benchmarking Arrow Flight

For client-server (local and remote) performance, we used Arrow Flight Benchmark, the Python script is placed here.

Cartesius local node example:

singularity exec  /scratch-shared/tahmad/bio_data/flight.simg /arrow/cpp/release/release/arrow-flight-benchmark --server_host tcn541

singularity exec  /scratch-shared/tahmad/bio_data/flight.simg /arrow/cpp/release/release/arrow-flight-perf-server --server_host tcn541

For querying NYC Taxi dataset on remote Dremio (client-server) nodes with varying number of records(1-16 millions). Different protocols like ODBC and turbodbc and Arrow Flight implementation is available here.

Starting Dremio:

./dremio-community-15.0.0-202103312106020527-0be9c719/bin/dremio start

For querying NYC Taxi dataset with varying number of records (0.1-16 millions) through remote DataFusion client-server Flight connection, we used DataFusion Flight updated client-server implementation.

Commands for creating Arrow Flight based singularity container:

sudo singularity build -w flight.simg flight.def
sudo singularity shell -w flight.simg
> mkdir /arrow
> cp -r arrow/cpp/release/release /arrow

About

Benchmarking Arrow Flight - A wire-speed protocol for data transfer, querying and microservices

License:Apache License 2.0


Languages

Language:Python 85.8%Language:Rust 12.0%Language:Shell 2.2%