datafusion-contrib / datafusion-hdfs-native

Connecting DataFusion to HDFS based on libhdfs3

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DataFusion-hdfs-native

Connecting DataFusion to HDFS through Native HDFS client (libhdfs3).

Setup libhdfs3

  1. Install libhdfs3

You can either install it via Conda

conda install -c conda-forge libhdfs3

or build it from source

# A specific version that could be compiled on osx for HDFS of 2.6.x version
git clone https://github.com/ClickHouse-Extras/libhdfs3.git
cd libhdfs3
git checkout 24b058c356794ef6cc2d31323dc9adf0386652ff

# then build it
mkdir build && cd build
../bootstrap --prefix=/usr/local
make
make install

Configuration

# client conf to use, env LIBHDFS3_CONF or hdfs-client.xml in working directory
export LIBHDFS3_CONF=/path/to/libhdfs3-hdfs-client.xml

About

Connecting DataFusion to HDFS based on libhdfs3

License:Apache License 2.0


Languages

Language:Rust 93.6%Language:Dockerfile 3.9%Language:Shell 2.5%