apache / datafusion-ballista

Apache DataFusion Ballista Distributed Query Engine

Home Page:https://datafusion.apache.org/ballista

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DecodeErrors using pyarrow flight connector

Maxsparrow opened this issue · comments

Describe the bug
Various errors occur when trying to get flight info with pyarrow Flight connector against a Ballista deployment using latest code.

Query1:

create external table sample stored as CSV with header row location '/mnt/sample.csv';

Error1 after calling get_flight_info:

ArrowInvalid: Flight returned invalid argument error, with message: DecodeError { description: "buffer underflow", stack: [("Any", "type_url")] }

Query2:

select 'Hello from Arrow Ballista!';

Error2:

ArrowInvalid: Flight returned invalid argument error, with message: DecodeError { description: "unexpected end group tag", stack: [] }

I also tried with the arrow-ballista-python repo, installing its latest code, and I'm unable to connect:

In [3]: ctx = ballista.BallistaContext(hostname, client_port)
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
Cell In[3], line 1
----> 1 ctx = ballista.BallistaContext(hostname, client_port)

Exception: Ballista error: DataFusionError(Execution("Status { code: Internal, message: \"Error parsing request\", metadata: MetadataMap { headers: {\"content-type\": \"application/grpc\", \"date\": \"Tue, 19 Dec 2023 21:27:04 GMT\", \"content-length\": \"0\"} }, source: None }"))

To Reproduce
Steps to reproduce the behavior:

  • Deploy Ballista scheduler and executors using latest code, built from the repo off commit 934b32f
  • Install latest pyarrow 14.0.2 in a Python 3.10 environment

Run against your service:

client = flight.FlightClient(f'grpc://{hostname}:{port}')
client.authenticate_basic_token("admin", "password")
query = "select 'Hello from Arrow Ballista!';"
descriptor = flight.FlightDescriptor.for_command(query)
info = client.get_flight_info(descriptor)
# Errors here

Expected behavior
No error and return flight info object.

Additional context
I deployed Ballista in Kubernetes, so it could still be a networking or setup issue. The Ballista scheduler and executor logs seem to suggest they started up correctly though, and there are no errors. The Ballista UI for my deployment also works, and the 'client.authenticate_basic_token' call works in Python, which suggests the server is running correctly and I can connect to it somehow.

I'm new to Rust and the whole DataFusion ecosystem, so I'm not aware if there's an easier way to test if my deployment is working. Any advice would be appreciated.