incorrect result, glaredb's sql api is lazy

Question

incorrect result, glaredb's sql api is lazy

sundy-li opened this issue 10 months ago · comments

it just constructs a plan, we should use gdb.sql(query).execute() to apply the action.

https://github.com/GlareDB/glaredb/blob/main/bindings/python/src/connection.rs#L107-L130

Lorenzo Mangani · Answer 1 · Mon Nov 13 2023 17:17:10 GMT+0800 (China Standard Time)

Thanks @sundy-li I'm new to GlareDB and this really helps! The suggested format didn't work but it seems gdb.execute(query) does, could you check and confirm if this is acceptable?

Lorenzo Mangani · Answer 2 · Mon Nov 13 2023 18:03:25 GMT+0800 (China Standard Time)

After some testing, it seems only the .show() function produces realistic results and the .execute() function doe not appear to be supported any longer

Lorenzo Mangani · Answer 3 · Mon Nov 13 2023 18:14:59 GMT+0800 (China Standard Time)

Resolved! Thanks for your precious input @sundy-li and let me know if you have other suggestions 👍

sundyli · Answer 4 · Tue Nov 14 2023 08:44:10 GMT+0800 (China Standard Time)

let me know if you have other suggestions

Maybe you can try adding databend to this bench, it has similar API
to glaredb, https://github.com/datafuselabs/databend/blob/main/src/bendpy/README.md

Lorenzo Mangani · Answer 5 · Tue Nov 14 2023 08:46:16 GMT+0800 (China Standard Time)

I thought databend required a service. I'll definitely add it if it works embedded! Thanks for the suggestion

Lorenzo Mangani · Answer 6 · Tue Nov 14 2023 09:47:10 GMT+0800 (China Standard Time)

@sundy-li databend added, but I couldn't find a way to query a local parquet file. Feel free to send a PR

Lorenzo Mangani · Answer 7 · Wed Nov 15 2023 20:54:54 GMT+0800 (China Standard Time)

@sundy-li my apologies by mistake I deleted your comment instead of replying to it! 😢

read local parquet examples:

 select * from 'fs:///home/sundy/data_parquet/parquet-00001.snappy.parquet' limit 3;

Lorenzo Mangani · Answer 8 · Thu Nov 16 2023 01:20:21 GMT+0800 (China Standard Time)

I've played with the approach a little and indeed it seems to want full paths.... something like this works but I see no way of selecting multiple local files without adding them to the stage, which seems overly complex for this test.

>>> import os
>>> db.sql("SELECT COUNT(*) FROM 'fs://"+os.getcwd()+"/hits_0.parquet'").collect()