multiprocessio / dsq

Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ingest limited columns in basic SQL queries

eatonphil opened this issue · comments

There may not be an existing SQLite parser we can use from Go but for simple queries we can use a PostgreSQL parser, see here for a good one and example of use.

The way this would work is that it would attempt to parse the query. If it can parse the query and the query consists of only syntax that we support, return all fields in the query. Then we pass this list of fields to the SQLiteWriter. If this list is set in the SQLiteWriter then when we write fields to SQLite we only write the ones in this list.

For a first pass I'd suggest supporting:

  • SELECT x FROM {} WHERE y = 1 where this returns ['x', 'y']

Additional ones that won't be too bad:

  • SELECT COUNT(x) FROM {} WHERE y = 2 returns ['x', 'y']
  • SELECT x FROM {} GROUP BY z returns ['x', 'z']

Harder but reasonable examples:

  • SELECT a.x FROM {0} a JOIN {1} b ON a.id = b.json_id returns {'a': ['x', 'id'], 'b': ['json_id']}

Examples this must fail on (this is not a comprehensive list):

  • SELECT x, * FROM {} (because of the star operator
  • SELECT x FROM {0} JOIN {1} ON id=json_id (ambiguous where x, id, and json_id come from; also requires supporting different columns for different tables)

This could also be extended to support LIMIT x without an ORDER BY clause to have it ingest only x rows.

Also, this mode must be disabled when -C/--cache is on.

I wouldn't mind have a go at this - I've been doing some initial investigations with the pg_query_go library and I think I can get something together to cover some of the above cases.

Hey @mc-borscht there's a PR open for this #76 but I got stuck because pg_query_go doesn't build on windows.

If you want, you can pick up that PR and get it working. Although before merging it I wanted to have some benchmarks that show it's actually an improvement.

To deal with pg_query_go not building on windows we could either fix pg_query_go's build process or we could use compile flags in Go to make this feature ignored on Windows.