multiprocessio / dsq

Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Performance ideas

eatonphil opened this issue · comments

Catchall for now for potential improvements to datastation/dsq.

  • SQL pre-processing
    • Import only used fields (see #71)
    • Do pre-filtering of data in SQLiteWriter, only insert things that match the WHERE clause
  • Support more input types using SQLiteWriter, basically requires supporting expanded nested objects in (see notes in #67 )
  • Maybe Handle jsonl in parallel since newlines must not be within individual JSON lines
  • Get rid of map[string]any inside datastation
    • At the very least put WriteRecord into the ResultWriter interface so SQLiteWriter can avoid map[string]any which it converts from anyway
  • CSV parser improvements
  • Add benchmarks for every file format, not just CSV. Basically every file format needs to be worked on individually