cube2222 / octosql

OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases and file formats using SQL.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Relaxed mode?

chapmanjacobd opened this issue · comments

I really like how fast octosql is and it seems like a great tool to bring unstructured data into a strict hierarchical, columnar format.

Unfortunately, wrangling unstructured data usually requires some flexibility and interpretation. I have two main issues so far:

  1. Selecting variables which are in some files but not others:

    ⭐ It would be really nice to have a --relaxed flag where unknown variables are silently replaced with null

unzstd --memory=2048MB --stdout RS_2008-02.zst | octosql "SELECT overridden_url, valid_column FROM stdin.json" -o csv

Error: typecheck error: unknown variable: 'overridden_url'

Ideally, that would:

  • with octosql --relaxed turn into "SELECT null, valid_column FROM stdin.json"
  • with normal, strict mode be the same error
  1. No way to cast booleans to int?
octosql "SELECT cast(false as int)"
Error: typecheck error: typecast of non-union type expression 'Boolean'

octosql "SELECT int(false)"
Error: typecheck error: unknown function: int(Boolean)

For example, in sqlite:

sqlite-utils memory -t 'select cast(false as int) as hello'
  hello
-------
      0

also, what's going on with len() ?

octosql "SELECT * FROM docs.function_signatures"
...
| 'len'            | []                       | 'NULL'        | true   | false            |
| 'len'            | []                       | 'NULL'        | true   | false            |
| 'len'            | []                       | 'NULL'        | true   | false            |
| 'len'            | ['String']               | 'Int'         | true   | true             |
...

octosql "SELECT len((0,1,2,3))"
+-------+
| col_0 |
+-------+
|     4 |
+-------+

seems to work fine...

octosql -v
octosql version dev

also, you might also want to store/print the git commit during compile time or something, for example:

'octosql version dev' + git describe --match=NeVeRmAtCh --always --abbrev=8 --dirty && git status --short

Hey!

So, multiple things you mentioned:

relaxed flag

The use case makes sense, I'll think about it.

No way to cast booleans to int

Good point, happy to add this.

what's going on with len()

When the simple_signature is false, then the arg types / return types have no meaning. But it's a good point that this way it's completely unreadable right now what's supported and what isn't. I'll see if I can add "synthetic" signatures there.

len works with strings, lists, tuples, and objects right now.

No way to cast booleans to int

This is now added in cc43f25 and released in 0.11.1.