multiprocessio / dsq

Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Filtering on nested object data

Shogan opened this issue · comments

Maybe I simply missed this in the documentation and its really straightforward, but is there a way to filter on nested data (maybe some kind of expansion) of objects in tables?

For example, I have the following representative data from parquet:

[{"PARGO_PREFIX___violations":{},"Role":"master","Foo":{"Id":"bar1"},"Created":"\ufffd-\ufffd\u0019\ufffd\u001a\u0000\u0000\ufffd\ufffd%\u0000","Version":1}
,
{"PARGO_PREFIX___violations":{},"Role":"master","Foo":{"Id":"bar2"},"Created":"\ufffd\ufffd\ufffd\u0011\ufffd\u001a\u0000\u0000\ufffd\ufffd%\u0000","Version":1}
,
{"PARGO_PREFIX___violations":{},"Role":"master","Foo":{"Id":"bar3"},"Created":"\ufffd6ګ\ufffd\u001a\u0000\u0000\ufffd\ufffd%\u0000","Version":1}
]

I can use dsq to select everything with Version == 1, but how would I select all items where Foo.Id == "bar2" for example?

I've tried what I thought would be the most logical:

dsq ~/Downloads/part-00030.snappy.parquet "SELECT * FROM {} WHERE Foo.Id == 'bar2'"

But I get a response back:

no such column: Foo.Id

Hey thanks for the question! It's noted in the README that nested fields are ignored right now. But my goal is definitely to support filtering on nested fields. I'll keep this issue up to date when support comes in.

image

Hey thanks for the question! It's noted in the README that nested fields are ignored right now. But my goal is definitely to support filtering on nested fields. I'll keep this issue up to date when support comes in.

image

Thanks for that @eatonphil 👍

This is now supported in version 0.2.0, just released (the binaries will be uploaded shortly). See the README for details. Thanks!