multiprocessio / dsq

Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add support for dumping result as CSV (and eventually other formats)

infojunkie opened this issue · comments

I'm trying to put together a simple data migration pipeline that reads from Excel sheets into a PostgreSQL database, and I'd like to avoid writing new code if possible. My current thought is to use dsq to write the queries that transform Excel data into something ingestible, then hand the data over to pgloader for robust loading into Pg.

But it seems that dsq only outputs json results, whereas pgloader does not support json as input type - it does support csv and sqlite. My question is: does dsq support csv output? Or if I understood correctly that DataStation uses sqlite under the hood, can I get access to that database from dsq?

Or am I barking at the wrong tree altogether? Thanks for your advice!

Great questions! dsq uses an in-memory SQLite file by default. Maybe it would be good to be able to allow user-specified SQLite files.

However you can get it to use an actual file for SQLite today with the --cache/-C flag. Then you can run the same query again with the --cache-file/-D and it will give you the location of the file on disk (and then exit).

It's already annoying to me that dsq doesn't support dumping CSV though. And I've considered having it output various file types. So this would be a good feature request.

Thanks! I found the sqlite file as you described. Outputting to csv or to a specified sqlite file would be useful.

I will let you close this or keep it open as you prefer, but I have now something to work with 👍

One further question here: The sqlite that gets cached is the raw import, correct? So making any transformation via dsq does not get reflected on file.

If this is the case, then using your SQL queries (as opposed to writing those queries in sqlite) becomes less useful for purposes of migration, unless the final output can be imported elsewhere.

I haven't tested it but you should be able to do any valid SQLite like:

dsq 'CREATE TABLE x (whatever...); INSERT INTO x SELECT blah from DM_getPanel() WHERE blah'

And then you can export from there.

Does that work/make sense?

Does that work/make sense?

Not sure I'm doing it right, but

dsq "data/sheet.xlsx" 'CREATE TABLE x (noc text primary key); INSERT INTO x SELECT NOC FROM {}'

gives me

no such table: x

Does that work/make sense?

Not sure I'm doing it right, but

dsq "data/sheet.xlsx" 'CREATE TABLE x (noc text primary key); INSERT INTO x SELECT NOC FROM {}'

gives me

no such table: x

I've had a look at this and I can see that SQLite ignores the table name. It gets added (in my tests) as "t_0" rather than "x". If your subsequent statements reference "t_0" rather than "x" they'll probably work.

Not sure why this is, and what further added tables would be named (I imagine t_1, t_2, etc.) ...

Thanks, this worked.

t_X is the naming pattern dsq gives each file it ingests. If you pass two files to dsq the second one is t_1, etc.

+1 to this feature request! I'm trying to get the result of this into a spreadsheet and outputting as CSV would be very helpful :)

Love the tool btw! Was very easy to use and solved my problem of joining data in local csv files very easily :)

Actually - there a ton of "json" to "csv" tools available online from a quick Google search which worked perfectly for me.