Add support for dumping result as CSV (and eventually other formats)

Question

Add support for dumping result as CSV (and eventually other formats)

infojunkie opened this issue 2 years ago · comments

I'm trying to put together a simple data migration pipeline that reads from Excel sheets into a PostgreSQL database, and I'd like to avoid writing new code if possible. My current thought is to use dsq to write the queries that transform Excel data into something ingestible, then hand the data over to pgloader for robust loading into Pg.

But it seems that dsq only outputs json results, whereas pgloader does not support json as input type - it does support csv and sqlite. My question is: does dsq support csv output? Or if I understood correctly that DataStation uses sqlite under the hood, can I get access to that database from dsq?

Or am I barking at the wrong tree altogether? Thanks for your advice!

Phil Eaton · Answer 1 · Sun Jun 26 2022 08:10:28 GMT+0800 (China Standard Time)

Great questions! dsq uses an in-memory SQLite file by default. Maybe it would be good to be able to allow user-specified SQLite files.

However you can get it to use an actual file for SQLite today with the --cache/-C flag. Then you can run the same query again with the --cache-file/-D and it will give you the location of the file on disk (and then exit).

It's already annoying to me that dsq doesn't support dumping CSV though. And I've considered having it output various file types. So this would be a good feature request.

Karim Ratib · Answer 2 · Sun Jun 26 2022 08:23:55 GMT+0800 (China Standard Time)

Thanks! I found the sqlite file as you described. Outputting to csv or to a specified sqlite file would be useful.

I will let you close this or keep it open as you prefer, but I have now something to work with 👍

Karim Ratib · Answer 3 · Mon Jun 27 2022 07:12:24 GMT+0800 (China Standard Time)

One further question here: The sqlite that gets cached is the raw import, correct? So making any transformation via dsq does not get reflected on file.

If this is the case, then using your SQL queries (as opposed to writing those queries in sqlite) becomes less useful for purposes of migration, unless the final output can be imported elsewhere.

Phil Eaton · Answer 4 · Tue Jun 28 2022 20:49:37 GMT+0800 (China Standard Time)

I haven't tested it but you should be able to do any valid SQLite like:

dsq 'CREATE TABLE x (whatever...); INSERT INTO x SELECT blah from DM_getPanel() WHERE blah'

And then you can export from there.

Does that work/make sense?

Karim Ratib · Answer 5 · Mon Jul 04 2022 03:15:36 GMT+0800 (China Standard Time)

Does that work/make sense?

Not sure I'm doing it right, but

dsq "data/sheet.xlsx" 'CREATE TABLE x (noc text primary key); INSERT INTO x SELECT NOC FROM {}'

gives me

no such table: x

Spasticus74 · Answer 6 · Thu Aug 04 2022 05:54:07 GMT+0800 (China Standard Time)

Does that work/make sense?

Not sure I'm doing it right, but
dsq "data/sheet.xlsx" 'CREATE TABLE x (noc text primary key); INSERT INTO x SELECT NOC FROM {}'
gives me
no such table: x

I've had a look at this and I can see that SQLite ignores the table name. It gets added (in my tests) as "t_0" rather than "x". If your subsequent statements reference "t_0" rather than "x" they'll probably work.

Not sure why this is, and what further added tables would be named (I imagine t_1, t_2, etc.) ...

Karim Ratib · Answer 7 · Thu Aug 04 2022 09:24:21 GMT+0800 (China Standard Time)

Thanks, this worked.

Phil Eaton · Answer 8 · Thu Aug 04 2022 09:34:30 GMT+0800 (China Standard Time)

t_X is the naming pattern dsq gives each file it ingests. If you pass two files to dsq the second one is t_1, etc.

Dusan Zelembaba · Answer 9 · Thu Oct 06 2022 21:57:00 GMT+0800 (China Standard Time)

+1 to this feature request! I'm trying to get the result of this into a spreadsheet and outputting as CSV would be very helpful :)

Love the tool btw! Was very easy to use and solved my problem of joining data in local csv files very easily :)

Dusan Zelembaba · Answer 10 · Thu Oct 06 2022 23:30:44 GMT+0800 (China Standard Time)

Actually - there a ton of "json" to "csv" tools available online from a quick Google search which worked perfectly for me.