mbostock / ndjson-cli

Command line tools for operating on newline-delimited JSON streams.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add ndjson-join?

mbostock opened this issue · comments

It’d be cool to join two ndjson streams (e.g., a stream from a CSV file and a stream of features from a GeoJSON collection).

Hypothetical example using bash process substitution:

ndjson-join <(shp2json -n example.shp) <(csv2json -n example.csv)

So, you’d specify an expression that takes d and i, and is evaluated for each line in each input A and B (in parallel, hopefully), and put into a Map of [a, b] for each evaluated key.

If an object in A has no corresponding key in B, then the corresponding entry is [a, null]. Likewise if an object in B has no corresponding key in A then the corresponding entry is [null, b].

But what if multiple objects in A have the same key, or multiple objects in B have the same key? I believe you’d want the cartesian product in that case, so for example if a1, a2, b1 and b2 have the same key, you should see [a1, b1], [a1, b2], [a2, b1] and [a2, b2] in the output stream.

You might also want options like whether it’s an inner join (don’t output any [a, null] or [null, b]), a left outer join (don’t output any [null, b]), a right outer join (don’t output any [a, null]), or a full outer join (include all results). Although it’d also be possible to do that in a subsequent ndjson-filter.