dflemstr / rq

Record Query - A tool for doing record analysis and transformation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

joining multiple fields

janetvanderpuye opened this issue · comments

Hi, I'm new to rq so forgive me if this is a noob question. It is more of a question than a bug/issue. I looked around a bit and I could not find a solution, that's why I'm posting here. I have a json structure similar to this

 {
	"id":"123456",
	"header":{"more":false},
	"result":[
		{
			"e_id":"XXX",
			"e_type":"ENTITY", 
			"identifiers": [
				{
					"type":"NAME",
					"id":"XXX_0",
					"name":"Segosan Itabachi",
					"modified_ts":"2017-04-06 20:27:02.0",
					"main":true
				},{
					"type":"TAG",
					"id":"XXX_1",
					"name":"Segosan",
					"modified_ts":"2017-04-06 20:27:02.0",
					"main":false
				},
                                 {
					"type":"NAME",
					"id":"XXX_2",
					"name":"Segosan Itabachi-san",
					"modified_ts":"2017-04-06 20:27:02.0",
					"main":false
				}
 		]}
 ]
}

What I want to do is to filter flatten the whole structure into a sort of csv. The selection criteria is to go through all the objects in the identifiers array, check if the identifier type == "NAME" and then output the e_id joined to the id for that identifier object on a row. So for the above the output would look like

"XXX", "XXX_0", "Segosan Itabachi"
"XXX", "XXX_2", "Segosan Itabachi-san"

So far, I'm stuck at rq 'at "result"|spread ' < rds_tickers.json . If I use the map function with the e_id field, I can't access the identifiers. If I use the flatMap function with the identifiers array, then i know longer have access to the e_id field. Any tips or pointers in the right direction would be greatly appreciated.

You can do this:

rq 'at "result" | spread | map (x) => { _.map(_.filter(x.identifiers, function(e) {return e.type === "NAME"}), function(e) {return [x.e_id, e.id, e.name]}) } | spread'

And if you use the latest version just released, you can add -V to get CSV output.

But it is arguably quite clumsy. Not sure if there is an easier way.

Slightly better version:

rq 'at "result" | spread | map (x) => { _.map(_.filter(x.identifiers, e=>e.type === "NAME"), e=>[x.e_id, e.id, e.name]) } | spread'