jqlang / jq

Command-line JSON processor

Home Page:https://jqlang.github.io/jq/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Stream delimiters in `fromstream`

ab-pm opened this issue · comments

When using --stream or tostream, I was surprised to find some elements in the stream that have only a path but no value. After some experimentation, I found that these are emitted whenever an object or array ends, with the path being the same as the path of the previous leaf.

$ echo '{"a":13, "b":[{"x":true}, {}, 42]}' | jq --stream -c '.'
[["a"],13]
[["b",0,"x"],true]
[["b",0,"x"]]
[["b",1],{}]
[["b",2],42]
[["b",2]]
[["b"]]

Same with jq -nc 'inputs | tostream'.

Issue 1

First, I would like to report a documentation request. I found this behaviour is documented

Streaming forms include […] [<path>] (to indicate the end of an array or object)

but there is no explanation why these are necessary, how they should be treated, or what the significance of the path is in these "object/array terminators". Why not use/allow any other value?

(I can guess that fromstream needs them to be able to generate multiple results, but this is not obvious - and doesn't explain why this particular format is needed)

Issue 2

Second, I would like to report a bug / feature request: fromstream fails silently when these terminator values are missing or invalid.
I found by experimentation that a valid terminator is any array that has a non-empty array as its single element. It doesn't matter what the values inside the inner array are, it just mustn't be empty. [] and [[]] are ignored.

Reproduction

$ jq -nc 'fromstream([["a"],13], [["b",0,"x"], true], [["b",0,"x"]], [["b",0]], [["b"]])'
{"a":13,"b":[{"x":true}]} # as expected

$ jq -nc 'fromstream([["a"],13], [["b",0,"x"], true], [["b"]])'
{"a":13,"b":[{"x":true}]} # works even without terminators for the inner object and array

$ jq -nc 'fromstream([["a"],13], [["b",0,"x"], true], [[ {}, null, false ]])'
{"a":13,"b":[{"x":true}]} # really works with arbitrary values in the path

$ jq -nc 'fromstream([["a"],13], [["b",0,"x"], true])'
$ jq -nc 'fromstream([["a"],13], [["b",0,"x"], true], [])'
$ jq -nc 'fromstream([["a"],13], [["b",0,"x"], true], [[]])'
# no output at all!

Expected behaviour

I would expect this to either just generate the output once the input generator ends, or to throw an error about the missing terminator.

Background

I was trying to merge multiple objects that are provided separately. I later found out this can be achieved easier with

$ echo '{"a":13} {"b":42}' | jq -c --slurp 'reduce .[] as $obj ({}; . * $obj)'
{"a":13,"b":42}
# or
$ echo '{"a":13} {"b":42}' | jq -cn 'reduce inputs as $obj ({}; . * $obj)'
{"a":13,"b":42}

but my first attempt was to use streams

$ echo '{"a":13} {"b":42}' | jq -c --stream 'fromstream(.)'
# no output at all?! bug?
$ echo '{"a":13} {"b":42}' | jq -c --stream --slurp 'fromstream(.[])'
{"a":13}
{"b":42}
# not what I wanted, but ok
$ echo '{"a":13} {"b":42}' | jq -cn --stream 'fromstream(inputs)'
{"a":13}
{"b":42}
# same thing

before I realised that I had to remove (filter out) the terminators between the objects and then append my own in the end:

$ echo '{"a":13} {"b":42}' | jq -n --stream 'fromstream((inputs | select(has(1))), [[0]])' -c
{"a":13,"b":42}