Stream delimiters in `fromstream`
ab-pm opened this issue · comments
When using --stream
or tostream
, I was surprised to find some elements in the stream that have only a path but no value. After some experimentation, I found that these are emitted whenever an object or array ends, with the path being the same as the path of the previous leaf.
$ echo '{"a":13, "b":[{"x":true}, {}, 42]}' | jq --stream -c '.'
[["a"],13]
[["b",0,"x"],true]
[["b",0,"x"]]
[["b",1],{}]
[["b",2],42]
[["b",2]]
[["b"]]
Same with jq -nc 'inputs | tostream'
.
Issue 1
First, I would like to report a documentation request. I found this behaviour is documented
Streaming forms include […]
[<path>]
(to indicate the end of an array or object)
but there is no explanation why these are necessary, how they should be treated, or what the significance of the path
is in these "object/array terminators". Why not use/allow any other value?
(I can guess that fromstream
needs them to be able to generate multiple results, but this is not obvious - and doesn't explain why this particular format is needed)
Issue 2
Second, I would like to report a bug / feature request: fromstream
fails silently when these terminator values are missing or invalid.
I found by experimentation that a valid terminator is any array that has a non-empty array as its single element. It doesn't matter what the values inside the inner array are, it just mustn't be empty. []
and [[]]
are ignored.
Reproduction
$ jq -nc 'fromstream([["a"],13], [["b",0,"x"], true], [["b",0,"x"]], [["b",0]], [["b"]])'
{"a":13,"b":[{"x":true}]} # as expected
$ jq -nc 'fromstream([["a"],13], [["b",0,"x"], true], [["b"]])'
{"a":13,"b":[{"x":true}]} # works even without terminators for the inner object and array
$ jq -nc 'fromstream([["a"],13], [["b",0,"x"], true], [[ {}, null, false ]])'
{"a":13,"b":[{"x":true}]} # really works with arbitrary values in the path
$ jq -nc 'fromstream([["a"],13], [["b",0,"x"], true])'
$ jq -nc 'fromstream([["a"],13], [["b",0,"x"], true], [])'
$ jq -nc 'fromstream([["a"],13], [["b",0,"x"], true], [[]])'
# no output at all!
Expected behaviour
I would expect this to either just generate the output once the input generator ends, or to throw an error about the missing terminator.
Background
I was trying to merge multiple objects that are provided separately. I later found out this can be achieved easier with
$ echo '{"a":13} {"b":42}' | jq -c --slurp 'reduce .[] as $obj ({}; . * $obj)'
{"a":13,"b":42}
# or
$ echo '{"a":13} {"b":42}' | jq -cn 'reduce inputs as $obj ({}; . * $obj)'
{"a":13,"b":42}
but my first attempt was to use streams
$ echo '{"a":13} {"b":42}' | jq -c --stream 'fromstream(.)'
# no output at all?! bug?
$ echo '{"a":13} {"b":42}' | jq -c --stream --slurp 'fromstream(.[])'
{"a":13}
{"b":42}
# not what I wanted, but ok
$ echo '{"a":13} {"b":42}' | jq -cn --stream 'fromstream(inputs)'
{"a":13}
{"b":42}
# same thing
before I realised that I had to remove (filter out) the terminators between the objects and then append my own in the end:
$ echo '{"a":13} {"b":42}' | jq -n --stream 'fromstream((inputs | select(has(1))), [[0]])' -c
{"a":13,"b":42}