jqlang / jq

Command-line JSON processor

Home Page:https://jqlang.github.io/jq/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reduce over raw files - EOF doesn't terminate `inputs`

eddyashton opened this issue · comments

This looks like a bug, but perhaps I'm just missing something obvious. I'm trying to combine multiple raw (non-JSON) files and embed them in a JSON object. These files are ultimately found by a glob, so I don't think I can use --rawfile without extensive bash machinery on top; instead I need to reduce them with -R to read each line, and recombine them based on input_filename.

$ jq -Rn 'reduce inputs as $line ({}; .[input_filename] += [$line])' foo.txt bar.txt baz.txt

This almost works, but the last line of each file is inserted into the wrong list:

{
  "foo.txt": [
    "First line of foo.",
    "I'm a text file with multiple lines!"
  ],
  "bar.txt": [
    "Last line of foo.First line of bar.",
    "I'm also a text file!"
  ],
  "baz.txt": [
    "Last line of bar.First line of baz.",
    "I'm a third text file.",
    "Last line of baz."
  ]
}

It took me a while to work out, but this is because there's no bare newline at the end of my files. If I add that to the test files, it works correctly. ie, if I rewrite foo.txt from:

First line of foo.\nI'm a text file with multiple lines!\nLast line of foo.

to

First line of foo.\nI'm a text file with multiple lines!\nLast line of foo.\n

But I can't guarantee that the real files will be terminated by an empty line. Surely inputs should 'split' at EOF, as well as each newline? It clearly does for the final file, since we get a final entry for the last line there, so why does it combine the entries from earlier files, across EOF marks?

Just to confirm, this also happens with direct invocations of input:

$ jq -Rn '[input] + [input] + [input] + [input] + [input] + [input] + [input]' foo.txt bar.txt baz.txt
[
  "Last line of baz.",
  "I'm a third text file.",
  "Last line of bar.First line of baz.",
  "I'm also a text file!",
  "Last line of foo.First line of bar.",
  "I'm a text file with multiple lines!",
  "First line of foo."
]

This is the behaviour on both 1.5 and 1.6, afaict.

Another data point - this weirdness on the last line of a file also affects input_line_number, which is off-by-one on the final line of the final file:

$ jq -Rn 'reduce inputs as $line ({}; .[input_filename] += [input_line_number, $line])' foo.txt bar.txt baz.txt
{
  "foo.txt": [
    1,
    "First line of foo.",
    2,
    "I'm a text file with multiple lines!"
  ],
  "bar.txt": [
    1,
    "Last line of foo.First line of bar.",
    2,
    "I'm also a text file!"
  ],
  "baz.txt": [
    1,
    "Last line of bar.First line of baz.",
    2,
    "I'm a third text file.",
    2,
    "Last line of baz."
  ]
}
echo -n first line > a.txt
echo second line > b.txt
jq -R '.' a.txt b.txt   # gives "first linesecond line"
gojq -R '.' a.txt b.txt  # print 2 lines as expected.

Please note that, for better or worse, jq behaves just like cat:

$ cat foo.txt bar.txt
foo1
foo2bar1
bar2

In other words, stringing together file names on the jq command line is more akin to running cat than grep. Call that a bug if you wish, but jq input functions all generally ignore EOF, e.g. with 1 in a one-byte file named one.txt, and '2' in two.txt:

$ jq . one.txt two.txt

yields the single number: 12

So for jq 1.3 through jq 1.6, this perhaps undesirable behavior is, for most intents and purposes, a "feature" given the "backwards compatibility" constraint for X.Y versions.

@pkoppstein Thanks for the clarification. So my proposed change in #2375 does actually break backwards compatibility here, as it now breaks at EOF when processing raw input:

$ jq . one.txt two.txt
12
$ jq -R . one.txt two.txt
"12"
$ ./jq . one.txt two.txt
12
$ ./jq -R . one.txt two.txt
"1"
"2"

Another related problem is that normally jq ignores U+FEFF at the start of the file, but, if you pass multiple files, it only ignores it at the start of the first file.

$ jq . /dev/stdin /dev/fd/3 <<<$'\ufeff'1 3<<<$'\ufeff'2
1
jq: parse error: Invalid numeric literal at line 3, column 0

GitHub needs more emoticons, so we can express horror, for example.