mangiucugna / json_repair

A python module to repair invalid JSON, commonly used to parse the output of LLMs

Home Page:https://pypi.org/project/json-repair/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Not working for basic example

dcollien opened this issue · comments

Describe the bug
The following "broken" json:

[
    {
        "foo": "Foo bar baz",
        "tag": "#foo-bar-baz"
    },
    {
        "foo": "foo bar "foobar" foo bar baz.",
        "tag": "#foo-bar-foobar"
    }
]

is repaired well by: https://josdejong.github.io/jsonrepair/

but not by this library.

To Reproduce

>>> bad_json
'[\n    {\n        "foo": "Foo bar baz",\n        "tag": "#foo-bar-baz"\n    },\n    {\n        "foo": "foo bar "foobar" foo bar baz.",\n        "tag": "#foo-bar-foobar"\n    }\n]'
>>> json_repair.loads(bad_json)
[{'foo': 'Foo bar baz', 'tag': '#foo-bar-baz"\n    },\n    {\n        "foo', 'foo bar "foobar" foo bar baz.': 'tag', '#foo-bar-foobar': ''}]

Expected behavior
Expected output:

[
    {
        "foo": "Foo bar baz",
        "tag": "#foo-bar-baz"
    },
    {
        "foo": "foo bar \"foobar\" foo bar baz.",
        "tag": "#foo-bar-foobar"
    }
]

(as per https://josdejong.github.io/jsonrepair/)

output instead:

[{'foo': 'Foo bar baz', 'tag': '#foo-bar-baz"\n    },\n    {\n        "foo', 'foo bar "foobar" foo bar baz.': 'tag', '#foo-bar-foobar': ''}]

super interesting thanks for reporting, somehow the whitespaces are messing with the library. I will take a look

0.15.6 is out, can you try it please? This example now was added to the tests and they are all green

Looks great, thank you!