koaning / bulk

A Simple Bulk Labelling Tool

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Load json and jsonl files

rsbohn opened this issue · comments

Finding .csv files a bit limiting.

def read_any(file:Path) -> DataTable:
    if file.name[-5:] == "jsonl":
        return pd.read_json(file, lines=True)
    if file.name[-4:] == "json":
        return pd.read_json(file)
    if file.name[-3:] == "csv":
        return pd.read_csv(file)
    raise ValueError(f"Can't read {file}.")

I wouldn't mind adding .jsonl but is .json really a format people use for something that is columnar?

I've been using datasette and sqlite-utils where the default format is .json. You can get .jsonl by adding '--nl'.

https://sqlite-utils.datasette.io/en/stable/cli.html#returning-json

I'm adding support for jsonl in #45.

For now I'll consider json out of scope for this project. Partially because .jsonl feel superior, but also because it simplifies things.