BrianHicks / elm-csv

Decode CSV in the most boring way possible.

Home Page:https://package.elm-lang.org/packages/BrianHicks/elm-csv/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Better error messages for missing columns

gampleman opened this issue · comments

When a column is missing, the function errorToString produces extremely repetitive output:

I saw 5 problems while decoding this CSV:

There was a problem on row 1, in the `foo` field: The `foo` field wasn't provided in the field names.

There was a problem on row 2, in the `foo` field: The `foo` field wasn't provided in the field names.

There was a problem on row 3, in the `foo` field: The `foo` field wasn't provided in the field names.

There was a problem on row 4, in the `foo` field: The `foo` field wasn't provided in the field names.

There was a problem on row 5, in the `foo` field: The `foo` field wasn't provided in the field names.

(This can easily lead to thousands of lines of error output), even though the error is only on the first line.

See this Ellie for a minimal example.

Yep, that makes sense. What would you expect to see instead?

There was a problem on row 1: the field definitions were supposed to define a `foo` field, but none was found.

or something like that, depending on how fancy one wants to go (I suppose a similar issue would be for numbered columns, where if a csv only has 3 column, then there isn't much point emitting a problem for each row that column 4 is missing)

I made an improved version in this Ellie. Would you like me to make a PR?

it's been a while since I was in this code so I'm not 100% sure what you changed. Is it that FieldNotFound, ColumnNotFound, and FieldNotProvided are rolled up into a single occurrence at row 0, column 0? I confess I'm not such a big fan of that—it may make sense for FieldNotProvided, but not the other two.

What if we allowed specifying multiple locations for an error? At least then we could say, "there was a problem on rows 1, 2, 3, 4, 5, 6…" Still not as compact as I'd like but better, and leaves the door open to calculating ranges: "there was a problem on rows 1–50 and 61"

I confess I'm not such a big fan of that—it may make sense for FieldNotProvided, but not the other two.

Why not? It doesn't seem like a row problem if you ask for the seventh column in a file that only has five. Although I am confused about the distinction between FieldNotProvided and FieldNotFound? How are they different?

What if we allowed specifying multiple locations for an error? At least then we could say, "there was a problem on rows 1, 2, 3, 4, 5, 6…" Still not as compact as I'd like but better, and leaves the door open to calculating ranges: "there was a problem on rows 1–50 and 61"

Are you thinking of non-rectangular CSV files? Because otherwise the column is either present or it isn't, it doesn't make much difference on what row you are.

from the docs:

  • ColumnNotFound Int and FieldNotFound String: we looked for the
    specified column, but couldn't find it. The argument specifies where we
    tried to look.
  • FieldNotProvided String: we looked for a specific field, but it wasn't
    present in the first row or the provided field names (depending on your
    configuration.)

I'm thinking of non-rectangular CSV files, yes, much like the ones you're describing in #23. I have also been in situations where I'm editing a CSV directly instead of exporting it from some tool. In those situations getting errors that say a column wasn't in a row are pretty helpful! But, as you've found, the current implementation gets spammy—that's why I think rolling them up may be a better idea.