Better error messages for missing columns
gampleman opened this issue · comments
When a column is missing, the function errorToString
produces extremely repetitive output:
I saw 5 problems while decoding this CSV:
There was a problem on row 1, in the `foo` field: The `foo` field wasn't provided in the field names.
There was a problem on row 2, in the `foo` field: The `foo` field wasn't provided in the field names.
There was a problem on row 3, in the `foo` field: The `foo` field wasn't provided in the field names.
There was a problem on row 4, in the `foo` field: The `foo` field wasn't provided in the field names.
There was a problem on row 5, in the `foo` field: The `foo` field wasn't provided in the field names.
(This can easily lead to thousands of lines of error output), even though the error is only on the first line.
Yep, that makes sense. What would you expect to see instead?
There was a problem on row 1: the field definitions were supposed to define a `foo` field, but none was found.
or something like that, depending on how fancy one wants to go (I suppose a similar issue would be for numbered columns, where if a csv only has 3 column, then there isn't much point emitting a problem for each row that column 4 is missing)
I made an improved version in this Ellie. Would you like me to make a PR?
it's been a while since I was in this code so I'm not 100% sure what you changed. Is it that FieldNotFound
, ColumnNotFound
, and FieldNotProvided
are rolled up into a single occurrence at row 0, column 0? I confess I'm not such a big fan of that—it may make sense for FieldNotProvided
, but not the other two.
What if we allowed specifying multiple locations for an error? At least then we could say, "there was a problem on rows 1, 2, 3, 4, 5, 6…" Still not as compact as I'd like but better, and leaves the door open to calculating ranges: "there was a problem on rows 1–50 and 61"
I confess I'm not such a big fan of that—it may make sense for FieldNotProvided, but not the other two.
Why not? It doesn't seem like a row problem if you ask for the seventh column in a file that only has five. Although I am confused about the distinction between FieldNotProvided
and FieldNotFound
? How are they different?
What if we allowed specifying multiple locations for an error? At least then we could say, "there was a problem on rows 1, 2, 3, 4, 5, 6…" Still not as compact as I'd like but better, and leaves the door open to calculating ranges: "there was a problem on rows 1–50 and 61"
Are you thinking of non-rectangular CSV files? Because otherwise the column is either present or it isn't, it doesn't make much difference on what row you are.
from the docs:
ColumnNotFound Int
andFieldNotFound String
: we looked for the
specified column, but couldn't find it. The argument specifies where we
tried to look.FieldNotProvided String
: we looked for a specific field, but it wasn't
present in the first row or the provided field names (depending on your
configuration.)
I'm thinking of non-rectangular CSV files, yes, much like the ones you're describing in #23. I have also been in situations where I'm editing a CSV directly instead of exporting it from some tool. In those situations getting errors that say a column wasn't in a row are pretty helpful! But, as you've found, the current implementation gets spammy—that's why I think rolling them up may be a better idea.