BrianHicks / elm-csv

Decode CSV in the most boring way possible.

Home Page:https://package.elm-lang.org/packages/BrianHicks/elm-csv/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Should a field containing only spaces be considered "blank"?

jpagex opened this issue · comments

Using the same example/use case as for the issue #11, I wonder if using blank should work for cells with only spaces (using trim before isEmpty)?

Here is an example:

[ "   Name, Age"
, "  Alice,  12"
, "    Bob,    "
, "Charlie,  24"
]
|> String.join "\r\n"
-- |> String.replace " " ""
|> decodeCsv FieldNamesFromFirstRow (field "Age" (blank int))

I would expect to get:

Ok [Just 12, Nothing, Just 24]

But instead, I get:

Err (DecodingErrors [{ column = Field "Age" (Just 1), problems = [ExpectedInt ("    ")], row = 2 }])

If I uncomment the line String.replace " " "", it works as expected.

As we discussed previously, this could lead to problems if we do blank string. Maybe someone would expect to get Just "   " instead of Nothing?

However, should it work for blank int and blank float?

I can surely add the following to make it work:

customBlank : Decoder a -> Decode (Maybe a)
customBlank decoder =
    andThen
        (\maybeBlank ->
            if String.isEmpty (String.trim maybeBlank) then
                succeed Nothing

            else
                map Just decoder
        )
        string


[ "   Name, Age"
, "  Alice,  12"
, "    Bob,    "
, "Charlie,  24"
]
|> String.join "\r\n"
|> decodeCsv FieldNamesFromFirstRow (field "Age" (customBlank int))
                                                  ^^^^^^^^^^^

Sorry for this long message. I have seen that you consider the package "done", but I was wondering what you would expect in this situation. Thanks a lot for your work and the nice 3.0.1 upgrade!

I realize that this may be an edge case. I work with a small dataset and visually align the columns to make it nice. But this would not be like it for real datasets. (Besides, I am not blocked by it, because I can still trim the columns myself.)

I do not want to make you lose your time. Feel free to close the issue if this does not help you or go in the direction you want. Otherwise, I am open for discussion.

Thanks for bringing it up. This seems reasonable! Wanna make a PR?

I should clarify the done-ness language: I'd consider this more a bug fix than a feature since it makes the package work more in ways you'd expect.

Great! Sure for the PR.

One question before. Is it OK then to consider any "white space" as blank? Even for strings?

Even for strings? What do you mean?

I think that for the blank function we get to define what a blank field is. If people don't agree with the definition, it's easy to roll your own. If people have some field where whitespace is significant, they can use string directly.

Like with the number decoders, the difference is one of intent. People are signaling their intent that a blank string doesn't mean anything, so we can be a little bit more aggressive in our data handling.

Oh do you mean if you specify blank string? Yes, even then—that won't trim the string, it'll just turn an empty string into Nothing. Subtle difference but IMO an acceptable one.

Exactly.

So we are saying that a blank field is composed of any number of white space characters, not only the empty string.

As you said, decodeCsv NoFieldNames (blank string) "    " should decode into Nothing, and we can use string directly if needed.

I'll make a PR then. Thank you for the feedback.