BrianHicks / elm-csv

Decode CSV in the most boring way possible.

Home Page:https://package.elm-lang.org/packages/BrianHicks/elm-csv/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Trim cell content

jpagex opened this issue · comments

I would expect to be able to add "padding" to cells in order to align the CSV columns. Example:

decodeCsv
    FieldNamesFromFirstRow
    (map2 Tuple.pair
        (field "Name" string)
        (field "Age" int)
    )
<|
    String.join "\n"
        [ "   Name, Age"
        , "  Alice,  12"
        , "    Bob,  14"
        , " Victor,  18"
        ]

I would expect it to decode it into:

[ ( "Alice", 12 )
, ( "Bob", 14 )
, ( "Victor", 18 )
]

Would it be desirable? For those who do not want to trim some parts, it would still be possible to add " around the cell content.

So:

  • yes for numbers. Trimming before parsing a number will probably never do anything wrong, since we already know that it's supposed to be a number in the field.
  • no for strings, because the package can't have an idea of the user's intent when calling Decode.string. If you'd like your string fields to do that, it's easy to specify anyway: Decode.map String.trim Decode.string.

I'd consider the number thing a bug, actually… is this giving you problems right now?

Thanks for your fast response. Actually, I am just playing with the package for a personal project, so no worries.

I think we have 3 cases:

  • Header line: I think we need to trim them if Decode.FieldNamesFromFirstRow is specified. Otherwise, for my previous example, it would not be possible to use Decode.field "Name" Decode.string. We would get the error:
    There was a problem on row 1: I looked for a column named `Name`, but couldn't find one.
    
  • Numbers: As you say, it should not be a problem to trim them. As a workaround for now, we can use:
    Decode.string
        |> Decode.andThen
            (\str ->
                case String.toInt (String.trim str) of
                    Just int ->
                        Decode.succeed int
    
                    Nothing ->
                        Decode.fail "Invalid int"
            )
  • Strings: I understand your point. Do you think the performance of calling Decode.map String.trim Decode.string is not far away from handling the case on the parser (I have not checked the code)?
    If there is a performance cost, maybe adding something like Decode.trimmedString could be nice (maybe a better name). We could set the trimmed version as the default and add Decode.rawString also. That's just rough ideas.

For info, I came into this by using the ArrangeColumn feature of csv.vim (https://github.com/chrisbra/csv.vim#arrangecolumn).

Hmm, headers are indeed a special case. It does not seem controversial to trim those.

Do you think the performance of calling Decode.map String.trim Decode.string is not far away from handling the case on the parser (I have not checked the code)?

I am not concerned over performance right now—let's make it right, then make it fast.

My bigger point here is that I do not want to make trimmed strings the default. CSV is mostly an interchange format, in my experience. Most of the time when I've dealt with CSVs the source of data has actually been a database or a spreadsheet, and CSV is just a convenient plain-text export from those programs. If I ever have to edit a CSV, I open it up in Numbers/Sheets/Excel, make my changes, then re-export to CSV.

I perceive tools like XSV and ArrangeColumn to be more about understanding some data you've just been handed, instead of ways to do long-term maintenance of CSV-formatted data. Maybe that's wrong! But I would want to see a couple real-world workflows that depend on insignificant whitespace before I'd change the default here.

I do want to thank you for your points about numbers and headers, though. Those are nice quality-of-life improvements that will make the library more durable to unexpected input and can go in right away. If you'd like to try your hand at adding them, I'd welcome the PRs! (You probably just need to add a String.trim in int, float, and getFieldNames)

I agree with all your points. Indeed, I am using ArrangeColumn for displaying a simple and short CSV file that I use for a really small personal project. I would not imagine manipulating a big/production CSV file in vim with ArrangeColumn either.

For the two other points, I would be glad to make a PR. It should be quite straightforward.