benhoyt / goawk

A POSIX-compliant AWK interpreter written in Go, with CSV support

Home Page:https://benhoyt.com/writings/goawk/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add helper functions for CSV processing

benhoyt opened this issue · comments

It'd be good to add a library of various functions to help with processing CSV files (or other tabular data, it wouldn't be limited to CSV). For example:

  • The printrow() function mentioned in csv.md.
  • If we add the above, may also want a printheader() function that prints the names in OFIELDS (or just use print?).
  • A function to delete a field or fields, for example delfield(n) to delete a single field, or maybe delfield(n[, c]) to delete c fields starting at field n (c defaults to 1). For one implementation, see the rmcol definition in this StackOverflow answer.
    • Do we also need a delfieldbyname()? Though with a better name.
  • A function to insert a field or fields, eg insfield(n, val). With standard AWK you can kind of cheat with something like {$n=val FS $n;}, but that doesn't work for CSV escaping.

We could start by making this a simple AWK library that you include, eg goawk -f lib.awk -f prog.awk (prepend/append the library to the source when using the Go API).

When we want to add them as builtins to GoAWK, we should do it in a backwards-compatible way (i.e., not make them keywords like the other builtins, but if the user redefines a function or variable with that same name, that takes precedence).

If you're thinking about helper functions for CSV processing, it would be worthwhile to look at "csvkit"

https://csvkit.readthedocs.io/en/latest/

which is a set of command line tools for processing CSV data. I'm pretty sure that all the simple tools have direct implementations in goawk, but some don't, and this might be inspiration.

(thanks for goawk, always nice to see a favorite old language get a modern implementation)

@vielmetti Thanks for that. Yeah, I've looked at csvkit some when thinking about this (see https://github.com/benhoyt/goawk/blob/master/csv.md#examples-based-on-csvkit). Select and cut and reorder are fairly straight-forward with the @ operator, and the functions in #127 augment that with field insertion/deletion when you need that.

Some things that csvkit can do probably aren't going to be included though, for example, converting to JSON. Or sorting -- that just doesn't fit the row-by-row AWK model very well.

You could also look at Miller for inspiration. Miller is heavily inspired by awk and the unix toolbox, but adds support for formats like CSV, JSON, etc. Miller is also written in Go, so you could even borrow some code for other parts of your project, like buffers and such.

PS: there is also csvtk in Go!