artemklevtsov / tabby

Direct to object CSV/TSV/tabulated data parser with hooks.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Tabby - Fast CSV parser with hooks.

nimble install tabby

Github Actions

API reference

This library has no dependencies other than the Nim standard library.

About

This library parses .csv or .tsv files directly into Nim objects. This is different from how Nim's standard library parsecsv works which first parses them into an intermediate representation. This makes tabby generate fewer memory allocations.

Tabby also has a simpler API and is easier to use with just two calls fromCsv/toCsv:

let rows = strData.fromCsv(seq[RowObj])

and back:

echo rows.toCsv()

Tabby also supports arbitrary delimiters. Not only standard tab \t and , with linux \n and windows \r\n line endings, but any delimiter can be used. It's trivial to convert your data to and from any tabular format:

strData.fromCsv(seq[RowObj], separator = ":", lineEnd = ";")

Tabby can also guess delimiters with fromCsvGuess() function.

This library is similar to my other jsony project that is for json, if you like jsony you should like tabby.

How to use

You need to have CSV strData:

word,count
the,23135851162
of,13151942776
and,12997637966

And a regular Nim object to parse into to:

type FreqRow = object
  word: string
  count: int

Then simply read in the data:

var rows = strData.fromCsv(seq[FreqRow])

Compare this single line with confusing but equivalent std/parcecsv code:

var
  rows: seq[FreqRow]
  strm = newStringStream(content)
  p: CsvParser
p.open(strm, "tmp.csv")
p.readHeaderRow()
while p.readRow():
  rows.add FreqRow(word: p.row[0], count: parseInt(p.row[1]))
p.close()

Speed

Even though tabby does not allocate intermediate objects, it does use header and field re-ordering, and parse and dump hooks, which makes makes it speed close to std/parcecsv but with much simpler API.

name ............................... min time      avg time    std dv   runs
tabby ............................ 134.951 ms    141.897 ms    ±7.301   x100
parsecsv ......................... 129.861 ms    136.086 ms    ±7.026   x100

Prase Hooks

Some times the data you get from csv file is not quite the format you are looking for. You can define your own parseHook function to parse any data.

let csvData = """
country,budget
US,$2000
GB,$1000
DE,$1000
"""

type
  Money = uint64 # in cents

  CountryMoney = object
    country: string
    budget: Money

proc parseHook(p: ParseContext, v: var Money) =
  inc p.i # skip the $
  var num: int
  p.parseHook(num)
  v = num.uint64 * 100 # in cents

var rows = csvData.fromCsv(seq[CountryMoney])

Dump Hooks

Just like parse hooks some times the data you want int a csv file is not quite the format your objects are. You can define your own dumpHook function to output your data into any format.

proc dumpHook(p: PrintContext, v: Money) =
  # read teh %
  p.data.add "$"
  p.data.add $(v div 100)

echo rows.toCsv()

About

Direct to object CSV/TSV/tabulated data parser with hooks.

License:MIT License


Languages

Language:Nim 100.0%