meiersi / Frames

Data frames for tabular data.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Frames

Data Frames for Haskell

User-friendly, type safe, runtime efficient tooling for working with tabular data deserialized from comma-separated values (CSV) files. The type of each row of data is inferred from data, which can then be streamed from disk, or worked with in memory.

We provide streaming and in-memory interfaces for efficiently working with datasets that can be safely indexed by column names found in the data files themselves. This type safety of column access and manipulation is checked at compile time.

Tutorial

For comparison to working with data frames in other languages, see the tutorial.

Demos

To play with various demos, I recommend working in a cabal sandbox, and not installing the executables associated with each demo.

$ cabal sandbox init
$ cabal install --dependencies-only -f demos
$ cabal configure -f demos
$ cabal run getdata
$ cabal run plot

The getdata program downloads data sets used by the plot and plot2 demo programs. Feel free to download those files manually; they are linked from the source for the demo programs that use them.

You can also explore the demos interactively by specifying a build target to cabal repl,

$ cabal repl plot

Benchmarks

The benchmark shows several ways of dealing with data when you want to perform multiple traversals.

Another demo shows how to fuse multiple passes into one so that the full data set is never resident in memory. A Pandas version of a similar program is also provided for comparison.

This is a trivial program, but shows that performance is comparable to Pandas, and the memory savings of a compiled program are substantial.

Trivial Benchmark

About

Data frames for tabular data.

License:BSD 3-Clause "New" or "Revised" License


Languages

Language:Haskell 98.2%Language:Nix 1.5%Language:Python 0.3%