kaleidicassociates / fastcsv

WIP std.experimental.allocator fork of HS Teoh's experimental fast CSV to nested array parser

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fast CSV parser in D

This is an experimental project in improving the performance of loading CSV files in D.

Its features include:

  • Extremely fast. It can run up to about 18x faster than std.csv (as of Jan 2016, using a 2.1 million record CSV test file).

  • Parses anything that conforms to RFC 4810.

  • Optional range-based interface (returns an input range of records).

  • Supports fast transcription of CSV data to a range of structs. Specially optimized for structs with string fields, that outperforms std.csv by an order of magnitude.

Its limitations are:

  • Does not perform data validation. Will produce nonsensical results for malformed CSV data (anything not conformant to RFC 4810).

  • Requires the input data to be entirely in memory. This may not be practical for very large CSV files or low-memory machines.

  • Cannot handle CSV data containing records that have more than 4096 fields each. (This can be statically raised by modifying a constant in the code.)

  • Cannot handle string fields larger than 64KB. (This can be statically raised by modifying a constant in the code.)

  • The input range interface still requires the entire input data to be an in-memory string. No forward range capability is provided.

  • Requires UTF-8 input. Does not support UTF-16 or UTF-32 or other encodings.

Usage:

import std.file;
import fastcsv;

// Load CSV into memory
auto input = cast(string) read(myCsvFile);

// Convert CSV to array
const(char)[][] mydata = csvToArray(input);

// Iterate over CSV as input range
foreach (record; csvByRecord(input))
{
    // do something with record (as array of string values)
}

About

WIP std.experimental.allocator fork of HS Teoh's experimental fast CSV to nested array parser


Languages

Language:D 100.0%