ashtum / lazycsv

A fast, lightweight and single-header C++ csv parser library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Lazycsv can forget the name of the last csv column

ibaaj opened this issue · comments

In python, I generated a csv as follows:

import random
import csv

headercsv = ["x", "y", "z", "result"]

dataset = []
for i in range(1, 10000):
    x = round(random.uniform(-10, 10),3)
    y = round(random.uniform(-10, 10),3)
    z = round(random.uniform(-10, 10),3)

    result =  round(x * y * z, 3)

    dataset.append([x, y, z, result])

with open('data.csv', 'w', encoding='UTF8') as f:
    writer1 = csv.DictWriter(f, fieldnames=headercsv)
    writer1.writeheader()
    writer = csv.writer(f)
    writer.writerows(dataset)
head -n 5  data.csv
x,y,z,result
-1.925,5.298,-7.699,78.519
-0.437,2.224,-3.304,3.211
0.82,1.699,5.656,7.88
-6.65,5.465,-2.115,76.864

I then tried to parse the column names of the generated csv file "data.csv" using lazycsv:

#include <iostream>
#include "lazycsv.h"

int main() {
    lazycsv::parser<
            lazycsv::mmap_source,           /* source type of csv data */
            lazycsv::has_header<true>,      /* first row is header or not */
            lazycsv::delimiter<','>,        /* column delimiter */
            lazycsv::quote_char<'"'>,       /* quote character */
            lazycsv::trim_chars<' ', '\t'>> /* trim characters of cells */
    parser{"data.csv"};

    auto header = parser.header();

    for (const auto cell: header) {

        std::cout << cell.raw() << std::endl;
    }
    std::cout << "===========" << std::endl;

    for (const auto cell: header) {

        std::cout << cell.raw() << " vs" << " result" << std::endl;

        if (cell.raw() == "result") {
            std::cout << "catched result" << std::endl;
        }
    }


    return 0;
}

This code generates the following output:

x
y
z
result
===========
x vs result
y vs result
z vs result
 vs result

It is not possible to compare the last column via cell.raw() with another value ("catched result" is not printed).
Furthermore, the last line shows that if I try to print sentences composed of cell.raw() (returning the name of the last column) and sentence segments, it doesn't work.

If I move the "result" column to a place other than the last column, I can capture it.

Thanks for reporting,

The problem is that the generated CSV file uses \r\n as the newline character, and the last column would include \r. For example, in this case, it would be result\r. If you add \r to the trim_chars list and use the trimmed output, it will display the expected result:

#include <iostream>
#include "lazycsv.h"

int main() {
    lazycsv::parser<
            lazycsv::mmap_source,                 /* source type of csv data */
            lazycsv::has_header<true>,            /* first row is header or not */
            lazycsv::delimiter<','>,              /* column delimiter */
            lazycsv::quote_char<'"'>,             /* quote character */
            lazycsv::trim_chars<' ', '\t','\r'>>  /* trim characters of cells */   <-- Note the change
    parser{"data.csv"};

    auto header = parser.header();

    for (const auto cell: header) {

        std::cout << cell.raw() << std::endl;
    }
    std::cout << "===========" << std::endl;

    for (const auto cell: header) {
        std::cout << cell.trimed() << " vs" << " result" << std::endl;

        if (cell.trimed() == "result") {
            std::cout << "catched result" << std::endl;
        }
    }

    return 0;
}

BTW I've a created an issue to address this problem.

Thank you very much for your help.
This solves, at least temporarily, this problem.