d99kris / rapidcsv

C++ CSV parser library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`pTrim` unexpectedly fails when `pSeparator` is the space character

matwey opened this issue · comments

Hi,

I am using rapidcsv 8.65 and see the following issue.
Imagine the following test.dat file:

1.2   3.4

(three spaces between 1.2 and 3.4)

and the following code:

#include <iostream>
#include <vector>
#include <rapidcsv.h>

int main() {
        rapidcsv::Document doc("test.dat",
                rapidcsv::LabelParams(-1, -1),
                rapidcsv::SeparatorParams(' ', true),
                rapidcsv::ConverterParams(),
                rapidcsv::LineReaderParams(true));

        std::vector<float> row = doc.GetColumn<float>(1);
        for (const auto& x: row) {
                std::cout << x << std::endl;
        }

        return 0;
}

What I see is:

terminate called after throwing an instance of 'std::invalid_argument'
  what():  stof
fish: './test' terminated by signal SIGABRT (Abort)

What I expected to see is:

3.4

I think so because when I change the separator both in the data and in the code as the following:

1.2 , 3.4

and

#include <iostream>
#include <vector>
#include <rapidcsv.h>

int main() {
        rapidcsv::Document doc("test.dat",
                rapidcsv::LabelParams(-1, -1),
                rapidcsv::SeparatorParams(',', true),
                rapidcsv::ConverterParams(),
                rapidcsv::LineReaderParams(true));

        std::vector<float> row = doc.GetColumn<float>(1);
        for (const auto& x: row) {
                std::cout << x << std::endl;
        }

        return 0;
}

then the code prints 3.4.

Hi @matwey - I'm not at a computer now so I haven't checked your issue in detail, but I suspect the use case is similar to #93 which was deemed out of scope for rapidcsv. There was some proposed changes in that issue which you could take a look at if you like, until I review your issue properly.

Hi again - so I can confirm the code in #93 should word for your use-case.

You can use the patch consecutive-separators.patch.txt
attached in this issue to apply to rapidcsv:
git apply consecutive-separators.patch.txt
and then use the mIgnoreConsecutive option like this:

#include <iostream>
#include <vector>
#include "rapidcsv.h"

int main()
{
  rapidcsv::LabelParams labelParams(-1, -1);
  rapidcsv::SeparatorParams separatorParams;
  separatorParams.mSeparator = ' ';
  separatorParams.mIgnoreConsecutive = true;
  rapidcsv::Document doc("test.dat", labelParams, separatorParams);

  std::vector<float> col = doc.GetColumn<float>(1);
  for (auto& val : col)
  {
    std::cout << "Value: " << val << std::endl;
  }
}

Having reviewed this use case before it was concluded that this was out of scope for rapidcsv, so there are no current plans to integrate this patch into rapidcsv.

Feel free to re-open this issue if you have any follow-up questions.