d3 / d3-dsv

A parser and formatter for delimiter-separated values, such as CSV and TSV.

Home Page:https://d3js.org/d3-dsv

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TSV Parses ""'s in Columns Incorrectly

agrow opened this issue · comments

commented

Greetings!

TLDR: Sections that begin with a quoted item but includes other text afterwards, such as:
"Hello" world
Will parse as
"Hello"<tab> world
which is two entries rather than one.

My data includes plain text and is exported to the tsv file correctly (verified with visual white space viewed in Word). However, when it is imported via d3.tsv, it splits an entry such as the one above into two, shoving over all my other data.
I do not have time to make an isolated test right now, but here are some screenshots of the incorrect parse (and one photo including an adjacent correct parse).

One instance:
2016-12-13_1404
2016-12-13_1403
2016-12-13_1405

Another instance:
2016-12-13_1418
2016-12-13_1411
2016-12-13_1413
Includes correct parse ^

Per RFC 4180:

If fields are not enclosed with double quotes, then double quotes may not appear inside the fields.

If you want the parsed value of the field to be "Hello" world, the serialized text for that field should be """Hello"" world".

commented

That link is for CSV, not TSV.

Also, Hello "world" parses correctly. Double quotes "may not appear" only when they begin the field. In a TSV.

This repo generalizes CSV as defined in RFC 4180 to support other delimiters besides comma, including tab (\t). But otherwise it adheres to that specification.

The fact that Hello "world" parses as expected does not mean that the input is correctly formatted; the correct input format should be "Hello """world""" in that case. This library doesn’t validate the input, and so its behavior is undefined if you give it invalid input.