Repeated columns names erase each other for xParse
mcnuttandrew opened this issue · comments
There is a small ambiguity in the way that the tsvParse and csvParse address parsing files with columns that non-unique names. For instance if you have a tsv like
Example A Example B Example A
1 5 0
2 5 0
3 5 0
4 5 0
And you run that through tsvParse then you get
[
{ 'Example A': '0', 'Example B': '5' },
{ 'Example A': '0', 'Example B': '5' },
{ 'Example A': '0', 'Example B': '5' },
{ 'Example A': '0', 'Example B': '5' },
columns: [ 'Example A', 'Example B', 'Example A' ]
]
The problem of course being that the data from the first Example A column is blown away during the parse. I'm not sure what the right solution to this might be: maybe including some messaging in the docs that column names need to be unique? Or maybe appending an incrementing index to the duplicated columns ('Example A-1' or something). Having recently been bit by this, this is a real hair pulling issues to find/resolve, so any help that might be offered to other people in a similar situation would no doubt be welcomed.
I think it's a good idea, and a possible implementation is given in #73
Note however that it would be a breaking change (people who already have some code running and this type of data expect it to continue working).
I like your solution, but I don't know if it's worth issuing a breaking change. I think just including some stuff in the documentation would probably get most people through the hurdle of identifying this error
I don't know… The thing is that, when the data has this shape (and when you don't control it), it's currently quite difficult to manipulate: you have to load it as text, then fiddle with the first line, then dsv.parse… I've had to do this literally last week.
(Plus, we're going to issue a major version soon, so having a breaking change is not that problematic.)
Oh i didn't know a major version was coming! This seems like a great approach then
I ♻️ my code into a notebook (and added "empty names" as well)
https://observablehq.com/@fil/csv-duplicate-names
Fixed in 8ab1ab8 ; thank you!