Escape Quotes in Cells
calebkleveter opened this issue · comments
When encoding or serializing data to CSV format, we don't escape double-quote characters ("
), which will causes the data to be parsed incorrectly. We need to escape the quote characters in the cell contents by adding another double-quote before it:
"Exactly!" He replied
Becomes this:
""Exactly!"" He replied
@calebkleveter I've been reviewing what I understand to be the spec for CSV (RFC 4180). It looks like if you want to include quotes anywhere in what we consider a cell
then you are supposed to wrap the whole cell in quotes as well. See Sections 2.6 and 2.7. Do the changes to the encoder and decoder/parser in #9 account for this?
I was looking at tackling this issue along side the config work I was working on, but if you've got a good approach for this then that's even better!
The parser and decoder handle escaped quotes in all contexts, even if the cell is not wrapped in double quotes. The serializer and encoder automatically surround all cells in quotes already. Once the PR for this issue is merged, all cases for double quotes should be handled.
@calebkleveter That's great!
So it would handle all of these examples?
full name,quote type
the hulk,no quotes
"aqua man","wrapped in quotes"
"""iron man","""wrapped and leading quote"
"spyder man""","wrapped and trailing quote"""
"bat""man","wrapped and""inserted quote"
Yes, it would. If you parsed that into a dictionary, it would look like this:
[
"full name": ["the hulk", "aqua man", "\"iron man", "spyder man\"", "bat\"man"],
"quote types": ["no quotes", "wrapped in quotes", "\"wrapped and leading quotes", "wrapped and trailing quotes\"", "wrapped and\"inserted quote"]
]