wireservice / csvkit

A suite of utilities for converting to and working with CSV, the king of tabular file formats.

Home Page:https://csvkit.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

conversion to TSV without double doublequotes (")

ciupicri opened this issue · comments

If I try to convert to TSV a CSV containing values with doublequotes ("), e.g. this file:

Software,Year
"The ""best"" software",2023

csvformat --out-tabs --out-no-doublequote fails with:

Error: need to escape, but no escapechar set

One liner for testing:

printf 'Software,Year\n"The ""best"" software",2023\n' | csvformat -T -B

You need to set -P:

  -P OUT_ESCAPECHAR, --out-escapechar OUT_ESCAPECHAR
                        Character used to escape the delimiter in the output CSV file if --quoting 3 ("Quote None") is specified and to escape the QUOTECHAR if --no-doublequote is specified.

Specifically: to escape the QUOTECHAR if --no-doublequote is specified.

Edit: Fixed typo 71210e6 as -P should mention --out-no-doublequote not --no-doublequote.

But I don't want to escape double qoutes because it doesn't make sense for TSV.

printf 'Software,Year\n"The ""best"" software",2023\n' | csvformat -T -B -P 'X' outputs:

Software	Year
The X"bestX" software	2023

when the ideal output is:

Software	Year
The "best" software	2023

Just in case, -P '' gives me:

TypeError: "escapechar" must be a 1-character string

"need to escape, but no escapechar set" comes from Python's own csv module.

You can change the quote character to one that doesn't occur in the text.

e.g. csvformat -T -Q~ or csvformat -T -Q"🦀"

This stuff ought to be in the manual.
By the way the csvformat -u 3 -U 3 -Q "" example is broken.

How so?

Doing:

csvformat -u3 examples/optional_quote_characters.csv

causes:

a,b,c
"""1""","""2""","""3"""

which looks horrible. Doing the documented incantation produces this instead:

a,b,c
"1","2","3"
# csvformat -u 3 -U 3 -Q ""
No input file or piped data provided. Waiting for standard input:
TypeError: "quotechar" must be a 1-character string

You’re not providing any input, as communicated in the output

$ printf 'Software,Year\n"The ""best"" software",2023\n' | csvformat -u 3 -U 3 -Q ""
TypeError: "quotechar" must be a 1-character string

What's your python --version? I get no error.

You can also run with --verbose (-v) for the traceback.

Aha: "Changed in version 3.11: An empty quotechar is not allowed." https://docs.python.org/3/library/csv.html#dialects-and-formatting-parameters I've updated the example to be -Q🐍

For what it's worth I'm using:

  • python3-3.11.2-1.fc37.x86_64
  • python3-csvkit-1.0.7-3.fc37.noarch