Inconsistent header treatment for csv tables
AndydeCleyre opened this issue · comments
Andy Kluger commented
Hello!
I'm sorry I'm not sure exactly what's going on here, so I'll get to it. Using Zsh:
$ rows=( Package,Version,Latest,Project 'tomli,2.0.0,2.0.1,~/Code/zpy' 'click,8.0.1,8.0.3,~/Code/archbuilder_iosevka' 'pep517,0.11.0,0.12.0,~/Code/archbuilder_iosevka' 'ruamel.yaml,0.17.17,0.17.21,~/Code/archbuilder_iosevka' 'tomli,1.2.1,2.0.1,~/Code/archbuilder_iosevka' )
$ rich --csv - <<<${(F)rows}
$ rows=( 'Package,Version,Latest,Project' 'tomli,2.0.0,2.0.1,~/Code/zpy' 'click,8.0.1,8.0.3,~/Code/archbuilder_iosevka' 'pep517,0.11.0,0.12.0,~/Code/archbuilder_iosevka' 'ruamel.yaml,0.17.17,0.17.21,~/Code/archbuilder_iosevka' 'tomli,1.2.1,2.0.1,~/Code/archbuilder_iosevka' )
$ rich --csv - <<<${(F)rows}
Same result as above
$ rows=( 'tomli,2.0.0,2.0.1,~/Code/zpy' 'click,8.0.1,8.0.3,~/Code/archbuilder_iosevka' 'pep517,0.11.0,0.12.0,~/Code/archbuilder_iosevka' 'ruamel.yaml,0.17.17,0.17.21,~/Code/archbuilder_iosevka' 'tomli,1.2.1,2.0.1,~/Code/archbuilder_iosevka' )
$ rich --csv - <<<${(F)rows}
What determines whether the first row gets treated as a header?
Thanks for any help!
Will McGugan commented
It’s a heuristic used by the Python CSV library, which is imperfect as you have noticed. In the future I’ll expose a way to adjust the via an option.
Andy Kluger commented
Thanks! Do you know what about the input in this case gives CSV
the wrong idea, so that I can work around this?
Will McGugan commented
Not sure. You could have a look at the source of the csv module.
Andy Kluger commented
FYI:
- csv.Sniffer.has_header:
def has_header(self, sample):
# Creates a dictionary of types of data in each column. If any
# column is of a single type (say, integers), *except* for the first
# row, then the first row is presumed to be labels. If the type
# can't be determined, it is assumed to be a string in which case
# the length of the string is the determining factor: if all of the
# rows except for the first are the same length, it's a header.
# Finally, a 'vote' is taken at the end for each column, adding or
# subtracting from the likelihood of the first row being a header.
- I spotted "lexter" @ https://github.com/Textualize/rich-cli/blob/main/src/rich_cli/__main__.py#L569