wireservice / csvkit

A suite of utilities for converting to and working with CSV, the king of tabular file formats.

Home Page:https://csvkit.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support zstandard-compressed (.zst) CSV files

gsauthof opened this issue · comments

Say you have a directory of gzip or zstandard compressed csv files you want to merge.

For this it would be great if csvstack would auto-detect the compression, i.e. stream the files into a decompressor and process the files as usual.

Would also make sense for other tools, as many csv files compress very good and other tools support compression transparently (e.g. duckdb), meaning such support would increase interoperability when exchanging such files.

Example usage:

csvstack *.csv.gz | zstd -o complte.csv.zst -f

It does already autodetect the compression as long as the filenames end with .gz, .bz2 or .xz.

Ok, cool. Looks like I primarily tried csvstack on zstandard compressed files.

So how about adding zstandard support then (based on the .zst extension)?

You would have to modify this method and then submit a pull request.

The Python standard library doesn't have support for zstandard, but I can add https://pypi.org/project/zstandard/ as an optional dependency.