Write custom metadata to output files with dataframe.to_parquet?
thehomebrewnerd opened this issue · comments
Is it possible to save custom metadata in the file when writing to a parquet file?
For example, with Dask, users can add custom metadata to the output files with this:
custom_metadata = {"custom_metadata": "my custom metadata"}
dataframe.to_parquet(path, custom_metadata=custom_metadata)
This code will add the custom metadata to the metadata of the saved parquet files, and the metadata then be read back in with pyarrow.parquet.read_metadata
.
Is it possible to do something similar with Koalas? So far, I have not been able to find a way. I also attempted to manually update the metadata in the files after writing the parquet files with ks.DataFrame.to_parquet
, but that is causing a checksum mismatch when trying to read the files back in to a dataframe with koalas.read_parquet
.
Can we file a JIRA in Apache Spark JIRA (https://issues.apache.org/jira/projects/SPARK)? This repository is in maintenance mode