Bytelength
Loganhex2021 opened this issue · comments
Background [Optional]
We are using cobrix library for reading ebcdic file in the Databricks. There is a validation requirement to check record byte length for each record in the file.
Question
Is there any option to generate byte length for the record while reading ebcdic file?
@yruslan - Could you please let me know if you have any idea to calculate byte length for a reach in ebcdic file ?
Do you need a record size for each record or file size for each record?
You can get a file name for each record using either
.option("with_input_file_name_col", "input_file_name")
or
df.withColumn("input_file_name", input_file_name())
depending on the type of file (variable length vs fixed length)
You can then use a filesystem API (Hadoop Client, etc) to get the file size for each file.
Hi, sorry for the late reply. Currently, this is not supported. I've added this to feature requests.
We can make
.option("generate_record_id", "true")
generate record length as well.