AbsaOSS / cobrix

A COBOL parser and Mainframe/EBCDIC data source for Apache Spark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bytelength

Loganhex2021 opened this issue · comments

commented

Background [Optional]

We are using cobrix library for reading ebcdic file in the Databricks. There is a validation requirement to check record byte length for each record in the file.

Question

Is there any option to generate byte length for the record while reading ebcdic file?

commented

@yruslan - Could you please let me know if you have any idea to calculate byte length for a reach in ebcdic file ?

Do you need a record size for each record or file size for each record?

You can get a file name for each record using either

.option("with_input_file_name_col", "input_file_name")

or

df.withColumn("input_file_name", input_file_name())

depending on the type of file (variable length vs fixed length)
You can then use a filesystem API (Hadoop Client, etc) to get the file size for each file.

commented

Thanks @yruslan , I need record size for each record.

commented

@yruslan , could you please help here

Hi, sorry for the late reply. Currently, this is not supported. I've added this to feature requests.
We can make

.option("generate_record_id", "true")

generate record length as well.