AbsaOSS / cobrix

A COBOL parser and Mainframe/EBCDIC data source for Apache Spark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add an option to specify minimum record length

yruslan opened this issue · comments

Background

This come from an issue with some ASCII files, but is relevant to EBCDIC as well.

Cobrix ignores all empty lines of ASCII files. But some files contain EOF character at the end:

aaaa bbbb 1234
cccc dddd 5678
EOF

Since there is a character in a row, it is treated as a record resulting one additional record:

+-----+-----+-----+
|A    |B    |C    |
+-----+-----+-----+
|aaaa |bbbb |1234 |
|cccc |dddd |5678 |
|null |null |null |
+-----+-----+-----+

Should be

+-----+-----+-----+
|A    |B    |C    |
+-----+-----+-----+
|aaaa |bbbb |1234 |
|cccc |dddd |5678 |
+-----+-----+-----+

Feature

Add an option to specify minimum record length.

Proposed Solution

.option("minimum_record_length", 2)