AbsaOSS / cobrix

A COBOL parser and Mainframe/EBCDIC data source for Apache Spark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Allow specifying ASCII partial record handling strategy

yruslan opened this issue · comments

Background

Currently, if an ASCII record does not fit copybook record size, the rest of the bytes will be part of the next record.
Most of the time it is not the correct behavior. Redundant bytes should be discarded in this case.

Feature

Add option

.option("allow_partial_records", "true")

with false as default.

Example [Optional]

Given this copybook

         01  ENTITY.
           05  A    PIC X(1).
           05  B    PIC X(3).

and the data file:

1
12
123
1234
12345
123456
1234567
12345678
123456789
12345678901234567890123456789
5678

If allow_partial_records = false:

+---+---+
|A  |B  |
+---+---+
|1  |   |
|1  |2  |
|1  |23 |
|1  |234|
|1  |234|
|1  |234|
|1  |234|
|1  |234|
|1  |234|
|1  |234|
|5  |678|
+---+---+

If allow_partial_records = true:

+---+---+
|A  |B  |
+---+---+
|1  |   |
|1  |2  |
|1  |23 |
|1  |234|
|1  |234|
|1  |234|
|5  |6  |
|1  |234|
|5  |67 |
|1  |234|
|5  |678|
|1  |234|
|5  |678|
|1  |234|
|5  |678|
|9  |012|
|3  |456|
|7  |890|
|1  |234|
|5  |678|
|5  |678|
+---+---+