AbsaOSS / cobrix

A COBOL parser and Mainframe/EBCDIC data source for Apache Spark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

segment filter not working in 2.6.2 version

saikumare-a opened this issue · comments

Background [Optional]

Hi,
we are receiving the multisegment ascii file and we would like to filter the data for a particular segment based on a column.

as per documentation, tried using option("segment_filter").

even after using this filter, observing no filtration of data is happening. can you help on checking on this?

Hi @yruslan,

just tested this in 2.6.1 and working fine, but not working in 2.6.2. can you check into this

Hi,
What's your full spark.read code snippet?

Hi @yruslan,

below are the options used

final_options = {'copybook': ‘<copybook_path>', 'generate_record_id': 'false', 'drop_value_fillers': 'false', 'drop_group_fillers': 'false', 'pedantic': 'true', 'encoding': 'ascii', 'variable_size_occurs': 'true', 'record_format': 'D', 'segment_field': 'BASE_RCRD_ID', 'segment_filter': 'ABC'}

df=spark.read.format("cobol").options(**final_options).load()

Yeah, I can see why it is happening. You can workaround by filtering your data frame using .filter(col("BASE_RCRD_ID") === "ABC") for now.

Hi @yruslan ,

is this issue fixed or any timeline by when this could be fixed?

Hi @yruslan,

any luck with looking into this?. Thanks in advance!!

Not yet. Please, use the workaround for now.

This should be fixed in 2.6.3 released yesterday.