segment filter not working in 2.6.2 version

Question

segment filter not working in 2.6.2 version

saikumare-a opened this issue a year ago · comments

Background [Optional]

Hi,
we are receiving the multisegment ascii file and we would like to filter the data for a particular segment based on a column.

as per documentation, tried using option("segment_filter").

even after using this filter, observing no filtration of data is happening. can you help on checking on this?

saikumare-a · Answer 1 · Tue Jan 03 2023 23:53:56 GMT+0800 (China Standard Time)

Hi @yruslan,

just tested this in 2.6.1 and working fine, but not working in 2.6.2. can you check into this

Ruslan Yushchenko · Answer 2 · Wed Jan 04 2023 02:51:15 GMT+0800 (China Standard Time)

Hi,
What's your full spark.read code snippet?

saikumare-a · Answer 3 · Wed Jan 04 2023 11:23:31 GMT+0800 (China Standard Time)

Hi @yruslan,

below are the options used

final_options = {'copybook': ‘<copybook_path>', 'generate_record_id': 'false', 'drop_value_fillers': 'false', 'drop_group_fillers': 'false', 'pedantic': 'true', 'encoding': 'ascii', 'variable_size_occurs': 'true', 'record_format': 'D', 'segment_field': 'BASE_RCRD_ID', 'segment_filter': 'ABC'}

df=spark.read.format("cobol").options(**final_options).load()

Ruslan Yushchenko · Answer 4 · Wed Jan 04 2023 23:17:24 GMT+0800 (China Standard Time)

Yeah, I can see why it is happening. You can workaround by filtering your data frame using .filter(col("BASE_RCRD_ID") === "ABC") for now.

saikumare-a · Answer 5 · Wed Jan 11 2023 19:32:06 GMT+0800 (China Standard Time)

Hi @yruslan ,

is this issue fixed or any timeline by when this could be fixed?

saikumare-a · Answer 6 · Fri Jan 20 2023 14:27:02 GMT+0800 (China Standard Time)

Hi @yruslan,

any luck with looking into this?. Thanks in advance!!

Ruslan Yushchenko · Answer 7 · Fri Jan 20 2023 23:28:30 GMT+0800 (China Standard Time)

Not yet. Please, use the workaround for now.

Ruslan Yushchenko · Answer 8 · Thu Feb 02 2023 16:29:38 GMT+0800 (China Standard Time)

This should be fixed in 2.6.3 released yesterday.