AbsaOSS / cobrix

A COBOL parser and Mainframe/EBCDIC data source for Apache Spark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Handling FILLER Names

saikumare-a opened this issue · comments

Background

cobrix is adding numbers as suffixes to FILLER column's if there are multiple filler columns in copybook

INPUT:
01 CUST-NAME PIC X.
01 FILLER PIC X.
01 CUST-ADDR PIC 100.
01 CUST-NO PIC X.
01 FILLER PIC X.

OUTPUT - column names
CUST_NAME
FILLER
CUST_ADDR
CUST_NO
FILLER2

##Problem
if there is a new filler gets added in between these 2 fillers, the previous "FILLER2" column becomes "FILLER3" column and this is impacting the downstream

INPUT:
01 CUST-NAME PIC X.
01 FILLER PIC X.
01 CUST-ADDR PIC 100.
01 FILLER PIC X.
01 CUST-NO PIC X.
01 FILLER PIC X.

OUTPUT - column names
CUST_NAME
FILLER
CUST_ADDR
FILLER2
CUST_NO
FILLER3

Feature

The option of having filler names with different suffixes and wont get changed based on new fillers addition could be very helpful.

optional parameter --> option("filler_suffixing","previous_column_name") --> default["seq_number"]

Proposed Solution [Optional]

Solution Ideas

  1. one approach could be adding previous column name as suffix like "FILLER_AFTER_{prev_column_name}". this would also have issue if there is a new column gets added between existing column and filler. but this could occur rare compared to previous number suffixing

Example [Optional]

OUTPUT - column names
CUST_NAME
FILLER_AFTER_CUST_NAME
CUST_ADDR
FILLER_AFTER_CUST_ADDR
CUST_NO
FILLER_AFTER_CUST_NO

Thank you for the feature request. i can see the issue it would solve. Could you help me understanding your use case more.

FILLERs are unnamed fields that are usually added as paddings and should not contain useful information. Cobrix initially was dropping all FILLERs. But we got many user requests to retain them.

If FILLERs in your case do contain useful information, and the way you name them is important, why would't you just name these fields in the copybook in the first place? If you replace your fillers with any other name - it will work exactly as you expect.

  1. we are receiving other important columns in FILLER groups ( scaling to many files and many sub columns in FILLERS)
  2. there are quarterly new columns additions to files

the data is from third party system ( they provide the copybook),

  1. so we dont have control on asking them to change
  2. we could handle that internally by changing in copybook, but due to frequent addition of columns, this approach is becoming problem for us

so, thought of handling in cobrix would be helpful and can benefit other cobrix users

Ok, I see how this can make your life easier. Adding to the backlog...

Hi @yruslan ,

any luck with thoughts on timeline for this enhancement. Thank you for the support !!

Sorry, can't provide timelines. A ballpark estimate might be early next year

Thank you for the update.

Hi @yruslan,

seems this enhancement is done, could you attach the latest jar snapshot so that i can test once and confirm.

Thank you for the support!!

It would really help if you can build it. You just need JDK 1.8 and sbt (any version). Then, use one of commands from here: https://github.com/AbsaOSS/cobrix#creating-an-uber-jar