AbsaOSS / cobrix

A COBOL parser and Mainframe/EBCDIC data source for Apache Spark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Copybook.parseSimple not dropping the fillers in output ast

codealways opened this issue · comments

String copybookContents = "01 RECORD.
05 FILLER PIC X(1).
05 COMPANY_PREFIX PIC X(3).
05 FILLER PIC X(1).
05 FILLER PIC X(1).
05 COMPANY_NAME PIC X(9)."

Group grp = CopybookParser.parseSimple(copybookContents, true,true,commentPolicy.apply(false,6,72)).ast();

in the above scenario as drop_value_fillers is true then the output ast also should not contain the FILLERS. But output ast is providing each column details.

Thanks for the bug report. Dropping of fillers is a feature of 'spark-cobol'. Fillers should be deopped in Spark schema while still can be present in the AST.
But your expectation makes sense. We can drop fillers from AST as well.

Hi, I was revisiting this issue.
Each AST element has isFiller flag, so you can always ignore all fields if the flag is true.

Is it necessary for your use case to actually remove fillers from the AST if you have this flag?

That's how we are handling in our code while parsing the ast. It's a good to have feature to drive via parsesimple

Could you please elaborate on what are you trying to achieve?

Every use case that I can think of can be done by just ignoring fields (AST statements) for which isFiller == true.

We are currently parsing the copybook using ast and converting to json. We are using parsesimple to get the ast. if based on the user options if we can drop or retain the filler in ast then for us its not required to handle while converting back to json.

Sure, but using node.isFiller can be used to achieve the same, isn't it?

What's the code snippet that you are using?

I see why you are asking for the feature. There are parameters that say 'dropGroupFiller' and 'dropValueFillers' but nothing actually dropped, just marked. It makes sense. We can implement the dropping as well

You are correct. Currently we are parsing and dropping the fillers by utilizing the isFiller. currently the parameter we are sending for dropping filler to parseSimple is not functional.

This is implemented.

Please, check the latest master. Note that the signature of the method (parseSimple) has changed. It now reflects what is actually being done.

Spark schemas doe not support having 2 column names having tha same name so FILLERs need to be renamed in order to be retained. So the method allow controlling if fillers are going to be renamed, and a separate flag that controls if non-renamed fields should be dropped.

@yruslan If signature is changed then it will break our existing code if we update the version, we need to make it backward compatible. Can we have it overloaded.

Okay, fair point. Will make it compatible

@yruslan thanks a lot. you can mark the existing parsesimple signature to @deprecated

The changes are in master - you can check. To retain the compatibility the behavior is the same by default.
Use 'dropFillersFromAst = true' to actually drop fillers from the AST.

So as in parseSimple is still have a new parameter dropFillersFromAst in the signature so once we upgrade we have to provide the value in our calling code. In scala it won't ask as the default value is given for it as false but in Java we have to explicitly specify to false while calling.

Yeah, but this is the only way to preserve backward compatible behavior. Since the method has default values it cannot be overloaded.

2.4.8 is released and it has the fix.

Btw, Cobrix has converters project that currently just provides examples on how to convert mainframe files to JSON without Spark. Consider contributing your converter to the project :)

Sure I would be happy to contribute.
Can I know the overall requirement and detail plan for it. Is there any high level design document or still it is in process. I can contribute their also. Any plans to have the code base in Java or it will be completely in scala :)

We don't have an exact plan at the moment. Just ideas. The one that we might likely to implement is a command-line tool + library that allows converting mainframe files to JSON.

Yes, the project will continue to be in Scala.

Sure then I think of 2 requirements

  1. Converting copybook ast to corresponding json
  2. Converting copybook to a map with key as parent hierarchy and value as column name.