AbsaOSS / cobrix

A COBOL parser and Mainframe/EBCDIC data source for Apache Spark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Does the Cobrix handle the Easytrieve layout.?

AnveshAeturi opened this issue · comments

Background [Optional]

I am having the Easytrive layout which is having the Packed unsigned fields (data-type U in Easytrieve), binary unsigned fields (data-type B in Easytrieve) and Alpha-numeric fields (data-type A in Easytrieve and storing Hexbit). The Data file that we are trying to convert is EBCDIC data.

Question

Is there a way we can convert this data thru Cobrix by providing the above mentioned Easytrieve layout? @yruslan

Hi, could you attach an example copybook and a link to the documentation for the data type, please?

The cobol copybook says X (2) but however the data itself is coming from an Easytrieve with a data type of U (Packed Unsigned).
Example is VARIABLE PIC X (2). The data stored is actually an unsigned packed field (definition of U in Easytrieve)

Data Type Link: https://www.mvsforums.com/manuals/EZT_PL_APP_63_MASTER.pdf
Page#35 - Library 2-11 is the footer on the page

Easyterieve_Layout_sample.xlsx

Hi @yruslan , This is the Excel which we created from the Easytrieve layout. only sample fields are added here.

I see. The data types look parsable at first glance. The only thing you need a proper copybook that matches the data in order to parse records like that. And for that you would need a mapping between Easyretrieve data types and Cobol data types.
For instance an Easyretrieve type U with length 4 can have a PIC 9(4) (or PIC 9(9) if the encoding is binary)

Do I understand it correctly that the fields specified in the Excel file are not all fields of the record? Field 'CRSCON' with length 1 at offset 10 is followed by CRADTR at offset 20. It means there are other fields between CRSCON and CRADTR that fill the rest 9 bytes.

Hello. I am adding a comment because I also need to request this same support for Unsigned Packed fields in the mainframe records.
Here is what is meant by "Unsigned Packed" :
An Easytrieve U (Unsigned Packed) field is the same as a normal Packed field, but without the sign-nibble on the end.

For example, let's say we have an account date value of '20220425'.
As a Packed number, that field would be defined in COBOL like this:
ACCT-DATE PIC 9(8) COMP-3.
. . .and in memory, that field would contain this:
X'020220425F'

As a U (Unsigned Packed) number, that field would be defined in COBOL like this:
ACCT-DATE PIC X(4).
. . .and in memory, that field would contain this:
X'20220425'

Unsigned Packed (U) fields must be defined in COBOL as PIC X fields because COBOL does not support the Unsigned Packed format.
It is invalid data to COBOL.
Therefore, when COBOL programmers encounter Unsigned Packed fields in their data, they have to write special code to convert it to a normal Packed value by inserting the sign nibble at the end, then processing it as a Packed field.

The Unsigned Packed field cannot be declared as a COBOL BINARY (COMP) field because it does not contain a binary value. It contains a Packed value without the sign nibble.
If you took our example data above and defined it as Binary in COBOL . . . 'PIC 9(8) COMP', the X'20220425' value is now treated as a Binary value, which is 539,100,197.

Adding support for Unsigned Packed fields would be pretty simple in Cobrix. You could add a "Unsigned Packed" flag to the 'decodeBCDIntegralNumber' function that handles Packed (COMP-3) values, and just leave out the sign nibble if it's the Unsigned Packed format.
You could add a Cobrix special parm, like COMP-UP, (similar to what you did for COMP-9).
Then, users could code this in their COBOL copybook for the Unsigned Packed field:
ACCT-DATE PIC X(4) COMP-UP.

Please let me know if you'd like to chat more about this. Thank you very much.

Hi @mike-childs,

Makes sense. I might ask a couple of more questions as we go.

The first one,

When you have ACCT-DATE PIC X(4). in unsigned packed format, does this mean that the maximum number of digits of the packed number is 4, or it means the field occupies 4 bytes, so it can have 8 digits?

I see the answer to the question in your description. Sorry.
I think adding a special USAGE like COMP-UP would indeed be the best way to do.
Or maybe COMP-3U (since is is like COMP-3, just without the sign nibble).

Hi @yruslan,
Yes, COMP-3U would also be excellent. And yes, the 'X(4)' length refers to 4-bytes in memory (8 digits). And please do feel free to ask questions. I have experience with this topic.
Thank you very much for accepting this request. It will be extremely helpful for us, (and others).

Great, thanks for the answer and for such a detailed description!

Will implement it soon.

One more question. Would it be okay if PIC required for packed numerics to be

PIC 9(4) COMP-3U.

not

PIC X(4) COMP-3U.

?
This is because the parser relies heavily on numeric data types usage of '9' in PIC.

Yes, requiring the '9' (as in 'PIC 9(4) COMP-3U') makes perfect sense because the field should contain only numeric data. The field would have all the same rules as a normal Packed field, other than the lack of a sign nibble.
Thanks you.

This is added. You can try building spark-cobol from master. Let me know if it works as expected.

Thank you very much @yruslan! We have a story in our backlog to pull in the latest Cobrix version and do thorough testing with the new COMP-3U type parm. I will add an update here once we have done that work. We really appreciate you adding this functionality.

Hello @yruslan. We have finished our testing with the new COMP-3U parm, and it correctly converted the Unsigned-Packed fields. I have attached a screen shot showing my input and output and test results. Please let me nkow if you need any further information. Thank you very much.

CobrixTestResults

Hi @mike-childs , Thanks a lot for confirming! Glad it works as expected.

commented

@yruslan I am getting the below error when updating the copybook to COMP-3U.

za.co.absa.cobrix.cobol.parser.exceptions.SyntaxErrorException: Syntax error in the copybook at line 28: Invalid input 'COMP-3U' at position 28:64

Use spark-cobol 2.6.4.
If you are already using the latest Cobrix, let me know how your copybook statement looks like for that field.

commented

@yruslan I have upgraded spark-cobol 2.6.4 and getting this error:

java.lang.NoClassDefFoundError: scala/$less$colon$less

here is the command:

class_poc_df = spark.read.format("cobol")
.option("copybook",class_copybook)
.option("record_format", "D")
.option("schema_retention_policy", "collapse_root")
.option("drop_value_fillers", "false")
.load(class_data)

The error suggests that you are using spark-cobol build for a different Scala version from your Spark environment.

Use the artifact that matches your Scala version:

  • spark-cobol_2.11
  • spark-cobol_2.12
  • spark-cobol_2.13

or build the one that matches your environment exactly using 'sbt assembly' (the full command is in README)

Hi @diddyp20 I have faced the similar error copybook at line 28: Invalid input 'COMP-3U' at position 28:64 in the past wrt to my copybooks. check the field alignment in the copybook, It should be aligned properly wrt to data inside the file. Hopefully that should solve the issue.