Does the Cobrix handle the Easytrieve layout.?

Question

Does the Cobrix handle the Easytrieve layout.?

AnveshAeturi opened this issue 2 years ago · comments

Background [Optional]

I am having the Easytrive layout which is having the Packed unsigned fields (data-type U in Easytrieve), binary unsigned fields (data-type B in Easytrieve) and Alpha-numeric fields (data-type A in Easytrieve and storing Hexbit). The Data file that we are trying to convert is EBCDIC data.

Question

Is there a way we can convert this data thru Cobrix by providing the above mentioned Easytrieve layout? @yruslan

Ruslan Yushchenko · Answer 1 · Mon Sep 12 2022 21:30:19 GMT+0800 (China Standard Time)

Hi, could you attach an example copybook and a link to the documentation for the data type, please?

AnveshA · Answer 2 · Wed Sep 14 2022 15:59:58 GMT+0800 (China Standard Time)

The cobol copybook says X (2) but however the data itself is coming from an Easytrieve with a data type of U (Packed Unsigned).
Example is VARIABLE PIC X (2). The data stored is actually an unsigned packed field (definition of U in Easytrieve)

Data Type Link: https://www.mvsforums.com/manuals/EZT_PL_APP_63_MASTER.pdf
Page#35 - Library 2-11 is the footer on the page

AnveshA · Answer 3 · Mon Sep 19 2022 16:02:09 GMT+0800 (China Standard Time)

Easyterieve_Layout_sample.xlsx

Hi @yruslan , This is the Excel which we created from the Easytrieve layout. only sample fields are added here.

Ruslan Yushchenko · Answer 4 · Thu Sep 22 2022 22:16:25 GMT+0800 (China Standard Time)

I see. The data types look parsable at first glance. The only thing you need a proper copybook that matches the data in order to parse records like that. And for that you would need a mapping between Easyretrieve data types and Cobol data types.
For instance an Easyretrieve type U with length 4 can have a PIC 9(4) (or PIC 9(9) if the encoding is binary)

Do I understand it correctly that the fields specified in the Excel file are not all fields of the record? Field 'CRSCON' with length 1 at offset 10 is followed by CRADTR at offset 20. It means there are other fields between CRSCON and CRADTR that fill the rest 9 bytes.

mike-childs · Answer 5 · Thu Dec 08 2022 06:33:17 GMT+0800 (China Standard Time)

Hello. I am adding a comment because I also need to request this same support for Unsigned Packed fields in the mainframe records.
Here is what is meant by "Unsigned Packed" :
An Easytrieve U (Unsigned Packed) field is the same as a normal Packed field, but without the sign-nibble on the end.

For example, let's say we have an account date value of '20220425'.
As a Packed number, that field would be defined in COBOL like this:
ACCT-DATE PIC 9(8) COMP-3.
. . .and in memory, that field would contain this:
X'020220425F'

As a U (Unsigned Packed) number, that field would be defined in COBOL like this:
ACCT-DATE PIC X(4).
. . .and in memory, that field would contain this:
X'20220425'

Unsigned Packed (U) fields must be defined in COBOL as PIC X fields because COBOL does not support the Unsigned Packed format.
It is invalid data to COBOL.
Therefore, when COBOL programmers encounter Unsigned Packed fields in their data, they have to write special code to convert it to a normal Packed value by inserting the sign nibble at the end, then processing it as a Packed field.

The Unsigned Packed field cannot be declared as a COBOL BINARY (COMP) field because it does not contain a binary value. It contains a Packed value without the sign nibble.
If you took our example data above and defined it as Binary in COBOL . . . 'PIC 9(8) COMP', the X'20220425' value is now treated as a Binary value, which is 539,100,197.

Adding support for Unsigned Packed fields would be pretty simple in Cobrix. You could add a "Unsigned Packed" flag to the 'decodeBCDIntegralNumber' function that handles Packed (COMP-3) values, and just leave out the sign nibble if it's the Unsigned Packed format.
You could add a Cobrix special parm, like COMP-UP, (similar to what you did for COMP-9).
Then, users could code this in their COBOL copybook for the Unsigned Packed field:
ACCT-DATE PIC X(4) COMP-UP.

Please let me know if you'd like to chat more about this. Thank you very much.

Ruslan Yushchenko · Answer 6 · Thu Dec 08 2022 16:31:27 GMT+0800 (China Standard Time)

Hi @mike-childs,

Makes sense. I might ask a couple of more questions as we go.

The first one,

When you have ACCT-DATE PIC X(4). in unsigned packed format, does this mean that the maximum number of digits of the packed number is 4, or it means the field occupies 4 bytes, so it can have 8 digits?

Ruslan Yushchenko · Answer 7 · Thu Dec 08 2022 17:14:31 GMT+0800 (China Standard Time)

I see the answer to the question in your description. Sorry.
I think adding a special USAGE like COMP-UP would indeed be the best way to do.
Or maybe COMP-3U (since is is like COMP-3, just without the sign nibble).

mike-childs · Answer 8 · Thu Dec 08 2022 20:38:54 GMT+0800 (China Standard Time)

Hi @yruslan,
Yes, COMP-3U would also be excellent. And yes, the 'X(4)' length refers to 4-bytes in memory (8 digits). And please do feel free to ask questions. I have experience with this topic.
Thank you very much for accepting this request. It will be extremely helpful for us, (and others).

Ruslan Yushchenko · Answer 9 · Thu Dec 08 2022 21:09:01 GMT+0800 (China Standard Time)

Great, thanks for the answer and for such a detailed description!

Will implement it soon.

Ruslan Yushchenko · Answer 10 · Thu Dec 08 2022 22:14:56 GMT+0800 (China Standard Time)

One more question. Would it be okay if PIC required for packed numerics to be

PIC 9(4) COMP-3U.

not

PIC X(4) COMP-3U.

?
This is because the parser relies heavily on numeric data types usage of '9' in PIC.

mike-childs · Answer 11 · Thu Dec 08 2022 22:26:11 GMT+0800 (China Standard Time)

Yes, requiring the '9' (as in 'PIC 9(4) COMP-3U') makes perfect sense because the field should contain only numeric data. The field would have all the same rules as a normal Packed field, other than the lack of a sign nibble.
Thanks you.

Ruslan Yushchenko · Answer 12 · Fri Dec 16 2022 00:21:04 GMT+0800 (China Standard Time)

This is added. You can try building spark-cobol from master. Let me know if it works as expected.

mike-childs · Answer 13 · Mon Dec 19 2022 21:48:09 GMT+0800 (China Standard Time)

Thank you very much @yruslan! We have a story in our backlog to pull in the latest Cobrix version and do thorough testing with the new COMP-3U type parm. I will add an update here once we have done that work. We really appreciate you adding this functionality.

mike-childs · Answer 14 · Thu Jan 05 2023 23:38:04 GMT+0800 (China Standard Time)

Hello @yruslan. We have finished our testing with the new COMP-3U parm, and it correctly converted the Unsigned-Packed fields. I have attached a screen shot showing my input and output and test results. Please let me nkow if you need any further information. Thank you very much.

Ruslan Yushchenko · Answer 15 · Fri Jan 06 2023 00:09:13 GMT+0800 (China Standard Time)

Hi @mike-childs , Thanks a lot for confirming! Glad it works as expected.

Didier · Answer 16 · Wed Mar 22 2023 23:34:28 GMT+0800 (China Standard Time)

@yruslan I am getting the below error when updating the copybook to COMP-3U.

za.co.absa.cobrix.cobol.parser.exceptions.SyntaxErrorException: Syntax error in the copybook at line 28: Invalid input 'COMP-3U' at position 28:64

Ruslan Yushchenko · Answer 17 · Wed Mar 22 2023 23:55:50 GMT+0800 (China Standard Time)

Use spark-cobol 2.6.4.
If you are already using the latest Cobrix, let me know how your copybook statement looks like for that field.

Didier · Answer 18 · Thu Mar 23 2023 01:13:59 GMT+0800 (China Standard Time)

@yruslan I have upgraded spark-cobol 2.6.4 and getting this error:

java.lang.NoClassDefFoundError: scala/$less$colon$less

here is the command:

class_poc_df = spark.read.format("cobol")
.option("copybook",class_copybook)
.option("record_format", "D")
.option("schema_retention_policy", "collapse_root")
.option("drop_value_fillers", "false")
.load(class_data)

Ruslan Yushchenko · Answer 19 · Thu Mar 23 2023 02:25:10 GMT+0800 (China Standard Time)

The error suggests that you are using spark-cobol build for a different Scala version from your Spark environment.

Use the artifact that matches your Scala version:

spark-cobol_2.11
spark-cobol_2.12
spark-cobol_2.13

or build the one that matches your environment exactly using 'sbt assembly' (the full command is in README)

AnveshA · Answer 20 · Thu Mar 23 2023 03:41:29 GMT+0800 (China Standard Time)

Hi @diddyp20 I have faced the similar error copybook at line 28: Invalid input 'COMP-3U' at position 28:64 in the past wrt to my copybooks. check the field alignment in the copybook, It should be aligned properly wrt to data inside the file. Hopefully that should solve the issue.