Debug functionality of EBCDIC data

Question

Debug functionality of EBCDIC data

bprasen opened this issue 5 years ago · comments

Hi,
Here I am not mentioning any issue but a functionality on debug purpose mainly. While Cobrix create dataframe it decodes the EBCDIC data according to the datatype of the given primitive field, Here if it is possible also to show the hex value of the column as well for a given option like add_hex = true. This functionality is only for debug purpose to check the data.

Ruslan Yushchenko · Answer 1 · Mon Apr 22 2019 21:53:42 GMT+0800 (China Standard Time)

Interesting idea. This could be helpful to diagnose decoding issues.
Although we cannot make Spark show the original bytes in hex, but we can add additional fields to the output dataframe. For instance, if a schema has ID, FIRST-NAME and LAST-NAME and if the debug option is turned on, the schema will contain additional ID_DEBUG, FIRST-NAME_DEBUG and LAST-NAME_DEBUG fields containing HEX values of the original data before decoding.

Please, clarify a couple of things about your use case:

Do you want to debug a particular column or all columns in the schema?
The HEX values should correspond to the original data before conversion to ASCII/Unicode, right?

bprasen · Answer 2 · Tue Apr 23 2019 16:37:39 GMT+0800 (China Standard Time)

I was thinking about all the columns and yes the HEX values are original data before conversion. Truly speaking, I was also trying to modify the source code to have that functionality for FixedLengthNested option only right now. I can share the code with you if you want, may be that requires some standardisation. Thanks for your interest, please let me know your email so that I can send these codes for your review.

Ruslan Yushchenko · Answer 3 · Wed Apr 24 2019 23:55:18 GMT+0800 (China Standard Time)

Great, thanks for the answers! I think this is a helpful feature and we are going to implement it.
You can send your code as a pull request, but it is not necessary. The feature seems pretty straightforward.

Ruslan Yushchenko · Answer 4 · Mon Mar 23 2020 16:22:47 GMT+0800 (China Standard Time)

🎉 @bprasen, finally this very helpful feature is implemented and it is a part of 2.0.5 released today.