AbsaOSS / cobrix

A COBOL parser and Mainframe/EBCDIC data source for Apache Spark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

copybook meta data for RDBMS

sree018 opened this issue · comments

Background

Currently, copybook metadata comes as spark schema, we need schema as rdbms level

Example [Optional]

'''
01 MASTER-RECORD.
02 RDT-TLF-MTHD-NM PIC X(08).
02 RDT-ADJ-ORGN-TRAN-DT PIC 9(06).
02 FILLER PIC X(03).
02 RDT-ADDL-DATA-GROUP.
05 RDT-ADDL-DATA OCCURS 0 TO 2 TIMES
DEPENDING ON RDT-ADDL-SEGS-NO.
10 RDT-ADDL-SEG-KEY.
15 RDT-ADDL-SEG-KEY-PROD PIC X(02).
15 RDT-ADDL-SEG-KEY-TYPE PIC S9(15)V99 COMP-3.
'''
Current Schema:
root
|-- RDT-TLF-MTHD-NM String
|-- RDT-ADJ-ORGN-TRAN-DT integer
|-- RDT-ADDL-DATA-GROUP
|-- RDT-ADDL-SEG-KEY
|-- RDT-ADDL-SEG-KEY-PROD String
|-- RDT-ADDL-SEG-KEY-TYPE DECIMAL (15,2)

expected out
|-- RDT-TLF-MTHD-NM VARCHAR(08)
|-- RDT-ADJ-ORGN-TRAN-DT integer (06)
|-- RDT-ADDL-DATA-GROUP
|-- RDT-ADDL-SEG-KEY
|-- RDT-ADDL-SEG-KEY-PROD VARCHAR(08)
|-- RDT-ADDL-SEG-KEY-TYPE DECIMAL (15,2)

we are able get parent-level element lengths only before flattening

df.schema.fields(0).metadata.getLong("maxLength")

is there any option to get the expected schema?

Spark does not have varchar() type, nor integer(6) data types, only string and integer, so the expected output you specified is not possible.

However, it could be possible to retain metadata after schema flattening. How do you flat the schema?

SparkUtils.flattenSchema(df,useShortFieldManes=false)

I've tested if retaining the metadata is possible, and it is.

This PR makes SparkUtils.flattenSchema() retain metadata: #635

It is already merged into master. Please, test if you can and let me know if it works for you.

@yruslan

New feature working.

thanks for feature

Awesome! Thanks for letting me know