copybook meta data for RDBMS
sree018 opened this issue · comments
Background
Currently, copybook metadata comes as spark schema, we need schema as rdbms level
Example [Optional]
'''
01 MASTER-RECORD.
02 RDT-TLF-MTHD-NM PIC X(08).
02 RDT-ADJ-ORGN-TRAN-DT PIC 9(06).
02 FILLER PIC X(03).
02 RDT-ADDL-DATA-GROUP.
05 RDT-ADDL-DATA OCCURS 0 TO 2 TIMES
DEPENDING ON RDT-ADDL-SEGS-NO.
10 RDT-ADDL-SEG-KEY.
15 RDT-ADDL-SEG-KEY-PROD PIC X(02).
15 RDT-ADDL-SEG-KEY-TYPE PIC S9(15)V99 COMP-3.
'''
Current Schema:
root
|-- RDT-TLF-MTHD-NM String
|-- RDT-ADJ-ORGN-TRAN-DT integer
|-- RDT-ADDL-DATA-GROUP
|-- RDT-ADDL-SEG-KEY
|-- RDT-ADDL-SEG-KEY-PROD String
|-- RDT-ADDL-SEG-KEY-TYPE DECIMAL (15,2)
expected out
|-- RDT-TLF-MTHD-NM VARCHAR(08)
|-- RDT-ADJ-ORGN-TRAN-DT integer (06)
|-- RDT-ADDL-DATA-GROUP
|-- RDT-ADDL-SEG-KEY
|-- RDT-ADDL-SEG-KEY-PROD VARCHAR(08)
|-- RDT-ADDL-SEG-KEY-TYPE DECIMAL (15,2)
we are able get parent-level element lengths only before flattening
df.schema.fields(0).metadata.getLong("maxLength")
is there any option to get the expected schema?
Spark does not have varchar()
type, nor integer(6)
data types, only string
and integer
, so the expected output you specified is not possible.
However, it could be possible to retain metadata after schema flattening. How do you flat the schema?
SparkUtils.flattenSchema(df,useShortFieldManes=false)
I've tested if retaining the metadata is possible, and it is.
This PR makes SparkUtils.flattenSchema() retain metadata: #635
It is already merged into master
. Please, test if you can and let me know if it works for you.
Awesome! Thanks for letting me know