AbsaOSS / cobrix

A COBOL parser and Mainframe/EBCDIC data source for Apache Spark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

string to varchar with length

anilpanicker opened this issue · comments

Background [Optional]

A clear explanation of the reason for raising the question.
This gives us a better understanding of your use cases and how we might accommodate them.

Question

we want to write the dataframe to SQL server, the dataframe has string datatype where we want to change the type to varchar with correct length. Is there a way to get fieldName, dataType and length from the copyBook?

Hi, you can get lengths and other parameters from an AST generated by parsing a copybook using CopybookParser.parseSimple(copyBookContents).

Example: https://github.com/AbsaOSS/cobrix#spark-sql-schema-extraction

When invoking parseSimple() you get an AST that you can traverse and read field lengths and other field properties.

ok, thanks let me try

I'm also thinking of adding a metadata field to the generated Spark schema that will contain maximum lengths of string fields, so converting this question to a feature request.

Thanks, Ruslan, the same idea came to my mind as well. Our use case is to load the data to RDBMS, currently, all strings default to max length (nvarchar). If we have lengths available we can add an option like this:
df.write.format("JDBC").option("createTableColumnTypes","ProductID Int, ProductName nvarchar(100) )

The new metadata field ('maxLength') for each Spark schema column is now available in the 'master' branch.
Here are details on this: https://github.com/AbsaOSS/cobrix#spark-schema-metadata
You can try it out by cloning master and building from source, or you can wait for the release of Cobrix 2.6.0, which should be soon.