string to varchar with length

Question

string to varchar with length

anilpanicker opened this issue 2 years ago · comments

Background [Optional]

A clear explanation of the reason for raising the question.
This gives us a better understanding of your use cases and how we might accommodate them.

Question

we want to write the dataframe to SQL server, the dataframe has string datatype where we want to change the type to varchar with correct length. Is there a way to get fieldName, dataType and length from the copyBook?

Ruslan Yushchenko · Answer 1 · Fri Sep 23 2022 21:22:09 GMT+0800 (China Standard Time)

Hi, you can get lengths and other parameters from an AST generated by parsing a copybook using CopybookParser.parseSimple(copyBookContents).

Example: https://github.com/AbsaOSS/cobrix#spark-sql-schema-extraction

When invoking parseSimple() you get an AST that you can traverse and read field lengths and other field properties.

Anil Ramapanicker · Answer 2 · Sat Sep 24 2022 20:10:06 GMT+0800 (China Standard Time)

ok, thanks let me try

Ruslan Yushchenko · Answer 3 · Mon Sep 26 2022 15:10:29 GMT+0800 (China Standard Time)

I'm also thinking of adding a metadata field to the generated Spark schema that will contain maximum lengths of string fields, so converting this question to a feature request.

Anil Ramapanicker · Answer 4 · Mon Sep 26 2022 19:47:58 GMT+0800 (China Standard Time)

Thanks, Ruslan, the same idea came to my mind as well. Our use case is to load the data to RDBMS, currently, all strings default to max length (nvarchar). If we have lengths available we can add an option like this:
df.write.format("JDBC").option("createTableColumnTypes","ProductID Int, ProductName nvarchar(100) )

Ruslan Yushchenko · Answer 5 · Mon Oct 10 2022 15:18:37 GMT+0800 (China Standard Time)

The new metadata field ('maxLength') for each Spark schema column is now available in the 'master' branch.
Here are details on this: https://github.com/AbsaOSS/cobrix#spark-schema-metadata
You can try it out by cloning master and building from source, or you can wait for the release of Cobrix 2.6.0, which should be soon.

Anil Ramapanicker · Answer 6 · Mon Oct 10 2022 18:09:03 GMT+0800 (China Standard Time)

Thanks for the quick turnaround. Will check it out.

…

On Mon, Oct 10, 2022 at 3:18 AM Ruslan Yushchenko ***@***.***> wrote: The new metadata field ('maxLength') for each Spark schema column is now available in the 'master' branch. Here are details on this: https://github.com/AbsaOSS/cobrix#spark-schema-metadata You can try it out by cloning master and building from source, or you can wait for the release of Cobrix 2.6.0, which should be soon. — Reply to this email directly, view it on GitHub <#517 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA7NTXCPBAJNPFB6LMFTJUDWCO7NRANCNFSM6AAAAAAQTQ4HAI> . You are receiving this because you authored the thread.Message ID: ***@***.***>

-- Anil Ramapanicker 148 Stony Brook Road Fishkill, New York 12524 Home: +1 845-440-6496 Cell: +1 914-826-7646 Great Algorithms are poetry of computation!!! - Francis Sullivan

Anil Ramapanicker · Answer 7 · Sat Oct 15 2022 07:33:50 GMT+0800 (China Standard Time)

Hi Ruslan, Another question: we have a data file with x length (x > 90), but I want to parse only the first 90 bytes, is it possible with the current approach? I tried with record_length option but it did not work. Let me please know your thoughts. On Mon, Oct 10, 2022 at 6:08 AM Anil Ramapanicker ***@***.***> wrote:

…

Thanks for the quick turnaround. Will check it out. On Mon, Oct 10, 2022 at 3:18 AM Ruslan Yushchenko < ***@***.***> wrote: > The new metadata field ('maxLength') for each Spark schema column is now > available in the 'master' branch. > Here are details on this: > https://github.com/AbsaOSS/cobrix#spark-schema-metadata > You can try it out by cloning master and building from source, or you can > wait for the release of Cobrix 2.6.0, which should be soon. > > — > Reply to this email directly, view it on GitHub > <#517 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AA7NTXCPBAJNPFB6LMFTJUDWCO7NRANCNFSM6AAAAAAQTQ4HAI> > . > You are receiving this because you authored the thread.Message ID: > ***@***.***> > -- Anil Ramapanicker 148 Stony Brook Road Fishkill, New York 12524 Home: +1 845-440-6496 Cell: +1 914-826-7646 Great Algorithms are poetry of computation!!! - Francis Sullivan

-- Anil Ramapanicker 148 Stony Brook Road Fishkill, New York 12524 Home: +1 845-440-6496 Cell: +1 914-826-7646 Great Algorithms are poetry of computation!!! - Francis Sullivan