AbsaOSS / cobrix

A COBOL parser and Mainframe/EBCDIC data source for Apache Spark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add support for parsing copybooks given Spark options

yruslan opened this issue · comments

Background

Sometime we want to use RDDs and Spark schemas separately for processing input files. In this case it is important to generate Spark schema that matches the record schema exactly. But the parser accepts its own set of options, and Spark reader for the 'cobol' format accepts options via '.option()'. It would be useful for the copybook parser to also be able to parse via options get from a Map[String. String], with the same semantics as the Spark cobol format reader.

Feature

Add support for parsing copybooks given Spark options.

Example

val sparkOptions = Map("generate_record_id" -> "true")
val cobolSchema = CobolSchema.fromSparkOptions(sparkOptions)
val sparkSchema = cobolSchema.getSparkSchema

Proposed Solution

As per example