eed3si9n / scalaxb

scalaxb is an XML data binding tool for Scala.

Home Page:http://scalaxb.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

case classes generated by scalaxb cannot be serialized to Parquet

opened this issue · comments

case class Type(code: Option[String] = None,
  description: Option[String] = None,
  any: Seq[scalaxb.DataRecord[Any]] = Nil)
case class Types(rdc_type: Seq[Type] = Nil,
  attributes: Map[String, scalaxb.DataRecord[Any]] = Map.empty) {
  lazy val typeValue = attributes("@type").as[String]
}
case class RID(
                           rdc_types: Seq[Types] = Nil,
                           any: Seq[scalaxb.DataRecord[Any]] = Nil)

When I try to write a parquet file or Dataframe I am getting issue for DataRecord[Any]. How should I resolve the issue?

@ag4s

When I try to write a parquet file or Dataframe I am getting issue for DataRecord[Any]. How should I resolve the issue?

Could you provide more details please? - https://scalaxb.org/issue-reporting-guideline
From what you posted above, I can't tell what the actual problem is.

By using scalaxb the above case class is generated and it has DataRecord[Any]. Then using this case class I am reading the XML file using fromXML and saving it to val. They I am trying to save it to parquet format using spark or any other tool and that's where the problem happen. When spark try to read it, it cannot recognize the DataRecord[Any] and that is the problem. So like to save it into parquet or dataframe. Case classes generated can read XML my problem is how it is saved to parquet format if it needs to be saved (specially handling DataRecord[Any]). If you have any example of reading DataRecord[Any] for creating parquet would be great.

Could you copy-paste the actual error message that you see during runtime? Is it missing Jackson databinding?

ParquetWriter.writeAndClose(path, val)
:15: error: could not find implicit value for parameter writerFactory: com.github.mjakubowski84.parquet4s.ParquetWriter.ParquetWriterFactory[RID]
Error occurred in an application involving default arguments.
ParquetWriter.writeAndClose(path, val)

For val .... = scalaxb.fromXML [RID] (pathofXML)
This is by using ParquetWriter and also tried I tried to createDataFrame from SQLContext to create but no success. Unfortunately no further details given in an error

According to the readme, this is how you can write a codec?

import com.github.mjakubowski84.parquet4s.{OptionalValueCodec, Value}

implicit def datarecordDummyCodec[A]: OptionalValueCodec[DataRecord[A]] = 
  new OptionalValueCodec[DataRecord[A]] {
    override protected def decodeNonNull(value: Value, configuration: ValueCodecConfiguration): DataRecord[A] = ???
    override protected def encodeNonNull(data: CustomType, configuration: ValueCodecConfiguration): Value = ???
  }
implicit val scalaxbCodec: OptionalValueCodec[DataRecord[Any]] = new OptionalValueCodec[DataRecord[Any] {
override protected def decodeNonNull(value: Value, configuration: ValueCodecConfiguration): DataRecord[Any] = ???
override protected def encodeNonNull(data: DataRecord[Any], configuration: ValueCodecConfiguration): Value = {data match {
case DataRecord(uri, key, Some(value: Int)) => implicitly[ValueCodec[Int]].encode(value, configuration)}}}

Wrote the above code but getting the same issue. Not sure if further more things are require to complete