AtomGraph / JSON2RDF

Streaming generic JSON to RDF converter

Home Page:https://hub.docker.com/r/atomgraph/json2rdf

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Adding a Apache Spark UDF?

mielvds opened this issue · comments

Hi! Would it make sense to have a small addition that makes the library usable in Apache Spark? Something along the lines of

package com.atomgraph.etl.json;
import org.apache.spark.sql.api.java.UDF1;
import org.apache.jena.rdf.model.Model;
public class Json2rdfUDF implements UDF1<String, String> {
private static final long serialVersionUID = 1L;
@Override
  public StreamRDF call(String jsonString) throws Exception {

       InputStream bis = new ByteArrayInputStream(jsonString.getBytes());
       Reader reader =  new BufferedReader(bis);

       StreamRDF rdfStream = new CollectorStreamRDF();
       new JsonStreamRDFWriter(reader, rdfStream, baseURI.toString()).convert();
       
       return rdfStream;
   }
}

Is it the serialVersionUID that does this? If it doesn't change the rest of the logic then fine.

Will you make a PR?

@mielvds ping. What is required here?