jonmarin / spark-xml-utils

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

spark-xml-utils

This site offers some background information on how to utilize the capabilities provided by the spark-xml-utils library within an Apache Spark application. Some java examples with using Apache Spark are provided. The focus at this point has not been on performance but just showing how things would work. As time permits, we plan to optimize the implementation.

The javadoc is also available for spark-xml-utils and could be helpful with understanding the class interactions.

Motivation

The spark-xml-utils library was developed because there is a large amount of XML in our big datasets and I felt this data could be better served by providing some helpful xml utilities. This includes the ability to filter documents based on an xpath expression, return specific nodes for an xpath/xquery expression, or transform documents using a xslt stylesheet. By providing some basic wrappers to Saxon, the spark-xml-utils library exposes some basic XPath, XSLT, and XQuery functionality that can readily be leveraged by any Spark application.

About

License:Apache License 2.0


Languages

Language:Java 100.0%