RevolutionAnalytics / RHadoop

RHadoop

Home Page:https://github.com/RevolutionAnalytics/RHadoop/wiki

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Parsing XML in Mappers

ywen2000ge opened this issue · comments

I need to parse some XML files into structured format. I set the input format of the mapper as make.input.format("text"). I used the following sample code. The XML library was correctly loaded. The issue is that one xml file was split to multiple mappers and the input to each map is no long in valid xml format. I am wondering if there is a way to force one mapper to process an entire file. Typical file size is around 1MB.

Also how do I make the input loaded as one string? Now the file is loaded to be a vector with each element being a row of the file. That's why I had to collapse it before parsing. Any better way?

mapper <- function(.,value)
{
value <- paste0(value,collapse=''")
doc <- xmlTreeParse(value)
...
}

Thank you!

Hi,
please file your request in the rmr2 issue tracker. Thanks

Antonio

On Wed, May 7, 2014 at 5:55 AM, ywen2000ge notifications@github.com wrote:

I need to parse some XML files into structured format. I set the input
format of the mapper as make.input.format("text"). I used the following
sample code. The XML library was correctly loaded. The issue is that one
xml file was split to multiple mappers and the input to each map is no long
in valid xml format. I am wondering if there is a way to force one mapper
to process an entire file. Typical file size is around 1MB.

Also how do I make the input loaded as one string? Now the file is loaded
to be a vector with each element being a row of the file. That's why I had
to collapse it before parsing. Any better way?

mapper <- function(.,value)
{
value <- paste0(value,collapse=''")
doc <- xmlTreeParse(value)
...
}

Thank you!


Reply to this email directly or view it on GitHubhttps://github.com//issues/206
.