RevolutionAnalytics / RHadoop

RHadoop

Home Page:https://github.com/RevolutionAnalytics/RHadoop/wiki

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

hadoop streaming failed with error code 5

JohnnyxB opened this issue · comments

I have created a multi-node hadoop cluster using my two laptops and have successfully tested it.
After that I have installed RHadoop upon the hadoop environment. All the necessary packages are installed and path variables are set.

Then, trying to run a wordcount example as follows:

map <- function(k,lines) {
   words.list <- strsplit(lines, "\\s")
   words <- unlist(words.list)
   return(keyval(words, 1))
}

reduce <- function(word, counts) {
 keyval(word, sum(counts))
}

wordcount <- function(input, output = NULL) {
   mapreduce(input = input, output = output, input.format = "text", map = map, reduce = reduce)
}

hdfs.root <- "wordcount"
hdfs.data <- file.path(hdfs.root, "data")
hdfs.out <- file.path(hdfs.root, "out")
out <- wordcount(hdfs.data, hdfs.out)

I get the following error:

15/05/24 21:09:20 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
15/05/24 21:09:20 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
15/05/24 21:09:20 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with     processName=JobTracker, sessionId= - already initialized
15/05/24 21:09:21 INFO mapreduce.JobSubmitter: Cleaning up the staging area     file:/app/hadoop/tmp/mapred/staging/master91618435/.staging/job_local91618435_0001
15/05/24 21:09:21 ERROR streaming.StreamJob: Error Launching job : No such file or directory
Streaming Command Failed!
Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce,  : 
  hadoop streaming failed with error code 5
Called from: mapreduce(input = input, output = output, input.format = "text", 
    map = map, reduce = reduce)

Prior to running this I have created two hdfs folders wordcount/data and wordcount/out and uploaded some text to the first using comman line.

A further issue is: I have two users on my computer: hduser and master. The first is created for the hadoop installation. I suppose that when I open R/RStudio I run it as master, and because hadoop is created for hduser there are some permission issues which lead to this error. As one can read on the 4. line of the output the system tries to find master91618435, which, as I suspect, should be hduser....

My question is, how can I get rid of this error?