RevolutionAnalytics / RHadoop

RHadoop

Home Page:https://github.com/RevolutionAnalytics/RHadoop/wiki

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

R-rmr2 PipeMapRed.waitOutputThreads(): subprocess failed with code 2

naveenkumar87 opened this issue · comments

I am running a rmr2 example from here, this is the code i tried :

Sys.setenv(HADOOP_HOME="/home/istvan/hadoop")
Sys.setenv(HADOOP_CMD="/home/istvan/hadoop/bin/hadoop")

library(rmr2)
library(rhdfs)

ints = to.dfs(1:100)
calc = mapreduce(input = ints,
                   map = function(k, v) cbind(v, 2*v))

I am using hadoop-streaming-1.1.1.jar, after calling mapreduce function job starts and it fails with exception :

        2013-12-17 14:03:10,260 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
    2013-12-17 14:03:11,034 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /app/cloudera/mapred/local/taskTracker/nkumar/jobcache/job_201312170101_0004/jars/job.jar <- /app/cloudera/mapred/local/taskTracker/nkumar/jobcache/job_201312170101_0004/attempt_201312170101_0004_m_000001_1/work/job.jar
    2013-12-17 14:03:11,038 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /app/cloudera/mapred/local/taskTracker/nkumar/jobcache/job_201312170101_0004/jars/.job.jar.crc <- /app/cloudera/mapred/local/taskTracker/nkumar/jobcache/job_201312170101_0004/attempt_201312170101_0004_m_000001_1/work/.job.jar.crc
    2013-12-17 14:03:11,123 WARN org.apache.hadoop.conf.Configuration: session.id is deprecated. Instead, use dfs.metrics.session-id
    2013-12-17 14:03:11,124 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=
    2013-12-17 14:03:11,700 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0
    2013-12-17 14:03:11,707 INFO org.apache.hadoop.mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1164b9b6
    2013-12-17 14:03:12,115 INFO org.apache.hadoop.mapred.MapTask: Processing split: hdfs://host:8020/user/nkumar/hadoop:2510+2510
    2013-12-17 14:03:12,160 WARN mapreduce.Counters: Counter name MAP_INPUT_BYTES is deprecated. Use FileInputFormatCounters as group name and  BYTES_READ as counter name instead
    2013-12-17 14:03:12,166 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
    2013-12-17 14:03:12,321 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed exec [/usr/bin/Rscript, ./rmr-streaming-map500f5e28b244]
    2013-12-17 14:03:12,384 INFO org.apache.hadoop.streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
    2013-12-17 14:03:12,385 INFO org.apache.hadoop.streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
    2013-12-17 14:03:12,403 INFO org.apache.hadoop.streaming.PipeMapRed: MRErrorThread done
    2013-12-17 14:03:12,403 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed failed!
    2013-12-17 14:03:14,348 WARN org.apache.hadoop.streaming.PipeMapRed: java.io.EOFException
        at java.io.DataInputStream.readFully(DataInputStream.java:180)
        at org.apache.hadoop.typedbytes.TypedBytesInput.readRawBytes(TypedBytesInput.java:218)
        at org.apache.hadoop.typedbytes.TypedBytesInput.readRaw(TypedBytesInput.java:152)
        at org.apache.hadoop.streaming.io.TypedBytesOutputReader.readKeyValue(TypedBytesOutputReader.java:51)
        at org.apache.hadoop.streaming.PipeMapRed$MROutputThread.run(PipeMapRed.java:418)

    2013-12-17 14:03:14,376 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
    2013-12-17 14:03:14,381 WARN org.apache.hadoop.mapred.Child: Error running child
    java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2
        at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
        at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576)
        at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:135)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
        at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
        at org.apache.hadoop.mapred.Child.main(Child.java:262)
    2013-12-17 14:03:14,403 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task

its creating a sequence file in /tmp directory on hdfs. Any suggestions to fix it thanks.

Edit :

Found this answer http://stackoverflow.com/questions/4460522/hadoop-streaming-job-failed-error-in-python so i also tried executing r script with these 2 lines at the top :

#!/usr/bin/Rscript
#!/usr/bin/env Rscript