RevolutionAnalytics / RHadoop

RHadoop

Home Page:https://github.com/RevolutionAnalytics/RHadoop/wiki

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Not able to read HDFS in Map of RMR2

sureshappana opened this issue · comments

Hi,
I am trying to access HDFS file in Map function of RMR. (The file is of type cdf.) I am using the following approach but not able to succeed in it.

Normal approach in R(without using mapreduce):
d <- open.ncdf("file.cdf")

This refers to local file.

Appoach I am trying in RMR:

x=hdfs.file("file.cdf")
d<- open.ncdf(x) #We will write this function call in map function

Error: No file found with the specified name. (I even tried by giving absolute path)

I am replacing local file reference with HDFS reference. (I can't use hdfs.read.text.file because my file is not in text format)

So could any one help me if there is anyway to refer the HDFS file (other than text file)?

(P.S: I can't use form.dfs also in my map because file is of size ~70MB)

Environment:
R Version - 3.2.2
Rmr-2_3.3.1
Cloudera Quickstart VM 5.5.0

Please let me know if any information required.

Thanks

Hi Suresh ,

Do you able to get any solution for the problem u have mentioned ??
Same problem I'm facing...

Thank you,
Ravikiran C K

Hi Suresh ,

check this example, maybe could help you.

#Set up the enviroment#
Sys.setenv(HADOOP_CMD='/usr/bin/hadoop')
Sys.setenv(HADOOP_HOME='/usr/lib/hadoop-0.20-mapreduce')
Sys.setenv(HADOOP_STREAMING='/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.7.1.jar')
library(rJava)
library(rmr2)
library(rhdfs)
hdfs.init()
#Define the arguments 'x' & 'y'#
table<-read.csv('http://archive.ics.uci.edu/ml/machine-learning-databases/00265/CASP.csv', sep=",")
table<-as.numeric(unlist(table))
table<-matrix(table, ncol=10)
X1<-to.dfs(table)

good luck

Regards

Juan