RevolutionAnalytics / RHadoop

RHadoop

Home Page:https://github.com/RevolutionAnalytics/RHadoop/wiki

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

output is in sequence format

RajkumarB opened this issue · comments

After installing Rhadoop using rmr2, rhdfs as suggested, I ran small example as follows.

small.ints = to.dfs (1:10)
Warning: $HADOOP_HOME is deprecated.
14/09/10 08:16:17 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/09/10 08:16:17 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
14/09/10 08:16:17 INFO compress.CodecPool: Got brand-new compressor

mapreduce (input = small.ints, map = function (k, v) cbind (v, v ^ 2))
Warning: $HADOOP_HOME is deprecated.
packageJobJar: [/home/mlcoeadmin/rajkumar/hadoop-1.2.1/temp/hadoop-unjar6616713832785091129/] [] /tmp/streamjob8123934586004586291.jar tmpDir=null
14/09/10 08:16:35 INFO mapred.FileInputFormat: Total input paths to process : 1
14/09/10 08:16:36 INFO streaming.StreamJob: getLocalDirs(): [/home/mlcoeadmin/rajkumar/hadoop-1.2.1/temp/mapred/local]
14/09/10 08:16:36 INFO streaming.StreamJob: Running job: job_201409100815_0001
14/09/10 08:16:36 INFO streaming.StreamJob: To kill this job, run:
14/09/10 08:16:36 INFO streaming.StreamJob: /home/mlcoeadmin/rajkumar/hadoop-1.2.1/libexec/../bin/hadoop job -Dmapred.job.tracker=localhost:54311 -kill job_201409100815_0001
14/09/10 08:16:36 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201409100815_0001
14/09/10 08:16:37 INFO streaming.StreamJob: map 0% reduce 0%
14/09/10 08:16:50 INFO streaming.StreamJob: map 100% reduce 0%
14/09/10 08:16:55 INFO streaming.StreamJob: map 100% reduce 100%
14/09/10 08:16:55 INFO streaming.StreamJob: Job complete: job_201409100815_0001
14/09/10 08:16:55 INFO streaming.StreamJob: Output: /tmp/file91f7466a911
function ()
{
fname
}

It ran nicely. When I checked in the output folder, it contains files as...

hdfs.ls("/tmp/file91f7466a911")
permission owner group size modtime
1 -rw-r--r-- mlcoeadmin supergroup 0 2014-09-10 08:16
2 drwxr-xr-x mlcoeadmin supergroup 0 2014-09-10 08:16
3 -rw-r--r-- mlcoeadmin supergroup 122 2014-09-10 08:16
4 -rw-r--r-- mlcoeadmin supergroup 797 2014-09-10 08:16
file
1 /tmp/file91f7466a911/_SUCCESS
2 /tmp/file91f7466a911/_logs
3 /tmp/file91f7466a911/part-00000
4 /tmp/file91f7466a911/part-00001
But the output files are in sequence format. It looks as,

hdfs.cat("/tmp/file91f7466a911/part-00000")
[1] "SEQ\006/org.apache.hadoop.typedbytes.TypedBytesWritable/org.apache.hadoop.typedbytes.TypedBytesWritable\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\025\024�~\021X5#>٠\023���"
hdfs.cat("/tmp/file91f7466a911/part-00001")
[1] "SEQ\006/org.apache.hadoop.typedbytes.TypedBytesWritable/org.apache.hadoop.typedbytes.TypedBytesWritable\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80>1�d\u0092�:�Cx����\024\xc0\x80\xc0\x80\xc0\x80�\xc0\x80\xc0\x80\xc0\x80\t\xc0\x80\xc0\x80\xc0\x80\005�\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80��\xc0\x80\xc0\x80\xc0\x80�\b\xc0\x80\xc0\x80\xc0\x80\002�\xc0\x80\xc0\x80\xc0\x80\xc0\x80�\xc0\x80\xc0\x80\xc0\x80U�\xc0\x80\xc0\x80\xc0\x80\001\006�\xc0\x80\xc0\x80\xc0\x80\031\xc0\x80\xc0\x80\xc0\x80\002\a\xc0\x80\xc0\x80\xc0\x80\003dim\a\xc0\x80\xc0\x80\xc0\x80\bdimnames\b\xc0\x80\xc0\x80\xc0\x80\002�\xc0\x80\xc0\x80\xc0\x80\t\003\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\002\b\xc0\x80\xc0\x80\xc0\x80\002�\xc0\x80\xc0\x80\xc0\x80\xc0\x80�\xc0\x80\xc0\x80\xc0\x80\017\xc0\x80\xc0\x80\xc0\x80\002\a\xc0\x80\xc0\x80\xc0\x80\001v\a\xc0\x80\xc0\x80\xc0\x80\xc0\x80�\xc0\x80\xc0\x80\xc0\x80\037\xc0\x80\xc0\x80\xc0\x80\002\a\xc0\x80\xc0\x80\xc0\x80\005names\a\xc0\x80\xc0\x80\xc0\x80\frmr.template\b\xc0\x80\xc0\x80\xc0\x80\002�\xc0\x80\xc0\x80\xc0\x80\024\xc0\x80\xc0\x80\xc0\x80\002\a\xc0\x80\xc0\x80\xc0\x80\003key\a\xc0\x80\xc0\x80\xc0\x80\003val�\xc0\x80\xc0\x80\xc0\x80\005�\xc0\x80\xc0\x80\xc0\x80\001\xc0\x80\xc0\x80\001\a\xc0\x80\xc0\x80\xc0\x80\t\xc0\x80\xc0\x80\xc0\x80\005�\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80�\xc0\x80\xc0\x80\xc0\x80��\xc0\x80\xc0\x80\xc0\x80�\006?�\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@\b\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@\020\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@\024\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@\030\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@\034\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@ \xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@"\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@$\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80?�\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@\020\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@"\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@0\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@9\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@B\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@H�\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@P\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@T@\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80@Y\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80�\xc0\x80\xc0\x80\xc0\x80\031\xc0\x80\xc0\x80\xc0\x80\002\a\xc0\x80\xc0\x80\xc0\x80\003dim\a\xc0\x80\xc0\x80\xc0\x80\bdimnames\b\xc0\x80\xc0\x80\xc0\x80\002�\xc0\x80\xc0\x80\xc0\x80\t\003\xc0\x80\xc0\x80\xc0\x80"
[2] "\xc0\x80\xc0\x80\xc0\x80\002\b\xc0\x80\xc0\x80\xc0\x80\002�\xc0\x80\xc0\x80\xc0\x80\xc0\x80�\xc0\x80\xc0\x80\xc0\x80\017\xc0\x80\xc0\x80\xc0\x80\002\a\xc0\x80\xc0\x80\xc0\x80\001v\a\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80�\xc0\x80\xc0\x80\xc0\x80\t\xc0\x80\xc0\x80\xc0\x80\005�\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80��\xc0\x80\xc0\x80\xc0\x80�\b\xc0\x80\xc0\x80\xc0\x80\002�\xc0\x80\xc0\x80\xc0\x80\xc0\x80�\xc0\x80\xc0\x80\xc0\x80U�\xc0\x80\xc0\x80\xc0\x80\001\006�\xc0\x80\xc0\x80\xc0\x80\031\xc0\x80\xc0\x80\xc0\x80\002\a\xc0\x80\xc0\x80\xc0\x80\003dim\a\xc0\x80\xc0\x80\xc0\x80\bdimnames\b\xc0\x80\xc0\x80\xc0\x80\002�\xc0\x80\xc0\x80\xc0\x80\t\003\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\xc0\x80\002\b\xc0\x80\xc0\x80\xc0\x80\002�\xc0\x80\xc0\x80\xc0\x80\xc0\x80�\xc0\x80\xc0\x80\xc0\x80\017\xc0\x80\xc0\x80\xc0\x80\002\a\xc0\x80\xc0\x80\xc0\x80\001v\a\xc0\x80\xc0\x80\xc0\x80\xc0\x80�\xc0\x80\xc0\x80\xc0\x80\037\xc0\x80\xc0\x80\xc0\x80\002\a\xc0\x80\xc0\x80\xc0\x80\005names\a\xc0\x80\xc0\x80\xc0\x80\frmr.template\b\xc0\x80\xc0\x80\xc0\x80\002�\xc0\x80\xc0\x80\xc0\x80\024\xc0\x80\xc0\x80\xc0\x80\002\a\xc0\x80\xc0\x80\xc0\x80\003key\a\xc0\x80\xc0\x80\xc0\x80\003val�\xc0\x80\xc0\x80\xc0\x80\005�\xc0\x80\xc0\x80\xc0\x80\001"

Which one is exact output file and how to convert this output into readable format?

Thanks in advance,
Rajkumar.

commented

I got exactly the same problem and output. I'm running a 64bit MapR distribution with the following config:

  • Hadoop 0.20.2
  • Java
java version "1.7.0_65"
OpenJDK Runtime Environment (rhel-2.5.1.2.el6_5-x86_64 u65-b17)
OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)
  • R
$version.string
[1] "R version 3.1.1 (2014-07-10)"

Any help would be great!

ThiDiff.

Hi ThiDiff,

Problem solved. Just pass one more argument "output.format" for "mapreduce" function as

mapreduce (input = small.ints, output.format="text",map = function (k, v) cbind (v, v ^ 2)).

commented

Hi RajkumarB,

Thank you, it resolves my problem!
I have just another question about the from.dfs(...) function. When I try to do

from.dfs(mapreduce(input=to.dfs(1:10), output.format='text', map=function(k,v) cbind(v,v^2)))

I get the following error :

Error in if (file.exists(cmd)) return(cmd) : argument is of length zero

I don't really understand why. This also happends when I type a simple from.dfs(to.dfs(1:10)).
Have you any idea?

Thank you in advance!