akuroda / hadoop-jruby-adapter

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

hadoop-jruby-adapter

1. Summary
This library provides adapter classes to write MapReduce in JRuby.
Developed for hadoop-0.23.0-cdh4b1, but should work fine as long as using new API(org.apache.hadoop.mapreduce).

2. How to use
- write main class in JRuby
- write Mapper and Reducer in JRuby
- set mapper and reducer class in hadoop-jruby-adapter-conf.xml
- compile jruby classes using jruby --javac

3. Sample
# mapper
class CountMapper
  def initialize
    @one = IntWritable.new(1)
    @word = Text.new
  end

  java_signature 'void map(Writable, Writable, Mapper.Context)'
  def map(key, value, context)
    #puts "mapper: #{value}"
    value.to_s.split.each do |i|
      @word.set(i)
      context.write(@word, @one)
    end
  end
end

# reducer
class CountReducer
  java_signature 'void reduce(Writable, Iterable, Reducer.Context)'
  def reduce(key, values, context)
    #puts "reducer: #{key}"
    sum = 0
    values.each do |i|
      sum += i.get
    end
    context.write key, IntWritable.new(sum)
  end
end

# main
class MapReduceTestMain

  java_signature 'void main(String[])'
  def self.main(args)
    puts "MapReduceTest"
    job = Job.instance
   
    # set actual mapper via hadoop-jruby-adapter-conf.xml
    job.mapperClass = MapperAdapter.java_class
    puts job.mapperClass
    job.mapOutputKeyClass = Text.java_class
    puts job.mapOutputKeyClass
    job.mapOutputValueClass = IntWritable.java_class
    puts job.mapOutputValueClass

    # set actual reducer via hadoop-jruby-adapter-conf.xml
    job.reducerClass = ReducerAdapter.java_class
    puts job.reducerClass
    job.outputKeyClass = Text.java_class
    puts job.outputKeyClass
    job.outputValueClass = IntWritable.java_class
    puts job.outputValueClass

    FileInputFormat.addInputPath job, Path.new(args[0])
    FileOutputFormat.setOutputPath job, Path.new(args[1])

    job.waitForCompletion true
  end
end

4. Limitation
Currently support only Mapper and Reducer.

5. License
Copyright (c) 2012 Akira Kuroda <akuroda@gmail.com>
Released under MIT License.

About


Languages

Language:Java 70.9%Language:Ruby 29.1%