khuongav / ImageMatching_MapReduce

Image matching with MapReduce on a small Hadoop cluster

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Image Matching with Map Reduce

  • Find identical images by comparing hash values of images in parallel using Map Reduce.
  • Handle small file problem in HDFS with SequenceFile.
  • Optimize Hadoop 2.x configurations on a small cluster (1 master & 3 slaves, each with 2 processors & 8 GB memory):
yarn.nodemanager.resource.memory-mb: 7168
yarn.scheduler.minimum-allocation-mb: 256
mapreduce.map.memory.mb: 3072
mapreduce.reduce.memory.mb: 256
mapreduce.input.fileinputformat.split.maxsize: 3221225472
mapreduce.input.fileinputformat.split.minsize: 3221225472
  • Performance (20 GB of image data):
    • Speedup: S = 295752 ms/95772 ms ~ 3
    • Efficiency: E = S/N = 3/6 = 0.5

About

Image matching with MapReduce on a small Hadoop cluster


Languages

Language:Java 100.0%