Hadoop example: Number of Mappers is always equal to 1

Question

Hadoop example: Number of Mappers is always equal to 1

AayushiDwivedi opened this issue 9 years ago · comments

@mibrahim @wnilkamal
I am running DataGenerator in distributed mode. I have a cluster that can run a number of mappers. But for some reason, I always end up with multiple blocks (from 2-100) and 2 containers. This I believe is the default setting for the number of containers.
Here is the script I use to run DataGenerator:
hadoop jar $HADOOP_CLASSPATH org.finra.datagenerator.samples.CmdLine -Dmapreduce.map.memory.mb="3072" -Dmapreduce.map.java.opts="-Xmx2362m" -Dmapred.map.tasks="10" -libjars $HADOOP_CLASSPATH -i sampleStateMachine.xml

I tried setting the number of map tasks to more than 1 but number of mapper is still 1 (containers = 2)
Is there some basic configuration setting that I need to tune?
Thanks!

Mohamed Ibrahim · Answer 1 · Sat Jul 25 2015 12:00:27 GMT+0800 (China Standard Time)

Hi Aayushi,

Datagenerator first attempts to split the graphs into a set of subgraphs
before executing the mappers. The function that sets the minimum number of
subgraphs required before distribution is here:
https://github.com/FINRAOS/DataGenerator/blob/master/dg-example-hadoop/src/main/java/org/finra/datagenerator/samples/CmdLine.java#L148

If you set the bootstrapmin to 1 (the default) while running on a multinode
cluster, you will end up with 1 mapper. For large problems consider
splitting them at least 100 to 500 pieces before distribution.

Let me know if that didn't answer your question.

On Fri, Jul 24, 2015, 8:05 PM AayushiDwivedi notifications@github.com
wrote:

@mibrahim https://github.com/mibrahim @wnilkamal
https://github.com/wnilkamal

I am running DataGenerator in distributed mode. I have a cluster that can
run a number of mappers. But for some reason, I always end up with multiple
blocks (from 2-100) and 2 containers. This I believe is the default setting
for the number of containers.
Here is the script I use to run DataGenerator:
hadoop jar $HADOOP_CLASSPATH org.finra.datagenerator.samples.CmdLine
-Dmapreduce.map.memory.mb="3072" -Dmapreduce.map.java.opts="-Xmx2362m"
-Dmapred.map.tasks="10" -libjars $HADOOP_CLASSPATH -i sampleStateMachine.xml

I tried setting the number of map tasks to more than 1 but number of
mapper is still 1 (containers = 2)
Is there some basic configuration setting that I need to tune?
Thanks!

—
Reply to this email directly or view it on GitHub
#226.

AayushiDwivedi · Answer 2 · Tue Jul 28 2015 06:03:04 GMT+0800 (China Standard Time)

@mibrahim Thanks! That helped.

Mohamed Ibrahim · Answer 3 · Tue Jul 28 2015 21:25:43 GMT+0800 (China Standard Time)

Closing, since answered.