FINRAOS / DataGenerator

DataGenerator is a Java library for systematically producing large volumes of data. DataGenerator frames data production as a modeling problem, with a user providing a model of dependencies among variables and the library traversing the model to produce relevant data sets.

Home Page:http://finraos.github.io/DataGenerator

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Hadoop example: Number of Mappers is always equal to 1

AayushiDwivedi opened this issue · comments

@mibrahim @wnilkamal
I am running DataGenerator in distributed mode. I have a cluster that can run a number of mappers. But for some reason, I always end up with multiple blocks (from 2-100) and 2 containers. This I believe is the default setting for the number of containers.
Here is the script I use to run DataGenerator:
hadoop jar $HADOOP_CLASSPATH org.finra.datagenerator.samples.CmdLine -Dmapreduce.map.memory.mb="3072" -Dmapreduce.map.java.opts="-Xmx2362m" -Dmapred.map.tasks="10" -libjars $HADOOP_CLASSPATH -i sampleStateMachine.xml

I tried setting the number of map tasks to more than 1 but number of mapper is still 1 (containers = 2)
Is there some basic configuration setting that I need to tune?
Thanks!

Hi Aayushi,

Datagenerator first attempts to split the graphs into a set of subgraphs
before executing the mappers. The function that sets the minimum number of
subgraphs required before distribution is here:
https://github.com/FINRAOS/DataGenerator/blob/master/dg-example-hadoop/src/main/java/org/finra/datagenerator/samples/CmdLine.java#L148

If you set the bootstrapmin to 1 (the default) while running on a multinode
cluster, you will end up with 1 mapper. For large problems consider
splitting them at least 100 to 500 pieces before distribution.

Let me know if that didn't answer your question.

On Fri, Jul 24, 2015, 8:05 PM AayushiDwivedi notifications@github.com
wrote:

@mibrahim https://github.com/mibrahim @wnilkamal
https://github.com/wnilkamal

I am running DataGenerator in distributed mode. I have a cluster that can
run a number of mappers. But for some reason, I always end up with multiple
blocks (from 2-100) and 2 containers. This I believe is the default setting
for the number of containers.
Here is the script I use to run DataGenerator:
hadoop jar $HADOOP_CLASSPATH org.finra.datagenerator.samples.CmdLine
-Dmapreduce.map.memory.mb="3072" -Dmapreduce.map.java.opts="-Xmx2362m"
-Dmapred.map.tasks="10" -libjars $HADOOP_CLASSPATH -i sampleStateMachine.xml

I tried setting the number of map tasks to more than 1 but number of
mapper is still 1 (containers = 2)
Is there some basic configuration setting that I need to tune?
Thanks!


Reply to this email directly or view it on GitHub
#226.

@mibrahim Thanks! That helped.

Closing, since answered.