trinodb / tpch

Port of TPC-H dbgen to Java

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to use airlift-tpch

prashant23 opened this issue · comments

Hello developers,
I was searching for java utility to generate TPCH data and found your code, Can you tell me how to use this as a API, I was looking for Readme file but i didn't found one.

Thanks and Regards

Prashant

I haven't gotten around to adding a command line interface yet, but you can create files with code like the following:

Writer writer = new FileWriter("yourFile");
for (Customer entity : new CustomerGenerator(scaleFactor, part, numberOfParts)) {
    writer.write(entity.toLine());
    writer.write('\n');
}

Each table in TPCH has an associated generator, and each generator is an Iterable<TpchEntity>. Each entity has getters for the individual column values, or you can use the toLine() to generate a standard TPCH output line.

when I am trying to run the above code , I am getting the following error -

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Unknown Source)
at java.lang.String.(Unknown Source)
at java.lang.StringBuilder.toString(Unknown Source)
at io.airlift.tpch.TextPool.(TextPool.java:62)
at io.airlift.tpch.TextPool.(TextPool.java:40)
at io.airlift.tpch.TextPool.getDefaultTestPool(TextPool.java:31)
at io.airlift.tpch.CustomerGenerator.(CustomerGenerator.java:44)
at tpch.data.GenerateData.main(GenerateData.java:14)

I think it's creating ample amount of garbage collector .
I am using windows 7 with RAM of 4GB.

Is there any problem with the RAM ?

Kindly let me know if any workaround is there ?

This code has not been optimized for running in memory constrained environments. I'm sure there is a lot of room for improvement here if you want to take a look at it. Also the latest commits in trunk improve performance and rate of garbage generation, but I'm not sure what the minimum amount of memory to required to run the generator is. I would guess you need at least a few GBs.

I don't know how the JVM chooses the default heap size on Windows, but it might be too small. Try increasing the heap size when running Java:

java -Xms2G -Xmx2G ...

This sets the starting size and maximum size to 2GB, so it will allocate that much memory up front and use a fixed-size heap. You can try using 1G or 3G depending on whether or not that works.