trinodb / tpch

Port of TPC-H dbgen to Java

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

toString() method on all TpchEntity

rmetzger opened this issue · comments

Hi,

first of all: Great project. I'm currently looking into using the code for creating a distributed TPC-H data generator.
I'm pretty confident that your code is suitable for that. Most likely, I'll contribute some stuff back, if thats okay for you.

One question (that could lead to a first contribution from my side): Is there a particular reason why all classes implementing TpchEntity don't use the toLine() method also for the toString() method?

Its very handy when using a debugger and for "system out" debugging.

No reason. When we switch to Java 8, we can just add a default method, but for now you can add the toString calls.

BTW, there is already a distributed generator in Presto (https://github.com/facebook/presto/tree/master/presto-tpch). We use it for testing. The generator is so fast that we run the initial SQL test directly on generator instead of loading it into a file backed store.

Okay. I'll do it once I find time for it. I found another solution so the issue isn't blocking me from using airlift/tpch.

Thanks to the pointer to Preso. I doing something similar with Apache Flink to generate TPC-H data in a cluster.
The purpose of my program is to a) provide an example for how to use Flink for (distributed) data generation and also b) to have a tool for generating data for our internal testing.
It would be too much work to set up Presto "just" to generate the data.