trinodb / tpch

Port of TPC-H dbgen to Java

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Line Item count is off by a little bit

hgschmie opened this issue · comments

running this code:

 public void generate() throws Exception {
        double scale = 10; 

        System.out.println("Line Item:     " + Iterables.size(TpchTable.LINE_ITEM.createGenerator(scale, 1, 1)));
        System.out.println("Nation:        " + Iterables.size(TpchTable.NATION.createGenerator(scale, 1, 1)));
        System.out.println("Region:        " + Iterables.size(TpchTable.REGION.createGenerator(scale, 1, 1)));
        System.out.println("Part:          " + Iterables.size(TpchTable.PART.createGenerator(scale, 1, 1)));
        System.out.println("Customer:      " + Iterables.size(TpchTable.CUSTOMER.createGenerator(scale, 1, 1)));
        System.out.println("Orders:        " + Iterables.size(TpchTable.ORDERS.createGenerator(scale, 1, 1)));
        System.out.println("Part Supplier: " + Iterables.size(TpchTable.PART_SUPPLIER.createGenerator(scale, 1, 1)));
        System.out.println("Supplier:      " + Iterables.size(TpchTable.SUPPLIER.createGenerator(scale, 1, 1)));
}

with scale == 10 yields

Line Item:     59986052
Nation:        25
Region:        5
Part:          2000000
Customer:      1500000
Orders:        15000000
Part Supplier: 8000000
Supplier:      100000

scale == 1

Line Item:     6001215
Nation:        25
Region:        5
Part:          200000
Customer:      150000
Orders:        1500000
Part Supplier: 800000
Supplier:      10000

scale == 0.1

Line Item:     600572
Nation:        25
Region:        5
Part:          20000
Customer:      15000
Orders:        150000
Part Supplier: 80000
Supplier:      1000

While all the other counts match the spec perfectly, the Line Item count is off by a little bit (and for scale == 10 there are actually fewer lines than the spec requires).

Guess TPC-H 4.2.5 answers my question. So it seems to be normal. Sorry for the noise.