ljishen / tpch-data

Generate tpch data in parquet format

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

tpch-data

$ pip install pyarrow duckdb
$ python3
>>> import duckdb
>>> import pyarrow.parquet as pq
>>> con = duckdb.connect(database=':memory:')
>>> con.execute("INSTALL tpch; LOAD tpch")
>>> con.execute("CALL dbgen(sf=10)")
>>> print(con.execute("show tables").fetchall())
[('customer',), ('lineitem',), ('nation',), ('orders',), ('part',), ('partsupp',), ('region',), ('supplier',)]
>>> tables = ["customer", "lineitem", "nation", "orders", "part", "partsupp", "region", "supplier"]
>>> for t in tables:
...     res = con.query("SELECT * FROM " + t)
...     pq.write_table(res.to_arrow_table(), t + ".parquet")
...

About

Generate tpch data in parquet format

License:MIT License