Generate TPC-DS dataset through dsgen
yjshen opened this issue · comments
Currently, the TPC-DS sf=1 dataset is generated once and placed in 'dev/tpcds_1g', making our repo huge.
To avoid tracking in git and repeated generation, we should generate the dataset in Github Actions and cache the datasets.
- Github Action grants each project a 10GB cache, and we currently only use less than 1GB.