Data generator should provide BYOD feature
vgkowski opened this issue · comments
Currently, BatchReplayer
is consuming PreparedDataset
to generate data. We can provide a new construct to prepare the data for replay during provisioning of the CDK application.
This construct can take a source dataset as input parameters and run a synchronous AWS Glue job to modify the dataset and make it consumable by the BatchReplayer
Pre-requisites for BatchReplayer
are listed in the PreparedDataset
construct documentation
Add some quality checks to prevent the preparation from failing.
Ensure the PySpark script is packaged into the core library.