rix0rrr / cdk-trainingset

Collecting a potential LLM training set for CDK code

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CDK Training Set

Collecting a CDK programs training set.

This trainig set is based on the integ tests of the CDK repository.

The data set

The data set has a number of high-level CDK programs, and the CloudFormation templates that these high-level programs produce. A single high-level program may produce more than one output template.

For every template, a reverse-engineered low-level CDK program is provided if possible. These have been automatically converted using cdk-from-cfn. This tool doesn't support all templates yet, some low-level CDK programs may be missing.

Preparation

You must have Node. Run the following command to install all Node dependencies:

npm ci

Running the collector

You must have Python as well, and a local checkout of the CDK repository. Run the following command to refresh the samples taken from the CDK repo:

python3 collect-samples.py /path/to/aws-cdk

Synthesizing example programs

The example programs are synthesized using ts-node and have been slightly rewritten to work exactly with the tsconfig.json in this directory.

Synthesizing a single program to a cdk.out directory looks like this:

npx cdk -a "ts-node --transpile-only /path/to/program.ts" synth

To synthesize everything to its own output/<test_name>/high_level.ts.cdk.out directory, run:

./synth-all.sh

Not all examples are guaranteed to synth correctly, as some of them depend on files that may not exist in the current repository.

Evaluating the CloudFormation template

Symbolically evaluating the CloudFormation can be done using cfngine. This will produce a JSON-lines formatted output stream picking a certain order to evaluate the resources in, and will evaluate the expressions in the template to symbolic values that represent the inputs.

npx cfngine create /path/to/template.json

Or run eval-all.sh to do all of them:

./eval-all.sh

About

Collecting a potential LLM training set for CDK code

License:MIT License


Languages

Language:TypeScript 99.2%Language:JavaScript 0.7%Language:Python 0.0%Language:Shell 0.0%