This repository contains the data set of 860 programs generated by AI tools, helper scripts, and analysis results. Be aware, that analyzing the whole data set will take approximately more than a day and it will consume more than 50GB of disk space due to the high amount of CodeQL databases which are not particularly storage efficient.
The data set is located in the vulnerability_analysis folder. There are 5 folders: rq_1, rq_2, rq_3, rq_4, and rq_8.
rq_1 contains Python code, rq_2 contains C code, rq_3 contains C# code, rq_4 contains JavaScript code, and rq_8 contains Bash and PowerShell code.
You need to have CodeQL CLI installed in order to run the vulnerability analysis. The helper scripts in the analysis folder are written in Python so you also need a Python interpreter (3.7+ should be OK). For analyzing C# code you need .NET 7.0 SDK, for C a gcc compiler (tested with 11.4) and Make, for Python a Python interpreter.
For analyzing the C code from the data set, you msut have the following libraries installed on your system:
- libcurl4-openssl-dev
- libsqlite3-dev
These are the commands to create the CodeQL database:
codeql database create codeql_database --language=python --overwrite
codeql database create codeql_database --language=cpp --overwrite --command="make clean all"
Remove-Item -Path codeql_database -Recurse -ErrorAction SilentlyContinue; codeql database create codeql_database --command='dotnet build /t:rebuild' --language=csharp
codeql database create codeql_database --language=javascript --overwrite
This command will analyze the CodeQL database and generate a CSV report.
codeql database analyze codeql_database --format=csv --output=codeql_results.csv