Code and Documentation Based Classification of Sources and Sinks.
In this repository, one can find relevant information about our approach to automatically classify Android sources and sinks. All the artefacts necessary to reproduce our study can be found here.
git clone https://github.com/JordanSamhi/CoDoC.git
cd CoDoC mvn clean install
java -jar CoDoC/target/CoDoC-0.0.1-SNAPSHOT-jar-with-dependencies.jar options
Options:
-c
: Extract source code.-d
: Extract documentation.-o
: Destination folder for files.-s
: Source folder.
Resulting files will be stored in path given with -o
option in two separate folders source_code and documentation.
python3 vectorize_documentation_using_sentence_bert.py EXTRACTED_SOURCE_FOLDER
Resulting vectors will be stored in current directory in file named documentation_vectors.txt.
Copy source code folder generated by CoDoC you want to analyze in artefacts/code2vec/android_source/
cp -r OUTPUT_PATH_SOURCE artefacts/code2vec/android_source/
Build JavaExtractor tool
cd artefacts/code2vec/JavaExtractor/JPredict/ mvn clean install
Prepare datasets for code2vec
cd ../../android_source/ mkdir android_source_train android_source_val android_source_test count_src=$(ls -1|grep -v android|wc -l) five_percent=$((count_src * 5 / 100)) ls -1 |grep -v "android"|head -$five_percent|parallel -j24 mv {} android_source_val/{}.java ls -1 |grep -v "android"|head -$five_percent|parallel -j24 mv {} android_source_test/{}.java ls -1 |grep -v "android"|parallel -j24 mv {} android_source_train/{}.java cd ../
Preprocess the data
sh preprocess.sh
Train the dataset
sh train.sh
Generate the source code vectors
python3 code2vector_infer.py -o OUTPUT
Vectors will be available in OUTPUT/source_code_vectors.txt
If you want to launch everything at once, you can just use this script:
./launch_codoc.sh -s PATH_TO_ANDROID_SOURCE -o OUTPUT_PATH
- Maven - Dependency Management
This project is licensed under the GNU GENERAL PUBLIC LICENSE Version 3 - see the LICENSE file for details
For any question regarding this study, please contact us at: Jordan Samhi