JordanSamhi / CoDoC

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CoDoC

Code and Documentation Based Classification of Sources and Sinks.

In this repository, one can find relevant information about our approach to automatically classify Android sources and sinks. All the artefacts necessary to reproduce our study can be found here.

Getting started

Downloading the tool

git clone https://github.com/JordanSamhi/CoDoC.git

Installing the tool

cd CoDoC
mvn clean install

Using the tool

Extracting source code and documentation from android sources:

java -jar CoDoC/target/CoDoC-0.0.1-SNAPSHOT-jar-with-dependencies.jar options

Options:

  • -c : Extract source code.
  • -d : Extract documentation.
  • -o : Destination folder for files.
  • -s : Source folder.

Resulting files will be stored in path given with -o option in two separate folders source_code and documentation.

Generating documentation vectors using sentenceBERT:

python3 vectorize_documentation_using_sentence_bert.py EXTRACTED_SOURCE_FOLDER

Resulting vectors will be stored in current directory in file named documentation_vectors.txt.

Generating source code vectors using code2vec:

Copy source code folder generated by CoDoC you want to analyze in artefacts/code2vec/android_source/

cp -r OUTPUT_PATH_SOURCE artefacts/code2vec/android_source/

Build JavaExtractor tool

cd artefacts/code2vec/JavaExtractor/JPredict/
mvn clean install

Prepare datasets for code2vec

cd ../../android_source/
mkdir android_source_train android_source_val android_source_test
count_src=$(ls -1|grep -v android|wc -l)
five_percent=$((count_src * 5 / 100))
ls -1 |grep -v "android"|head -$five_percent|parallel -j24 mv {} android_source_val/{}.java
ls -1 |grep -v "android"|head -$five_percent|parallel -j24 mv {} android_source_test/{}.java
ls -1 |grep -v "android"|parallel -j24 mv {} android_source_train/{}.java
cd ../

Preprocess the data

sh preprocess.sh

Train the dataset

sh train.sh

Generate the source code vectors

python3 code2vector_infer.py -o OUTPUT

Vectors will be available in OUTPUT/source_code_vectors.txt

Run everything at once

If you want to launch everything at once, you can just use this script:

./launch_codoc.sh -s PATH_TO_ANDROID_SOURCE -o OUTPUT_PATH

Built With

  • Maven - Dependency Management

License

This project is licensed under the GNU GENERAL PUBLIC LICENSE Version 3 - see the LICENSE file for details

Contact

For any question regarding this study, please contact us at: Jordan Samhi

About

License:GNU General Public License v3.0