Main Code for Code Retrieval Model
Repository Structures
/code
: All code for model training and evaluation/data
: Directory for data and model checkpoints
Code Strcuture
codesearcher.py
: The main for Code Retrieval Model:config.py
: Configurations for models defined in themodels.py
. Each function defines the hyper-parameters for the corresponding model.data.py
: dataset loader.utils.py
: Utilities for models and training.
Usage
The /data
folder provides all the datasets
- Main staQC dataset
- CodeNN dataset
- CodeNN sanity check dataset (sanity - pad, unk)
- staQC dataset to check generalization to Code Annotation (anno)
- staQC sanity check dataset (sanity check - code empty, QB empty)
To test on specific dataset, make changes in the Config
file.
For Codenn dataset, comment eval
and uncomment eval_codenn
in codesearch
python file
To train and test our model:
Configuration
- Edit hyper-parameters and settings in
config.py
- Don't run training without changing parameters, it will override saved best models
- The parameters provided load the best model for evaluation.
Model Using QB (Best Model parameters)
+ Both 512 and 256 batchsizes produced similar results. I have placed both the weights in the data directory.
Train
python codesearcher.py --mode train --use_qb 1 --code_enc bilstm --reload -1 --dropout 0.25 --emb_size 200 --lstm_dims 400 --batch_size 256
python codesearcher.py --mode train --use_qb 1 --code_enc bilstm --reload -1 --dropout 0.25 --emb_size 200 --lstm_dims 400 --batch_size 512
Evaluation
python codesearcher.py --mode eval --use_qb 1 --code_enc bilstm --reload 1 --dropout 0.25 --emb_size 200 --lstm_dims 400 --batch_size 256
python codesearcher.py --mode eval --use_qb 1 --code_enc bilstm --reload 1 --dropout 0.25 --emb_size 200 --lstm_dims 400 --batch_size 512
Model without QB (Best Model parameters)
Train
python codesearcher.py --mode train --use_qb 0 --code_enc bilstm --reload -1 --dropout 0.35 --emb_size 200 --lstm_dims 400 --batch_size 1024
Evaluation
python codesearcher.py --mode eval --use_qb 0 --code_enc bilstm --reload 1 --dropout 0.35 --emb_size 200 --lstm_dims 400 --batch_size 1024