weiguoPian / MetaTPTrans

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MetaTPTrans

This repository is based on the implementation of TPTrans.

Raw data

To run experiments, please first create datasets from raw code snippets of CodeSearchNet dataset. Download and unzip the raw jsonl data of CSN into the raw_data dir like that

├── raw_data     
│   ├── python         
│   │   ├── train    
│   │   │   ├── XXXX.jsonl...
│   │   ├── test    
│   │   ├── valid   
│   ├── ruby          
│   ├── go        
│   ├── javascript        

For the subset used for code completion task, please download it here and parse it.

Preprocess

AST Parser

We use Tree-Sitter to parse the source code snippets to ASTs. Please put the parser into vendor fold like this.

├── vendor        
│   ├── tree-sitter-python  (from https://github.com/tree-sitter/tree-sitter-python)         
│   ├── tree-sitter-javascript  (from https://github.com/tree-sitter/tree-sitter-javascript)     
│   ├── tree-sitter-go  (from https://github.com/tree-sitter/tree-sitter-go)
│   ├── tree-sitter-ruby  (from https://github.com/tree-sitter/tree-sitter-ruby)

And then, run script multi_language_parse.py for preprocessing data for code summarization task.

And run multi_language_parse_completion.py (if applicable) for preprocessing data for code completion task.

About


Languages

Language:Python 99.4%Language:Shell 0.6%