- Python 3(recommend python version > 3.5)
- (networkx (if not :
(sudo) pip(3) install networkx
) - (pydot (if not:
(sudo) pip(3) install pydot
) - (dg)
- GED -python programs to generate the similarity of two c files(mainly depends on the GED algorithm). You'd better just call the method
get_similarity(cfile1,cfile2,dg_tools_path)
in GraphSimilarity.py, while other programs is just for testing. (classify - the matching result of incorrect programs to correct programs'cluster correct - the clustering result of correct programs incorrect - the clustering result of incorrect programs(it is not that important, you can ignore this dir currently)) - Work Report -My Reports about my work. You can ignore this dir.
- iterate -The data set used in our research. You can find checksum.c in each subdir like
/iterate/problems/XXX/00X
To install this project, run the following commands:
git clone https://github.com/BryceTsui/UIUCSummerResearch.git
Input: c programs Output: dot files In this step, I use LLVM's tool dg to compile c programs, and then generate DD,CD,CFG graphs of these grograms.
And your data should follow next rule:
-datasetName(could contain several problems) --problems ---students ----all the student's submits -----some files and one c file with the same name of problem(for example checksum.c)
Input: dot files Output: Graphs(CD,DD,CFG)
In this step, you should put dot file as input and will get CD, DD,CFG graphs.
To realize this step, I use networkx, pydot in my program.
Input: graphs Output: similarity(0.0 - 1.0)
In this step, I use edit distance to judge the similarity between two graphs.
I combine these three steps into one programs, so you can run them easily. To finish these steps, you can run the main function in the GED.py file and change the parameter of 'datasetPath' to your datasetPath.
if __name__ == '__main__':
datasetPath = "G:\\UIUC\IntroClass_Dot\IntroClass"
......
Input: test cases Output: Incorrect programs and correct programs
To judge weather a program is correct or incorrect, we have to use test cases to judge it. To help you test all programs easily and automatically, I write a program to help you do that.
You only have to run the mian funtion in TestResult.py and change
'names' to the names of problems(>=1) 'path' to the path of the dataset
if __name__ == '__main__':
names = ['syllables']
path = 'G:\\UIUC\IntroClass_Dot\IntroClass'
for name in names:
getAllResult(path, path+'\\'+name+'\\tests\whitebox\\',
path+'\\'+name+'\\tests\\blackbox\\',name)
Input: Correct programs, Incorrect programs, Similarity Matrix Output: All Clusters
In this step, I use AffinityPropagation Clustering Algorithm to complete this it.
You can run main funtion in CLuster.py to finish this step.
if __name__ == '__main__':
names = ['checksum']
types = ['CFG','CD','DD']
for name in names:
for type in types:
#classifyIncorrectSamples(name,type,"G:\\UIUC\IntroClass_Dot\IntroClass")
correctClustering(name, type, "G:\\UIUC\IntroClass_Dot\IntroClass")
If you want to use the project just to calculate the graph similarity, you can just call /GED/GraphSimilarity after change the path to the data set.Note, this project only target at the data set in iterate. If you want to use new data set, need to modify the program carefully.
get_similarity('G:\\UIUC\GED\\testData\checksum1.c','G:\\UIUC\GED\\testData\checksum2.c','G:\\UIUC\DDCD\dg\\build\\tools')