hvdthong / Concept_Graph

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Construct a Concept Graph for a Course Dataset


The code in this project is used to generate the concept graph for a course dataset (i.e., udemy). There are two types of concept graphs: cover network and order network.

Cover Network

  • A cover network consists of concept nodes and cover edges .

  • A cover edge if and for some course and section .

Order Network

  • An order network consists of concept nodes and cover edges .

  • An order link if and for some course , section , and section such that .

Details about the order and cover network can be found here.

Implementation Environment

Please install the neccessary libraries before running our tool:

  • python
  • tqdm


Below is the list of main files used to generate the concept graph (cover or order network):

  • main_extraction.py: used to extract the concept information from the title, section, and lecture in the course
  • main_graph.py: used to generate the net cover or order network
  • main_root_graph: used to generate the net cover or order graph based on a root concept
  • main_rwr.py: applied random walk with restart (rwr) for the cover or order network
  • main_data.py: used to convert the cover or order network into JSON format for a visualization purpose



  • -concept: Directory of the list of concepts
  • -course: Directory of the list of courses
  • -p: Number of threads to speed up the pre processing data (Default: 2)


  • -title: Directory of the file containing matching information of course title
  • -section: Directory of the file containing matching information of course section
  • -lecture: Directory of the file containing matching information of course lecture
  • -option: Option to generate the cover or order graph (Default: cover)
  • -threshold: Threshold to construct the net graph for the cover and order graph (Default: 0.1)


  • -concept: Name of the concept
  • -graph_edge: Directory of the list of edges in the graph


  • -graph_edge: Directory of the list of edges in the graph
  • -c: Restart probablity (rwr) or jumping probability (otherwise) (Default: 0.15)
  • -epsilon: Error tolerance for power iteration (Default: 1e-9)
  • -max_iters: Maximum number of iterations for power iteration (Default: 100)


  • -graph_edge: Directory of the list of edges in the graph


Step 1:

  • To extract the concept information from the dataset, please follow this command:

    $ python main_extraction.py -concept [path of the concept dictionary] -course [path of the course data] -p 5 

After running this command, we will see three files, beginning with 'matching_title...', 'matching_sections...', 'matching_each_section...', in the main folder.

Step 2:

  • To generate the cover network, please follow this command:

    $ python main_graph.py -title [path of matching information of course title extracted by the main_extraction.py] -section [path of matching information of course section extracted by the main_extraction.py] -option cover

Note that we use the two files, named 'matching_title...' and 'matching_sections...', to construct the cover network.

  • We use a similar command to generate the order network, however, the input of the order network is two files, named 'matching_sections...' and 'matching_each_section...'

Step 3:

  • To generate the json data for the graph visualization purpose, please follow this command:

     $ python main_data.py -graph_edge [path of the list of edges in the graph generated by the main_graph.py]

After running this command, we will see a pickle file, which has the same name with the files in the step 2, in the main folder.


  • To generate the tree concept graph, please follow this command:

    $ python main_root_graph.py -concept [name of the concept] -graph_edge [path of the list of edges in the graph generated by the main_graph.py]
  • To apply the random walk with restart, please follow this command:

    $ python main_rwr.py -graph_edge [path of the edge data constructed by the main_graph.py]



Language:Jupyter Notebook 57.2%Language:Python 42.8%