- First we present an example of the methods used to extract keywords (see Graph of words and keywords extraction.ipynb and K-truss_code_example.ipynb)
- Then we give a code to compute the k_core and obtain the graphs of directories of files or all files in directories containing sub-directories (see K_core_corpus.py)
- We also give an implementation of the K-truss algorithm (see K-truss_code.py)
- We make a time analysis to see the evolution of some words through time, in order to detect events related to them.
- Networkx to create and vizualize graphs
- Spacy to preprocess the text
- The k-core is directly taken from Networkx library.
- The k-truss is implemented following https://arxiv.org/abs/1205.6693
- The density and inflexion methods are implemented following https://www.aclweb.org/anthology/D16-1191
This notebook is dedicated to people who want to extract keywords from text document or corpus documents using a graph approach.
The goal of this notebook is to extract keywords from a text file using four different approachs :
- Best Coverage keywords extraction - http://www2013.w3c.br/proceedings/p715.pdf
- Div Rank keywords extraction - http://clair.si.umich.edu/~radev/papers/SIGKDD2010.pdf
- K-core Number - Python Library Networkx
- K-Truss - https://arxiv.org/abs/1205.6693
Through a french summary of Games of Thrones, we bring an example of the outputs of the four different approaches.
This jupyter notebook is an example of the following script
Two functions are implemented.
- The first one compute the K-truss of each node in G, the maximum non empty subgraph, the k from the maximum non empty subgraph and the necessary informations to compute the density and inflexion method.
- The second one gives the k-truss subgraph of the graph, where k is given as an input