The idea of our group is to work on related Wikipedia pages. That is to give as input a Wikipedia page and to get related pages’ name as output. The idea is to go through the first page text and to get all Wikipedia pages mentioned. Then go through all that pages and get all Wikipedia pages mentioned and so on until we have visited a specific number of pages. By doing it some pages will been mentioned many times and so they should be related to input page. It can be seen as a network where nodes are Wikipedia pages and there is a path from node A to node B if page B is mentioned in page A. Thus the intention is to retrieve the most linked nodes which there is a path from the input one.
Here is a short description of the steps will have to go through :
First you have to be sure that all packages are already installed.
To install the ForceAtlas2 :
pip install fa2
To install Networkx :
pip install networkx
Now you are ready to use our program.
The first parameter is the title of the Wikipedia page you want suggestions about. The second one is the depth limit (we recommend you not to exceed 3 otherwise it will take a really long time to run and the suggested pages might off subject from the input.
By default two graphs are already constructed as .gml files, which will avoid you to waste time. If you want to use one of them, you just have to go to the graphAnalyzer.py file and specify with file you want analyze line 127.
Warning : Please keep in mind that the DBSCAN algorithm does not provide a really relevant output and that it might take up to 10min to run. Therefore, we recommend you to use the K-Means algorithm (that is to give False as input when running graphAnalyzer.py)