EdwinPuertas / KnowledgeExtraction

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Knowledge Extraction

DOI

The knowledge extraction component is intended to consult the different Wikipedia articles on a particular topic and extract summary, texts, links, categories, keywords, among others. With the purpose of finding the most frequent words of the set of Wikipedia articles. The extraction process is done through the Wikipedia public API. Also, entities are extracted. The pressure and exhaustiveness of the extracted texts are also calculated using Jaccard Similarity and Similarity. After this process, the most frequent words are identified and exported in a CVS file and also exported to a web interface to be validated by domain experts.

About


Languages

Language:Python 100.0%