A Search Engine written in Python3.
Below is the list of implemented functionalities in this rank retrival model. These can be multi-purposely used to search for queries from the corpus in English
and Arabic
languages.
- Searching top 10 articles based on a given query.
- Comparing cosine-similarity between 2 articles present in the corpus.
- Implementation of Wilcard Queries.
- Term Auto-completion for search suggestions using tries.
- English Corpus - from where English-articles were downloaded and processed.
- Arabic Corpus - from where Arabic-articles were downloaded and processed.
-
python-nltk - NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces
- Install NLTK:
$ sudo pip install -U nltk
- Install Numpy:
$ sudo pip install -U numpy
- Test installation:
$ python3 >>> import nltk
-
python-flask - frame work to creade a frontend interactive application in python. Installing flask in python
-
WikiExtractor Python script that extracts and cleans text from enwiki dumps.
-
arwiki_parser Python script that extracts and cleans text from arwiki dumps.
- This can be used on any browser that support javascript, css and jquery provided we have the server set up.
- To set up server we need python with both python nltk and flask installed.
- Creating Query Logs and using machine learning technique to use logs to implement more efficient query searching in corpus.