iwasingh / Wikoogle

Wikoogle is a wikipedia search engine. Snippet Generation, Query expansion, Page rank, HITS, FLASK, whoosh, and so on

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool


Logo

Wikoogle

The wikipedia search engine!
Demo » (service might be unvailable)

Wikoogle is a wikipedia information retrieval system. In other terms is a wikipedia search engine

Installation

Alongside of other dependencies used to build the project explained below, you must need:

  • Wikipedia dumps. You should pick few dumps and only ones that have pages-articles-multistream as a name. Download them and put in the dumps directory on the root of the project
  • You will need enough RAM memory that depends on the number of dumps you want to index, you can easily run out of memory during the index or running phase. For example, if you have >= 3 dumps (> 3 GB decompressed) you will need at least 4 GB of ram free. On the other hand if you index a single tiny dump (from 600MB to 1.3GB decompressed), 2 GB free should be enough

Windows/Unix

Requirements

  • python (>= 3.7)
  • pipenv

After checking the requirements

python --version
pipenv --version

follow these steps from the root of the project:

Installing dependencies

pipenv install
pipenv run  python -m nltk.downloader 'popular'

Run

  1. Specify the entrypoint
    export FLASK_APP=main.py 
    or in Window Powershell (Window-key + X -> Window Powershell)
    $env:FLASK_APP = "main.py"
  2. Run
    cd src
    pipenv run python -m flask run --host 0.0.0.0 --port PORT --no-reload
    
    remember to set the PORT (e.g 8888)

Docker

As alternative to the first installation, you can install and run the project within a linux container. Be sure to have docker installed: https://docs.docker.com/get-docker/

  1. Build the image information_retrieval (you can change the tag)

    docker image build -t information_retrieval -f Dockerfile.dev .

    The image is based on the python:latest image. If the process fails due to missing image, download it with docker pull python and retry.

  2. Create the container with the name ir_container (you can change it)

    docker container create -p 8888:8888 -v ${PWD}:/app -it --name ir_container information_retrieval

    You can change the ports mapping(8888 is the only exposed port of the image, so don't change the destination container port but only the origin host port) and the name of the container. Be sure to give ENOUGH RAM to the container(read installation instruction at the beginning), otherwise the next step might fail

  3. Run

    docker -ia ir_container # or the name you specified before
    cd src
    export FLASK_APP=main.py
    python -m flask run --host 0.0.0.0 --port 8888 --no-reload

Usage

The usage is straightforward, you can checkout the demo online here: http://212.237.42.43:8080/ or hit the browser after you ran the application on your computer at: localhost:PORT where PORT is the port you specified in the previous steps.

Wikoogle, resembles google(at least, we try): the query language is almost the same and you can configure search parameters of the models (e.g page rank, query expansion) from the ui-friendly menu

Browser support

All major modern browser are supported:

  • Chrome (>=57)
  • Edge (>=16)
  • Firefox (>=52)

Docs

See docs here, you will find evaluation and performance measures and other aspects related to the project as challenges, architecture, models and so on.

Screenshots

License

MIT

About

Wikoogle is a wikipedia search engine. Snippet Generation, Query expansion, Page rank, HITS, FLASK, whoosh, and so on

License:MIT License


Languages

Language:Python 82.3%Language:HTML 7.0%Language:CSS 6.3%Language:JavaScript 4.3%