BlueSkyLT / Info_retrieval

CI6226 Information Retrieval Assignment

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Information Retrieval

NTU CI6226 Information Retrieval Assignment

Group members

Cheng Hao, Guo Lanqing, Lan Tian, Li Ruibo, Yang Ze

Introduction

Information Search is a information retrieval system. Apply Django for web program, Bootstrap for the front end, and this system includes two types of corpus: 1) Our Novels dataset 2) HillaryEmails dataset; three different search methods: 1)

Program Structure

├── InforRetrieval
│   ├── __init__.py
│   ├── settings.py
│   ├── urls.py
│   └── wsgi.py
├── manage.py                         --entrance
├── search_web                        --a django app
│   ├── Info_retrieval                 --search algorithms
│   │   ├── components.py
│   │   └── main.py
│   ├── spider                        --spider for novel website
│   │   ├── Conversion_encoding_to_utf_8.py
│   │   └── Renumber.py
│   ├── __init__.py
│   ├── admin.py
│   ├── apps.py
│   ├── migrations
│   │   └── __init__.py
│   ├── models.py
│   ├── tests.py
│   └── views.py                       --data require function
├── static														 --static resource
│   ├── css
│   ├── img
│   └── js
│       ├── bootstrap
│       ├── font-awesome
│       ├── jquery
│       └── simple-line-icons
└── templates														--html
    ├── content.html
    └── index.html

Installation

  • python 3.7
  • nltk
  • tqdm
  • django

Dataset

Our Novels Dataset can be download here

Usage

  • Clone this program to local path
  • python manage.py runserver # run server in default port 8000
  • Access Link:http://127.0.0.1:8000/index

It will take about 40 mins to create the index for the Novels and HillaryEmails corpora. Considering such long time it takes, we have already deployed it on the server, feel free to get access via http://154.8.218.119:10101/index. Note that the physical address of the server is in China. We appreciate your kind patience to wait the connection. Thank you.

About

CI6226 Information Retrieval Assignment


Languages

Language:JavaScript 89.2%Language:Python 5.5%Language:CSS 3.0%Language:HTML 2.2%Language:Shell 0.1%