maztak / pysearch

Web crawler and Search engine in Python.

Home Page:http://nwpct1.hatenablog.com/entry/python-search-engine

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Search Engine and Web Crawler in Python

Screenshot

  • Implement a web crawler
  • japanese morphological analysis using janome
  • Implement search engine
  • Store in MongoDB
  • Web frontend using Flask

More details are avairable from My Tech Blog(Japanese).

Requirements

  • Python 3.5

Setup

  1. Clone repository

    $ git clone git@github.com:maztak/pysearch.git
    
  2. Install python packages

    $ cd pysearch
    $ pip3 install -r requirements.txt -c constraints.txt
    
  3. MongoDB settings

    Install MongoDB

    $ 
    

    Install pymongo

    $
    

    Start mongo shell and create table(e.g. index table)

    $ mongo
    > use index
    

    in config.py, set the mongo url, like below.

    MONGO_URL = 'mongodb://127.0.0.1:27017/index'
    

    If you want to use MongoDB with GUI, I recommend free software Robo 3T. Only Robo 3T is sufficient(No Studio 3T is needed).

  4. Run

    $ python3 manage.py crawler # build a index
    $ python3 manage.py webpage # access to http://127.0.0.1:9000
    

About

Web crawler and Search engine in Python.

http://nwpct1.hatenablog.com/entry/python-search-engine


Languages

Language:Python 78.8%Language:HTML 21.2%