PC-Pedia / Crawler

Simple Crawler, Indexer and Search Engine Web Application

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Crawler

Simple Crawler and Indexer and Search Engine Web Application

release Build status

Nuget Restore

Just open the project and right click the solution and choose nuget package restore. Wait till package restore completes.

Configuration

  1. Build and run the first project called Crawler. It uses its seed and downlaods the sites recursively (Breath First Search) and stores it in Data.Db and Crawler.Db file. Whenever you feel the gathered data is enough, simply close the program.

  2. Build and run the second project called Indexer. You should copy Crawler.Db file from previous section here. After opening the program, It starts indexing the downloaded data and generates three files Sites.Db, TitleIndex.Db, and BodyIndex.Db.

  3. Copy files generated from previous section to App_Data folder.

Enjoy.

About

Simple Crawler, Indexer and Search Engine Web Application

License:MIT License


Languages

Language:JavaScript 48.8%Language:CSS 40.5%Language:C# 9.4%Language:HTML 1.3%Language:ASP.NET 0.0%Language:PowerShell 0.0%