ZdsAlpha / SearchEngine

Implementation of Google Research Paper "The Anatomy of a Large-Scale Hypertextual Web Search Engine" on Wikipedia dataset along with intense optimizations and data parallelism.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Large-Scale Search Engine

It is an implementation of Google Research Paper "The Anatomy of a Large-Scale Hypertextual Web Search Engine" on Wikipedia dataset along with intense optimizations and data parallelism.

It can efficiently index 60GB of wikipedia data dump containing millions of articles in 3 hours on an average laptop. It uses all system resources and eliminates the bottlenecks.

Dataset

I have used wikipedia dump (60GB) xml format. It can found on this link: https://en.wikipedia.org/wiki/Wikipedia:Database_download#XML_schema

About

Implementation of Google Research Paper "The Anatomy of a Large-Scale Hypertextual Web Search Engine" on Wikipedia dataset along with intense optimizations and data parallelism.


Languages

Language:C# 97.5%Language:HTML 2.5%