divithraju / divith-raju-SearchEngine-Wikipedia

search engine optimizationA complete search engine experience built on top of 75 GB Wikipedia corpus with subsecond latency for searches. Results contain wiki pages ordered by TF/IDF relevance based on given search word/s. From an optimized code to the K-Way mergesort algorithm, this project addresses latency, indexing, and big data challenges.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

search-engine

search engine optimization

About

search engine optimizationA complete search engine experience built on top of 75 GB Wikipedia corpus with subsecond latency for searches. Results contain wiki pages ordered by TF/IDF relevance based on given search word/s. From an optimized code to the K-Way mergesort algorithm, this project addresses latency, indexing, and big data challenges.


Languages

Language:Python 99.7%Language:Shell 0.3%