sahar-hamdi / Web-Crawler-Information-Retrieval-

1- A web crawler that crawl wikipedia starting from the following 2 seeded https://en.wikipedia.org/wiki/List_of_pharaohs 2- Build the inverted index for visited pages 3- get a query ( set of a number of words) 4- compute the cosine similarity between each file and the query 5- rank the top k=10 files according to the value of the cosin similarity

Geek Repo

Github PK Tool

Web-Crawler-Information-Retrieval-

1- A web crawler that crawl wikipedia starting from the following 2 seeded https://en.wikipedia.org/wiki/List_of_pharaohs

2- Build the inverted index for visited pages

3- get a query ( set of a number of words)

4- compute the cosine similarity between each file and the query 5- rank the top k=10 files according to the value of the cosin similarity

About

Languages

Language:Java 100.0%