This project concerns the implementation of a simple search engine. The Applications is devided into two parts. Front-End and Back-End. Below you can see in more detail the structure of the software.
FRONT END
Fornt-end is actually a simple interface where the user can access the search engine.
index.html
script.js
style.css
BACK END
All the work for crawling, indexing and pricessing queries is done here.
- application
Application.java Response.java RouteController.java
- services
- crawler
CrawlTask.java Crawler.java
- indexer
Indexer.java IndexingTask.java
- query_processor
Query.java
- crawler
- util
ArrayIndexerComparator.java HtmlDocument.java StopWords.java Tupple.java
- Chrome extension : Allow CORS: Access-Control-Allow-Origin
- Java 13.0.2 or higher
- intellij idea (if you choose the second option)
- maven (if you choose the second option)
First Way : download the precompiled .jar file from the repository
- Download search-engine-1.0-SNAPSHOT.jar
- Open the command-line at the folder you have downloaded the .jar file
- Use the following command
java -jar search-engine-1.0-SNAPSHOT.jar <website> <number of pages to crawl> <number of threads> <use old data?(true/false)> <no crawling?(true/false)>
- Wait the server to initialize (about 1-2 minutes - depends on the hardware used)
- Open the
index.html
file and try to search something by giving a query and the number of pages you want to get as a result.
Second Way : download the whole project and using maven generate your own .jar file
- Download the project from the repository
- Import the project in you IDE - intellij Idea
- Perform maven Lifecycle
clean
operation - Perform maven Lifecycle
package
operation - The .jar file is produced and saved in the created folder with name
Target
- The following steps are the same as in the previus option
In the following tutorial we will be testing the software using a precalculated dictionary which was created from the following website : (https://en.wikipedia.org/wiki/Cabinet_of_Kyriakos_Mitsotakis)
This website has information about the current Greek coverment formation - Ministers, Ministries and the Prime Minister.
First of all copy and paste in your .jar file directory the following files:
The next step is to run the following command in the directory where we have placed the above files, which is the same directory as search-engine-1.0-SNAPSHOT.jar
java -jar search-engine-1.0-SNAPSHOT.jar https://en.wikipedia.org/wiki/Cabinet_of_Kyriakos_Mitsotakis 200 8 true true