RahulDwiwedi / Vertical-Search-Engine

A vertical search engine as distinct from a general web search engine focuses on a specific segment of online content. They are also called specialty or topical search engines. The vertical content area may be based on topicality, media type, or genre of content As people explore the options provided by Google, they quickly move out of the search engine and visit topical sites – such as Best Buy or Apple – and carry out specific searches.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Vertical-Search-Engine

What is a Vertical Search Engine?

A vertical search engine as distinct from a general web search engine focuses on a specific segment of online content. They are also called specialty or topical search engines. The vertical content area may be based on topicality, media type, or genre of content As people explore the options provided by Google, they quickly move out of the search engine and visit topical sites – such as Best Buy or Apple – and carry out specific searches. Although the initial search activity often originates in Google, a majority of the follow-up searches – “specific searches” – have shifted to vertical and topical sites. Specific searches are also becoming more popular with the rise of smartphones

IMPORTANCE OF VERTICAL SEARCH ENGINE OVER OTHER SEARCH ENGINES

Vertical search offers several potential benefits over general search engines: • Greater precision due to limited scope, • Leverage domain knowledge including taxonomies and ontologies, • Support of specific unique user tasks.

WORKING:-

(1) CRAWLING: Web Crawlers such as “spider”, “bot” etc. are used for crawling. Web Crawlers are a program or automated script which browses the World Wide Web in a methodical, automated manner. Crawling is never ending process. Crawler module extracts data and key information from each page. Web crawlers can copy all the pages and store them in page repository and visit for later processing. When large parts of the Internet were essentially invisible to search engines – “deep web” (this is rare now). TOR-hosted websites remain unindexed by Google and are only accessible by connecting to the TOR network and knowing the address.

(2) INDEXING: It refers to various methods for indexing the contents of a website or of the Internet as a whole. Search engines usually use keywords and metadata to provide a more useful vocabulary for Internet or onsite searching Search engine indexing collects, parses, and stores data to facilitate fast and accurate information retrieval. The purpose of storing an index is to optimize speed and performance in finding relevant documents for a search query. Indexed documents are then stored in databases.

(3) Retrieval or Searching: When user input any text for searching, that keyword is searched from database where condensed summary of web pages are stored After finding entered keyword relative webpages, page rank algorithm is implemented on those web pages. The page having highest rank is showed as first website of search result page.

TYPES OF WEB CRAWLER:

(1) Distributed Crawler: Many crawlers are working to distribute in the process of web crawling, in order to have the most coverage of the web. A central server manages the communication and synchronization of the nodes, as it is geographically distributed. It basically uses Page rank algorithm for its increased efficiency and quality search. The benefit of distributed web crawler is that it is robust against system crashes and other events, and can be adapted to various crawling applications

(2) Parallel Crawler: Multiple crawlers are often run in parallel, which are referred as Parallel crawlers. The Parallel crawlers depend on Page freshness and Page Selection .A Parallel crawler can be on local network or be distributed at geographically distant locations. Parallelization of crawling system is very vital from the point of view of downloading documents in a reasonable amount of time

(3) Incremental Crawler: An incremental crawler incrementally refreshes the existing collection of pages by visiting them frequently; based upon the estimate as to how often pages change. It also exchanges less important pages by new and more important pages. It resolves the problem of the freshness of the pages. The benefit of incremental crawler is that only the valuable data is provided to the user, thus network bandwidth is saved and data enrichment is achieved.

(4) Focused Web Crawler: It tries to download pages that are related to each other. It collects documents which are specific and relevant to the given topic. It is also known as a Topic Crawler because of its way of working. It determines how far the given page is relevant to the particular topic and how to proceed forward. The benefits of focused web crawler is that it is economically feasible in terms of hardware and network resources It can reduce the amount of network traffic and downloads. Vertical search engine uses focused web crawler

About

A vertical search engine as distinct from a general web search engine focuses on a specific segment of online content. They are also called specialty or topical search engines. The vertical content area may be based on topicality, media type, or genre of content As people explore the options provided by Google, they quickly move out of the search engine and visit topical sites – such as Best Buy or Apple – and carry out specific searches.


Languages

Language:Python 71.5%Language:CSS 21.7%Language:HTML 6.8%