Anzo52 / JCrawl

Java web crawler

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

JCrawl

JCrawl - Java Websites Crawler

JCrawl is a basic web crawler implemented in Java, designed to scrape web pages starting from a given URL and extracting links from those pages. Web crawling is the process of navigating and extracting information from web pages, often used by search engines and web scrapers

Table of Contents

Features

  • Web crawling from a starting URL.
  • Specify the number of links to scrape using a breakpoint.
  • Extract links from web pages.

Prerequisites

  • Java Development Kit (JDK) installed on your system.

Usage

  1. Clone or download this repository to your local machine.
  2. Compile the JCrawl.java file using javac: javac JCrawl.java

Run the porgram:

  1. java JCrawl

About

Java web crawler

License:GNU General Public License v3.0


Languages

Language:Java 100.0%