pi22by7 / scrapegoat

A Python web scraper built with BeautifulSoup for extracting data from websites, handling links and paginations, and saving results to CSV.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Scraper

This is a Python-based web scraping application that allows you to extract data from websites in a simple and efficient way. It provides functionality to handle links and paginations, making it easier to scrape multiple pages or follow links within a website.

Features

  • Extract data from websites using specified element names, class names, and ID names.
  • Handle links and paginations to scrape multiple pages or follow links within a website.
  • Save the scraped data to a CSV file for further analysis or processing.
  • Customizable user agent selection to mimic different web browsers or devices.
  • Modern GUI interface for easy input and interaction.

Installation

  1. Clone the repository: git clone https://github.com/pi22by7/scraper.git
  2. Navigate to the project directory: cd web-scraper
  3. Install the required dependencies: pip install -r requirements.txt

Usage

  1. Run the application: python gui.py
  2. Enter the URL to scrape, element name, class name (optional), and ID name (optional) in the GUI.
  3. [WIP] Optionally, select a user agent from the dropdown menu to mimic different web browsers or devices.
  4. Click the "Scrape" button to start the scraping process.
  5. The scraped data will be saved to a CSV file in your chosen directory.

Screenshots

Screenshot

Contributing

Contributions are welcome! If you have any suggestions, bug reports, or feature requests, please open an issue or submit a pull request.

License

GPLv3

About

A Python web scraper built with BeautifulSoup for extracting data from websites, handling links and paginations, and saving results to CSV.

License:GNU General Public License v3.0


Languages

Language:Python 100.0%