gayathri-pan / rateGain

Welcome to my solution for the web scraping hackathon! In this challenge, I developed a program using Python and the Scrapy library to extract specific information from the "https://rategain.com/blog" webpage.

Home Page:https://github.com/gayathri-pan

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RateGain

Web Scraping Hackathon Solution


Welcome to my solution for the web scraping hackathon! In this challenge I developed a program using Python and the Scrapy library to extract specific information from the "https://rategain.com/blog" webpage. The goal was to collect data from various blog posts including blog titles publication dates image URLs and the number of likes each post has received.

Project Screenshots:


image
image-1

๐Ÿ› ๏ธ Setup:


1. Before running the program make sure you have the required dependencies installed. You can install them using the following command:

pip install scrapy

๐Ÿง How the solution Works


1. Navigating Through Pages: The program starts by visiting the target URL and navigates through various pages to gather data comprehensively. It uses Scrapy's capabilities for web crawling and scraping.

2. Extracting Information:

  • Blog Title: The program captures the titles of the blog posts.
  • Blog Date: It retrieves the publication dates of each blog post.
  • Blog Image URL: The program extracts the URLs of the images associated with the blogs.
  • Blog Likes Count: It records the number of likes each blog post has received.

3. Data Management:

  • The extracted data is organized and saved efficiently.
  • The preferred format for storage is CSV for ease of use and compatibility with various analysis tools.

๐Ÿš€ Running the solution


1. Clone this repository:

git clone https://github.com/your-username/hackathon-web-scraping.git

2. Navigate to the project directory:

cd hackathon-web-scraping

3. Run the Scrapy spider:

scrapy crawl blog_scraper -o output_data.csv

This command will execute the spider and save the extracted data in a CSV file named output_data.csv.

Results


After running the solution, you will have a CSV file (output_data.csv) containing the organized data extracted from the target webpage. This file is ready for further analysis and exploration.

Feel free to reach out if you have any questions or need assistance with the solution. Happy coding!

๐Ÿ’ป Built with


Technologies used in the project:
  • Python
  • Scrapy
  • Visual Studio Code
  • Web Scraping
  • Data Extraction

Reference


About

Welcome to my solution for the web scraping hackathon! In this challenge, I developed a program using Python and the Scrapy library to extract specific information from the "https://rategain.com/blog" webpage.

https://github.com/gayathri-pan


Languages

Language:Python 93.5%Language:Cython 3.0%Language:C 2.8%Language:XSLT 0.5%Language:PowerShell 0.1%Language:GAP 0.1%Language:Shell 0.0%Language:Batchfile 0.0%Language:HTML 0.0%