csv-format data-extraction hackathon python3 scrapy-crawler scrapy-spider web-crawling web-scraping-python rategain-code-rangers

RateGain

Web Scraping Hackathon Solution

Welcome to my solution for the web scraping hackathon! In this challenge I developed a program using Python and the Scrapy library to extract specific information from the "https://rategain.com/blog" webpage. The goal was to collect data from various blog posts including blog titles publication dates image URLs and the number of likes each post has received.

Project Screenshots:

🛠️ Setup:

1. Before running the program make sure you have the required dependencies installed. You can install them using the following command:

pip install scrapy

🧐 How the solution Works

1. Navigating Through Pages: The program starts by visiting the target URL and navigates through various pages to gather data comprehensively. It uses Scrapy's capabilities for web crawling and scraping.

2. Extracting Information:

Blog Title: The program captures the titles of the blog posts.
Blog Date: It retrieves the publication dates of each blog post.
Blog Image URL: The program extracts the URLs of the images associated with the blogs.
Blog Likes Count: It records the number of likes each blog post has received.

3. Data Management:

The extracted data is organized and saved efficiently.
The preferred format for storage is CSV for ease of use and compatibility with various analysis tools.

🚀 Running the solution

1. Clone this repository:

git clone https://github.com/your-username/hackathon-web-scraping.git

2. Navigate to the project directory:

cd hackathon-web-scraping

3. Run the Scrapy spider:

scrapy crawl blog_scraper -o output_data.csv

This command will execute the spider and save the extracted data in a CSV file named output_data.csv.

Results

After running the solution, you will have a CSV file (output_data.csv) containing the organized data extracted from the target webpage. This file is ready for further analysis and exploration.

Feel free to reach out if you have any questions or need assistance with the solution. Happy coding!

💻 Built with

Technologies used in the project:

Python
Scrapy
Visual Studio Code
Web Scraping
Data Extraction

Reference

About

Welcome to my solution for the web scraping hackathon! In this challenge, I developed a program using Python and the Scrapy library to extract specific information from the "https://rategain.com/blog" webpage.

https://github.com/gayathri-pan

csv-format data-extraction hackathon python3 scrapy-crawler scrapy-spider web-crawling web-scraping-python rategain-code-rangers

Languages

Language:Python 93.5%Language:Cython 3.0%Language:C 2.8%Language:XSLT 0.5%Language:PowerShell 0.1%Language:GAP 0.1%Language:Shell 0.0%Language:Batchfile 0.0%Language:HTML 0.0%